Protection Domains - Learning Module

Loading content...

0/227

Protection Rings

The Hierarchical Model of Trust

Imagine a medieval castle with concentric walls. The outermost wall protects the city below. The next wall guards the castle courtyard. The innermost wall protects the keep, where the king resides. Each wall represents a level of trust—enemies must breach multiple barriers to reach the most sensitive areas.

Protection rings apply this same principle to computing. The CPU implements concentric privilege levels, with the most critical code (the kernel) at the innermost ring and the least trusted code (user applications) at the outermost. Each ring can access its own resources and those of outer rings, but cannot directly access inner rings without going through controlled gates.

This architecture, pioneered by the Multics operating system in the 1960s and implemented in the Intel 80286 and all subsequent x86 processors, remains the foundation of hardware protection in modern systems.

What You Will Learn

By the end of this page, you will understand the protection ring architecture, how hardware enforces ring boundaries, the role of each ring in typical operating systems, the evolution from multi-ring to two-ring systems, and how virtualization has resurrected unused rings.

The Protection Ring Model

Protection rings are a hierarchical mechanism for organizing protection domains. Each ring is assigned a privilege level, with lower numbers indicating higher privilege. Code at ring N can access all resources available to rings N through (max ring), but cannot directly access resources in rings 0 through N-1.

Formal Definition:

A protection ring system consists of:

A set of N privilege levels numbered 0 to N-1
Ring 0 is most privileged (innermost); Ring N-1 is least privileged (outermost)
Ring N can access all resources of rings ≥ N
Transitions from outer to inner rings require controlled gates

x86 Protection Rings:

Intel x86 processors implement 4 protection rings (0-3), encoded in 2 bits of the code segment selector:

Ring 0: Kernel mode (highest privilege)
Ring 1: Device drivers (unused in most OS)
Ring 2: System services (unused in most OS)  
Ring 3: User mode (lowest privilege)

Converting Mermaid diagram...

Why Only 2 Rings Are Used

Despite having 4 rings available, most operating systems use only Ring 0 (kernel) and Ring 3 (user). Ring 1 and 2 are unused because: (1) Unix was designed for architectures with only 2 modes; (2) Some CPU features only distinguish Ring 0 from non-Ring-0; (3) Portability concerns—other architectures may not have 4 rings.

Hardware Enforcement of Rings

The CPU enforces ring boundaries through multiple hardware mechanisms. Understanding these mechanisms reveals what protection rings actually guarantee.

Current Privilege Level (CPL):

The CPL is stored in bits 0-1 of the CS (Code Segment) register. It represents the ring in which the currently executing code operates.

CS Register: [Segment Selector (bits 3-15)] [TI (bit 2)] [RPL (bits 0-1)]
When code is executing: CPL = RPL of the current CS

Ring 0: CPL = 0b00 (binary 00)
Ring 1: CPL = 0b01 (binary 01)
Ring 2: CPL = 0b10 (binary 10)
Ring 3: CPL = 0b11 (binary 11)

Descriptor Privilege Level (DPL):

Every segment descriptor (in the GDT/LDT) and gate descriptor has a DPL field specifying the minimum privilege required to access it:

struct segment_descriptor {
    u16 limit_low;
    u16 base_low;
    u8  base_mid;
    u8  type:4;
    u8  s:1;      // 1=code/data, 0=system
    u8  dpl:2;    // Descriptor Privilege Level (0-3)
    u8  p:1;      // Present
    u8  limit_high:4;
    u8  avl:1;
    u8  l:1;      // 64-bit mode
    u8  d:1;
    u8  g:1;
    u8  base_high;
};

Access Control Rules:

When code attempts to access a segment or call through a gate, the CPU performs privilege checks:

For Data Segments:

Access allowed if: CPL ≤ DPL and RPL ≤ DPL

Code in Ring 0 can access Ring 0, 1, 2, 3 data. Code in Ring 3 can only access Ring 3 data.

For Code Segments (via CALL/JMP):

Non-conforming code: CPL must equal DPL (exact match required)
Conforming code: CPL ≥ DPL (can call from outer rings, keeps caller's CPL)

For Call Gates (privilege transition):

Access allowed if: CPL ≤ DPL_gate
After transition: CPL = DPL_code_segment

A Ring 3 process can only use gates with DPL=3. Upon traversing the gate, CPL changes to the target segment's DPL.

Ring Transition Rules Summary
From → To	Mechanism	Condition	CPL After
Ring 3 → Ring 0	Interrupt/Trap Gate	Gate DPL ≥ 3	0
Ring 3 → Ring 0	SYSCALL instruction	MSRs configured by kernel	0
Ring 0 → Ring 3	IRET instruction	Target CS has RPL = 3	3
Ring 0 → Ring 3	SYSRET instruction	Implicit return to user	3
Ring N → Ring N	Normal CALL/JMP	CPL = DPL_target	N (unchanged)
Ring 3 → Ring 3	All normal operations	Always allowed within ring	3 (unchanged)

Ring-Specific Capabilities

Beyond segment access, protection rings determine which CPU instructions and operations are permitted. Ring 0 has exclusive access to many critical operations.

Privileged Instructions (Ring 0 Only):

These instructions can only execute when CPL = 0. Attempting to execute them at Ring 3 triggers a General Protection Fault (#GP):

Category	Instructions	Purpose
Control Registers	MOV CR0/CR2/CR3/CR4	Control CPU modes, paging, features
Debug Registers	MOV DR0-DR7	Hardware breakpoints, debug control
Model-Specific Registers	RDMSR, WRMSR	CPU configuration, syscall setup
I/O Permissions	CLI, STI	Enable/disable interrupts
I/O Port Access	IN, OUT, INS, OUTS	(if IOPL < CPL)
Descriptor Tables	LGDT, LIDT, LLDT, LTR	Load segment descriptor tables
Cache Control	INVD, WBINVD	Invalidate CPU caches
TLB Control	INVLPG, INVPCID	Invalidate page table entries
Halt	HLT	Halt CPU until interrupt

I/O Privilege Level (IOPL):

The IOPL field in RFLAGS (bits 12-13) provides additional control over I/O operations:

RFLAGS: [...] [IOPL (12-13)] [...]

If CPL ≤ IOPL:
    IN, OUT, INS, OUTS are allowed
    CLI, STI are allowed (interrupt flag control)
    
If CPL > IOPL:
    I/O ops check the I/O Permission Bitmap in TSS
    CLI, STI cause #GP exception

The I/O Permission Bitmap:

The TSS contains a bitmap where each bit controls access to one I/O port (65536 ports = 8192 bytes). Even in Ring 3, if the corresponding bit is 0, access is allowed:

struct tss_struct {
    // ...
    u16 io_map_base;  // Offset to I/O permission bitmap
    // The bitmap follows the TSS
    // Bit N = 0: Port N accessible from Ring 3
    // Bit N = 1: Port N requires Ring 0
};

Ring 0 Has Total Control

Ring 0 code can modify any CPU state, access any memory, disable all interrupts, and completely subvert the operating system. This is why kernel vulnerabilities are so severe—a bug in Ring 0 code cannot be contained by the ring system.

The Two-Ring Reality

Although x86 provides four rings, modern operating systems (Linux, Windows, macOS, BSD) use only Ring 0 and Ring 3. This simplification has both historical and practical reasons.

Why Not Use All Four Rings?

Unix Legacy: Unix was designed for the PDP-11, which had only two modes (kernel/user). When ported to x86, developers kept the two-mode model.
Portability: Other architectures (ARM, MIPS, SPARC) traditionally had only two privilege levels. Using only Ring 0/3 maximizes portability.
X86 Implementation Details: Many x86 features only distinguish "Ring 0" from "not Ring 0":
- SYSENTER/SYSEXIT assume Ring 0 ↔ Ring 3
- Most supervisor-mode checks are "CPL == 0" not "CPL ≤ N"
- Page table entries have a single U/S bit (User/Supervisor)
Diminishing Returns: Intermediate rings add complexity without proportional security benefits. If driver code in Ring 1 is compromised, it can often escalate to Ring 0 anyway.

Advantages of Two Rings

•Simpler mental model
•Portable across architectures
•Lower switching overhead
•Fewer privilege transitions
•Easier security auditing
•Well-understood attack surface

Disadvantages of Two Rings

•Drivers have full kernel privilege
•All kernel code equally trusted
•Driver bugs crash the system
•No isolation within kernel
•Large attack surface in Ring 0
•Harder to contain compromises

Alternative Approaches:

Some operating systems have attempted to isolate drivers from the kernel:

Approach	Example	How It Works
Microkernel	MINIX, seL4	Drivers run in Ring 3; kernel is minimal
User-mode drivers	Windows UMDF	Certain drivers run as user processes
Paravirtualization	Xen Ring 1 drivers	Drivers in Ring 1, kernel in Ring 0
Hardware isolation	SR-IOV	Hardware provides per-device isolation
IOMMU	Intel VT-d	Restricts device DMA to specific memory

These approaches provide driver isolation at the cost of complexity and performance.

Virtualization and Ring Compression

Virtualization introduces interesting complications to the ring model. When running a guest operating system, both the hypervisor and the guest kernel want to be in Ring 0—but there can be only one true Ring 0.

The Original Problem:

Without hardware virtualization support:

The hypervisor (VMM) must run in Ring 0 to control the CPU
The guest kernel thinks it's in Ring 0, but actually runs in Ring 1 or Ring 3
This leads to "ring compression" or "ring aliasing"

Software Virtualization Approaches:

Ring Compression (Guest kernel in Ring 1):

Hypervisor: Ring 0 (true Ring 0)
Guest Kernel: Ring 1 (thinks it's Ring 0)
Guest User: Ring 3 (true Ring 3)

Problem: Ring 1 lacks some Ring 0 features. Some privileged instructions must be emulated.

Ring Aliasing (Guest kernel in Ring 3):

Hypervisor: Ring 0 (true Ring 0)
Guest Kernel: Ring 3 (thinks it's Ring 0, heavily emulated)
Guest User: Ring 3 (same as guest kernel!)

Problem: Guest kernel cannot be isolated from guest user space.

Hardware Virtualization Solution:

Intel VT-x and AMD-V introduce a new layer below Ring 0:

┌─────────────────────────────────────┐
│         Guest Mode (VMX non-root)   │
│  ┌──────────────────────────────┐   │
│  │ Ring 0: Guest Kernel         │   │
│  │ Ring 3: Guest User Space     │   │
│  └──────────────────────────────┘   │
├─────────────────────────────────────┤
│         Host Mode (VMX root)        │
│  Ring 0: Hypervisor                 │
└─────────────────────────────────────┘

VMX Root Mode: The hypervisor runs here. It has true control. VMX Non-Root Mode: Guest OS runs here. Guest Ring 0 is real Ring 0 from the guest's perspective, but the hypervisor can intercept and emulate any operation.

Ring -1 Is a Misnomer

People sometimes call the hypervisor mode "Ring -1," but this is informal. The hypervisor runs in Ring 0 of VMX root mode. The guest kernel runs in Ring 0 of VMX non-root mode. They're both Ring 0, just in different CPU modes with different privileges.

Privilege Levels with Hardware Virtualization
Mode	Ring	What Runs Here	Power Level
VMX root	Ring 0	Hypervisor	Maximum (true supervisor)
VMX root	Ring 3	Hypervisor user tools	Limited (under hypervisor)
VMX non-root	Ring 0	Guest kernel	Controlled (can be intercepted)
VMX non-root	Ring 3	Guest applications	Minimal (under guest kernel)

ARM Exception Levels

ARM processors use a different privilege model called Exception Levels (EL). Unlike x86's four rings that are rarely used, ARM's exception levels are designed with virtualization and secure execution built-in.

ARM Exception Levels:

Level	Name	Typical Use	x86 Equivalent
EL0	User	Applications	Ring 3
EL1	Supervisor	OS Kernel	Ring 0
EL2	Hypervisor	Virtualization	VMX root Ring 0
EL3	Secure Monitor	TrustZone	(no direct equivalent)

Key Differences from x86:

Designed for virtualization: EL2 is specifically for hypervisors, not a repurposed ring
Security built-in: EL3 manages transitions between Normal and Secure worlds (TrustZone)
Clean separation: Each level has clearly defined responsibilities
Always used: Unlike x86 Ring 1/2, all ARM ELs serve a purpose

TrustZone: The Secure World:

ARM TrustZone creates two parallel "worlds"—Normal and Secure—that run side by side:

┌────────────────────────────────────────────────────┐
│                     EL3: Secure Monitor            │
│         (Manages transitions between worlds)       │
├─────────────────────────┬──────────────────────────┤
│     Normal World        │      Secure World        │
│                         │                          │
│  EL2: Hypervisor        │  EL2: (optional)         │
│  EL1: Linux Kernel      │  EL1: Secure Kernel      │
│  EL0: Applications      │  EL0: Trusted Apps       │
└─────────────────────────┴──────────────────────────┘

Secure World code has access to memory, keys, and hardware that Normal World cannot see. This enables:

Secure boot: EL3 validates the boot chain before passing control
Cryptographic key storage: Keys never leave the Secure World
DRM: Media decryption happens in Secure EL1, Normal World sees only rendered output
Biometric authentication: Fingerprint/face data processed in Secure World

Exception Level Transitions

ARM EL transitions are asymmetric: you can only increase EL through exceptions (interrupts, syscalls, faults), and can only decrease EL through explicit return instructions (ERET). This prevents unprivileged code from directly jumping to higher ELs.

Page Tables and Ring Protection

Protection rings interact with the paging system to provide memory protection. Page table entries contain bits that enforce ring-based access control on every memory access.

The User/Supervisor (U/S) Bit:

Each page table entry contains a U/S bit:

U/S = 1: Page is accessible from any ring (user page)
U/S = 0: Page is accessible only from Ring 0 (supervisor page)

Page Table Entry (64-bit):
┌────────────────────────────────────────────────────┐
│ Address (bits 12-51) │ Flags │ U/S │ R/W │ P │    │
└────────────────────────────────────────────────────┘
                            ↑
                    User/Supervisor bit

If U/S = 0 and CPL = 3:
    → Page Fault (#PF) on any access
    
If U/S = 1:
    → Accessible from Ring 0, 1, 2, or 3

SMEP, SMAP, and PKU:

Modern processors add extra protections beyond the basic U/S bit:

Supervisor Mode Execution Prevention (SMEP): Prevents Ring 0 from executing code in user pages. If an attacker controls user memory and tricks the kernel into jumping there, SMEP blocks execution.

CR4.SMEP = 1:
    If CPL = 0 and page has U/S = 1:
        Instruction fetch → #PF

Supervisor Mode Access Prevention (SMAP): Prevents Ring 0 from reading/writing user pages unless explicitly enabled. Protects against confused deputy attacks.

CR4.SMAP = 1:
    If CPL = 0 and page has U/S = 1 and AC flag = 0:
        Data access → #PF
        
Kernel must use STAC/CLAC instructions:
    STAC: Set AC flag (allow user access)
    CLAC: Clear AC flag (restore protection)

Protection Keys for Userspace (PKU): Allows user-space to define 16 protection domains for their own memory, with per-domain read/write restrictions. Enforced via the PKRU register.

Memory Protection Features by CPU Generation
Feature	Introduced	Protection Provided	Controlled By
U/S bit	i386 (1985)	User vs. supervisor pages	Page table entry
NX bit	Pentium 4/AMD K8	Non-executable pages	Page table entry
SMEP	Ivy Bridge (2012)	No user code exec in Ring 0	CR4 register
SMAP	Haswell (2013)	No user data access in Ring 0	CR4 + RFLAGS.AC
PKU	Skylake (2015)	16 user-space protection keys	PKRU register
CET	Tiger Lake (2020)	Shadow stack, IBT	CR4 + MSRs

Rings in Embedded and Real-Time Systems

Not all systems use protection rings the same way. Embedded and real-time environments often have different requirements that influence ring usage.

Embedded Systems:

Many embedded systems disable protection rings entirely or run everything in Ring 0:

┌─────────────────────────────────┐
│      Single Ring (Ring 0)       │
│  Application + RTOS + Drivers   │
│         All privileged          │
└─────────────────────────────────┘

Why?

Deterministic timing (no mode switch overhead)
Resource constraints (no MMU, limited memory)
Simplified development (no syscall interface needed)
Single-purpose systems (one trusted application)
Physical security (device is not multi-user)

Real-Time Systems:

Hard real-time systems may avoid ring transitions to guarantee timing:

Mode switches take time (hundreds of cycles)
TLB flushes may be required
Interrupt latency affected by mode
Worst-case execution time harder to analyze with rings

ARM Cortex-M: Two-Level Protection

ARM Cortex-M processors (common in microcontrollers) use a simpler two-level model:

Mode	Name	Purpose
Thread Mode	Unprivileged	Normal application code
Handler Mode	Privileged	Exception/interrupt handlers

The Memory Protection Unit (MPU) on Cortex-M provides region-based protection without a full MMU:

// Configure MPU region 0
MPU->RNR = 0;  // Select region 0
MPU->RBAR = 0x20000000;  // Base address (SRAM)
MPU->RASR = 
    (0x13 << 1) |  // Size = 1MB
    (0x3 << 24) |  // Full access
    (1 << 0);      // Enable region

This provides isolation without the overhead of full paging and ring transitions.

Security Trade-offs in Embedded

Running without protection rings means any bug can crash the system and any vulnerability grants full control. For connected IoT devices, this is increasingly problematic. Newer embedded architectures (ARMv8-M TrustZone-M, RISC-V PMP) add protection features specifically for embedded use cases.

Summary: Protection Rings

We've explored the hierarchical protection model that underpins modern processor security. Let's consolidate the key insights:

Key Takeaways

•Protection rings create hierarchical trust levels — Ring 0 (innermost) has maximum privilege; Ring 3 (outermost) has minimum privilege
•Hardware enforces ring boundaries — CPL, DPL, and segment descriptors control what code can access what resources
•Ring 0 has exclusive capabilities — Privileged instructions, I/O control, and system configuration are Ring 0 only
•Most systems use only two rings — Ring 0 for kernel, Ring 3 for user; Ring 1/2 are unused for historical and practical reasons
•Virtualization adds complexity — Hardware virtualization (VT-x) creates a layer below Ring 0 for hypervisors
•ARM uses Exception Levels — EL0-EL3 provide clean separation for user, kernel, hypervisor, and secure monitor
•Page tables enforce per-page protection — U/S bit, SMEP, SMAP, and PKU add fine-grained memory access control
•Embedded systems may skip rings — Real-time and resource-constrained systems often run everything in privileged mode

What's Next:

Protection rings provide the mechanism for privilege separation, but they don't tell us how much privilege code should have. The next page explores the Principle of Least Privilege—the fundamental security guideline that code should have only the minimum rights necessary for its task. This principle guides the design of secure systems and the proper use of protection domains.

Page Complete

You now understand how protection rings provide hierarchical privilege separation enforced by hardware. This knowledge is essential for understanding kernel architecture, system call implementation, and privilege escalation attacks.