Operating SystemsMemory-Mapped I/O

Memory-Mapped I/O

LevelIntermediate

Duration90 mins

TopicMemory-Mapped I/O

4 / 5

Comparison

The Great I/O Debate

With deep understanding of both Port-Mapped I/O (PMIO) and Memory-Mapped I/O (MMIO) established, we can now engage in meaningful comparative analysis. This isn't merely an academic exercise—choosing the appropriate I/O paradigm affects hardware design, software architecture, performance, and long-term system maintainability.

The choice between PMIO and MMIO involves trade-offs across multiple dimensions: instruction set requirements, address space consumption, performance characteristics, caching behavior, protection mechanisms, portability, and legacy compatibility. No single paradigm is universally superior; instead, each excels in specific contexts.

This page synthesizes everything we've learned into a comprehensive comparison, providing the analytical framework needed to make informed I/O architecture decisions.

Learning Objectives

By the end of this page, you will understand: (1) The fundamental architectural differences between PMIO and MMIO, (2) Performance characteristics and when each paradigm excels, (3) Protection and security implications of each approach, (4) Historical context influencing modern prevalence, (5) Hybrid systems that use both paradigms, and (6) Decision frameworks for selecting the appropriate paradigm.

Fundamental Architectural Differences

At their core, PMIO and MMIO represent fundamentally different philosophies about how CPUs should communicate with peripheral devices. These philosophical differences manifest in concrete architectural distinctions.

Address Space Philosophy

PMIO Philosophy: "Devices are fundamentally different from memory and deserve their own address realm." The CPU maintains two parallel address spaces—one for memory, one for I/O.
MMIO Philosophy: "Everything is memory. Devices are just memory-mapped resources." A single unified address space encompasses both RAM and device registers.

Instruction Set Implications

This philosophical difference propagates into instruction set design:

Instruction Set Comparison for I/O Access
Aspect	Port-Mapped I/O	Memory-Mapped I/O
Instructions Required	Dedicated IN/OUT instructions	Standard LOAD/STORE instructions
Addressing Modes	Direct/Indirect port number only	All memory addressing modes (base+offset, indexed, etc.)
Register Constraints	Must use specific registers (AL/AX/EAX for data, DX for indirect)	Any general-purpose register
Block Transfer	REP INS/OUTS (limited)	Standard memcpy, SIMD instructions
Compiler Support	Requires inline assembly or intrinsics	Native pointer operations
Language Portability	Non-portable (architecture-specific)	Portable across MMIO architectures

Hardware Signal Differentiation

PMIO: Requires dedicated control signal (M/IO# on Intel) to distinguish I/O cycles from memory cycles. Address decoders must monitor this signal.
MMIO: Uses only address bits for routing. No special signals needed—standard memory transaction signals suffice.

Address Space Size

Paradigm	Typical Address Space	Modern System Perspective
PMIO (x86)	64 KB (16-bit ports)	Severely limited, mostly legacy-claimed
MMIO (64-bit)	Effectively unlimited	Terabytes available for devices

The 64 KB PMIO limit forces devices with large register sets to use sliding window techniques or switch to MMIO for additional registers.

Converting Mermaid diagram...

Performance Characteristics

Performance differences between PMIO and MMIO are substantial on modern processors, though the gap was smaller on older systems. Understanding these characteristics is essential for performance-critical device drivers.

Instruction Latency

On modern x86 processors, IN/OUT instructions are significantly slower than memory operations:

Approximate Cycle Counts (Modern Intel x86-64)
Operation	Typical Cycles	Notes
IN/OUT (8/16/32-bit)	~20-50 cycles	Serializing, non-pipelined
INS/OUTS (string)	~10-30 cycles per element	Better for bulk but still costly
MMIO read (UC)	~100-300 cycles	Crossing domain to device, but pipelining possible
MMIO write (UC)	~10-50 cycles	Posted write, may not wait for completion
MMIO write (WC)	~10-50 cycles combined	Multiple writes combined, high throughput
Cached memory access	~1-4 cycles	For comparison (L1 cache hit)

Why Port I/O is Slow

IN/OUT instructions are serializing—the processor completes all pending operations before executing them and doesn't pipeline subsequent instructions until completion. This serialization prevents out-of-order execution benefits and stalls the pipeline.

Historically, this made sense: serial access guaranteed writes reached slow devices before reads returned status. But modern devices are fast, and this serialization becomes pure overhead.

Why MMIO Can Be Faster

Memory-mapped writes can be posted—the CPU hands off the write to the memory controller and continues without waiting for device acknowledgment. This works because:

The memory controller buffers posted writes
Write ordering is maintained within a single device (or can be explicitly enforced)
Only reads or explicit barriers need to synchronize

For burst transfers, MMIO with write-combining memory type achieves dramatically higher throughput:

Port I/O bulk: ~30-40 MB/s (limited by REP OUTS overhead)
MMIO with WC: Hundreds of MB/s to GB/s (bus bandwidth limited)

This performance gap is why modern high-speed devices (NVMe, GPU, 10+ Gbps NICs) exclusively use MMIO.

performance_comparison.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
/*
 * Performance Comparison: Port I/O vs Memory-Mapped I/O
 * 
 * This example demonstrates the throughput difference when
 * writing data to a device using both paradigms.
 */
 
#include <linux/io.h>
#include <linux/time.h>
 
/* Hypothetical device with both port and MMIO access */
#define DEVICE_DATA_PORT    0x3F0     /* 8-bit data port */
#define DEVICE_MMIO_SIZE    0x10000   /* 64 KB MMIO region */
 
/*
 * TEST 1: Write 64 KB using Port I/O (REP OUTSB)
 * 
 * This is the fastest possible port I/O method on x86.
 */
static void benchmark_port_io(const uint8_t *buffer, size_t size)
{
    ktime_t start, end;
    s64 elapsed_ns;
    
    start = ktime_get();
    
    /* REP OUTSB - fastest port string output */
    asm volatile(
        "rep outsb"
        : "+S"(buffer), "+c"(size)
        : "d"(DEVICE_DATA_PORT)
    );
    
    end = ktime_get();
    elapsed_ns = ktime_to_ns(ktime_sub(end, start));
    
    pr_info("Port I/O: %zu bytes in %lld ns (%.2f MB/s)
",
            size, elapsed_ns,
            (double)(size * 1000) / elapsed_ns);
}
 
/*
 * TEST 2: Write 64 KB using MMIO with write-combining
 * 
 * Much faster due to posted writes and write combining.
 */
static void benchmark_mmio_wc(void __iomem *dest, const uint8_t *buffer, size_t size)
{
    ktime_t start, end;
    s64 elapsed_ns;
    
    start = ktime_get();
    
    /* memcpy_toio handles alignment and proper MMIO semantics */
    memcpy_toio(dest, buffer, size);
    
    /* Ensure all writes complete before measuring time */
    wmb();
    
    end = ktime_get();
    elapsed_ns = ktime_to_ns(ktime_sub(end, start));
    
    pr_info("MMIO (WC): %zu bytes in %lld ns (%.2f MB/s)
",
            size, elapsed_ns,
            (double)(size * 1000) / elapsed_ns);
}
 
/*
 * TEST 3: Single register access comparison
 * 
 * Latency for individual accesses.
 */
static void benchmark_single_access(void __iomem *mmio_base)
{
    ktime_t start, end;
    volatile uint32_t val;
    int i;
    const int iterations = 10000;
    
    /* Benchmark port I/O read */
    start = ktime_get();
    for (i = 0; i < iterations; i++) {
        asm volatile("inl %1, %0" : "=a"(val) : "Nd"((uint16_t)0x3F0));
    }
    end = ktime_get();
    pr_info("Port IN (x%d): %lld ns avg
",
            iterations, ktime_to_ns(ktime_sub(end, start)) / iterations);
    
    /* Benchmark MMIO read */
    start = ktime_get();
    for (i = 0; i < iterations; i++) {
        val = readl(mmio_base);
    }
    end = ktime_get();
    pr_info("MMIO readl (x%d): %lld ns avg
",
            iterations, ktime_to_ns(ktime_sub(end, start)) / iterations);
}
 
/*
 * Typical Results (example, varies by hardware):
 * 
 * Port I/O: 65536 bytes in 1,500,000 ns (43.7 MB/s)
 * MMIO (WC): 65536 bytes in 45,000 ns (1456.4 MB/s)
 * 
 * Port IN (x10000): 250 ns avg
 * MMIO readl (x10000): 180 ns avg
 * 
 * For bulk transfers, MMIO is 30-40x faster!
 * For individual accesses, MMIO has slight latency advantage.
 */

The Takeaway for Driver Writers

Use MMIO whenever possible for performance-sensitive paths. Reserve port I/O for legacy compatibility only. Modern devices designed without legacy constraints should exclusively use MMIO, especially for data-intensive operations.

Protection and Security

Both I/O paradigms offer protection mechanisms to prevent unauthorized device access, but the approaches differ significantly in granularity and integration with existing security infrastructure.

Port I/O Protection

x86 provides two complementary mechanisms:

IOPL (I/O Privilege Level): A 2-bit field in EFLAGS controlling port access by privilege ring:
- IOPL=0 (default): Only ring 0 (kernel) can execute IN/OUT
- IOPL=3: All rings can access ports (dangerous, rarely used)
I/O Permission Bitmap (IOPB): Per-task bitmap in TSS enabling selective port access:
- Each bit corresponds to one port
- Bit=0: Access permitted regardless of CPL
- Bit=1: Access requires CPL ≤ IOPL

Advantages of IOPL/IOPB:

Simple binary allow/deny decision
Fast hardware check (built into instruction execution)
No memory protection table complexity

Disadvantages:

Per-port granularity only (8192 bytes for full bitmap)
Switching IOPB adds context switch overhead
Not integrated with memory protection infrastructure

Protection Mechanism Comparison
Aspect	Port I/O (PMIO)	Memory-Mapped I/O (MMIO)
Primary Mechanism	IOPL + IOPB bitmap	Page tables + IOMMU
Granularity	Per-port (byte)	Per-page (4 KB minimum)
User-Space Access	Via ioperm()/iopl() syscalls	Via mmap() on /dev/mem or device files
Virtualization	VM must trap IN/OUT	EPT/NPT nested page tables
DMA Protection	Not applicable	IOMMU provides isolation
Integration	Separate mechanism	Uses existing memory protection

MMIO Protection

MMIO leverages the processor's memory protection infrastructure:

Page Table Permissions: MMIO pages can be marked as:
- Kernel-only (user access faults)
- Read-only (write attempts fault)
- No-execute (code execution faults)
Virtual Address Isolation: Each process has its own page tables. MMIO isn't mapped into user-space unless explicitly granted.
IOMMU for DMA: Modern systems use IOMMUs to control device DMA:
- Devices can only access memory regions explicitly mapped
- Prevents DMA attacks where devices overwrite arbitrary memory
- Enables secure VM device passthrough (VFIO)

Advantages of Page-Table Protection:

Integrates with existing protection infrastructure
Combined read/write/execute permissions
Works with virtual memory (MMIO in process space)
IOMMU provides DMA protection

Disadvantages:

4 KB granularity (may expose more than needed)
Requires TLB entries
More complex setup (page table manipulation)

DMA Attack Vector

Port I/O protection doesn't prevent DMA attacks. A malicious device can DMA directly to memory, bypassing CPU protection. IOMMU (used with MMIO paradigm) is essential for DMA isolation—a critical security consideration in modern systems with untrusted PCIe devices (like Thunderbolt peripherals).

Portability and Language Support

Software portability is a crucial consideration for code that must run across different architectures or be maintained for extended periods.

Port I/O Portability Challenges

Port I/O is fundamentally architecture-specific:

Instruction Set Dependency: IN/OUT instructions exist only on x86 (and a few other architectures). ARM, RISC-V, MIPS, PowerPC—none have native port I/O instructions.
Compiler Support: Standard C has no port I/O primitives. Access requires:
- Inline assembly (non-portable)
- Compiler intrinsics like __inb()/__outb() (compiler-specific)
- Operating system wrappers (OS-specific)
Cross-Platform Code: Drivers using PMIO cannot be ported to non-x86 architectures without complete rewrite of I/O access layer.

portability_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
/*
 * Portability Comparison: PMIO vs MMIO
 * 
 * This example demonstrates the portability differences
 * between port I/O and memory-mapped I/O in C code.
 */
 
/* ============================================================
 * PORT I/O: Architecture-Specific, Requires Inline Assembly
 * ============================================================ */
 
/* x86-specific inline functions */
#if defined(__x86_64__) || defined(__i386__)
 
static inline uint8_t inb(uint16_t port)
{
    uint8_t value;
    asm volatile("inb %1, %0" : "=a"(value) : "Nd"(port));
    return value;
}
 
static inline void outb(uint16_t port, uint8_t value)
{
    asm volatile("outb %0, %1" : : "a"(value), "Nd"(port));
}
 
#else
/* Non-x86 architectures: PMIO not available */
#error "Port I/O not supported on this architecture"
#endif
 
/* Device driver using PMIO - completely non-portable */
void legacy_device_write_pmio(uint16_t base_port, uint8_t data)
{
    outb(base_port + 0, 0x01);    /* Select register */
    outb(base_port + 1, data);    /* Write data */
}
 
/* ============================================================
 * MEMORY-MAPPED I/O: Portable Across Architectures
 * ============================================================ */
 
#include <stdint.h>
 
/*
 * Portable MMIO accessor (simplified for illustration)
 * Real implementations include memory barriers and volatile.
 */
static inline void mmio_write32(volatile uint32_t *addr, uint32_t value)
{
    *addr = value;
    /* Barrier implied by volatile in most cases,
       but real code should use explicit barriers */
}
 
static inline uint32_t mmio_read32(volatile uint32_t *addr)
{
    return *addr;
}
 
/* Device driver using MMIO - portable to any MMIO architecture */
void modern_device_write_mmio(volatile uint32_t *regs, uint32_t data)
{
    mmio_write32(&regs[0], 0x01);    /* Select register */
    mmio_write32(&regs[1], data);    /* Write data */
}
 
/*
 * This MMIO-based driver works on:
 * - x86/x64
 * - ARM/ARM64
 * - RISC-V
 * - MIPS
 * - PowerPC
 * - And any other architecture with memory-mapped devices
 */
 
/* ============================================================
 * LINUX KERNEL APPROACH: Abstraction Layer
 * ============================================================ */
 
#ifdef __KERNEL__
#include <linux/io.h>
 
/*
 * Linux provides portable accessor macros that handle
 * architecture-specific details internally.
 */
void linux_driver_example(void __iomem *base)
{
    /* These work on ALL Linux-supported architectures */
    
    /* Write 32-bit value */
    writel(0x12345678, base + 0x00);
    
    /* Read 32-bit value */
    uint32_t status = readl(base + 0x04);
    
    /* Block copy to device */
    memcpy_toio(base + 0x100, buffer, size);
    
    /* 
     * Architecture-specific behavior hidden:
     * - ARM: Includes barriers for weak ordering
     * - x86: Direct memory access
     * - All: Proper volatile semantics
     */
}
 
/*
 * For PMIO on x86, Linux provides wrappers that are
 * conditionally compiled only for x86:
 */
#ifdef CONFIG_X86
void linux_pmio_example(unsigned long port)
{
    uint8_t val = inb(port);
    outb(0xFF, port);
}
#endif
 
#endif /* __KERNEL__ */

MMIO Portability Benefits

Memory-mapped I/O is supported by virtually all modern processor architectures:

Standard C Pointers: MMIO addresses are simply memory addresses. Standard C pointer operations work.
Cross-Architecture: The same logical code (write to address X) works on x86, ARM, RISC-V, etc.
OS Abstraction: Operating systems provide portable accessor macros (Linux readl/writel, Windows READ_REGISTER_ULONG).

The Dominance of MMIO

This portability advantage, combined with performance benefits, explains why:

All modern processor architectures (ARM, RISC-V, etc.) use MMIO exclusively
New device hardware standards (NVMe, USB-C, Thunderbolt) mandate MMIO
Cloud/server environments with heterogeneous hardware prefer common code paths
x86 PMIO survives only for legacy compatibility

Future-Proofing Device Drivers

When writing new device drivers, always prefer MMIO over PMIO unless the hardware specifically requires port I/O. This ensures code can be ported to non-x86 platforms (ARM servers, embedded systems, new architectures) without fundamental rewrites.

Hybrid Systems and Real-World Usage

Real-world x86 systems don't use PMIO or MMIO exclusively—they use both in a complementary fashion. Understanding this hybrid reality is essential for practical systems work.

Why Hybrids Exist

Legacy compatibility drives hybrid usage:

1981: IBM PC establishes port-mapped interfaces for keyboard, timer, interrupt controllers
1984-2000: New devices add both PMIO (compatibility) and MMIO (performance)
2000-present: New standards prefer MMIO, but legacy devices remain

Modern Hybrid Example: PCI/PCIe Configuration

PCI configuration space itself shows the evolution:

Legacy PCI: Configuration accessed via ports 0xCF8/0xCFC (PMIO)
PCIe ECAM: Configuration memory-mapped to physical address region (MMIO)
Reality: Systems support both, with ECAM preferred but port fallback available

Common Devices: I/O Paradigm Usage
Device Category	PMIO Usage	MMIO Usage	Trend
PIC (8259)	Ports 0x20-0x21, 0xA0-0xA1	None (pure PMIO)	Legacy only
APIC	None	0xFEE00000 region	MMIO only
PIT (8254)	Ports 0x40-0x43	None (pure PMIO)	Legacy only
HPET	None (optional port)	0xFED00000 region	MMIO preferred
PS/2 Keyboard	Ports 0x60/0x64	None	Legacy only
USB Controller	None (EHCI+)	Large BAR regions	MMIO only
SATA Controller	Ports (compatibility mode)	ABAR region (AHCI)	MMIO preferred
NVMe Controller	None	All MMIO-based	MMIO only
GPU	VGA legacy ports	Multiple large BARs	MMIO dominant
Network Card (modern)	None	Full MMIO	MMIO only
Serial Port (legacy)	Ports 0x3F8/0x2F8	None	Legacy only
Serial Port (modern)	None	MMIO-based UART	MMIO only

Case Study: AHCI SATA Controller

AHCI (Advanced Host Controller Interface) demonstrates the hybrid transition elegantly:

Legacy Mode: Emulates IDE controller using traditional ports (0x1F0-0x1F7, 0x3F6)
Native AHCI Mode: All operations through MMIO (the ABAR - AHCI Base Address Register)

Performance in native AHCI mode is significantly better:

Command submission is a single MMIO doorbell write
Status is memory-mapped (no port polling)
DMA descriptors are in memory, pointed to by MMIO registers

Operating systems enable AHCI mode during boot, but BIOS uses IDE mode for compatibility with older boot code.

Categorizing Device I/O Today

Legacy-Only PMIO: PIC, PIT, PS/2, legacy serial—maintained for backward compatibility
Hybrid (Transitional): VGA, SATA IDE mode—support both, MMIO preferred
MMIO-Only: NVMe, USB, PCIe, modern NICs—no port I/O interface at all

ahci_hybrid_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
/*
 * AHCI: The Hybrid Transition Example
 * 
 * This illustrates how AHCI supports both legacy PMIO (IDE mode)
 * and modern MMIO (native mode) for the same SATA controller.
 */
 
#include <linux/io.h>
#include <linux/pci.h>
 
/* Legacy IDE Ports (PMIO) */
#define IDE_PRIMARY_DATA    0x1F0
#define IDE_PRIMARY_STATUS  0x1F7
#define IDE_PRIMARY_CONTROL 0x3F6
 
/* AHCI MMIO Registers (offsets from ABAR) */
#define AHCI_HOST_CAP       0x00  /* Host Capabilities */
#define AHCI_GHC            0x04  /* Global Host Control */
#define AHCI_IS             0x08  /* Interrupt Status */
#define AHCI_PI             0x0C  /* Ports Implemented */
#define AHCI_PORT_BASE      0x100 /* Port 0 registers start */
#define AHCI_PORT_SIZE      0x80  /* Size of each port's register block */
 
/*
 * Legacy IDE Mode Access (Port I/O - Slow, Limited)
 * 
 * This is how DOS and early OSes accessed disks.
 * Limited to 1 command outstanding, PIO-based transfers.
 */
uint8_t ide_read_status_legacy(void)
{
    return inb(IDE_PRIMARY_STATUS);
}
 
void ide_write_sector_legacy(const uint16_t *buffer)
{
    int i;
    /* Wait for controller ready */
    while (inb(IDE_PRIMARY_STATUS) & 0x80)
        ;  /* Spin until BSY clears */
    
    /* Write 256 words (512 bytes) one at a time */
    for (i = 0; i < 256; i++) {
        outw(IDE_PRIMARY_DATA, buffer[i]);
    }
}
 
/*
 * Native AHCI Mode Access (MMIO - Fast, Full-Featured)
 * 
 * Modern approach with command queuing, NCQ, etc.
 */
struct ahci_controller {
    void __iomem *abar;        /* AHCI Base Address (from BAR[5]) */
    struct ahci_port *ports;   /* Per-port structures */
};
 
void ahci_init(struct ahci_controller *ahci, struct pci_dev *pdev)
{
    /* Get ABAR from PCI BAR 5 */
    ahci->abar = pci_iomap(pdev, 5, 0);
    
    /* Read capabilities via MMIO */
    uint32_t cap = readl(ahci->abar + AHCI_HOST_CAP);
    int max_ports = (cap & 0x1F) + 1;
    int supports_ncq = (cap >> 30) & 1;
    
    /* Enable AHCI mode */
    uint32_t ghc = readl(ahci->abar + AHCI_GHC);
    ghc |= (1 << 31);  /* AHCI Enable bit */
    writel(ghc, ahci->abar + AHCI_GHC);
    
    pr_info("AHCI: %d ports, NCQ=%s
", max_ports, 
            supports_ncq ? "yes" : "no");
}
 
/*
 * Submit a command using AHCI (MMIO-based)
 * 
 * By contrast to IDE, this supports:
 * - 32 commands queued per port (NCQ)
 * - DMA transfers (no PIO overhead)
 * - Single doorbell write to submit
 */
void ahci_submit_command(struct ahci_controller *ahci, int port, int slot)
{
    void __iomem *port_regs = ahci->abar + AHCI_PORT_BASE + (port * AHCI_PORT_SIZE);
    
    /* Command already prepared in memory (command list, FIS, PRD table) */
    
    /* Single MMIO write submits the command! */
    writel(1 << slot, port_regs + 0x38);  /* CI (Command Issue) register */
    
    /* Controller now DMA's the command, executes it, DMAs data,
       and posts completion in memory. No further CPU intervention! */
}
 
/*
 * Performance Comparison Summary:
 * 
 * IDE (PMIO):
 * - ~10 MB/s max (PIO bottleneck)
 * - 1 command at a time
 * - CPU involved in every word transfer
 * 
 * AHCI (MMIO):
 * - 600+ MB/s (SATA III speeds)
 * - 32 commands queued (NCQ)
 * - DMA handles all data movement
 */

Decision Framework: Choosing Between PMIO and MMIO

When designing new hardware or writing device drivers, choosing the I/O paradigm requires systematic evaluation. Here's a comprehensive decision framework.

Primary Decision Factors

When to Use Port-Mapped I/O

•Legacy Hardware Compatibility: Device must be compatible with legacy BIOS, DOS, or old operating systems that expect port access
•x86-Only Target: Hardware will only ever be used with x86 processors (increasingly rare assumption)
•Minimal Register Count: Device has few registers (< 256 bytes), making port space consumption acceptable
•Explicit Serialization Needed: Device protocol requires that each access be serialized (rare, usually MMIO with barriers is preferred)
•BIOS/Firmware Requirements: System initialization code uses ports (e.g., POST codes to port 0x80)

When to Use Memory-Mapped I/O (Strongly Preferred)

•Performance-Critical Paths: High-throughput data transfers, minimal latency requirements
•Large Register Spaces: Device has substantial register or buffer space (> 1 KB)
•Cross-Platform Intent: Hardware may be used with ARM, RISC-V, or non-x86 processors
•Modern OS Targeting: Operating system is Linux, Windows Vista+, macOS, or other modern OS
•DMA Integration: Device uses DMA and benefits from unified address space
•Virtualization Compatibility: Device may be passed through to virtual machines
•Complex Access Patterns: Driver benefits from indexed or offset addressing modes

Quick Reference Decision Matrix
Criterion	Choose PMIO	Choose MMIO
Target Architecture	x86 only, legacy	Any modern architecture
Register Count	< 256 bytes	Any size
Data Transfer Rate	< 1 MB/s	Any speed
Legacy Boot Required	Yes	No
Driver Portability	Not required	Important
VM Passthrough	Not planned	May be used
New Design (2020+)	Almost never	Always

The Modern Default

For any new hardware design or driver written today, MMIO should be the default choice. Use PMIO only when explicitly required for legacy compatibility—and even then, consider providing an MMIO alternative for modern systems.

Summary: PMIO vs MMIO Comparison

This page has provided a comprehensive comparative analysis of Port-Mapped I/O and Memory-Mapped I/O. Let's consolidate the key insights:

Port-Mapped I/O

•Separate address space (64 KB on x86)
•Dedicated IN/OUT instructions
•x86-specific (not portable)
•Slower (~20-50 cycles per access)
•IOPL/IOPB protection
•Limited bulk transfer capability
•Legacy dominance

Memory-Mapped I/O

•Unified address space (terabytes available)
•Standard LOAD/STORE instructions
•Cross-platform (ARM, RISC-V, etc.)
•Faster (posted writes, WC bursts)
•Page table + IOMMU protection
•Full memcpy/DMA support
•Modern standard

Essential Takeaways

•MMIO is the modern default: New hardware and drivers should use MMIO unless legacy compatibility specifically requires PMIO.
•Performance favors MMIO: Posted writes, write-combining, and efficient burst transfers make MMIO dramatically faster for high-throughput scenarios.
•Portability favors MMIO: Only x86 supports PMIO; MMIO works across all modern architectures.
•Real systems are hybrid: x86 PCs use both—legacy devices via PMIO, modern devices via MMIO.
•Security differs: PMIO uses IOPL/IOPB; MMIO uses page tables and IOMMU for comprehensive DMA protection.
•Understand both: Even if you only write MMIO drivers, understanding PMIO is essential for diagnosing legacy device issues.

Looking Ahead

With the complete picture of I/O addressing paradigms, we're prepared to examine the hardware support mechanisms that make efficient I/O possible—the chipset features, bus architectures, and controller capabilities that bring these paradigms to life.

Page Complete

You now possess the analytical framework to evaluate I/O addressing choices for any system or device. This comparative understanding enables informed architectural decisions and effective debugging across the full spectrum of x86 I/O interfaces.

4 / 5

Loading learning content...

Operating SystemsMemory-Mapped I/O

Memory-Mapped I/O

LevelIntermediate

Duration90 mins

TopicMemory-Mapped I/O

4 / 5

Comparison

The Great I/O Debate

This page synthesizes everything we've learned into a comprehensive comparison, providing the analytical framework needed to make informed I/O architecture decisions.

Learning Objectives

Fundamental Architectural Differences

Address Space Philosophy

PMIO Philosophy: "Devices are fundamentally different from memory and deserve their own address realm." The CPU maintains two parallel address spaces—one for memory, one for I/O.
MMIO Philosophy: "Everything is memory. Devices are just memory-mapped resources." A single unified address space encompasses both RAM and device registers.

Instruction Set Implications

This philosophical difference propagates into instruction set design:

Instruction Set Comparison for I/O Access
Aspect	Port-Mapped I/O	Memory-Mapped I/O
Instructions Required	Dedicated IN/OUT instructions	Standard LOAD/STORE instructions
Addressing Modes	Direct/Indirect port number only	All memory addressing modes (base+offset, indexed, etc.)
Register Constraints	Must use specific registers (AL/AX/EAX for data, DX for indirect)	Any general-purpose register
Block Transfer	REP INS/OUTS (limited)	Standard memcpy, SIMD instructions
Compiler Support	Requires inline assembly or intrinsics	Native pointer operations
Language Portability	Non-portable (architecture-specific)	Portable across MMIO architectures

Hardware Signal Differentiation

PMIO: Requires dedicated control signal (M/IO# on Intel) to distinguish I/O cycles from memory cycles. Address decoders must monitor this signal.
MMIO: Uses only address bits for routing. No special signals needed—standard memory transaction signals suffice.

Address Space Size

Paradigm	Typical Address Space	Modern System Perspective
PMIO (x86)	64 KB (16-bit ports)	Severely limited, mostly legacy-claimed
MMIO (64-bit)	Effectively unlimited	Terabytes available for devices

The 64 KB PMIO limit forces devices with large register sets to use sliding window techniques or switch to MMIO for additional registers.

Converting Mermaid diagram...

Performance Characteristics

Instruction Latency

On modern x86 processors, IN/OUT instructions are significantly slower than memory operations:

Approximate Cycle Counts (Modern Intel x86-64)
Operation	Typical Cycles	Notes
IN/OUT (8/16/32-bit)	~20-50 cycles	Serializing, non-pipelined
INS/OUTS (string)	~10-30 cycles per element	Better for bulk but still costly
MMIO read (UC)	~100-300 cycles	Crossing domain to device, but pipelining possible
MMIO write (UC)	~10-50 cycles	Posted write, may not wait for completion
MMIO write (WC)	~10-50 cycles combined	Multiple writes combined, high throughput
Cached memory access	~1-4 cycles	For comparison (L1 cache hit)

Why Port I/O is Slow

Historically, this made sense: serial access guaranteed writes reached slow devices before reads returned status. But modern devices are fast, and this serialization becomes pure overhead.

Why MMIO Can Be Faster

Memory-mapped writes can be posted—the CPU hands off the write to the memory controller and continues without waiting for device acknowledgment. This works because:

The memory controller buffers posted writes
Write ordering is maintained within a single device (or can be explicitly enforced)
Only reads or explicit barriers need to synchronize

For burst transfers, MMIO with write-combining memory type achieves dramatically higher throughput:

Port I/O bulk: ~30-40 MB/s (limited by REP OUTS overhead)
MMIO with WC: Hundreds of MB/s to GB/s (bus bandwidth limited)

This performance gap is why modern high-speed devices (NVMe, GPU, 10+ Gbps NICs) exclusively use MMIO.

performance_comparison.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
/*
 * Performance Comparison: Port I/O vs Memory-Mapped I/O
 * 
 * This example demonstrates the throughput difference when
 * writing data to a device using both paradigms.
 */
 
#include <linux/io.h>
#include <linux/time.h>
 
/* Hypothetical device with both port and MMIO access */
#define DEVICE_DATA_PORT    0x3F0     /* 8-bit data port */
#define DEVICE_MMIO_SIZE    0x10000   /* 64 KB MMIO region */
 
/*
 * TEST 1: Write 64 KB using Port I/O (REP OUTSB)
 * 
 * This is the fastest possible port I/O method on x86.
 */
static void benchmark_port_io(const uint8_t *buffer, size_t size)
{
    ktime_t start, end;
    s64 elapsed_ns;
    
    start = ktime_get();
    
    /* REP OUTSB - fastest port string output */
    asm volatile(
        "rep outsb"
        : "+S"(buffer), "+c"(size)
        : "d"(DEVICE_DATA_PORT)
    );
    
    end = ktime_get();
    elapsed_ns = ktime_to_ns(ktime_sub(end, start));
    
    pr_info("Port I/O: %zu bytes in %lld ns (%.2f MB/s)
",
            size, elapsed_ns,
            (double)(size * 1000) / elapsed_ns);
}
 
/*
 * TEST 2: Write 64 KB using MMIO with write-combining
 * 
 * Much faster due to posted writes and write combining.
 */
static void benchmark_mmio_wc(void __iomem *dest, const uint8_t *buffer, size_t size)
{
    ktime_t start, end;
    s64 elapsed_ns;
    
    start = ktime_get();
    
    /* memcpy_toio handles alignment and proper MMIO semantics */
    memcpy_toio(dest, buffer, size);
    
    /* Ensure all writes complete before measuring time */
    wmb();
    
    end = ktime_get();
    elapsed_ns = ktime_to_ns(ktime_sub(end, start));
    
    pr_info("MMIO (WC): %zu bytes in %lld ns (%.2f MB/s)
",
            size, elapsed_ns,
            (double)(size * 1000) / elapsed_ns);
}
 
/*
 * TEST 3: Single register access comparison
 * 
 * Latency for individual accesses.
 */
static void benchmark_single_access(void __iomem *mmio_base)
{
    ktime_t start, end;
    volatile uint32_t val;
    int i;
    const int iterations = 10000;
    
    /* Benchmark port I/O read */
    start = ktime_get();
    for (i = 0; i < iterations; i++) {
        asm volatile("inl %1, %0" : "=a"(val) : "Nd"((uint16_t)0x3F0));
    }
    end = ktime_get();
    pr_info("Port IN (x%d): %lld ns avg
",
            iterations, ktime_to_ns(ktime_sub(end, start)) / iterations);
    
    /* Benchmark MMIO read */
    start = ktime_get();
    for (i = 0; i < iterations; i++) {
        val = readl(mmio_base);
    }
    end = ktime_get();
    pr_info("MMIO readl (x%d): %lld ns avg
",
            iterations, ktime_to_ns(ktime_sub(end, start)) / iterations);
}
 
/*
 * Typical Results (example, varies by hardware):
 * 
 * Port I/O: 65536 bytes in 1,500,000 ns (43.7 MB/s)
 * MMIO (WC): 65536 bytes in 45,000 ns (1456.4 MB/s)
 * 
 * Port IN (x10000): 250 ns avg
 * MMIO readl (x10000): 180 ns avg
 * 
 * For bulk transfers, MMIO is 30-40x faster!
 * For individual accesses, MMIO has slight latency advantage.
 */

The Takeaway for Driver Writers

Protection and Security

Both I/O paradigms offer protection mechanisms to prevent unauthorized device access, but the approaches differ significantly in granularity and integration with existing security infrastructure.

Port I/O Protection

x86 provides two complementary mechanisms:

IOPL (I/O Privilege Level): A 2-bit field in EFLAGS controlling port access by privilege ring:
- IOPL=0 (default): Only ring 0 (kernel) can execute IN/OUT
- IOPL=3: All rings can access ports (dangerous, rarely used)
I/O Permission Bitmap (IOPB): Per-task bitmap in TSS enabling selective port access:
- Each bit corresponds to one port
- Bit=0: Access permitted regardless of CPL
- Bit=1: Access requires CPL ≤ IOPL

Advantages of IOPL/IOPB:

Simple binary allow/deny decision
Fast hardware check (built into instruction execution)
No memory protection table complexity

Disadvantages:

Per-port granularity only (8192 bytes for full bitmap)
Switching IOPB adds context switch overhead
Not integrated with memory protection infrastructure

Protection Mechanism Comparison
Aspect	Port I/O (PMIO)	Memory-Mapped I/O (MMIO)
Primary Mechanism	IOPL + IOPB bitmap	Page tables + IOMMU
Granularity	Per-port (byte)	Per-page (4 KB minimum)
User-Space Access	Via ioperm()/iopl() syscalls	Via mmap() on /dev/mem or device files
Virtualization	VM must trap IN/OUT	EPT/NPT nested page tables
DMA Protection	Not applicable	IOMMU provides isolation
Integration	Separate mechanism	Uses existing memory protection

MMIO Protection

MMIO leverages the processor's memory protection infrastructure:

Page Table Permissions: MMIO pages can be marked as:
- Kernel-only (user access faults)
- Read-only (write attempts fault)
- No-execute (code execution faults)
Virtual Address Isolation: Each process has its own page tables. MMIO isn't mapped into user-space unless explicitly granted.
IOMMU for DMA: Modern systems use IOMMUs to control device DMA:
- Devices can only access memory regions explicitly mapped
- Prevents DMA attacks where devices overwrite arbitrary memory
- Enables secure VM device passthrough (VFIO)

Advantages of Page-Table Protection:

Integrates with existing protection infrastructure
Combined read/write/execute permissions
Works with virtual memory (MMIO in process space)
IOMMU provides DMA protection

Disadvantages:

4 KB granularity (may expose more than needed)
Requires TLB entries
More complex setup (page table manipulation)

DMA Attack Vector

Portability and Language Support

Software portability is a crucial consideration for code that must run across different architectures or be maintained for extended periods.

Port I/O Portability Challenges

Port I/O is fundamentally architecture-specific:

Instruction Set Dependency: IN/OUT instructions exist only on x86 (and a few other architectures). ARM, RISC-V, MIPS, PowerPC—none have native port I/O instructions.
Compiler Support: Standard C has no port I/O primitives. Access requires:
- Inline assembly (non-portable)
- Compiler intrinsics like __inb()/__outb() (compiler-specific)
- Operating system wrappers (OS-specific)
Cross-Platform Code: Drivers using PMIO cannot be ported to non-x86 architectures without complete rewrite of I/O access layer.

portability_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
/*
 * Portability Comparison: PMIO vs MMIO
 * 
 * This example demonstrates the portability differences
 * between port I/O and memory-mapped I/O in C code.
 */
 
/* ============================================================
 * PORT I/O: Architecture-Specific, Requires Inline Assembly
 * ============================================================ */
 
/* x86-specific inline functions */
#if defined(__x86_64__) || defined(__i386__)
 
static inline uint8_t inb(uint16_t port)
{
    uint8_t value;
    asm volatile("inb %1, %0" : "=a"(value) : "Nd"(port));
    return value;
}
 
static inline void outb(uint16_t port, uint8_t value)
{
    asm volatile("outb %0, %1" : : "a"(value), "Nd"(port));
}
 
#else
/* Non-x86 architectures: PMIO not available */
#error "Port I/O not supported on this architecture"
#endif
 
/* Device driver using PMIO - completely non-portable */
void legacy_device_write_pmio(uint16_t base_port, uint8_t data)
{
    outb(base_port + 0, 0x01);    /* Select register */
    outb(base_port + 1, data);    /* Write data */
}
 
/* ============================================================
 * MEMORY-MAPPED I/O: Portable Across Architectures
 * ============================================================ */
 
#include <stdint.h>
 
/*
 * Portable MMIO accessor (simplified for illustration)
 * Real implementations include memory barriers and volatile.
 */
static inline void mmio_write32(volatile uint32_t *addr, uint32_t value)
{
    *addr = value;
    /* Barrier implied by volatile in most cases,
       but real code should use explicit barriers */
}
 
static inline uint32_t mmio_read32(volatile uint32_t *addr)
{
    return *addr;
}
 
/* Device driver using MMIO - portable to any MMIO architecture */
void modern_device_write_mmio(volatile uint32_t *regs, uint32_t data)
{
    mmio_write32(&regs[0], 0x01);    /* Select register */
    mmio_write32(&regs[1], data);    /* Write data */
}
 
/*
 * This MMIO-based driver works on:
 * - x86/x64
 * - ARM/ARM64
 * - RISC-V
 * - MIPS
 * - PowerPC
 * - And any other architecture with memory-mapped devices
 */
 
/* ============================================================
 * LINUX KERNEL APPROACH: Abstraction Layer
 * ============================================================ */
 
#ifdef __KERNEL__
#include <linux/io.h>
 
/*
 * Linux provides portable accessor macros that handle
 * architecture-specific details internally.
 */
void linux_driver_example(void __iomem *base)
{
    /* These work on ALL Linux-supported architectures */
    
    /* Write 32-bit value */
    writel(0x12345678, base + 0x00);
    
    /* Read 32-bit value */
    uint32_t status = readl(base + 0x04);
    
    /* Block copy to device */
    memcpy_toio(base + 0x100, buffer, size);
    
    /* 
     * Architecture-specific behavior hidden:
     * - ARM: Includes barriers for weak ordering
     * - x86: Direct memory access
     * - All: Proper volatile semantics
     */
}
 
/*
 * For PMIO on x86, Linux provides wrappers that are
 * conditionally compiled only for x86:
 */
#ifdef CONFIG_X86
void linux_pmio_example(unsigned long port)
{
    uint8_t val = inb(port);
    outb(0xFF, port);
}
#endif
 
#endif /* __KERNEL__ */

MMIO Portability Benefits

Memory-mapped I/O is supported by virtually all modern processor architectures:

Standard C Pointers: MMIO addresses are simply memory addresses. Standard C pointer operations work.
Cross-Architecture: The same logical code (write to address X) works on x86, ARM, RISC-V, etc.
OS Abstraction: Operating systems provide portable accessor macros (Linux readl/writel, Windows READ_REGISTER_ULONG).

The Dominance of MMIO

This portability advantage, combined with performance benefits, explains why:

All modern processor architectures (ARM, RISC-V, etc.) use MMIO exclusively
New device hardware standards (NVMe, USB-C, Thunderbolt) mandate MMIO
Cloud/server environments with heterogeneous hardware prefer common code paths
x86 PMIO survives only for legacy compatibility

Future-Proofing Device Drivers

Hybrid Systems and Real-World Usage

Real-world x86 systems don't use PMIO or MMIO exclusively—they use both in a complementary fashion. Understanding this hybrid reality is essential for practical systems work.

Why Hybrids Exist

Legacy compatibility drives hybrid usage:

1981: IBM PC establishes port-mapped interfaces for keyboard, timer, interrupt controllers
1984-2000: New devices add both PMIO (compatibility) and MMIO (performance)
2000-present: New standards prefer MMIO, but legacy devices remain

Modern Hybrid Example: PCI/PCIe Configuration

PCI configuration space itself shows the evolution:

Legacy PCI: Configuration accessed via ports 0xCF8/0xCFC (PMIO)
PCIe ECAM: Configuration memory-mapped to physical address region (MMIO)
Reality: Systems support both, with ECAM preferred but port fallback available

Common Devices: I/O Paradigm Usage
Device Category	PMIO Usage	MMIO Usage	Trend
PIC (8259)	Ports 0x20-0x21, 0xA0-0xA1	None (pure PMIO)	Legacy only
APIC	None	0xFEE00000 region	MMIO only
PIT (8254)	Ports 0x40-0x43	None (pure PMIO)	Legacy only
HPET	None (optional port)	0xFED00000 region	MMIO preferred
PS/2 Keyboard	Ports 0x60/0x64	None	Legacy only
USB Controller	None (EHCI+)	Large BAR regions	MMIO only
SATA Controller	Ports (compatibility mode)	ABAR region (AHCI)	MMIO preferred
NVMe Controller	None	All MMIO-based	MMIO only
GPU	VGA legacy ports	Multiple large BARs	MMIO dominant
Network Card (modern)	None	Full MMIO	MMIO only
Serial Port (legacy)	Ports 0x3F8/0x2F8	None	Legacy only
Serial Port (modern)	None	MMIO-based UART	MMIO only

Case Study: AHCI SATA Controller

AHCI (Advanced Host Controller Interface) demonstrates the hybrid transition elegantly:

Legacy Mode: Emulates IDE controller using traditional ports (0x1F0-0x1F7, 0x3F6)
Native AHCI Mode: All operations through MMIO (the ABAR - AHCI Base Address Register)

Performance in native AHCI mode is significantly better:

Command submission is a single MMIO doorbell write
Status is memory-mapped (no port polling)
DMA descriptors are in memory, pointed to by MMIO registers

Operating systems enable AHCI mode during boot, but BIOS uses IDE mode for compatibility with older boot code.

Categorizing Device I/O Today

Legacy-Only PMIO: PIC, PIT, PS/2, legacy serial—maintained for backward compatibility
Hybrid (Transitional): VGA, SATA IDE mode—support both, MMIO preferred
MMIO-Only: NVMe, USB, PCIe, modern NICs—no port I/O interface at all

ahci_hybrid_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
/*
 * AHCI: The Hybrid Transition Example
 * 
 * This illustrates how AHCI supports both legacy PMIO (IDE mode)
 * and modern MMIO (native mode) for the same SATA controller.
 */
 
#include <linux/io.h>
#include <linux/pci.h>
 
/* Legacy IDE Ports (PMIO) */
#define IDE_PRIMARY_DATA    0x1F0
#define IDE_PRIMARY_STATUS  0x1F7
#define IDE_PRIMARY_CONTROL 0x3F6
 
/* AHCI MMIO Registers (offsets from ABAR) */
#define AHCI_HOST_CAP       0x00  /* Host Capabilities */
#define AHCI_GHC            0x04  /* Global Host Control */
#define AHCI_IS             0x08  /* Interrupt Status */
#define AHCI_PI             0x0C  /* Ports Implemented */
#define AHCI_PORT_BASE      0x100 /* Port 0 registers start */
#define AHCI_PORT_SIZE      0x80  /* Size of each port's register block */
 
/*
 * Legacy IDE Mode Access (Port I/O - Slow, Limited)
 * 
 * This is how DOS and early OSes accessed disks.
 * Limited to 1 command outstanding, PIO-based transfers.
 */
uint8_t ide_read_status_legacy(void)
{
    return inb(IDE_PRIMARY_STATUS);
}
 
void ide_write_sector_legacy(const uint16_t *buffer)
{
    int i;
    /* Wait for controller ready */
    while (inb(IDE_PRIMARY_STATUS) & 0x80)
        ;  /* Spin until BSY clears */
    
    /* Write 256 words (512 bytes) one at a time */
    for (i = 0; i < 256; i++) {
        outw(IDE_PRIMARY_DATA, buffer[i]);
    }
}
 
/*
 * Native AHCI Mode Access (MMIO - Fast, Full-Featured)
 * 
 * Modern approach with command queuing, NCQ, etc.
 */
struct ahci_controller {
    void __iomem *abar;        /* AHCI Base Address (from BAR[5]) */
    struct ahci_port *ports;   /* Per-port structures */
};
 
void ahci_init(struct ahci_controller *ahci, struct pci_dev *pdev)
{
    /* Get ABAR from PCI BAR 5 */
    ahci->abar = pci_iomap(pdev, 5, 0);
    
    /* Read capabilities via MMIO */
    uint32_t cap = readl(ahci->abar + AHCI_HOST_CAP);
    int max_ports = (cap & 0x1F) + 1;
    int supports_ncq = (cap >> 30) & 1;
    
    /* Enable AHCI mode */
    uint32_t ghc = readl(ahci->abar + AHCI_GHC);
    ghc |= (1 << 31);  /* AHCI Enable bit */
    writel(ghc, ahci->abar + AHCI_GHC);
    
    pr_info("AHCI: %d ports, NCQ=%s
", max_ports, 
            supports_ncq ? "yes" : "no");
}
 
/*
 * Submit a command using AHCI (MMIO-based)
 * 
 * By contrast to IDE, this supports:
 * - 32 commands queued per port (NCQ)
 * - DMA transfers (no PIO overhead)
 * - Single doorbell write to submit
 */
void ahci_submit_command(struct ahci_controller *ahci, int port, int slot)
{
    void __iomem *port_regs = ahci->abar + AHCI_PORT_BASE + (port * AHCI_PORT_SIZE);
    
    /* Command already prepared in memory (command list, FIS, PRD table) */
    
    /* Single MMIO write submits the command! */
    writel(1 << slot, port_regs + 0x38);  /* CI (Command Issue) register */
    
    /* Controller now DMA's the command, executes it, DMAs data,
       and posts completion in memory. No further CPU intervention! */
}
 
/*
 * Performance Comparison Summary:
 * 
 * IDE (PMIO):
 * - ~10 MB/s max (PIO bottleneck)
 * - 1 command at a time
 * - CPU involved in every word transfer
 * 
 * AHCI (MMIO):
 * - 600+ MB/s (SATA III speeds)
 * - 32 commands queued (NCQ)
 * - DMA handles all data movement
 */

Decision Framework: Choosing Between PMIO and MMIO

When designing new hardware or writing device drivers, choosing the I/O paradigm requires systematic evaluation. Here's a comprehensive decision framework.

Primary Decision Factors

When to Use Port-Mapped I/O

•Legacy Hardware Compatibility: Device must be compatible with legacy BIOS, DOS, or old operating systems that expect port access
•x86-Only Target: Hardware will only ever be used with x86 processors (increasingly rare assumption)
•Minimal Register Count: Device has few registers (< 256 bytes), making port space consumption acceptable
•Explicit Serialization Needed: Device protocol requires that each access be serialized (rare, usually MMIO with barriers is preferred)
•BIOS/Firmware Requirements: System initialization code uses ports (e.g., POST codes to port 0x80)

When to Use Memory-Mapped I/O (Strongly Preferred)

•Performance-Critical Paths: High-throughput data transfers, minimal latency requirements
•Large Register Spaces: Device has substantial register or buffer space (> 1 KB)
•Cross-Platform Intent: Hardware may be used with ARM, RISC-V, or non-x86 processors
•Modern OS Targeting: Operating system is Linux, Windows Vista+, macOS, or other modern OS
•DMA Integration: Device uses DMA and benefits from unified address space
•Virtualization Compatibility: Device may be passed through to virtual machines
•Complex Access Patterns: Driver benefits from indexed or offset addressing modes

Quick Reference Decision Matrix
Criterion	Choose PMIO	Choose MMIO
Target Architecture	x86 only, legacy	Any modern architecture
Register Count	< 256 bytes	Any size
Data Transfer Rate	< 1 MB/s	Any speed
Legacy Boot Required	Yes	No
Driver Portability	Not required	Important
VM Passthrough	Not planned	May be used
New Design (2020+)	Almost never	Always

The Modern Default

Summary: PMIO vs MMIO Comparison

This page has provided a comprehensive comparative analysis of Port-Mapped I/O and Memory-Mapped I/O. Let's consolidate the key insights:

Port-Mapped I/O

•Separate address space (64 KB on x86)
•Dedicated IN/OUT instructions
•x86-specific (not portable)
•Slower (~20-50 cycles per access)
•IOPL/IOPB protection
•Limited bulk transfer capability
•Legacy dominance

Memory-Mapped I/O

•Unified address space (terabytes available)
•Standard LOAD/STORE instructions
•Cross-platform (ARM, RISC-V, etc.)
•Faster (posted writes, WC bursts)
•Page table + IOMMU protection
•Full memcpy/DMA support
•Modern standard

Essential Takeaways

•MMIO is the modern default: New hardware and drivers should use MMIO unless legacy compatibility specifically requires PMIO.
•Performance favors MMIO: Posted writes, write-combining, and efficient burst transfers make MMIO dramatically faster for high-throughput scenarios.
•Portability favors MMIO: Only x86 supports PMIO; MMIO works across all modern architectures.
•Real systems are hybrid: x86 PCs use both—legacy devices via PMIO, modern devices via MMIO.
•Security differs: PMIO uses IOPL/IOPB; MMIO uses page tables and IOMMU for comprehensive DMA protection.
•Understand both: Even if you only write MMIO drivers, understanding PMIO is essential for diagnosing legacy device issues.

Looking Ahead

Page Complete

4 / 5