Loading learning content...
Memory-Mapped I/O (MMIO) represents a fundamentally different philosophy from Port-Mapped I/O: rather than treating devices as residents of a separate address realm, MMIO integrates device registers directly into the processor's memory address space. Every device register receives a memory address, and CPU memory instructions (load, store) become the universal interface for all hardware communication.
This architectural choice has profound implications. It eliminates the need for dedicated I/O instructions, enables the full power of the processor's addressing modes for device access, and allows devices with large register spaces or memory buffers to be efficiently addressed. The simplicity and elegance of "everything is memory" has made MMIO the dominant paradigm in modern computer architecture.
From graphics cards with gigabytes of VRAM to network interfaces with thousands of registers, from embedded microcontrollers to the most powerful server processors—Memory-Mapped I/O enables them all.
By the end of this page, you will understand: (1) The fundamental principles of memory-mapped device access, (2) How device registers appear as ordinary memory locations, (3) Memory region configuration and the role of MTRRs/PAT, (4) Programming techniques for MMIO with memory barrier considerations, (5) Hardware address translation and bus routing, and (6) How modern high-performance devices leverage MMIO for maximum throughput.
In Memory-Mapped I/O, device registers and memory buffers are assigned addresses within the processor's physical memory address space. When the CPU issues a memory transaction to such an address, the memory controller recognizes it as a device region and routes the transaction to the appropriate hardware.
The Unified Address Model
Consider a 64-bit processor with a 48-bit physical address space (256 TB). Within this vast space:
From the CPU's perspective, all accesses use the same instruction set: MOV, LOAD, STORE (or architecture equivalents). The address bus carries the target address, the data bus carries the data, and control signals indicate read or write. The destination (memory vs. device) is determined solely by address decoding, not by instruction type.
| Address Range | Size | Designation | Contents |
|---|---|---|---|
| 0x0000_0000_0000 - 0x0000_0009_FFFF | 640 KB | Conventional Memory | Legacy DOS area, BIOS data |
| 0x0000_000A_0000 - 0x0000_000B_FFFF | 128 KB | Video Memory | Legacy VGA frame buffer |
| 0x0000_000C_0000 - 0x0000_000F_FFFF | 256 KB | ROM Area | BIOS, option ROMs |
| 0x0000_0010_0000 - Low Memory End | Variable | Extended Memory | Main RAM |
| Low Memory End - 4GB | Variable | PCI MMIO Region | 32-bit device BARs |
| 0x0000_FED0_0000 - 0x0000_FED0_3FFF | 16 KB | HPET | High Precision Event Timer |
| 0x0000_FEE0_0000 - 0x0000_FEE0_0FFF | 4 KB | APIC | Local APIC registers |
| Above 4GB | Variable | High MMIO | 64-bit device BARs, large devices |
The Memory Hole Concept
Incorporating MMIO into the memory address space creates memory holes—regions of the address space that map to devices instead of RAM. Even if physical RAM exists at those addresses, it becomes inaccessible (hidden or remapped) when devices claim those ranges.
The most significant memory hole on x86 systems exists between the top of usable memory (typically 2-3 GB on legacy systems) and the 4 GB boundary. This MMIO gap accommodates:
On modern systems with more than 4 GB of RAM, the "hidden" memory is remapped to physical addresses above 4 GB through memory controller remapping features.
Modern chipsets implement memory remapping to recover RAM addresses claimed by MMIO. For example, if MMIO occupies addresses 0xC000_0000 to 0xFFFF_FFFF (1 GB), the RAM that would have occupied that space is remapped to 0x1_0000_0000 and above, preserving total usable memory.
Understanding MMIO requires tracing a memory transaction from CPU instruction through bus hierarchies to the final device. Let's examine this journey in detail.
The Memory Transaction Lifecycle
When the CPU executes a memory instruction targeting an MMIO address, the following sequence occurs:
Virtual Address Translation: If virtual memory is enabled, the MMU translates the virtual address to a physical address. The page table entry may contain attributes (caching behavior, memory type) affecting MMIO.
Cache Lookup: The CPU checks if the address is cached. For properly configured MMIO regions, the access bypasses cache (marked uncacheable or write-combining).
Memory Controller Decode: The integrated or discrete memory controller examines the physical address against configured ranges. DRAM ranges route to memory; MMIO ranges route to I/O bus hierarchy.
Bus Bridge Traversal: On PCIe systems, the transaction traverses the Root Complex, possibly crossing PCIe switches, until reaching the target device's bridge.
Device BAR Match: The target device compares the address against its configured BAR ranges. A match triggers device register access.
Device Response: The device reads or writes the register and returns data (for reads) or acknowledgment (for writes).
Base Address Registers (BARs)
PCI and PCIe devices advertise their MMIO requirements through Base Address Registers. BARs are configuration space registers that define:
During system initialization (BIOS/UEFI or OS enumeration), software assigns addresses to each device's BARs, constructing the system's MMIO map. This dynamic assignment allows the same device to reside at different addresses on different systems.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
/* * PCI BAR Enumeration and Size Detection * * This code demonstrates the standard algorithm for discovering * a PCI device's MMIO requirements by probing its Base Address Registers. */ #include <stdint.h> /* PCI Configuration Space access (x86 uses ports 0xCF8/0xCFC) */#define PCI_CONFIG_ADDR 0x0CF8#define PCI_CONFIG_DATA 0x0CFC /* BAR types */#define BAR_TYPE_MASK 0x01#define BAR_TYPE_MEMORY 0x00#define BAR_TYPE_IO 0x01#define BAR_MEM_TYPE_MASK 0x06#define BAR_MEM_TYPE_32 0x00#define BAR_MEM_TYPE_64 0x04#define BAR_MEM_PREFETCH_MASK 0x08 extern void outl(uint16_t port, uint32_t value);extern uint32_t inl(uint16_t port); /* * Build PCI configuration address for a specific register. */static uint32_t pci_config_addr(uint8_t bus, uint8_t device, uint8_t function, uint8_t offset){ return (1UL << 31) | /* Enable bit */ ((uint32_t)bus << 16) | ((uint32_t)(device & 0x1F) << 11) | ((uint32_t)(function & 0x07) << 8) | (offset & 0xFC); /* Dword aligned */} /* * Read a 32-bit value from PCI configuration space. */uint32_t pci_config_read32(uint8_t bus, uint8_t device, uint8_t function, uint8_t offset){ outl(PCI_CONFIG_ADDR, pci_config_addr(bus, device, function, offset)); return inl(PCI_CONFIG_DATA);} /* * Write a 32-bit value to PCI configuration space. */void pci_config_write32(uint8_t bus, uint8_t device, uint8_t function, uint8_t offset, uint32_t value){ outl(PCI_CONFIG_ADDR, pci_config_addr(bus, device, function, offset)); outl(PCI_CONFIG_DATA, value);} /* * Determine the size of a MMIO BAR by the standard probe algorithm. * * Algorithm: * 1. Save original BAR value * 2. Write all 1s to BAR * 3. Read back - device returns size mask (cleared bits = size) * 4. Restore original BAR value * 5. Calculate size from mask */struct bar_info { uint64_t base_address; /* Current BAR value (assigned address) */ uint64_t size; /* Size in bytes */ uint8_t type; /* 0=MMIO, 1=Port I/O */ uint8_t is_64bit; /* 1 if 64-bit BAR */ uint8_t prefetchable; /* 1 if prefetchable memory */}; int pci_get_bar_info(uint8_t bus, uint8_t device, uint8_t function, uint8_t bar_index, struct bar_info *info){ /* BAR registers start at offset 0x10, each is 4 bytes */ uint8_t bar_offset = 0x10 + (bar_index * 4); /* Read original BAR value */ uint32_t original = pci_config_read32(bus, device, function, bar_offset); /* Determine BAR type */ if (original & BAR_TYPE_IO) { /* I/O BAR - not MMIO */ info->type = 1; info->base_address = original & ~0x03; /* Probe size */ pci_config_write32(bus, device, function, bar_offset, 0xFFFFFFFF); uint32_t size_mask = pci_config_read32(bus, device, function, bar_offset); pci_config_write32(bus, device, function, bar_offset, original); size_mask &= ~0x03; /* Clear type bits */ info->size = (~size_mask + 1) & 0xFFFF; /* I/O limited to 64 KB */ info->is_64bit = 0; info->prefetchable = 0; return 0; } /* Memory BAR */ info->type = 0; info->is_64bit = ((original & BAR_MEM_TYPE_MASK) == BAR_MEM_TYPE_64); info->prefetchable = (original & BAR_MEM_PREFETCH_MASK) ? 1 : 0; /* Probe lower 32 bits */ pci_config_write32(bus, device, function, bar_offset, 0xFFFFFFFF); uint32_t low_mask = pci_config_read32(bus, device, function, bar_offset); pci_config_write32(bus, device, function, bar_offset, original); low_mask &= ~0x0F; /* Clear type bits */ if (info->is_64bit) { /* Read upper 32 bits from next BAR */ uint32_t original_high = pci_config_read32(bus, device, function, bar_offset + 4); pci_config_write32(bus, device, function, bar_offset + 4, 0xFFFFFFFF); uint32_t high_mask = pci_config_read32(bus, device, function, bar_offset + 4); pci_config_write32(bus, device, function, bar_offset + 4, original_high); /* Combine to form 64-bit mask and calculate size */ uint64_t full_mask = ((uint64_t)high_mask << 32) | low_mask; info->size = ~full_mask + 1; info->base_address = ((uint64_t)original_high << 32) | (original & ~0x0F); } else { info->size = (~low_mask + 1) & 0xFFFFFFFF; info->base_address = original & ~0x0F; } return 0;}One of the most critical aspects of MMIO is proper configuration of memory caching behavior. Unlike RAM, where caching improves performance, MMIO regions require careful cache control to ensure correct device behavior.
Why Caching MMIO is Dangerous
Device registers are side-effecting: reading a register might clear an interrupt flag, and writing might trigger an action. If these accesses were cached:
To prevent these issues, x86 provides memory type configuration through MTRRs (Memory Type Range Registers) and the PAT (Page Attribute Table).
| Memory Type | Code | Characteristics | MMIO Use Case |
|---|---|---|---|
| Uncacheable (UC) | 0 | No caching, serialized access, no speculation | Standard device registers |
| Write Combining (WC) | 1 | No caching but writes combine, reads may be speculative | Frame buffers, DMA buffers |
| Write Through (WT) | 4 | Cached reads, writes propagate immediately | Rarely used for MMIO |
| Write Protect (WP) | 5 | Cached reads, writes are silently dropped | Not for MMIO |
| Write Back (WB) | 6 | Fully cached, writes delayed | Never use for MMIO |
| Uncacheable Minus (UC-) | 7 | Like UC but can be overridden by WC MTRR | Default MMIO fallback |
Memory Type Range Registers (MTRRs)
MTRRs provide a mechanism to assign memory types to physical address ranges at the hardware level, independent of page tables. There are two types:
Modern systems typically configure variable MTRRs during boot to mark RAM as WB and MMIO regions as UC. The BIOS/UEFI firmware establishes these settings.
Page Attribute Table (PAT)
While MTRRs work at the physical level, the PAT allows memory type configuration in page tables—enabling per-page control visible to the operating system. Each page table entry can specify a PAT index that, combined with the PCD and PWT bits, selects one of eight memory types from the PAT configuration register.
The operating system's memory mapping functions must use appropriate flags when creating MMIO mappings. On Linux, this is handled through ioremap() variants.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
/* * Linux MMIO Mapping Functions * * The kernel provides several ioremap variants that configure * appropriate memory types for different MMIO characteristics. */ #include <linux/io.h>#include <linux/types.h> /* * ioremap() / ioremap_nocache() * * Maps MMIO region with Uncacheable (UC) memory type. * Use for: Standard device registers where every access must * reach the device in order and without caching. * * @param phys_addr: Physical address of MMIO region (from BAR) * @param size: Size of the region in bytes * @returns: Virtual address for kernel access, or NULL on failure */void __iomem *map_device_registers(phys_addr_t phys_addr, size_t size){ void __iomem *base; /* Standard uncacheable mapping for device registers */ base = ioremap(phys_addr, size); if (!base) { pr_err("Failed to map MMIO region at %pa", &phys_addr); return NULL; } return base;} /* * ioremap_wc() * * Maps MMIO region with Write-Combining (WC) memory type. * Use for: Frame buffers, large write-only or write-mostly regions * where combining writes improves throughput. */void __iomem *map_framebuffer(phys_addr_t phys_addr, size_t size){ void __iomem *base; /* Write-combining for better frame buffer performance */ base = ioremap_wc(phys_addr, size); if (!base) { /* Fall back to uncacheable if WC not available */ base = ioremap(phys_addr, size); } return base;} /* * Example: Complete MMIO mapping and access workflow */struct my_device_regs { uint32_t control; /* Offset 0x00: Control register */ uint32_t status; /* Offset 0x04: Status register */ uint32_t interrupt; /* Offset 0x08: Interrupt status */ uint32_t data; /* Offset 0x0C: Data register */}; struct my_device { void __iomem *regs; /* MMIO register base */ void __iomem *framebuffer; /* Write-combining frame buffer */ phys_addr_t regs_phys; phys_addr_t fb_phys; size_t fb_size;}; int my_device_probe(struct pci_dev *pdev, struct my_device *dev){ /* Get BAR 0 for registers (uncacheable) */ dev->regs_phys = pci_resource_start(pdev, 0); dev->regs = ioremap(dev->regs_phys, pci_resource_len(pdev, 0)); if (!dev->regs) return -ENOMEM; /* Get BAR 1 for frame buffer (write-combining) */ dev->fb_phys = pci_resource_start(pdev, 1); dev->fb_size = pci_resource_len(pdev, 1); dev->framebuffer = ioremap_wc(dev->fb_phys, dev->fb_size); if (!dev->framebuffer) { iounmap(dev->regs); return -ENOMEM; } pr_info("Device mapped: regs=%pa, fb=%pa (size %zu)", &dev->regs_phys, &dev->fb_phys, dev->fb_size); return 0;} void my_device_remove(struct my_device *dev){ if (dev->framebuffer) iounmap(dev->framebuffer); if (dev->regs) iounmap(dev->regs);}Mapping MMIO as write-back (WB) cacheable memory will cause catastrophic failures. Writes may never reach the device, reads return stale data, and the resulting behavior is undefined. Always use ioremap() or equivalent uncacheable mapping functions for device registers.
Accessing MMIO locations requires more than simple pointer dereferences. Compilers may reorder, combine, or eliminate memory accesses as optimizations, and processors may reorder memory operations for performance. For MMIO, these behaviors can cause device miscommunication.
The Problem with Direct Pointer Access
Consider this naive MMIO access:
void __iomem *regs = ioremap(phys_addr, size);
uint32_t *control = (uint32_t *)regs;
*control = 0x01; // Start operation
*control = 0x02; // Don't do this!
Problems:
For device correctness, every write must reach the device, in order, when programmed.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
/* * Proper MMIO Access Functions * * Modern kernels provide typed accessor functions that ensure: * 1. Volatile semantics (no compiler optimization) * 2. Correct memory ordering where needed * 3. Architecture-appropriate implementation */ #include <linux/io.h> /* * Basic Accessors (Linux: ioread/iowrite) * * These functions provide the minimum guarantee: the access will occur. * They do NOT provide ordering guarantees with respect to other accesses. */ void demonstrate_basic_accessors(void __iomem *regs){ uint32_t status; /* Read 32-bit value from MMIO location */ status = ioread32(regs + 0x04); /* Write 32-bit value to MMIO location */ iowrite32(0x00000001, regs + 0x00); /* Size variants */ iowrite8(0xFF, regs + 0x10); /* 8-bit write */ iowrite16(0x1234, regs + 0x12); /* 16-bit write */ /* Big-endian variants (if device uses BE register layout) */ iowrite32be(0x12345678, regs + 0x20); uint32_t be_value = ioread32be(regs + 0x20);} /* * Ordered Accessors (Linux: readl/writel family) * * These provide stronger ordering guarantees: * - writel: Write is complete before the function returns * (typically includes a read-back fence on weakly-ordered archs) * - readl: Preceding writes are visible before this read */ void demonstrate_ordered_accessors(void __iomem *regs){ uint32_t status; /* Strongly-ordered write: flushes write buffers */ writel(0x00000001, regs + 0x00); /* Read back to confirm write completion */ status = readl(regs + 0x04); /* The write at offset 0x00 is guaranteed to have reached * the device before the read at offset 0x04 returns */} /* * Relaxed Accessors (Linux: writel_relaxed/readl_relaxed) * * Maximum performance, minimum ordering. Use when: * - Batching many writes before a final ordered access * - Writes can be safely reordered * - You explicitly manage barriers */ void demonstrate_relaxed_accessors(void __iomem *regs){ int i; /* Batch of relaxed writes - may be reordered and combined */ for (i = 0; i < 100; i++) { writel_relaxed(data[i], regs + 0x100 + (i * 4)); } /* Explicit write memory barrier ensures all above writes complete */ wmb(); /* Final ordered write to trigger device action */ writel(0x01, regs + 0x00); /* Start processing */ /* Read status - guarantees all writes visible to device */ uint32_t status = readl(regs + 0x04);} /* * Memory Barrier Types * * Different barriers provide different ordering guarantees. */ void demonstrate_barriers(void __iomem *regs){ /* Read memory barrier: prior reads complete before subsequent reads */ rmb(); /* Write memory barrier: prior writes complete before subsequent writes */ wmb(); /* Full memory barrier: orders all prior memory ops before subsequent */ mb(); /* MMIO-specific barrier: ensures MMIO writes reach devices */ mmiowb(); /* Deprecated in favor of spin_unlock ordering */ /* Compiler barrier only: prevents compiler reordering but not CPU */ barrier(); /* Example: Read status until ready, then read data */ uint32_t status; do { status = readl(regs + STATUS_OFFSET); cpu_relax(); /* Hint for spin-waiting, may include barrier */ } while (!(status & READY_BIT)); rmb(); /* Ensure status read completes before data read */ uint32_t data = readl(regs + DATA_OFFSET);} /* * String/Block Operations * * For bulk data transfer, string variants operate on buffers. */ void demonstrate_block_transfers(void __iomem *regs, void *buffer, size_t count){ /* Read block of 32-bit values */ ioread32_rep(regs + FIFO_OFFSET, buffer, count); /* Write block of 32-bit values */ iowrite32_rep(regs + FIFO_OFFSET, buffer, count); /* Memory copy to/from MMIO (with proper access width handling) */ memcpy_toio(regs + 0x1000, buffer, count * 4); memcpy_fromio(buffer, regs + 0x1000, count * 4);}On x86/x64, MMIO regions are typically marked UC (uncacheable), which provides strong ordering guarantees. On ARM and other weakly-ordered architectures, explicit barriers become critical. The kernel's accessor macros abstract these differences—always use them instead of raw pointer dereferences.
Memory-Mapped I/O enables modern devices to achieve extraordinary performance levels by leveraging large address spaces, sophisticated memory types, and direct CPU-device communication patterns.
Graphics Processing Units (GPUs)
Modern GPUs exemplify advanced MMIO usage:
The combination of large WC-mapped data regions and small UC-mapped control registers enables GPUs to achieve bandwidth-efficient bulk transfers while maintaining precise control semantics.
| BAR | Size | Memory Type | Purpose |
|---|---|---|---|
| BAR 0 | 16-256 MB | Uncacheable | Control registers, doorbells |
| BAR 1 | 256 MB - 16 GB | Write-Combining | Direct GPU memory aperture |
| BAR 2/3 | Variable | Write-Combining | Extended memory aperture |
| Resizable BAR | Up to 100% VRAM | Write-Combining | Full VRAM access (if enabled) |
NVMe Solid-State Drives
NVMe leverages MMIO for its high-performance storage interface:
The elegance of NVMe's MMIO design enables it to achieve millions of IOPS with minimal CPU overhead—each command submission requires only a single 4-byte doorbell write.
Network Interface Cards
High-speed NICs (40 Gbps, 100 Gbps, and beyond) use MMIO extensively:
Modern NICs minimize MMIO latency through register layout optimization, cache line alignment of frequently accessed registers, and separation of read-heavy and write-heavy regions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
/* * NVMe MMIO Register Access Example * * Demonstrates the MMIO structure of NVMe controllers, * showcasing minimal-overhead command submission. */ #include <linux/io.h>#include <linux/types.h> /* NVMe Controller Registers (from NVMe specification) */struct nvme_controller_regs { uint64_t cap; /* 0x00: Controller Capabilities */ uint32_t vs; /* 0x08: Version */ uint32_t intms; /* 0x0C: Interrupt Mask Set */ uint32_t intmc; /* 0x10: Interrupt Mask Clear */ uint32_t cc; /* 0x14: Controller Configuration */ uint32_t reserved1; /* 0x18 */ uint32_t csts; /* 0x1C: Controller Status */ uint32_t nssr; /* 0x20: NVM Subsystem Reset */ uint32_t aqa; /* 0x24: Admin Queue Attributes */ uint64_t asq; /* 0x28: Admin Submission Queue Base */ uint64_t acq; /* 0x30: Admin Completion Queue Base */ /* ... additional registers ... */}; /* Doorbell stride is determined by CAP.DSTRD, typically 4 bytes */#define NVME_DOORBELL_BASE 0x1000 struct nvme_device { void __iomem *regs; /* Base of MMIO registers */ void __iomem *doorbells; /* Base of doorbell registers */ uint32_t doorbell_stride; /* Bytes between doorbells */ /* Admin queues in host memory (not MMIO) */ struct nvme_command *admin_sq; struct nvme_completion *admin_cq; uint16_t admin_sq_tail; uint16_t admin_cq_head;}; /* * Initialize NVMe controller - probe and configure. */int nvme_init(struct nvme_device *dev, void __iomem *bar0){ uint64_t cap; uint32_t vs, cc, csts; dev->regs = bar0; /* Read capabilities to determine controller parameters */ cap = readq(dev->regs + offsetof(struct nvme_controller_regs, cap)); /* Extract doorbell stride: 2^(2+DSTRD) bytes */ uint8_t dstrd = (cap >> 32) & 0x0F; dev->doorbell_stride = 4 << dstrd; dev->doorbells = dev->regs + NVME_DOORBELL_BASE; /* Read version */ vs = readl(dev->regs + offsetof(struct nvme_controller_regs, vs)); pr_info("NVMe version: %d.%d.%d", (vs >> 16), (vs >> 8) & 0xFF, vs & 0xFF); /* Wait for controller ready (CSTS.RDY = 1) after enabling */ /* ... configuration sequence ... */ return 0;} /* * Submit a command to an NVMe queue. * * This demonstrates the minimal MMIO access pattern: * 1. Write command to host memory submission queue * 2. Single 4-byte doorbell write to notify controller */void nvme_submit_command(struct nvme_device *dev, uint16_t qid, struct nvme_command *cmd, uint16_t *sq_tail){ /* Step 1: Copy command to submission queue (host memory, not MMIO) */ memcpy(&dev->queues[qid].sq[*sq_tail], cmd, sizeof(*cmd)); /* Ensure command is written before doorbell */ wmb(); /* Step 2: Advance tail pointer */ *sq_tail = (*sq_tail + 1) % dev->queues[qid].sq_depth; /* Step 3: Write doorbell - single 4-byte MMIO write triggers controller */ /* Submission Queue y Tail Doorbell offset = 0x1000 + (2y * doorbell_stride) */ writel(*sq_tail, dev->doorbells + (2 * qid * dev->doorbell_stride)); /* That's it - controller will now fetch and process the command */} /* * Check for completions - poll completion queue and ring doorbell. */int nvme_poll_completions(struct nvme_device *dev, uint16_t qid, struct nvme_completion *completions, int max_completions){ int count = 0; struct nvme_queue *q = &dev->queues[qid]; while (count < max_completions) { struct nvme_completion *cqe = &q->cq[q->cq_head]; /* Check phase tag to see if entry is valid */ if ((cqe->status & 1) != q->cq_phase) break; /* No more completions */ /* Copy completion to caller's buffer */ completions[count++] = *cqe; /* Advance head */ q->cq_head++; if (q->cq_head >= q->cq_depth) { q->cq_head = 0; q->cq_phase ^= 1; /* Toggle phase on wrap */ } } if (count > 0) { /* Ring completion queue head doorbell */ /* Completion Queue y Head Doorbell offset = 0x1000 + ((2y+1) * doorbell_stride) */ writel(q->cq_head, dev->doorbells + ((2 * qid + 1) * dev->doorbell_stride)); } return count;}Memory-Mapped I/O has become the dominant paradigm for good reasons. Its advantages extend across hardware design, software development, and system performance.
MMIO's greatest strength is uniformity. Memory allocation, protection, virtual addressing, and access primitives all work the same whether the target is DRAM or a device. This reduces cognitive load for developers and enables reuse of operating system infrastructure.
This page has provided a thorough exploration of Memory-Mapped I/O, the dominant paradigm for modern device communication. Let's consolidate the key concepts:
Looking Ahead
With both Port-Mapped and Memory-Mapped I/O understood, we're ready to examine how these paradigms consume address space—the critical system design consideration that influences everything from firmware layout to operating system memory management.
You now have mastery over Memory-Mapped I/O concepts—from hardware address translation through kernel access primitives to modern device usage patterns. This knowledge directly applies to understanding device drivers, debugging I/O issues, and designing high-performance systems.