Loading content...
When a process reads a file or sends a network packet, that high-level operation ultimately translates into low-level communication between the CPU and physical devices. But how does the processor actually "talk" to a disk, a network card, or a keyboard? The answer involves device communication mechanisms—the fundamental techniques that enable CPUs and peripherals to exchange commands, status, and data.
This communication happens across physical distances (centimeters to meters), clock domains, and architectural boundaries. Understanding these mechanisms is essential for device driver development, hardware debugging, and comprehending how the seemingly magical abstraction of file I/O actually works at the hardware level.
By the end of this page, you will understand the fundamental mechanisms for CPU-device communication, the differences between port-mapped and memory-mapped I/O, how device registers control hardware behavior, modern bus architectures and their role in device communication, and the hardware abstractions that make uniform device access possible.
All device communication follows a fundamental pattern: the CPU and device exchange information through well-defined interfaces, typically consisting of registers and data areas. Let's understand this model.
The Three Types of Device Registers:
Most devices expose their functionality through three categories of registers:
Basic I/O Operation Flow:
┌──────────┐ ┌──────────────┐
│ CPU │ │ Device │
└────┬─────┘ └──────┬───────┘
│ │
│ 1. Write command to control register │
├──────────────────────────────────────────>│
│ │
│ 2. (Optional) Write data to data register │
├──────────────────────────────────────────>│
│ │
│ Device processes command │
│ ┌──────┴──────┐
│ │(Processing) │
│ └──────┬──────┘
│ │
│ 3. Device sets status register │
│<──────────────────────────────────────────┤
│ │
│ 4. CPU reads status register (poll) │
│ OR device raises interrupt │
│ │
│ 5. CPU reads data from data register │
│<──────────────────────────────────────────┤
This pattern, with variations, applies to virtually all device communication.
Device registers are not memory in the traditional sense. Reading a status register might clear flags (read has side effects). Writing to a command register triggers actions. Device registers can return different values on consecutive reads even without writes. This is fundamentally different from RAM, where values persist until modified.
Port-Mapped I/O (also called Isolated I/O or I/O-Mapped I/O) uses a separate address space for device registers, distinct from memory addresses. This approach, pioneered by Intel x86 processors, uses special CPU instructions to access devices.
How PMIO Works:
The CPU has a dedicated I/O address space (separate from the memory address space) containing I/O ports. Each device is assigned specific port addresses. Special instructions access these ports:
123456789101112131415161718192021222324252627282930313233343536373839404142
; Port-Mapped I/O examples (x86 architecture) ; The x86 I/O port address space is 64KB (ports 0x0000 - 0xFFFF) ; === Reading from a port === ; Read byte from port 0x60 (keyboard data port)in al, 0x60 ; Read into AL register ; Read word from port 0x1F0 (primary ATA data port)in ax, 0x1F0 ; Read 16-bit word into AX ; Read byte from port specified in DXmov dx, 0x3F8 ; COM1 serial portin al, dx ; Read byte ; === Writing to a port === ; Write byte to port 0x64 (keyboard command port)mov al, 0xFE ; System reset commandout 0x64, al ; Write to port ; Write byte to port specified in DXmov dx, 0x20 ; PIC command portmov al, 0x20 ; End of Interrupt (EOI)out dx, al ; === String port I/O (for bulk transfers) === ; Rep insw: Read 256 words from port 0x1F0 into ES:DImov dx, 0x1F0 ; ATA data portmov cx, 256 ; Word countmov di, buffer ; Destination bufferrep insw ; Repeat IN string word ; Rep outsw: Write 256 words to port 0x1F0 from DS:SImov dx, 0x1F0mov cx, 256mov si, bufferrep outswCommon x86 I/O Port Assignments:
| Port Range | Device | Usage |
|---|---|---|
| 0x00-0x1F | DMA Controller 1 | DMA channel configuration |
| 0x20-0x3F | PIC (8259A) | Programmable Interrupt Controller |
| 0x40-0x5F | Timer (8254) | System timer, speaker |
| 0x60-0x6F | Keyboard Controller (8042) | Keyboard, PS/2 mouse |
| 0x70-0x7F | RTC/CMOS | Real-time clock, BIOS settings |
| 0x80 | POST Diagnostic | Debug codes during boot |
| 0x1F0-0x1F7 | Primary ATA Controller | Primary hard drive |
| 0x170-0x177 | Secondary ATA Controller | Secondary hard drive |
| 0x3F8-0x3FF | COM1 (Serial) | First serial port |
| 0x2F8-0x2FF | COM2 (Serial) | Second serial port |
| 0x378-0x37F | LPT1 (Parallel) | First parallel port |
On x86, IN and OUT instructions are privileged—they can only execute in Ring 0 (kernel mode). Userspace programs cannot directly access I/O ports. The kernel must provide system calls or ioctl interfaces for applications that need device access. On Linux, root can use iopl() or ioperm() to grant port access to userspace (extremely dangerous).
Memory-Mapped I/O addresses device registers through the regular memory address space. Device registers appear as memory locations, and the CPU accesses them using standard load/store instructions. This is the dominant approach in modern systems.
How MMIO Works:
Regions of the physical address space are assigned to devices instead of RAM. When the CPU accesses addresses in these regions, the memory controller routes the transaction to the appropriate device instead of main memory.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
#include <stdint.h> /** * Memory-Mapped I/O access in a device driver * * Device registers mapped at physical address 0xFEDC0000 */ // Device register definitions (offsets from base)#define REG_STATUS 0x00 // Read: device status#define REG_CONTROL 0x04 // Write: device control#define REG_DATA 0x08 // R/W: data transfer#define REG_INTERRUPT 0x0C // R/W: interrupt control // Status register bits#define STATUS_READY (1 << 0)#define STATUS_BUSY (1 << 1)#define STATUS_ERROR (1 << 2)#define STATUS_DMA_DONE (1 << 3) // After mapping, this points to the device registersvolatile uint32_t *device_base; /** * CRITICAL: volatile prevents compiler optimization * * Without volatile, the compiler might: * - Cache register reads in CPU registers * - Reorder or eliminate seemingly redundant accesses * - Merge multiple writes into one * * These optimizations would break device communication! */ // Read device statusstatic inline uint32_t device_read_status(void) { return device_base[REG_STATUS / sizeof(uint32_t)];} // Write to control registerstatic inline void device_write_control(uint32_t value) { device_base[REG_CONTROL / sizeof(uint32_t)] = value;} // Wait for device ready (polling example)static int device_wait_ready(unsigned long timeout_us) { while (timeout_us--) { uint32_t status = device_read_status(); if (status & STATUS_ERROR) return -EIO; // Device error if (status & STATUS_READY) return 0; // Success udelay(1); // Wait 1 microsecond } return -ETIMEDOUT;} // Map device registers in driver initializationstatic int device_probe(struct pci_dev *pdev) { resource_size_t mmio_start, mmio_len; // Get MMIO region from BAR (Base Address Register) mmio_start = pci_resource_start(pdev, 0); mmio_len = pci_resource_len(pdev, 0); // Map physical addresses to kernel virtual addresses device_base = ioremap(mmio_start, mmio_len); if (!device_base) return -ENOMEM; printk(KERN_INFO "Device MMIO mapped at %p", device_base); return 0;}MMIO vs. PMIO Comparison:
| Aspect | Port-Mapped I/O (PMIO) | Memory-Mapped I/O (MMIO) |
|---|---|---|
| Address Space | Separate I/O space | Shared memory space |
| Instructions | Special IN/OUT | Standard load/store (MOV, LDR) |
| Address Range | Limited (64KB on x86) | Large (full address space) |
| Compiler Support | Requires inline assembly | Standard C with volatile |
| CPU Architecture | x86 primarily | All modern architectures |
| Protection | IOPL/IOPERM privileges | Page tables/MMU |
| Performance | Slightly slower (special decode) | Can use regular caching infrastructure |
| Flexibility | Limited | Supports large register sets, buffers |
Modern CPUs reorder memory operations for performance. For MMIO, this is dangerous—a write to a control register might execute before a preceding write to a data register. Memory barriers (asm volatile("" ::: "memory") on x86, dmb/dsb on ARM) force ordering. Linux provides readl()/writel() which include necessary barriers.
The CPU doesn't communicate directly with most devices. Instead, a hierarchy of controllers and buses intermediates, providing standardized interfaces and electrical connectivity.
The Device Controller:
Every device has a controller—an electronic subsystem that:
Simple devices (LEDs, buttons) might have trivial controllers. Complex devices (SSDs, GPUs) have sophisticated controllers with their own processors and firmware.
Modern PC Bus Hierarchy:
┌─────────────────────────────────────────────────────────────────┐
│ CPU │
│ (Internal interconnect) │
└───────────────────────────┬─────────────────────────────────────┘
│
┌───────────────────────────┼─────────────────────────────────────┐
│ System Agent / North Bridge │
│ │
│ ┌──────────────────────┼──────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Memory │ │ PCIe │ │ Integrated│ │
│ │Controller│ │ Root │ │ GPU │ │
│ │ │ │ Complex │ │ │ │
│ └─────────┘ └────┬─────┘ └───────────┘ │
│ │ │
└──────────────────────────┼───────────────────────────────────────┘
│
┌───────────────────┬─┴─────────────────┬────────────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ NVMe │ │ GPU │ │ NIC │ │ Platform│
│ SSD │ │Discrete │ │ │ │Controller│
│ │ │ │ │ │ │ (PCH) │
└─────────┘ └─────────┘ └─────────┘ └────┬────┘
│
PCIe x4 PCIe x16 PCIe x4 │
│
┌──────────────────────────────────────────────────────────┬─┘
│ Platform Controller Hub (PCH) │
│ (Formerly "South Bridge") │
│ │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │SATA │ │ USB │ │Audio │ │ LAN │ │ SPI │ │
│ │Ports │ │Ports │ │Codec │ │(opt.)│ │Flash │ │
│ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ │
└───────────────────────────────────────────────────────────┘
Major Bus Technologies:
| Bus | Topology | Speed (Current Gen) | Primary Use |
|---|---|---|---|
| PCIe 5.0 | Point-to-point | 32 GT/s per lane (up to x16) | GPUs, NVMe SSDs, NICs, accelerators |
| PCIe 6.0 | Point-to-point | 64 GT/s per lane | Next-gen high-bandwidth devices |
| USB 3.2 Gen 2x2 | Star (hub-based) | 20 Gbps | Peripherals, external storage |
| USB4/Thunderbolt 4 | Tunneled | 40 Gbps (80 Gbps optional) | Universal connectivity, docking |
| SATA III | Point-to-point | 6 Gbps | Legacy SSDs, HDDs, optical |
| SAS-4 | Point-to-point | 22.5 Gbps per lane | Enterprise storage arrays |
| NVMe over Fabrics | Network | Variable (network speed) | Distributed storage |
PCIe has become the universal high-performance bus. Modern systems route almost everything through PCIe: storage (NVMe), graphics (PCIe x16), networking (PCIe NICs), and even peripheral controllers (USB/SATA controllers are on PCIe). Understanding PCIe is essential for modern systems work.
PCI Express (PCIe) is the dominant interconnect for I/O devices in modern systems. Understanding PCIe illuminates how high-speed devices communicate with the CPU.
PCIe Architecture Overview:
PCIe uses a packet-based, point-to-point protocol with several key concepts:
PCIe Configuration Space:
Every PCIe device has a 256-byte (legacy) or 4KB (PCIe extended) configuration space containing device identification, capabilities, and BAR definitions:
1234567891011121314151617181920212223242526272829
# View PCI/PCIe deviceslspci# 00:00.0 Host bridge: Intel Corporation Device 9a14 (rev 01)# 00:02.0 VGA compatible controller: Intel Corporation Device 9a49 (rev 03)# 00:14.0 USB controller: Intel Corporation Device a0ed (rev 20)# 01:00.0 Non-Volatile memory controller: Samsung Electronics Device a80a # Show detailed device informationlspci -vv -s 01:00.0# Output includes:# - Vendor/Device ID# - Memory regions (BARs) and their sizes# - Link capabilities (speed, width)# - MSI/MSI-X interrupt configuration# - Power management state # Example BAR output:# Region 0: Memory at b4000000 (64-bit, non-prefetchable) [size=16K]# Region 4: Memory at b4100000 (64-bit, non-prefetchable) [size=1M] # View the PCIe topology as a treelspci -tv# -[0000:00]-+-00.0 Intel Corporation Device 9a14# +-01.0-[01]----00.0 Samsung Electronics Device a80a# +-02.0 Intel Corporation Device 9a49# +-14.0 Intel Corporation Device a0ed # Read raw configuration space (first 64 bytes)lspci -xxx -s 01:00.0PCIe Transaction Types:
| TLP Type | Direction | Purpose |
|---|---|---|
| Memory Read | Requester → Target | Read data from target's memory (BAR region) |
| Memory Write | Requester → Target | Write data to target's memory (BAR region) |
| Completion | Target → Requester | Return requested data (for reads) |
| I/O Read/Write | Either | Legacy I/O port access (rarely used) |
| Configuration Read/Write | Either | Access device configuration space |
| Message | Various | Interrupts, errors, power management events |
PCIe devices can be bus masters—they can initiate transactions without CPU involvement. This enables DMA: instead of the CPU explicitly moving data byte-by-byte, the NIC or SSD directly writes received data to system RAM, then interrupts the CPU. This dramatically reduces CPU overhead for I/O. The device driver sets up DMA descriptors pointing to memory buffers; the device handles the transfer.
Before the OS can communicate with devices, it must discover what devices exist and how to address them. This enumeration process happens at boot and when devices are hot-plugged.
PCI/PCIe Enumeration:
PCI uses a hierarchical address scheme: Bus:Device:Function (BDF). During enumeration:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
#include <linux/pci.h> /** * PCI device driver registration in Linux * * The kernel matches devices by Vendor/Device ID * and calls our probe() function when found */ // Define which devices this driver supportsstatic const struct pci_device_id my_device_ids[] = { { PCI_DEVICE(0x1234, 0x5678) }, // Vendor 0x1234, Device 0x5678 { PCI_DEVICE(0x1234, 0x5679) }, // Same vendor, different device { 0, } // Terminator};MODULE_DEVICE_TABLE(pci, my_device_ids); // Called when a matching device is discoveredstatic int my_device_probe(struct pci_dev *pdev, const struct pci_device_id *id) { int err; resource_size_t mmio_addr, mmio_len; // Enable the device (transitions from D3 to D0 power state) err = pci_enable_device(pdev); if (err) return err; // Request exclusive access to MMIO regions err = pci_request_regions(pdev, "my_device"); if (err) goto err_enable; // Get BAR 0 address and size mmio_addr = pci_resource_start(pdev, 0); mmio_len = pci_resource_len(pdev, 0); printk(KERN_INFO "my_device: found at %04x:%02x:%02x.%d", pci_domain_nr(pdev->bus), pdev->bus->number, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn)); printk(KERN_INFO "my_device: MMIO at 0x%llx, size 0x%llx", (unsigned long long)mmio_addr, (unsigned long long)mmio_len); // Enable bus mastering for DMA pci_set_master(pdev); return 0; err_enable: pci_disable_device(pdev); return err;} // Called when device is removed (or driver unloaded)static void my_device_remove(struct pci_dev *pdev) { pci_release_regions(pdev); pci_disable_device(pdev);} static struct pci_driver my_pci_driver = { .name = "my_device", .id_table = my_device_ids, .probe = my_device_probe, .remove = my_device_remove,}; module_pci_driver(my_pci_driver);Hot-plug (connecting devices while running) adds significant complexity. The OS must handle surprise removal gracefully—in-flight I/O must be failed, memory remapped, interrupts disabled, and resources freed. USB and Thunderbolt make hot-plug common; even PCIe supports it in servers and docking stations. Drivers must be written to handle sudden device disappearance.
Beyond PCIe, several specialized interconnect technologies address specific device communication needs.
CXL (Compute Express Link):
CXL builds on PCIe physical layer to enable cache-coherent memory expansion and device memory sharing:
CXL enables memory-semantic accelerators and memory pooling—a device's memory appears as standard RAM to the CPU.
| Technology | Primary Use | Key Feature | Speed |
|---|---|---|---|
| CXL | Memory expansion, accelerators | Cache coherency | PCIe 5.0+ speeds |
| NVLink | GPU-to-GPU communication | High-bandwidth GPU interconnect | Up to 900 GB/s bidirectional |
| Infinity Fabric | AMD chiplet interconnect | Multi-die integration | Internal to package |
| UPI (Ultra Path Interconnect) | Multi-socket CPU | Cache coherency across CPUs | 16 GT/s per lane |
| CCIX | Accelerator coherency | Industry-standard cache coherence | PCIe-based |
| Gen-Z | Memory-centric fabric | Memory semantic operations | Up to 56 Gbps per lane |
USB: Universal Serial Bus
USB remains the primary external device interconnect:
USB Evolution:
USB 1.0/1.1 (1996): 1.5/12 Mbps (Low/Full Speed)
│
▼
USB 2.0 (2000): 480 Mbps (High Speed)
│
▼
USB 3.0/3.1/3.2 (2008-2017): 5/10/20 Gbps (SuperSpeed)
│
▼
USB4 (2019): 20/40 Gbps (based on Thunderbolt 3)
│
▼
USB4 v2 (2022): 80/120 Gbps asymmetric
USB uses a hub-based tree topology with a single host controller. Devices are automatically enumerated and can request bandwidth reservations for isochronous transfers (audio/video).
Modern trends show convergence: USB4 and Thunderbolt tunnel PCIe. NVMe can run over PCIe, Thunderbolt, or fabric (NVMe-oF). CXL uses PCIe physical layer. This convergence simplifies system design—PCIe becomes the universal substrate, with protocol tunneling providing flexibility.
Device communication is where software meets hardware—where load/store instructions translate into electrical signals that control physical devices. Understanding this interface is fundamental to systems programming, driver development, and hardware debugging.
Key Takeaways:
Module Complete:
With this page, you've completed an exhaustive exploration of I/O device types—from the fundamental classification of block, character, and network devices, through device characteristics that define behavior, to the communication mechanisms that enable CPU-device interaction. This foundation prepares you for deeper study of specific device management topics: controllers, interrupts, DMA, and device driver development.
You now have a comprehensive understanding of I/O device types—their classification, characteristics, and the mechanisms through which CPUs communicate with hardware. This knowledge is fundamental to operating system design, device driver development, and systems performance optimization. The next module explores I/O controllers in greater depth.