Loading content...
At the heart of every device driver lies an interface—a formal contract that defines how the driver communicates with the rest of the operating system. This interface is not merely a collection of function signatures; it's a promise about behavior, semantics, and guarantees that both sides must honor.
The driver interface serves two fundamental purposes: it provides a standardized mechanism for user-space applications to access hardware, and it establishes a consistent abstraction that allows the kernel to manage devices without knowing their implementation details. Without well-defined interfaces, every application would need custom code for every device—a maintenance nightmare that would cripple computing.
By the end of this page, you will understand the file operations interface that drivers implement, the ioctl mechanism for device-specific commands, memory mapping between user and kernel space, the buffer and DMA interfaces for data transfer, and the sysfs interface for device attributes. This knowledge is essential for developing drivers, understanding kernel internals, and debugging device communication issues.
In Unix-like systems, the fundamental principle of "everything is a file" extends to devices. Character devices, block devices, and even some pseudo-devices are accessed through the file abstraction. When a user-space program opens /dev/sda or /dev/ttyUSB0, it receives a file descriptor that can be used with standard I/O system calls.
The file_operations structure is the cornerstone of this abstraction. It's a table of function pointers that the kernel invokes when user-space performs file operations on a device. By implementing these functions, a driver teaches the kernel how to handle reads, writes, and other operations for its specific hardware.
123456789101112131415161718192021222324252627282930313233343536373839
/* Simplified file_operations structure (Linux kernel) */struct file_operations { struct module *owner; /* Module owning this structure */ /* File lifecycle */ int (*open)(struct inode *, struct file *); int (*release)(struct inode *, struct file *); int (*flush)(struct file *, fl_owner_t id); /* Data transfer */ ssize_t (*read)(struct file *, char __user *, size_t, loff_t *); ssize_t (*write)(struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter)(struct kiocb *, struct iov_iter *); ssize_t (*write_iter)(struct kiocb *, struct iov_iter *); /* Position management */ loff_t (*llseek)(struct file *, loff_t, int); /* Device control */ long (*unlocked_ioctl)(struct file *, unsigned int, unsigned long); long (*compat_ioctl)(struct file *, unsigned int, unsigned long); /* Memory mapping */ int (*mmap)(struct file *, struct vm_area_struct *); /* Polling and async notification */ __poll_t (*poll)(struct file *, struct poll_table_struct *); int (*fasync)(int, struct file *, int); /* Locking */ int (*lock)(struct file *, int, struct file_lock *); int (*flock)(struct file *, int, struct file_lock *); /* Splice support */ ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);};| Method | System Call | Purpose | Required? |
|---|---|---|---|
| open | open() | Initialize device, allocate per-file resources | Usually |
| release | close() | Clean up when last reference closes | Usually |
| read | read() | Transfer data from device to user buffer | For readable devices |
| write | write() | Transfer data from user buffer to device | For writable devices |
| llseek | lseek() | Change file position (for seekable devices) | For random-access devices |
| unlocked_ioctl | ioctl() | Device-specific control commands | For most devices |
| mmap | mmap() | Map device memory to user address space | For memory-mapped devices |
| poll | poll()/select()/epoll() | Report I/O readiness | For async-capable devices |
Implementation Semantics:
open(): Called when a process opens the device file. Responsibilities include:
file->private_data)release(): Called when the last reference to the file is closed. Responsibilities:
read() and write(): These are the core data transfer functions. They must:
copy_to_user()/copy_from_user()*ppos) if the device is seekable123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126
#include <linux/fs.h>#include <linux/uaccess.h> struct mydev_private { char buffer[4096]; size_t data_len; struct mutex lock; wait_queue_head_t read_queue;}; static int mydev_open(struct inode *inode, struct file *file){ struct mydev_private *priv; /* Allocate per-file state */ priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; mutex_init(&priv->lock); init_waitqueue_head(&priv->read_queue); file->private_data = priv; pr_info("Device opened by pid %d\n", current->pid); return 0; /* Success */} static int mydev_release(struct inode *inode, struct file *file){ struct mydev_private *priv = file->private_data; pr_info("Device closed by pid %d\n", current->pid); kfree(priv); return 0;} static ssize_t mydev_read(struct file *file, char __user *buf, size_t count, loff_t *ppos){ struct mydev_private *priv = file->private_data; ssize_t retval = 0; if (mutex_lock_interruptible(&priv->lock)) return -ERESTARTSYS; /* Wait for data if buffer is empty */ while (priv->data_len == 0) { mutex_unlock(&priv->lock); if (file->f_flags & O_NONBLOCK) return -EAGAIN; if (wait_event_interruptible(priv->read_queue, priv->data_len > 0)) return -ERESTARTSYS; if (mutex_lock_interruptible(&priv->lock)) return -ERESTARTSYS; } /* Limit read to available data */ if (count > priv->data_len) count = priv->data_len; /* Copy to user space - CRITICAL: never use memcpy! */ if (copy_to_user(buf, priv->buffer, count)) { retval = -EFAULT; goto out; } /* Shift remaining data to start of buffer */ priv->data_len -= count; memmove(priv->buffer, priv->buffer + count, priv->data_len); retval = count; out: mutex_unlock(&priv->lock); return retval;} static ssize_t mydev_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos){ struct mydev_private *priv = file->private_data; ssize_t retval; size_t space; if (mutex_lock_interruptible(&priv->lock)) return -ERESTARTSYS; space = sizeof(priv->buffer) - priv->data_len; if (count > space) count = space; if (count == 0) { retval = -ENOSPC; goto out; } /* Copy from user space */ if (copy_from_user(priv->buffer + priv->data_len, buf, count)) { retval = -EFAULT; goto out; } priv->data_len += count; retval = count; /* Wake up readers */ wake_up_interruptible(&priv->read_queue); out: mutex_unlock(&priv->lock); return retval;} static const struct file_operations mydev_fops = { .owner = THIS_MODULE, .open = mydev_open, .release = mydev_release, .read = mydev_read, .write = mydev_write,};NEVER use memcpy() or direct pointer dereference with user-space pointers. User pointers may be invalid, unmapped, or malicious. Always use copy_to_user(), copy_from_user(), get_user(), and put_user(). These functions validate addresses and handle faults safely. Violations can crash the kernel or create security vulnerabilities.
While read() and write() handle data transfer, devices often need control operations that don't fit the data stream model. How do you change a serial port's baud rate? Query a disk's geometry? Set a network interface's MAC address? The answer is ioctl (I/O Control).
The ioctl mechanism provides a general-purpose channel for sending commands and parameters to devices. Each device defines its own set of ioctl command codes, each with specific semantics and argument types.
ioctl Command Number Structure:
ioctl commands are not arbitrary integers. They follow a structured format that encodes:
This structure enables the kernel to validate ioctl calls and helps avoid command number collisions between different drivers.
1234567891011121314151617181920212223242526272829303132333435363738394041
#include <linux/ioctl.h> /* * ioctl command macros: * _IO(type, nr) - Command with no argument * _IOR(type, nr, datatype) - Command that reads data from device * _IOW(type, nr, datatype) - Command that writes data to device * _IOWR(type, nr, datatype)- Command that does both */ /* Define a magic number for our driver (use unique value) */#define MYDEV_IOC_MAGIC 'M' /* Command definitions for a hypothetical device */ /* Get device status (read 4 bytes from device) */#define MYDEV_GET_STATUS _IOR(MYDEV_IOC_MAGIC, 0, uint32_t) /* Set device configuration (write config struct to device) */#define MYDEV_SET_CONFIG _IOW(MYDEV_IOC_MAGIC, 1, struct mydev_config) /* Get/set device parameters (bidirectional) */#define MYDEV_XFER_PARAMS _IOWR(MYDEV_IOC_MAGIC, 2, struct mydev_params) /* Reset device (no data argument) */#define MYDEV_RESET _IO(MYDEV_IOC_MAGIC, 3) /* Maximum command number for validation */#define MYDEV_IOC_MAXNR 3 /* Data structures for ioctl arguments */struct mydev_config { uint32_t mode; uint32_t speed; uint32_t flags;}; struct mydev_params { uint32_t input_param; uint32_t output_result;};12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
static long mydev_ioctl(struct file *file, unsigned int cmd, unsigned long arg){ struct mydev_private *priv = file->private_data; int retval = 0; /* Validate command type and number */ if (_IOC_TYPE(cmd) != MYDEV_IOC_MAGIC) return -ENOTTY; if (_IOC_NR(cmd) > MYDEV_IOC_MAXNR) return -ENOTTY; /* Validate argument pointer accessibility */ if (_IOC_DIR(cmd) & _IOC_READ) { if (!access_ok((void __user *)arg, _IOC_SIZE(cmd))) return -EFAULT; } if (_IOC_DIR(cmd) & _IOC_WRITE) { if (!access_ok((void __user *)arg, _IOC_SIZE(cmd))) return -EFAULT; } switch (cmd) { case MYDEV_GET_STATUS: { uint32_t status = priv->device_status; if (put_user(status, (uint32_t __user *)arg)) return -EFAULT; break; } case MYDEV_SET_CONFIG: { struct mydev_config cfg; if (copy_from_user(&cfg, (void __user *)arg, sizeof(cfg))) return -EFAULT; /* Validate configuration */ if (cfg.speed > MAX_SPEED || cfg.mode > MAX_MODE) return -EINVAL; /* Apply configuration to hardware */ retval = apply_config(priv, &cfg); break; } case MYDEV_XFER_PARAMS: { struct mydev_params params; if (copy_from_user(¶ms, (void __user *)arg, sizeof(params))) return -EFAULT; /* Process input, generate output */ params.output_result = process_params(priv, params.input_param); if (copy_to_user((void __user *)arg, ¶ms, sizeof(params))) return -EFAULT; break; } case MYDEV_RESET: retval = reset_device(priv); break; default: return -ENOTTY; /* Unknown command */ } return retval;}While ioctl is pervasive, it has drawbacks: opaque binary protocol, limited discoverability, and 32/64-bit compatibility challenges. Modern kernel interfaces often prefer sysfs attributes for simple values, netlink sockets for complex configuration, or configfs for device-specific filesystems. Consider these alternatives for new interfaces.
For high-performance I/O or devices with large memory regions, copying data between user and kernel space is inefficient. Memory mapping (mmap) provides a zero-copy alternative by mapping device memory or kernel buffers directly into a process's address space.
When a user-space program maps a device file, the driver's mmap handler establishes the virtual-to-physical address mapping. Subsequently, the process can read and write device memory as if it were regular memory—no system calls required for data access.
Types of Memory Mapping:
1. Device Register Mapping: Maps device control and status registers to user space. Used for GPU command submission, network interface card (NIC) registers, or custom hardware accelerators. Allows user-space direct hardware manipulation without kernel transitions.
2. DMA Buffer Mapping: Maps kernel-allocated DMA buffers to user space. Used for video capture, audio, and network applications where avoiding copy overhead is critical. The kernel maintains the actual buffer; user space gets direct access.
3. Frame Buffer Mapping: Maps graphics memory for display. Applications write pixels directly to the mapped region, which the GPU reads for display output.
4. Shared Memory: Although not strictly device-related, mmap can create shared memory between processes, useful for inter-process communication with device data.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
#include <linux/mm.h> /* mmap handler for device register mapping */static int mydev_mmap(struct file *file, struct vm_area_struct *vma){ struct mydev_private *priv = file->private_data; unsigned long size = vma->vm_end - vma->vm_start; unsigned long offset = vma->vm_pgoff << PAGE_SHIFT; phys_addr_t phys_addr; /* Validate mapping size */ if (size > DEVICE_MEMORY_SIZE) return -EINVAL; /* Validate offset */ if (offset + size > DEVICE_MEMORY_SIZE) return -EINVAL; /* Calculate physical address of device memory */ phys_addr = priv->device_phys_base + offset; /* Set appropriate page protection for device memory */ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); /* Prevent core dumps from including this region */ vm_flags_set(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP); /* Establish the mapping */ if (remap_pfn_range(vma, vma->vm_start, phys_addr >> PAGE_SHIFT, size, vma->vm_page_prot)) return -EAGAIN; return 0;} /* mmap handler for DMA buffer mapping */static int mydev_mmap_dma(struct file *file, struct vm_area_struct *vma){ struct mydev_private *priv = file->private_data; unsigned long size = vma->vm_end - vma->vm_start; /* Validate against allocated DMA buffer size */ if (size > priv->dma_buffer_size) return -EINVAL; /* Map coherent DMA buffer to user space */ return dma_mmap_coherent(priv->dev, vma, priv->dma_buffer_virt, priv->dma_buffer_phys, priv->dma_buffer_size);}| Protection | Purpose | When to Use |
|---|---|---|
| pgprot_noncached | Disable CPU caching | Device registers that must not be cached |
| pgprot_writecombine | Write-combining mode | Frame buffers for improved write performance |
| pgprot_device | Strong ordering | Memory-mapped I/O requiring strict ordering |
| Normal (default) | Cached memory | DMA buffers with software-managed coherence |
Mapping device memory to user space is a potential security risk. Malicious processes could manipulate hardware in unintended ways. Always validate permissions, limit mappable regions, and consider whether the mapping is truly necessary. Some security-critical systems disable device mmap entirely.
Efficient I/O requires knowing when a device is ready without continuously checking (polling). The poll interface lets applications wait for I/O readiness on multiple devices simultaneously, enabling event-driven programming without dedicated threads per device.
The Polling Problem:
Consider a program that reads from both a network socket and a hardware sensor. Blocking read() on one device prevents data from the other. Spinning in a polling loop wastes CPU. The solution is to let the kernel notify the application when any device becomes ready.
User-Space Interfaces:
All of these ultimately call the driver's poll function to query device readiness.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
#include <linux/poll.h> static __poll_t mydev_poll(struct file *file, struct poll_table_struct *wait){ struct mydev_private *priv = file->private_data; __poll_t mask = 0; /* * Register wait queues with poll infrastructure. * This doesn't block - it just tells epoll/select/poll * which wait queues to watch. */ poll_wait(file, &priv->read_queue, wait); poll_wait(file, &priv->write_queue, wait); /* * Check current state and set appropriate flags. * These checks must be done WITHOUT holding locks * that might cause deadlock with wake_up calls. */ /* Can read without blocking? */ if (priv->read_buffer_count > 0) mask |= EPOLLIN | EPOLLRDNORM; /* Can write without blocking? */ if (priv->write_buffer_space > 0) mask |= EPOLLOUT | EPOLLWRNORM; /* Device has been disconnected? */ if (priv->disconnected) mask |= EPOLLHUP; /* Error condition? */ if (priv->error_pending) mask |= EPOLLERR; return mask;} /* * Elsewhere in driver - when device state changes: */static void data_received(struct mydev_private *priv, void *data, size_t len){ /* Add data to buffer */ add_to_read_buffer(priv, data, len); /* Wake up anyone waiting for read readiness */ wake_up_interruptible(&priv->read_queue);} static void buffer_space_available(struct mydev_private *priv){ /* Wake up anyone waiting to write */ wake_up_interruptible(&priv->write_queue);}Asynchronous Notification with fasync:
Beyond polling, some applications prefer signal-based notification. The fasync interface sends SIGIO to a process when the device becomes ready, allowing event-driven programming without explicit polling.
This is less common than poll/epoll but still useful for simple applications or when integrating with signal-based event loops.
1234567891011121314151617181920212223242526
static int mydev_fasync(int fd, struct file *file, int mode){ struct mydev_private *priv = file->private_data; /* fasync_helper manages the async queue */ return fasync_helper(fd, file, mode, &priv->async_queue);} /* When data/event occurs, notify async readers */static void notify_async_readers(struct mydev_private *priv){ if (priv->async_queue) kill_fasync(&priv->async_queue, SIGIO, POLL_IN);} /* In release(), clean up async state */static int mydev_release(struct inode *inode, struct file *file){ struct mydev_private *priv = file->private_data; /* Remove from async notification list */ mydev_fasync(-1, file, 0); /* ... rest of cleanup ... */ return 0;}epoll supports both level-triggered (default, like poll) and edge-triggered (EPOLLET) modes. Edge-triggered notifies only on state transitions, not while readable/writable. This can improve performance but requires careful handling—you must drain data completely or risk lost notifications. Level-triggered is safer for most drivers.
Block devices (disks, SSDs, USB storage) use a specialized interface distinct from character devices. Instead of byte streams, they handle fixed-size blocks (sectors). The block layer provides sophisticated request queuing, I/O scheduling, and caching—services that would be redundant in character drivers.
Block vs Character Device Model:
Character devices process I/O requests one at a time through read/write calls. Block devices receive queued requests through a request processing function or block layer multiqueue (blk-mq) interface. The kernel batches, reorders, and merges requests for efficiency before delivering them to the driver.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
#include <linux/blkdev.h>#include <linux/blk-mq.h> /* Block device operations structure */static const struct block_device_operations myblk_fops = { .owner = THIS_MODULE, .open = myblk_open, .release = myblk_release, .ioctl = myblk_ioctl, .getgeo = myblk_getgeo, /* Report disk geometry */}; /* blk-mq operations - modern request handling */static const struct blk_mq_ops myblk_mq_ops = { .queue_rq = myblk_queue_rq, /* Queue a request */ .complete = myblk_complete, /* Complete a request */ .init_hctx = myblk_init_hctx, /* Init hardware context */ .timeout = myblk_timeout, /* Handle timed-out request */}; /* Process a single request (called by blk-mq layer) */static blk_status_t myblk_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd){ struct request *rq = bd->rq; struct myblk_device *dev = hctx->queue->queuedata; /* Start tracking this request */ blk_mq_start_request(rq); /* Determine operation type */ switch (req_op(rq)) { case REQ_OP_READ: return myblk_do_read(dev, rq); case REQ_OP_WRITE: return myblk_do_write(dev, rq); case REQ_OP_FLUSH: return myblk_do_flush(dev, rq); case REQ_OP_DISCARD: return myblk_do_discard(dev, rq); default: return BLK_STS_IOERR; }} /* Report disk geometry (for compatibility) */static int myblk_getgeo(struct block_device *bdev, struct hd_geometry *geo){ struct myblk_device *dev = bdev->bd_disk->private_data; geo->heads = 64; geo->sectors = 32; geo->cylinders = dev->size / (64 * 32 * 512); return 0;}| Operation | Purpose | Driver Responsibility |
|---|---|---|
| REQ_OP_READ | Read sector data | DMA data from device to memory |
| REQ_OP_WRITE | Write sector data | DMA data from memory to device |
| REQ_OP_FLUSH | Ensure data persistence | Flush device write caches |
| REQ_OP_DISCARD | Mark sectors unused (TRIM) | Inform device sectors are free for optimization |
| REQ_OP_SECURE_ERASE | Cryptographic erasure | Securely destroy sector data |
| REQ_OP_WRITE_ZEROES | Zero sectors efficiently | Use hardware zero-fill if available |
The multi-queue block layer (blk-mq) replaced the legacy single-queue architecture to better utilize modern multi-core CPUs and NVMe devices with multiple hardware queues. Drivers using blk-mq can handle hundreds of thousands of IOPS by distributing requests across multiple hardware submission queues.
sysfs is a virtual filesystem mounted at /sys that exposes kernel objects and their attributes as files. Unlike ioctl's binary protocol, sysfs provides a human-readable interface for device configuration and status. Users and administrators can interact with devices using simple shell commands.
sysfs Philosophy:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
#include <linux/device.h>#include <linux/sysfs.h> /* Read-only attribute: device status */static ssize_t status_show(struct device *dev, struct device_attribute *attr, char *buf){ struct mydev_data *data = dev_get_drvdata(dev); const char *status_str; switch (data->status) { case STATUS_IDLE: status_str = "idle"; break; case STATUS_ACTIVE: status_str = "active"; break; case STATUS_ERROR: status_str = "error"; break; default: status_str = "unknown"; break; } return sysfs_emit(buf, "%s\n", status_str);}static DEVICE_ATTR_RO(status); /* Read-write attribute: device speed setting */static ssize_t speed_show(struct device *dev, struct device_attribute *attr, char *buf){ struct mydev_data *data = dev_get_drvdata(dev); return sysfs_emit(buf, "%u\n", data->speed);} static ssize_t speed_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count){ struct mydev_data *data = dev_get_drvdata(dev); unsigned int new_speed; int ret; ret = kstrtouint(buf, 10, &new_speed); if (ret) return ret; if (new_speed < MIN_SPEED || new_speed > MAX_SPEED) return -EINVAL; mutex_lock(&data->lock); data->speed = new_speed; apply_speed_setting(data); mutex_unlock(&data->lock); return count;}static DEVICE_ATTR_RW(speed); /* Group all attributes */static struct attribute *mydev_attrs[] = { &dev_attr_status.attr, &dev_attr_speed.attr, NULL};ATTRIBUTE_GROUPS(mydev); /* Register with device model */static int mydev_probe(struct platform_device *pdev){ struct mydev_data *data; data = devm_kzalloc(&pdev->dev, sizeof(*data), GFP_KERNEL); if (!data) return -ENOMEM; platform_set_drvdata(pdev, data); /* Attributes automatically created from mydev_groups */ pdev->dev.groups = mydev_groups; return 0;}Using sysfs from User Space:
# Read device status
$ cat /sys/devices/platform/mydevice/status
active
# Read current speed
$ cat /sys/devices/platform/mydevice/speed
115200
# Set new speed
$ echo 230400 > /sys/devices/platform/mydevice/speed
# Find all devices of a class
$ ls /sys/class/mydev/
mydev0 mydev1
sysfs vs ioctl:
| Aspect | sysfs | ioctl |
|---|---|---|
| Interface | Text files | Binary protocol |
| Discoverability | Self-documenting (ls/cat) | Requires documentation |
| Scripting | Trivial | Requires special tools |
| Complex data | Awkward (requires parsing) | Natural |
| Performance | Slower (text conversion) | Faster (direct) |
| Atomicity | Per-attribute only | Multi-value transaction possible |
Use sysfs for simple configuration values, status reporting, and debugging. Use ioctl or netlink for complex transactions, bulk data transfer, or operations requiring atomicity across multiple values. Document attribute semantics in kernel documentation. Follow established conventions (lowercase names, one value per file).
High-performance devices transfer data directly between device and memory using Direct Memory Access (DMA), bypassing the CPU. The kernel's DMA API provides a portable interface for allocating DMA-capable memory and managing the complexities of cache coherence, IOMMU translation, and platform-specific quirks.
The DMA Challenge:
DMA seems simple—give the device a memory address and let it transfer data. But reality is complex:
| Type | Allocation | Use Case | Coherence |
|---|---|---|---|
| Coherent DMA | dma_alloc_coherent() | Long-lived descriptors, control structures | Always coherent (uncached or hardware-coherent) |
| Streaming DMA | dma_map_single()/dma_map_sg() | Data buffers for individual transfers | Requires explicit sync operations |
| DMA Pool | dma_pool_alloc() | Many small allocations of same size | Coherent, from pre-allocated pool |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
#include <linux/dma-mapping.h> /* Coherent DMA - for long-lived structures */static int setup_dma_descriptors(struct my_device *dev){ /* Allocate coherent memory for DMA descriptors */ dev->desc_ring = dma_alloc_coherent(dev->dev, DESC_RING_SIZE, &dev->desc_ring_dma, GFP_KERNEL); if (!dev->desc_ring) return -ENOMEM; /* desc_ring is CPU virtual address * desc_ring_dma is the address to give to hardware */ return 0;} /* Streaming DMA - for data transfers */static int submit_tx_buffer(struct my_device *dev, void *data, size_t len){ dma_addr_t dma_handle; /* Map kernel memory for DMA to device */ dma_handle = dma_map_single(dev->dev, data, len, DMA_TO_DEVICE); if (dma_mapping_error(dev->dev, dma_handle)) return -EIO; /* Ensure CPU writes are visible to device before DMA */ dma_sync_single_for_device(dev->dev, dma_handle, len, DMA_TO_DEVICE); /* Program device with DMA address */ writel(lower_32_bits(dma_handle), dev->regs + TX_ADDR_LO); writel(upper_32_bits(dma_handle), dev->regs + TX_ADDR_HI); writel(len, dev->regs + TX_LENGTH); /* Start DMA transfer */ writel(TX_START, dev->regs + TX_CONTROL); /* Save handle for later unmapping */ dev->tx_dma_handle = dma_handle; dev->tx_len = len; return 0;} /* Called from interrupt handler when DMA completes */static void tx_complete_handler(struct my_device *dev){ /* Unmap the buffer - no longer needed for DMA */ dma_unmap_single(dev->dev, dev->tx_dma_handle, dev->tx_len, DMA_TO_DEVICE);} /* Scatter-gather DMA for non-contiguous buffers */static int submit_sg_transfer(struct my_device *dev, struct scatterlist *sg, int nents){ int mapped; struct scatterlist *s; int i; /* Map entire scatter-gather list */ mapped = dma_map_sg(dev->dev, sg, nents, DMA_TO_DEVICE); if (mapped == 0) return -EIO; /* Program device descriptor ring with all segments */ for_each_sg(sg, s, mapped, i) { struct dma_desc *desc = &dev->desc_ring[i]; desc->addr = sg_dma_address(s); desc->len = sg_dma_len(s); desc->flags = (i == mapped - 1) ? DESC_LAST : 0; } /* Start DMA */ writel(mapped, dev->regs + DESC_COUNT); writel(DMA_START, dev->regs + DMA_CONTROL); return 0;}For streaming DMA, you MUST call sync functions before and after transfers. Before DMA-to-device: sync_for_device to flush CPU caches. Before CPU reads after DMA-from-device: sync_for_cpu to invalidate caches. Forgetting these causes data corruption that's extremely difficult to debug.
We've explored the comprehensive interfaces through which device drivers communicate with the rest of the operating system. Let's consolidate the key takeaways:
What's Next:
With the driver interface understood, we'll next explore driver development—the practical aspects of writing, building, and testing device drivers. We'll cover the development environment, debugging techniques, and best practices for creating reliable, maintainable driver code.
You now understand the interfaces through which device drivers communicate—from file operations and ioctl to memory mapping and DMA. This knowledge is essential for driver development, debugging device communication issues, and understanding how the operating system interacts with hardware.