Loading learning content...
When a hardware interrupt arrives, the CPU has already done its part—saving essential state and jumping to the designated handler address. But now the real work begins. The operating system must quickly determine what caused the interrupt, process the event appropriately, and return control—all while ensuring that other pending interrupts aren't starved and that the system remains stable.
Interrupt handling is one of the most performance-critical code paths in any operating system. Handlers execute millions of times per second in a typical system, and every nanosecond of handler latency accumulates. Yet handlers must also be robust: a bug in an interrupt handler can crash the entire system, since there's no higher-level code to catch errors.
By the end of this page, you will understand the complete interrupt handling lifecycle, master the top-half/bottom-half architecture that balances responsiveness with system stability, learn the critical constraints on interrupt handler code, and see how modern operating systems manage complex interrupt processing scenarios.
An interrupt handler's lifecycle begins when the CPU transfers control and ends when the handler returns to the interrupted context. Understanding this lifecycle is essential for writing correct, efficient interrupt handling code.
The Complete Interrupt Handling Sequence:
Phase 1: Entry and Source Identification
The first task of any interrupt handler is determining which device or condition caused the interrupt. In systems with dedicated interrupt vectors per device, this is trivial—the vector number tells you. But in systems where multiple devices share an IRQ line, the handler must query each potential source:
1234567891011121314151617181920212223242526272829
// Shared IRQ handler - must identify which device(s) interruptedirqreturn_t shared_irq_handler(int irq, void *dev_id) { // When multiple devices share an IRQ, we must check each one // Check device A if (device_a->status_register & INTERRUPT_PENDING) { handle_device_a_interrupt(); device_a->status_register = CLEAR_INTERRUPT; return IRQ_HANDLED; } // Check device B if (device_b->status_register & INTERRUPT_PENDING) { handle_device_b_interrupt(); device_b->status_register = CLEAR_INTERRUPT; return IRQ_HANDLED; } // Check device C if (device_c->status_register & INTERRUPT_PENDING) { handle_device_c_interrupt(); device_c->status_register = CLEAR_INTERRUPT; return IRQ_HANDLED; } // None of our devices caused the interrupt // (Some other device on the shared line must handle it) return IRQ_NONE;}IRQ sharing was common in the ISA/PCI era but is increasingly rare with MSI (Message Signaled Interrupts). MSI provides thousands of unique interrupt vectors, eliminating the need for source identification polling. This is one reason modern NVMe SSDs and 10+ Gigabit network cards are faster—they waste no cycles figuring out who interrupted.
Phase 2: Interrupt Acknowledgment
For level-triggered interrupts (the majority in modern systems), the handler must acknowledge the interrupt by clearing the device's interrupt-pending flag. Failure to do so causes the infamous interrupt storm—the CPU immediately re-enters the handler upon return, effectively hanging the system.
Phase 3: Event Processing
The handler performs whatever work is necessary to service the device. This might include:
Phase 4: Return
The handler restores any saved registers and executes the return-from-interrupt instruction (x86: IRET, ARM: return from exception). The CPU atomically restores the previous execution context.
Interrupt handlers operate in a unique execution context with significant constraints. Understanding these constraints is critical for writing correct handler code—violations can cause system crashes, data corruption, or subtle race conditions that are nearly impossible to debug.
| Action | Allowed? | Reason |
|---|---|---|
| Read/write device registers | ✅ Yes | This is the primary purpose of handlers |
| Modify kernel data structures | ✅ Yes (with locks) | Core functionality, but needs synchronization |
| Wake sleeping processes | ✅ Yes | Uses run-queue manipulation, no sleeping |
| Schedule bottom-half work | ✅ Yes | Deferred work mechanism exists for this |
| Call printk (Linux) | ✅ Yes* | Special implementation avoids sleeping |
| Call kmalloc(GFP_ATOMIC) | ✅ Yes | Non-sleeping memory allocation |
| Call kmalloc(GFP_KERNEL) | ❌ No | May sleep waiting for memory |
| Acquire a mutex | ❌ No | Mutexes can sleep on contention |
| Acquire a spinlock | ✅ Yes | Busy-waits, doesn't sleep |
| Access user memory | ❌ No | Would require page fault handling |
| Call schedule() | ❌ No | Would switch away from interrupt context |
| Use sleeping file I/O | ❌ No | Disk I/O requires sleeping |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// Examples of CORRECT and INCORRECT interrupt handler code // WRONG: This handler can crash the systemirqreturn_t bad_handler(int irq, void *dev_id) { // BUG: GFP_KERNEL may sleep waiting for memory void *buffer = kmalloc(1024, GFP_KERNEL); // BUG: Mutex may sleep waiting for lock holder mutex_lock(&device_mutex); // BUG: copy_to_user may trigger page fault copy_to_user(user_ptr, data, size); // BUG: This may sleep for disk I/O write_log_file(event_data); return IRQ_HANDLED;} // CORRECT: This handler follows all constraintsirqreturn_t good_handler(int irq, void *dev_id) { struct device_state *dev = dev_id; // Use GFP_ATOMIC for interrupt-context allocation void *buffer = kmalloc(1024, GFP_ATOMIC); if (!buffer) { // Just drop the data—can't sleep to get memory dev->dropped_events++; goto ack_and_return; } // Use spinlock, not mutex spin_lock(&dev->spinlock); // Read device data directly (no user space) memcpy_fromio(buffer, dev->hardware_buffer, 1024); list_add(&new_entry->list, &dev->pending_work); spin_unlock(&dev->spinlock); // Schedule work for non-critical processing later schedule_work(&dev->bottom_half_work); ack_and_return: // Clear interrupt condition writel(ACK_INTERRUPT, dev->status_reg); return IRQ_HANDLED;}The Linux kernel provides in_interrupt() and in_irq() to check execution context. Many functions internally check these and will BUG() or WARN() if called incorrectly. When debugging mysterious crashes, always verify your handler isn't calling sleeping functions—the kernel doesn't always catch these at compile time.
Given the severe constraints on interrupt handlers, a critical question arises: How do we handle complex, time-consuming tasks in response to interrupts?
The answer is the top-half / bottom-half (also called first-level / second-level) architecture. This design pattern splits interrupt processing into two distinct phases:
Why This Split Matters:
Consider a network card receiving a packet. The interrupt handler must:
Steps 1-2 must happen immediately in the top half. Steps 3-6 can be deferred to the bottom half, allowing the CPU to quickly return to normal execution and handle other interrupts.
Early Unix had only interrupt handlers—no bottom halves. As network speeds increased, this became untenable. BSD introduced 'software interrupts' to defer work. Linux evolved through multiple bottom-half mechanisms: original 'bottom halves' (BHs), tasklets, softirqs, and workqueues—each with different trade-offs.
Linux provides multiple mechanisms for bottom-half processing, each with distinct characteristics. Choosing the right mechanism is crucial for performance and correctness.
| Mechanism | Context | Can Sleep | Concurrency | Use Case |
|---|---|---|---|---|
| Softirqs | Interrupt context | No | Same softirq runs on all CPUs concurrently | High-frequency, high-performance (networking, block I/O) |
| Tasklets | Interrupt context | No | Same tasklet serialized (no concurrent execution) | Per-device deferred work, simpler than softirqs |
| Workqueues | Process context | Yes | Can configure per-CPU or shared | Complex work, needs sleeping (file I/O, memory allocation) |
| Threaded IRQs | Process context | Yes | Per-IRQ kernel thread | Converting legacy handlers, real-time systems |
Softirqs: The Highest-Performance Option
Softirqs are the most primitive and highest-performance bottom-half mechanism. The kernel defines a fixed set (currently 10) of softirq types, each for a specific subsystem. Softirqs run with interrupts enabled but are still in interrupt context—they cannot sleep.
The key characteristic: The same softirq can run simultaneously on every CPU. This provides maximum parallelism but requires careful, lock-free programming.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// The networking softirqs in Linux // Defined softirq types (include/linux/interrupt.h)enum { HI_SOFTIRQ = 0, // High-priority tasklets TIMER_SOFTIRQ, // Timer callbacks NET_TX_SOFTIRQ, // Network transmit NET_RX_SOFTIRQ, // Network receive BLOCK_SOFTIRQ, // Block device completion IRQ_POLL_SOFTIRQ, // IRQ polling TASKLET_SOFTIRQ, // Regular tasklets SCHED_SOFTIRQ, // Scheduler balancing HRTIMER_SOFTIRQ, // High-res timer RCU_SOFTIRQ, // RCU callbacks NR_SOFTIRQS // Count}; // Raising (scheduling) a softirq from interrupt handlerirqreturn_t network_irq_handler(int irq, void *dev_id) { struct net_device *dev = dev_id; // Minimal work: grab packet from hardware if (dev->hardare_has_packet()) { napi_schedule(&dev->napi); // This eventually raises NET_RX_SOFTIRQ } // Clear interrupt dev->ack_interrupt(); return IRQ_HANDLED;} // Softirq handler runs later with interrupts enabledvoid net_rx_softirq_handler(struct softirq_action *h) { // Process all pending network receive work // Can run for up to 2ms (net.core.netdev_budget_usecs) // Then yields to allow other softirqs and processes while (packets_pending && !budget_exhausted) { struct sk_buff *skb = dequeue_packet(); process_packet(skb); // Parse headers, checksums, etc. deliver_to_socket(skb); }}Tasklets: Serialized Deferred Work
Tasklets are built on top of softirqs but provide a simpler programming model. Unlike softirqs, a given tasklet is guaranteed to run on only one CPU at a time—it will not be called concurrently with itself. This eliminates many concurrency concerns.
12345678910111213141516171819202122232425262728293031323334353637
// Tasklet usage example // Static tasklet declarationDECLARE_TASKLET(my_tasklet, my_tasklet_function); // Or dynamic allocationstruct tasklet_struct my_dynamic_tasklet; void init_my_tasklet(void) { tasklet_init(&my_dynamic_tasklet, my_tasklet_function, (unsigned long)my_device_data);} // Tasklet function - runs in softirq contextvoid my_tasklet_function(unsigned long data) { struct my_device *dev = (struct my_device *)data; // SAFE: Same tasklet never runs concurrently with itself // (still needs locks for data shared with other contexts) process_device_data(dev); update_device_statistics(dev); wakeup_waiting_processes(dev);} // Schedule tasklet from interrupt handlerirqreturn_t my_interrupt_handler(int irq, void *dev_id) { // Do minimal essential work capture_hardware_state(); acknowledge_interrupt(); // Schedule tasklet for bulk processing tasklet_schedule(&my_tasklet); return IRQ_HANDLED;}Workqueues: Process Context Deferred Work
When bottom-half processing needs to sleep—for memory allocation, mutex acquisition, or blocking I/O—workqueues are the only option. Workqueues execute in the context of special kernel worker threads, providing a full process context.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// Workqueue usage example // Define work structure (often embedded in device struct)struct my_device { struct work_struct deferred_work; int device_id; // ... other fields ...}; // Work function - runs in process context, CAN SLEEPvoid my_work_function(struct work_struct *work) { struct my_device *dev = container_of(work, struct my_device, deferred_work); // These operations are ALLOWED here but NOT in interrupt handlers: // Allocate memory (may sleep waiting for pages) void *buffer = kmalloc(4096, GFP_KERNEL); // Acquire mutex (may sleep waiting for holder) mutex_lock(&dev->mutex); // Perform file I/O (sleeps for disk) write_to_log_file(event_data); // Sleep deliberately msleep(10); // Delay 10ms mutex_unlock(&dev->mutex); kfree(buffer);} // Schedule work from interrupt handlerirqreturn_t my_interrupt_handler(int irq, void *dev_id) { struct my_device *dev = dev_id; // Capture time-critical data dev->last_event_type = read_hardware_register(); acknowledge_interrupt(); // Schedule workqueue item for complex processing schedule_work(&dev->deferred_work); return IRQ_HANDLED;} // Initialize during driver probeint my_driver_probe(void) { INIT_WORK(&my_device->deferred_work, my_work_function); return 0;}Use softirqs only for kernel subsystems with extreme performance requirements (networking, block I/O). Use tasklets for per-device work that doesn't need to sleep. Use workqueues whenever you need process context—it's the safest choice. When in doubt, use workqueues; premature optimization with softirqs often creates concurrency bugs.
Before a driver can handle interrupts, it must register its handler with the kernel. This process associates a callback function with a specific IRQ number, allowing the kernel to route interrupts to the correct code.
The Linux request_irq() Function:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
// Linux interrupt handler registration /** * request_irq - Register an interrupt handler * @irq: The interrupt number * @handler: The handler function * @flags: Flags (sharing, trigger type, etc.) * @name: Device name (for /proc/interrupts) * @dev_id: Private data passed to handler * * Returns: 0 on success, negative error on failure */int request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags, const char *name, void *dev_id); // Example driver using request_irqint my_device_probe(struct pci_dev *pdev) { struct my_device *dev; int irq_num; int ret; dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) return -ENOMEM; // Get IRQ number from PCI configuration irq_num = pdev->irq; // Register interrupt handler ret = request_irq(irq_num, my_interrupt_handler, IRQF_SHARED, // Allow sharing with other devices "my_device", // Name shown in /proc/interrupts dev); // Passed to handler as dev_id if (ret) { dev_err(&pdev->dev, "Failed to register IRQ %d: %d\n", irq_num, ret); kfree(dev); return ret; } dev->irq = irq_num; pci_set_drvdata(pdev, dev); return 0;} // Cleanup - always free IRQ before releasing device memory!void my_device_remove(struct pci_dev *pdev) { struct my_device *dev = pci_get_drvdata(pdev); // CRITICAL: Free IRQ before freeing device memory // Otherwise, an interrupt could reference freed memory free_irq(dev->irq, dev); kfree(dev);}IRQ Flags:
The flags parameter controls how the interrupt is configured:
| Flag | Meaning |
|---|---|
| IRQF_SHARED | Allow other devices to share this IRQ line |
| IRQF_ONESHOT | Keep IRQ disabled after handler runs (for threaded handlers) |
| IRQF_NO_SUSPEND | Keep active during system suspend |
| IRQF_TRIGGER_RISING | Trigger on rising edge |
| IRQF_TRIGGER_FALLING | Trigger on falling edge |
| IRQF_TRIGGER_HIGH | Trigger when level is high |
| IRQF_TRIGGER_LOW | Trigger when level is low |
| IRQF_NO_THREAD | Force non-threaded handling (bypass RT-linux features) |
When IRQF_SHARED is used, the dev_id parameter MUST be unique and non-NULL. When freeing the IRQ, the kernel uses dev_id to identify which handler to remove from the shared IRQ chain. Passing NULL for shared IRQs will cause undefined behavior or kernel panics.
Modern Linux (since 2.6.30) supports threaded interrupt handlers—handlers that run in the context of a dedicated kernel thread rather than in hard interrupt context. This approach, pioneered by the PREEMPT_RT (real-time) patches, simplifies driver development and improves system determinism.
The request_threaded_irq() Interface:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
// Threaded interrupt handler registration /** * request_threaded_irq - Register a threaded interrupt handler * @irq: The interrupt number * @handler: Primary handler (runs in hard IRQ context) * @thread_fn: Threaded handler (runs in process context) * @flags: IRQ flags * @name: Device name * @dev_id: Private data * * The primary handler runs immediately in hard IRQ context. * If it returns IRQ_WAKE_THREAD, the thread_fn is woken. */int request_threaded_irq(unsigned int irq, irq_handler_t handler, // Hard IRQ part irq_handler_t thread_fn, // Thread part unsigned long flags, const char *name, void *dev_id); // Example: Splitting work between hard and threaded handlersirqreturn_t my_hard_handler(int irq, void *dev_id) { struct my_device *dev = dev_id; u32 status = ioread32(dev->status_reg); // Check if we caused this interrupt if (!(status & MY_INTERRUPT_PENDING)) return IRQ_NONE; // Acknowledge interrupt (MUST happen quickly) iowrite32(MY_ACK, dev->status_reg); // Store status for threaded handler dev->interrupt_status = status; // Wake the thread for heavy processing return IRQ_WAKE_THREAD;} irqreturn_t my_thread_handler(int irq, void *dev_id) { struct my_device *dev = dev_id; // We're in process context here - can sleep! // Acquire mutex (would crash in hard IRQ context) mutex_lock(&dev->big_mutex); // Do extensive processing if (dev->interrupt_status & DATA_READY) { void *buf = kmalloc(dev->data_size, GFP_KERNEL); // Can sleep process_incoming_data(dev, buf); kfree(buf); } mutex_unlock(&dev->big_mutex); return IRQ_HANDLED;} // Registrationrequest_threaded_irq(irq_num, my_hard_handler, my_thread_handler, IRQF_ONESHOT | IRQF_SHARED, // ONESHOT keeps IRQ masked "my_device", dev);Advantages of Threaded Handlers:
When to Use Threaded Handlers:
When using threaded handlers with level-triggered interrupts, IRQF_ONESHOT is typically required. This flag keeps the IRQ line disabled until the threaded handler completes. Without it, the interrupt could re-fire before the thread handles the current event, potentially causing an infinite loop of hard IRQ → thread wake → hard IRQ...
Interrupt handler performance directly impacts system responsiveness and throughput. Understanding the metrics and optimization techniques is crucial for high-performance drivers.
Key Performance Metrics:
| Metric | Definition | Typical Target |
|---|---|---|
| Handler Latency | Time from IRQ assertion to first handler instruction | < 10 μs |
| Handler Duration | Time spent executing the handler | < 50 μs for hard IRQ |
| Interrupt Rate | Number of interrupts per second the system can sustain | 1M+ IRQs/sec per core |
| Interrupt-Off Time | Duration interrupts are disabled | < 10 μs ideally |
| Jitter | Variance in latency/duration | < 1 μs for real-time |
Optimization Techniques:
__always_inline for critical helpers. Cache misses in handlers are expensive.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
// High-performance interrupt handler techniques // Pre-allocated buffer pool - avoid runtime allocationstruct buffer_pool { struct buffer *free_list; spinlock_t lock;} ____cacheline_aligned_in_smp; // Avoid false sharing static DEFINE_PER_CPU(struct buffer_pool, buffer_pools); static inline struct buffer *fast_alloc_buffer(void) { struct buffer_pool *pool = this_cpu_ptr(&buffer_pools); struct buffer *buf; // Per-CPU pool avoids cross-CPU contention spin_lock(&pool->lock); buf = pool->free_list; if (buf) pool->free_list = buf->next; spin_unlock(&pool->lock); return buf; // May be NULL - caller handles} // Batched interrupt processingirqreturn_t high_perf_handler(int irq, void *dev_id) { struct device *dev = dev_id; int events_processed = 0; u32 status; // Read status ONCE, not in a loop - I/O is expensive status = ioread32(dev->status_reg); // Process ALL pending events in one invocation while (status & EVENTS_PENDING) { struct buffer *buf = fast_alloc_buffer(); if (!buf) { // Can't process more - come back later dev->stats.alloc_failures++; break; } // Copy data from hardware to buffer memcpy_fromio(buf->data, dev->data_reg, DATA_SIZE); // Add to processing queue (lockless enqueue) llist_add(&buf->llist, &dev->pending); events_processed++; status = ioread32(dev->status_reg); // Recheck } // Single acknowledgment for all processed events iowrite32(ACK_ALL, dev->status_reg); // Per-CPU statistics - no locking needed this_cpu_add(dev->stats.irq_count, events_processed); if (events_processed) tasklet_schedule(&dev->process_tasklet); return events_processed ? IRQ_HANDLED : IRQ_NONE;}Use Linux's trace-cmd and ftrace to measure actual handler durations: trace-cmd record -e irq_handler_entry -e irq_handler_exit. The perf tool can also profile interrupt handlers: perf top -e irq:*. For fine-grained timing, use ktime_get() for nanosecond timestamps.
We've explored the complete interrupt handling lifecycle. Let's consolidate the key takeaways:
request_irq() and request_threaded_irq() set up the interrupt-to-handler mapping.What's Next:
So far, we've discussed single interrupts in isolation. But real systems have dozens of devices, all potentially interrupting simultaneously. The next page explores the Interrupt Vector Table (IVT) and Interrupt Descriptor Table (IDT)—the data structures that map interrupt numbers to handler addresses, enabling the CPU to quickly dispatch to the correct code.
You now understand how operating systems handle interrupts—from the moment the CPU saves state through handler registration, execution, and the various deferred-work mechanisms. This knowledge is essential for anyone writing device drivers or debugging system-level performance issues.