Loading content...
Every keystroke you type, every network packet that arrives, every disk sector that's read—all these events begin with a hardware interrupt. An interrupt is a signal from a device to the CPU saying, "Stop what you're doing; something important happened." The code that responds to these signals is the interrupt handler—among the most time-critical software in any operating system.
Interrupt handlers operate under extreme constraints. They must execute in microseconds, cannot block or wait, and must correctly handle situations where multiple interrupts occur simultaneously. Understanding these handlers is essential for anyone who wants to truly understand how computers interact with the physical world.
By completing this page, you will understand: the mechanics of hardware interrupts, how the CPU transfers control to interrupt handlers, the structure and constraints of interrupt service routines (ISRs), the interrupt vector table and interrupt routing, nested interrupts and priority schemes, the relationship between interrupts and the I/O software layers above, and modern interrupt technologies like MSI/MSI-X.
An interrupt is a hardware-initiated control transfer. Unlike software-triggered events (exceptions, system calls), interrupts are truly asynchronous—they can occur at any point during program execution, between any two instructions.
Why Interrupts Exist:
Without interrupts, the CPU would have to constantly poll devices to check if they need attention. This wastes CPU cycles and limits responsiveness. Interrupts invert the model: devices signal the CPU only when they have something to report.
| Aspect | Polling | Interrupt-Driven |
|---|---|---|
| CPU usage while idle | 100% (constantly checking) | ~0% (CPU runs other tasks) |
| Response latency | Depends on poll interval | Near-immediate (µs) |
| Suitable for | Very fast devices, predictable loads | Most devices, variable loads |
| Programming complexity | Simple loop | More complex (handlers, synchronization) |
| Power consumption | High (CPU always active) | Lower (CPU can sleep) |
The Interrupt Lifecycle:
When a device triggers an interrupt, a precise sequence of events occurs:
Interrupts are never processed in the middle of an instruction. The CPU always completes the current instruction before checking for pending interrupts. This "instruction boundary" guarantee simplifies both hardware design and software reasoning—you know your instructions execute atomically with respect to interrupts.
The CPU needs to know where to jump when an interrupt occurs. This information is stored in the Interrupt Descriptor Table (IDT)—a table of 256 entries on x86 systems, each containing the address of a handler for one interrupt vector.
IDT Structure (x86-64):
Each IDT entry is 16 bytes containing:
| Field | Size | Purpose |
|---|---|---|
| Offset (low) | 16 bits | Handler address bits 0-15 |
| Segment Selector | 16 bits | Code segment for handler |
| IST | 3 bits | Interrupt Stack Table entry (0 = none) |
| Type | 4 bits | Gate type (interrupt/trap/task) |
| DPL | 2 bits | Required privilege level to invoke |
| Present | 1 bit | Entry is valid |
| Offset (mid) | 16 bits | Handler address bits 16-31 |
| Offset (high) | 32 bits | Handler address bits 32-63 |
| Reserved | 32 bits | Must be zero |
Standard Interrupt Vector Assignments:
The first 32 vectors (0-31) are reserved for CPU exceptions. Hardware interrupts typically start at vector 32 and above:
| Vector Range | Purpose | Examples |
|---|---|---|
| 0-31 | CPU Exceptions | Divide by zero (0), Page fault (14), Double fault (8) |
| 32-47 | Legacy PIC (IRQ 0-15) | Timer (32), Keyboard (33), COM1 (36) |
| 48-255 | Available for APIC/MSI | APIC timer, IPI, device MSI vectors |
| 128 (0x80) | Linux syscall (32-bit) | Legacy system call entry |
| 0xFE | APIC Error | Local APIC error handling |
| 0xFF | APIC Spurious | Spurious interrupt vector |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
/* * Simplified IDT setup (conceptual, not production code) */ #include <stdint.h> /* IDT entry structure (x86-64) */struct idt_entry { uint16_t offset_low; // Handler address bits 0-15 uint16_t selector; // Kernel code segment uint8_t ist; // Interrupt Stack Table (3 bits) uint8_t type_attr; // Gate type, DPL, present bit uint16_t offset_mid; // Handler address bits 16-31 uint32_t offset_high; // Handler address bits 32-63 uint32_t reserved; // Must be zero} __attribute__((packed)); /* IDT descriptor for LIDT instruction */struct idt_descriptor { uint16_t limit; // Size of IDT - 1 uint64_t base; // Address of IDT} __attribute__((packed)); #define IDT_ENTRIES 256struct idt_entry idt[IDT_ENTRIES];struct idt_descriptor idtp; /* Type/attribute byte values */#define IDT_INTERRUPT_GATE_64 0x8E /* Present, Ring 0, 64-bit interrupt gate */#define IDT_TRAP_GATE_64 0x8F /* Present, Ring 0, 64-bit trap gate */ /* * Set an IDT entry * Interrupt gates disable interrupts on entry; trap gates don't */void idt_set_gate(int vector, uint64_t handler, uint8_t type) { idt[vector].offset_low = (uint16_t)(handler & 0xFFFF); idt[vector].selector = 0x08; // Kernel code segment idt[vector].ist = 0; // No IST idt[vector].type_attr = type; idt[vector].offset_mid = (uint16_t)((handler >> 16) & 0xFFFF); idt[vector].offset_high = (uint32_t)((handler >> 32) & 0xFFFFFFFF); idt[vector].reserved = 0;} /* Handler function prototypes (assembly stubs) */extern void isr_timer(void);extern void isr_keyboard(void);extern void isr_page_fault(void); /* * Initialize the IDT */void idt_init(void) { /* Set up exception handlers (vectors 0-31) */ idt_set_gate(14, (uint64_t)isr_page_fault, IDT_INTERRUPT_GATE_64); /* Set up hardware interrupt handlers */ idt_set_gate(32, (uint64_t)isr_timer, IDT_INTERRUPT_GATE_64); idt_set_gate(33, (uint64_t)isr_keyboard, IDT_INTERRUPT_GATE_64); /* Load IDT */ idtp.limit = sizeof(idt) - 1; idtp.base = (uint64_t)&idt; /* Load IDT register (assembly instruction) */ __asm__ __volatile__("lidt %0" : : "m"(idtp));}Interrupt gates automatically disable interrupts (clear IF flag) when the handler is entered, preventing nested interrupts. Trap gates leave interrupts enabled. Hardware interrupt handlers use interrupt gates to prevent re-entrancy; some exceptions (like breakpoints) use trap gates to allow debugging interrupts. Linux uses interrupt gates for all hardware IRQs.
An interrupt handler (Interrupt Service Routine or ISR) must follow a strict structure. The handler is entered with interrupts disabled and must handle the interrupt quickly before re-enabling interrupts and returning control.
Handler Entry (Assembly Stub):
The CPU saves minimal state (instruction pointer, stack pointer, flags) when entering an interrupt. The handler must save any additional registers it uses:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
# x86-64 interrupt stub (simplified) .global isr_common_stub.global isr_keyboard_entry # Entry point for keyboard interrupt (vector 33)isr_keyboard_entry: # CPU already pushed SS, RSP, RFLAGS, CS, RIP # Push error code placeholder (keyboard has none) pushq $0 # Push interrupt number pushq $33 jmp isr_common_stub # Common stub for all interruptsisr_common_stub: # Save all general-purpose registers pushq %rax pushq %rbx pushq %rcx pushq %rdx pushq %rsi pushq %rdi pushq %rbp pushq %r8 pushq %r9 pushq %r10 pushq %r11 pushq %r12 pushq %r13 pushq %r14 pushq %r15 # Save segment registers (if necessary) # Not typically needed in 64-bit mode # Pass pointer to saved state as argument movq %rsp, %rdi # Call C handler # void irq_handler(struct interrupt_frame *frame); call irq_handler # Restore registers in reverse order popq %r15 popq %r14 popq %r13 popq %r12 popq %r11 popq %r10 popq %r9 popq %r8 popq %rbp popq %rdi popq %rsi popq %rdx popq %rcx popq %rbx popq %rax # Remove error code and interrupt number addq $16, %rsp # Return from interrupt (restores RIP, CS, RFLAGS, RSP, SS) iretqThe C Handler:
After the assembly stub saves state, it calls a C function that dispatches to the appropriate device handler:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
#include <stdint.h> /* * Saved register state passed from assembly stub */struct interrupt_frame { /* General registers (pushed by stub) */ uint64_t r15, r14, r13, r12, r11, r10, r9, r8; uint64_t rbp, rdi, rsi, rdx, rcx, rbx, rax; /* Pushed by stub */ uint64_t int_no; /* Interrupt number */ uint64_t error_code; /* Error code (or 0) */ /* Pushed by CPU */ uint64_t rip; /* Instruction pointer */ uint64_t cs; /* Code segment */ uint64_t rflags; /* CPU flags */ uint64_t rsp; /* Stack pointer */ uint64_t ss; /* Stack segment */}; /* Device-specific handlers registered by drivers */typedef void (*irq_handler_fn)(struct interrupt_frame *);static irq_handler_fn irq_handlers[256]; /* * Register a handler for an IRQ */void register_irq_handler(int irq, irq_handler_fn handler) { irq_handlers[irq] = handler;} /* * Main interrupt dispatcher - called from assembly */void irq_handler(struct interrupt_frame *frame) { uint64_t vector = frame->int_no; /* Statistics */ irq_count[vector]++; /* Call registered handler */ if (irq_handlers[vector]) { irq_handlers[vector](frame); } else { /* No handler registered - spurious interrupt? */ printk("Unhandled interrupt: %lu\n", vector); } /* Send End-Of-Interrupt to interrupt controller */ if (vector >= 32 && vector < 48) { /* Legacy PIC: Send EOI to master (and slave if vector >= 40) */ if (vector >= 40) { outb(0xA0, 0x20); /* EOI to slave PIC */ } outb(0x20, 0x20); /* EOI to master PIC */ } else if (vector >= 48) { /* APIC: Write to EOI register */ lapic_write(LAPIC_EOI, 0); }}Every hardware interrupt must be acknowledged with an End-Of-Interrupt (EOI) signal. If you forget the EOI, the interrupt controller will not deliver any more interrupts at that priority level or below—your system will appear to hang as devices stop responding. Always verify EOI is sent on every handler path, including error paths.
The CPU has only a few interrupt pins, but systems have many devices. Interrupt controllers multiplex device interrupts onto the CPU's limited inputs and provide prioritization, masking, and routing capabilities.
Legacy 8259 PIC:
The original IBM PC used two cascaded 8259 PICs, providing 15 interrupt lines (IRQ 0-15). While obsolete, the PIC interface is still emulated for compatibility:
| IRQ | Traditional Device | Vector |
|---|---|---|
| 0 | System timer | 32 |
| 1 | Keyboard | 33 |
| 2 | Cascade to slave PIC | 34 |
| 3 | COM2 / COM4 | 35 |
| 4 | COM1 / COM3 | 36 |
| 6 | Floppy disk | 38 |
| 7 | LPT1 / Spurious | 39 |
| 8 | CMOS RTC | 40 |
| 12 | PS/2 Mouse | 44 |
| 14 | Primary IDE | 46 |
| 15 | Secondary IDE | 47 |
Advanced Programmable Interrupt Controller (APIC):
Modern systems use the APIC architecture, consisting of:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
/* * I/O APIC configuration example */ #define IOAPIC_BASE 0xFEC00000 /* Standard I/O APIC address */#define IOAPIC_REGSEL 0x00 /* Register select */#define IOAPIC_WINDOW 0x10 /* Register value */ #define IOAPIC_ID 0x00 /* I/O APIC ID register */#define IOAPIC_VER 0x01 /* Version register */#define IOAPIC_REDTBL 0x10 /* Redirection table base */ /* * Redirection table entry format (64 bits) */struct ioapic_redir_entry { uint8_t vector; /* Interrupt vector (32-255) */ uint8_t delivery_mode:3; /* 0=Fixed, 1=Lowest Priority, etc */ uint8_t dest_mode:1; /* 0=Physical, 1=Logical */ uint8_t delivery_status:1;/* Read-only: pending delivery */ uint8_t polarity:1; /* 0=Active high, 1=Active low */ uint8_t remote_irr:1; /* Read-only: for level-triggered */ uint8_t trigger_mode:1; /* 0=Edge, 1=Level */ uint8_t mask:1; /* 1=Masked (disabled) */ uint8_t reserved:7; uint8_t reserved2; uint8_t destination; /* APIC ID or logical destination */} __attribute__((packed)); /* * Program an I/O APIC redirection entry */void ioapic_set_irq(int irq, uint8_t vector, uint8_t dest_apic_id) { volatile uint32_t *regsel = (uint32_t *)(IOAPIC_BASE + IOAPIC_REGSEL); volatile uint32_t *window = (uint32_t *)(IOAPIC_BASE + IOAPIC_WINDOW); uint32_t low, high; /* Each redirection entry uses two 32-bit registers */ low = vector; /* Vector number */ low |= (0 << 8); /* Fixed delivery mode */ low |= (0 << 11); /* Physical destination */ low |= (0 << 13); /* Active high */ low |= (0 << 15); /* Edge triggered */ low |= (0 << 16); /* Not masked */ high = ((uint32_t)dest_apic_id) << 24; /* Destination APIC ID */ /* Write to I/O APIC */ *regsel = IOAPIC_REDTBL + (irq * 2); *window = low; *regsel = IOAPIC_REDTBL + (irq * 2) + 1; *window = high;} /* * Mask (disable) an I/O APIC interrupt */void ioapic_mask_irq(int irq) { volatile uint32_t *regsel = (uint32_t *)(IOAPIC_BASE + IOAPIC_REGSEL); volatile uint32_t *window = (uint32_t *)(IOAPIC_BASE + IOAPIC_WINDOW); *regsel = IOAPIC_REDTBL + (irq * 2); uint32_t entry = *window; entry |= (1 << 16); /* Set mask bit */ *window = entry;}On multi-core systems, you can configure which CPUs receive which interrupts. This interrupt affinity can improve cache locality (process data on same CPU that received packet) or balance load. Linux exposes this at /proc/irq/<irq>/smp_affinity. High-performance systems carefully tune interrupt routing.
Modern PCIe devices often use Message Signaled Interrupts (MSI) or MSI-X instead of traditional pin-based interrupts. MSI uses normal memory writes to signal interrupts, offering significant advantages.
How MSI Works:
Instead of asserting a physical interrupt line, the device performs a memory write to a special address. The CPU's memory system recognizes this as an interrupt and triggers the appropriate vector.
MSI vs MSI-X:
| Feature | MSI | MSI-X |
|---|---|---|
| Maximum vectors | 32 (most use 1) | 2048 |
| Vector allocation | Consecutive | Any |
| Per-vector masking | No | Yes |
| Address table | One for all | Per vector |
| Typical usage | Simple devices | High-performance NICs, NVMe, GPUs |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
#include <linux/pci.h>#include <linux/interrupt.h> /* * Setting up MSI-X interrupts in a Linux driver */ struct mydev_data { struct pci_dev *pdev; int num_vectors; struct msix_entry *msix_entries; /* Per-queue data for each interrupt vector */ struct mydev_queue *queues;}; /* * MSI-X interrupt handler for a specific queue */static irqreturn_t mydev_msix_handler(int irq, void *data){ struct mydev_queue *queue = data; /* Process events for this specific queue */ /* No need to check which device/queue - we know from the vector */ mydev_process_queue(queue); return IRQ_HANDLED;} /* * Allocate and request MSI-X vectors */int mydev_setup_msix(struct mydev_data *dev, int num_queues){ int i, ret; /* Allocate msix_entry array */ dev->msix_entries = kcalloc(num_queues, sizeof(struct msix_entry), GFP_KERNEL); if (!dev->msix_entries) return -ENOMEM; /* Initialize entry numbers (which MSI-X table entries to use) */ for (i = 0; i < num_queues; i++) dev->msix_entries[i].entry = i; /* Request MSI-X vectors from the kernel * This allocates interrupt numbers and configures hardware */ ret = pci_enable_msix_exact(dev->pdev, dev->msix_entries, num_queues); if (ret) { /* Couldn't get all vectors; could fall back to fewer */ dev_err(&dev->pdev->dev, "Failed to enable MSI-X: %d\n", ret); goto err_free; } dev->num_vectors = num_queues; /* Register handlers for each vector */ for (i = 0; i < num_queues; i++) { ret = request_irq(dev->msix_entries[i].vector, mydev_msix_handler, 0, /* No flags needed - MSI-X is never shared */ "mydev-queue", &dev->queues[i]); if (ret) { /* Free already-registered vectors */ while (--i >= 0) free_irq(dev->msix_entries[i].vector, &dev->queues[i]); goto err_disable; } } dev_info(&dev->pdev->dev, "Enabled %d MSI-X vectors\n", num_queues); return 0; err_disable: pci_disable_msix(dev->pdev);err_free: kfree(dev->msix_entries); return ret;}High-performance devices like NVMe SSDs and 10/40/100 Gbps NICs use MSI-X to provide one interrupt vector per hardware queue. When queue 5 completes an operation, it triggers vector 5, which is handled by the CPU assigned to queue 5. This eliminates sharing, reduces latency, and enables perfect parallelism across cores.
What happens if an interrupt occurs while another interrupt is being processed? The answer involves interrupt priority and nested interrupt handling.
Interrupt Nesting Scenarios:
Linux Interrupt Context:
Linux distinguishes between "soft" and "hard" interrupt context:
Hard IRQ context: Executing a hardware interrupt handler. All interrupts disabled (on that CPU). Cannot sleep, cannot acquire sleeping locks.
Soft IRQ context: Executing softirqs, tasklets, or timer callbacks. Interrupts enabled, but cannot sleep. Preemptible by hard IRQs.
Process context: Normal kernel code. Can sleep, can be preempted by both hard and soft IRQs.
Priority Inversion:
A dangerous scenario occurs when a high-priority interrupt handler waits for a resource held by a lower-priority handler. In poorly designed systems, this priority inversion can cause deadlocks or severe latency. Solutions include:
For real-time systems, interrupt latency is critical. The time from hardware signal to handler execution must be bounded. Linux's PREEMPT_RT patches reduce worst-case latency by making almost all interrupt handlers threaded (running as kernel threads that can be preempted), but this trades average performance for predictability.
Interrupt behavior is often critical for diagnosing performance problems. Linux provides several interfaces for monitoring interrupts:
The /proc/interrupts File:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
# View interrupt counts per CPU$ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 22 0 0 0 IR-IO-APIC 2-edge timer 1: 0 2 0 0 IR-IO-APIC 1-edge i8042 8: 0 0 0 1 IR-IO-APIC 8-edge rtc0 9: 0 0 0 45 IR-IO-APIC 9-fasteoi acpi 16: 23 0 12 0 IR-IO-APIC 16-fasteoi ehci_hcd:usb1 23: 42 11203 0 0 IR-IO-APIC 23-fasteoi ehci_hcd:usb2120: 0 0 0 0 DMAR-MSI 0-edge dmar0121: 0 0 0 0 IR-PCI-MSI 327680-edge nvme0q0122: 23412 0 0 0 IR-PCI-MSI 327681-edge nvme0q1123: 0 19823 0 0 IR-PCI-MSI 327682-edge nvme0q2124: 0 0 22145 0 IR-PCI-MSI 327683-edge nvme0q3125: 0 0 0 21004 IR-PCI-MSI 327684-edge nvme0q4NMI: 23 24 21 22 Non-maskable interruptsLOC: 892341 891022 893102 890234 Local timer interruptsSPU: 0 0 0 0 Spurious interruptsPMI: 23 24 21 22 Performance monitoring interruptsIWI: 0 0 0 0 IRQ work interruptsRES: 45123 44892 45234 44981 Rescheduling interruptsCAL: 2341 2245 2312 2289 Function call interruptsTLB: 1234 1198 1245 1223 TLB shootdowns # Reading the output:# - Each row is an interrupt source# - Columns show count per CPU# - Right side shows type and device name# - Note NVMe queues spread across CPUs (good!) # Watch interrupt rates in real-time$ watch -n1 'cat /proc/interrupts | head -30' # Or use vmstat$ vmstat 1procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 1234567 123456 7890123 0 0 5 10 1234 234 2 1 97 0 0 ^^^^ interrupts/sec # Detailed interrupt stats$ cat /proc/stat | grep intrintr 123456789 22 3 0 0 0 0 0 0 1 45 0 0 23 0 ... # Check IRQ affinity$ cat /proc/irq/122/smp_affinity1$ cat /proc/irq/123/smp_affinity2# Affinity is a bitmask: 1=CPU0, 2=CPU1, 4=CPU2, etc. # Change affinity (balance load)$ echo 4 > /proc/irq/122/smp_affinity # Move to CPU2Diagnosing Interrupt Problems:
| Symptom | Possible Cause | Diagnostic |
|---|---|---|
| High CPU in interrupt handler | Driver doing too much work in ISR | Profile with perf top |
| Interrupts all on one CPU | IRQ affinity not configured | Check /proc/irq/*/smp_affinity |
| Missing interrupts | EOI not sent, IRQ masked | Check dmesg, interrupt counts |
irq X: nobody cared | Faulty hardware or shared IRQ issue | Check driver, try different slot |
| Interrupt storms | Hardware failure or configuration | Check counts, mask problematic IRQ |
| High latency | Long interrupt handlers, nesting | Enable irqsoff tracer |
The irqsoff and hwlat tracers in ftrace can identify when interrupts are disabled for too long. Enable with: echo irqsoff > /sys/kernel/debug/tracing/current_tracer. The trace shows the longest period where interrupts were disabled and the code path responsible.
Interrupt handlers are the lowest layer of I/O software—the code that directly responds to hardware signals and enables the asynchronous operation we take for granted. They operate under the most stringent constraints of any software, yet their correct operation is essential for system stability and performance. Let's consolidate the key concepts:
What's Next:
We've now explored the complete I/O software stack from user-level code through device-independent software, device drivers, and interrupt handlers. The final page will zoom out to examine how all these layers work together—the hardware layer and how it interfaces with the software we've studied.
You now understand how interrupt handlers work: from the hardware signal through the IDT lookup, register saving, handler execution, and EOI transmission. This knowledge is fundamental for kernel development, driver writing, and diagnosing low-level system issues.