Loading learning content...
In the previous page, we established that disabling interrupts provides an elegant and absolute guarantee of mutual exclusion. A process running with interrupts disabled cannot be preempted, and therefore no other process can execute during its critical section.
But there's a critical assumption hidden in this guarantee.
The entire argument rests on the premise that there is only one processor. In a uniprocessor system, if the CPU isn't executing my code, it must be executing someone else's code—and vice versa. Mutual exclusion follows directly from exclusive use of the single execution resource.
In the modern world of multi-core processors, this assumption is fundamentally broken. Your laptop likely has 4-16 cores. Data center servers routinely have 64-256 cores. Even smartphones have 8 cores.
In a multiprocessor system, disabling interrupts on one CPU has zero effect on any other CPU.
This page examines why the interrupt disable approach fails in multiprocessor environments, what happens when engineers mistakenly rely on it, and how this limitation shaped the development of more sophisticated synchronization mechanisms.
Disabling interrupts is a LOCAL operation. Each processor in an SMP (Symmetric Multiprocessing) system has its own interrupt flag. When CPU₀ executes 'cli', it only affects CPU₀. CPUs 1, 2, 3... continue operating normally, potentially accessing the same shared data simultaneously. This is the fundamental reason interrupt disabling alone cannot provide mutual exclusion on multiprocessor systems.
Let's first precisely characterize why interrupt disabling provides mutual exclusion on a uniprocessor system. This understanding forms the foundation for seeing why the approach fails on multiprocessors.
The Uniprocessor Execution Model:
In a uniprocessor system, execution follows a simple model:
The key insight is that only one thread of execution exists at any moment. Concurrent execution is an illusion created by rapid context switching.
Concurrency vs Parallelism on Uniprocessors:
On a uniprocessor, we have concurrency but not parallelism:
Interleaved execution means that at any given instant, only one process is running. The others are suspended, waiting for their turn. When we disable interrupts, we prevent the transition that would suspend our process and resume another. Since there's only one CPU and it's running our code, no one else can run.
Mathematical Formalization:
Let's formalize this. Define:
T = the set of all time instantsP = the set of all processesrunning(t) = the single process executing at time tFor a uniprocessor: ∀t ∈ T: |running(t)| = 1
This means at every instant, exactly one process is running. If process A is in its critical section at time t, the mutual exclusion condition trivially holds because no other process exists in any section at time t—they're all suspended.
On a uniprocessor, the CPU is the exclusive resource. If I have it, you don't. If I prevent the mechanism that takes it away from me (interrupts), I have it for as long as I want. Mutual exclusion is a natural consequence of resource exclusivity.
Now let's examine what changes in a multiprocessor system and why the interrupt disable approach completely breaks down.
The Multiprocessor Execution Model:
In a symmetric multiprocessor (SMP) system:
The presence of shared memory is precisely what creates the synchronization problem. Multiple CPUs can read and write the same memory locations simultaneously.
| Characteristic | Uniprocessor | Multiprocessor |
|---|---|---|
| Active execution units | 1 | N (typically 2-256) |
| Simultaneous processes | 1 (illusion of many) | Up to N truly parallel |
| Interrupt flag scope | Global (one CPU) | Per-CPU (local only) |
| Context switch source | Timer/I/O interrupts only | Timer interrupts + true parallelism |
| Race condition source | Only during interleaving | True simultaneous access |
| CLI effect | Blocks ALL other execution | Blocks only local CPU preemption |
| Memory visibility | Trivial (one cache) | Complex (coherence protocols) |
The Fatal Flaw:
When CPU₀ executes cli, it sets its local IF = 0. This prevents:
But CPU₁, CPU₂, ... CPUₙ₋₁ all have their own interrupt flags, which remain set to 1. They continue operating normally:
Scenario: The Broken Synchronization
123456789101112131415161718192021222324252627282930313233343536
/* BROKEN: This does NOT work on multiprocessor systems! *//* Shared counter accessed by processes on multiple CPUs */ volatile int shared_counter = 0; void increment_counter(void) { disable_interrupts(); /* Only affects THIS CPU! */ /* * DANGEROUS: We think we have mutual exclusion, * but another CPU could be executing this exact * same code AT THE SAME INSTANT */ int temp = shared_counter; temp = temp + 1; shared_counter = temp; enable_interrupts();} /* * RACE CONDITION EXAMPLE: * * Time CPU 0 CPU 1 * ---- ----- ----- * t0 disable_interrupts() disable_interrupts() * t1 temp₀ = shared_counter (0) temp₁ = shared_counter (0) * t2 temp₀ = temp₀ + 1 (1) temp₁ = temp₁ + 1 (1) * t3 shared_counter = temp₀ (1) shared_counter = temp₁ (1) * t4 enable_interrupts() enable_interrupts() * * RESULT: Counter should be 2, but is 1! * Both CPUs had their interrupts disabled. * Neither could be preempted. * But they ran SIMULTANEOUSLY and corrupted the data. */The fundamental issue is that disabling interrupts is a LOCAL operation. It affects only the CPU executing the instruction. Other CPUs are entirely unaware that you've disabled your interrupts. They don't care. They will continue executing their code, including accesses to shared memory, regardless of what you do.
To fully understand why interrupt disabling is local, let's examine the hardware architecture of interrupt handling in multiprocessor systems.
The Local APIC (Advanced Programmable Interrupt Controller):
In modern x86 multiprocessor systems, each CPU has a Local APIC (LAPIC) integrated into the processor core. This is a crucial piece of hardware that:
There's also an I/O APIC (a separate chip or integrated into the chipset) that:
Per-CPU State:
Each CPU maintains its own set of registers, including:
When you execute cli on CPU₀, it modifies CPU₀'s FLAGS register. CPU₁'s FLAGS register is completely separate and unaffected.
The Interrupt Delivery Path:
Let's trace an interrupt from device to handler:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
/* Conceptual Local APIC register structure *//* Each CPU has its OWN copy of these registers */ struct local_apic { /* Identification */ uint32_t id; /* Unique LAPIC ID for this CPU */ uint32_t version; /* LAPIC version */ /* Priority management */ uint32_t task_priority; /* TPR: below this, interrupts masked */ uint32_t processor_priority; /* PPR: effective priority */ /* Interrupt state */ uint32_t in_service[8]; /* ISR: currently servicing */ uint32_t trigger_mode[8]; /* TMR: edge vs level */ uint32_t request[8]; /* IRR: pending interrupts */ /* Control */ uint32_t spurious; /* Spurious interrupt vector + enable */ /* Inter-processor interrupts */ uint32_t icr_low; /* IPI destination + mode */ uint32_t icr_high; /* IPI target processor */ /* Timer */ uint32_t timer_initial; /* Timer initial count */ uint32_t timer_current; /* Timer current count */ uint32_t timer_divide; /* Timer divisor */}; /* * KEY POINT: The IF (Interrupt Flag) in EFLAGS is separate * from the LAPIC. CLI clears IF in the CPU's EFLAGS register. * The LAPIC can still receive and queue interrupts. * But the CPU won't service them until IF = 1. * * EACH CPU HAS ITS OWN: * - EFLAGS.IF * - LAPIC registers * - Pending interrupt queue * * Disabling interrupts on CPU 0 has NO EFFECT on CPU 1's * interrupt handling capability. */The per-CPU nature of interrupt handling is not a design flaw—it's a deliberate architectural choice that enables scalability. If all CPUs shared a single interrupt enable/disable flag, that flag would become a massive bottleneck. Every interrupt-related operation would require global coordination, destroying parallelism.
The synchronization problem on multiprocessors is fundamentally about memory access races, not about preemption. Even if you could magically disable preemption on all CPUs simultaneously, you'd still have race conditions because multiple CPUs can access memory at the same time.
The Memory Hierarchy Problem:
Modern multiprocessor systems have complex memory hierarchies:
CPU 0 CPU 1
│ │
┌─┴─┐ ┌─┴─┐
│L1$│ │L1$│ ← Private per-CPU (32-64KB)
└─┬─┘ └─┬─┘
┌─┴─┐ ┌─┴─┐
│L2$│ │L2$│ ← Private or shared per-core (256KB-1MB)
└─┬─┘ └─┬─┘
└────────┬───────────┘
┌─┴─┐
│L3$│ ← Shared last-level cache (8-64MB)
└─┬─┘
┌─┴─┐
│RAM│ ← Main memory (gigabytes)
└───┘
When CPU 0 writes to memory, the write goes to CPU 0's L1 cache first. CPU 1 doesn't immediately see this change—it might have an old copy in its own cache. Cache coherence protocols (like MESI) eventually propagate the change, but there's a window where CPUs disagree about memory contents.
Simultaneous Memory Operations:
Consider two CPUs executing the following at the exact same time:
CPU 0: mov eax, [counter] CPU 1: mov eax, [counter]
CPU 0: add eax, 1 CPU 1: add eax, 1
CPU 0: mov [counter], eax CPU 1: mov [counter], eax
Even though each CPU has interrupts disabled (so neither can be preempted), both are executing truly simultaneously. The memory operations race:
The problem isn't preemption—it's true parallelism.
| Race Type | Cause | Uniprocessor | Multiprocessor |
|---|---|---|---|
| Read-Modify-Write | Interleaved operations | CLI prevents (no interleaving) | CLI doesn't help (true parallelism) |
| Check-Then-Act | State changes between check and action | CLI prevents (atomic execution) | CLI doesn't help (parallel checks) |
| Write Ordering | Compile/hardware reordering | Usually not visible | Critical (memory barriers needed) |
| Cache Visibility | Stale cache values | N/A (one cache) | Requires coherence protocol awareness |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
/* Memory access races on multiprocessor systems */ volatile int flag = 0;volatile int data = 0; /* Producer (CPU 0) */void producer(void) { disable_interrupts(); /* Has no effect on CPU 1! */ data = 42; /* Write data first */ flag = 1; /* Then signal ready */ enable_interrupts();} /* Consumer (CPU 1) */void consumer(void) { disable_interrupts(); /* Has no effect on CPU 0! */ while (flag == 0) { /* Spin waiting for producer */ } int value = data; /* Read the data */ /* value might be 0 or 42! */ enable_interrupts();} /* * PROBLEM 1: Visibility * Even after producer writes flag = 1, consumer might * still see flag == 0 due to cache coherence delays. * * PROBLEM 2: Reordering * Compiler or CPU might reorder the writes in producer: * flag = 1; // Moved before data = 42! * data = 42; * * Now consumer sees flag == 1 but data == 0. * * SOLUTION: Memory barriers + proper synchronization * Disabling interrupts does NOTHING for these issues. */ /* CORRECT: Using atomic operations and barriers */#include <stdatomic.h> atomic_int flag_atomic = 0;int data_protected = 0; void producer_correct(void) { data_protected = 42; atomic_store_explicit(&flag_atomic, 1, memory_order_release); /* Release ensures data write is visible before flag write */} void consumer_correct(void) { while (atomic_load_explicit(&flag_atomic, memory_order_acquire) == 0) { /* Spin */ } /* Acquire ensures we see all writes before the flag was set */ int value = data_protected; /* Guaranteed to be 42 */}On multiprocessors, the synchronization problem shifts from preventing interleaving (which interrupts control) to preventing simultaneous access (which requires atomic operations, locks, or lock-free algorithms). Disabling interrupts addresses the wrong problem entirely.
While interrupt disabling doesn't provide mutual exclusion on multiprocessor systems, it's not useless. It provides specific guarantees that remain important:
1. Prevention of Local Preemption:
Disabling interrupts ensures that the current CPU won't switch to a different process or handle an interrupt while you're in a critical section. This is important when:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
/* * PER-CPU DATA: Interrupt disabling IS sufficient * Each CPU has its own copy, so no cross-CPU races */ struct per_cpu_stats { unsigned long interrupts_handled; unsigned long context_switches; unsigned long syscalls;}; DEFINE_PER_CPU(struct per_cpu_stats, cpu_stats); /* Safe even on SMP: accessing only THIS CPU's data */void update_cpu_stats(void) { unsigned long flags; local_irq_save(flags); /* No other CPU ever accesses OUR per-CPU stats */ this_cpu_inc(cpu_stats.interrupts_handled); local_irq_restore(flags);} /* * COMBINED PROTECTION: Spinlock + interrupt disable * Standard pattern for SMP kernel code */ spinlock_t global_lock; void update_global_data(void) { unsigned long flags; /* spin_lock_irqsave: * 1. Saves interrupt state * 2. Disables local interrupts * 3. Acquires spinlock (spins if another CPU holds it) */ spin_lock_irqsave(&global_lock, flags); /* * Now we have BOTH: * - Mutual exclusion across CPUs (spinlock) * - Local preemption disabled (CLI) * * Why both? An interrupt handler on OUR CPU might try * to acquire the same lock, causing deadlock if we * didn't disable interrupts. */ modify_global_data(); spin_unlock_irqrestore(&global_lock, flags);}2. Deadlock Prevention:
A critical use case for interrupt disabling on SMP is preventing deadlocks between thread context and interrupt context:
Scenario without interrupt disable:
Scenario with interrupt disable:
On SMP systems, the standard pattern is: disable local interrupts, THEN acquire a spinlock. This provides both local protection (no preemption/interrupt during critical section) and global protection (no other CPU can enter). This is why spin_lock_irqsave() is so common in kernel code.
The transition from uniprocessor to multiprocessor computing created one of the most significant shifts in operating system design. Understanding this history helps explain why we have the synchronization primitives we do today.
The Uniprocessor Era (1960s-1980s):
During this era, operating systems were designed with the uniprocessor assumption:
| Year | System/Event | Significance |
|---|---|---|
| 1962 | Burroughs D825 | First commercial multiprocessor |
| 1969 | CDC 7600 | Multiple functional units (proto-SMP) |
| 1986 | Sequent Balance | Commercial SMP Unix systems |
| 1989 | Intel i860 | Microprocessor designed for multiprocessor |
| 1991 | Linux 0.01 | Born as uniprocessor-only |
| 1996 | Linux 2.0 | First SMP support (BKL - Big Kernel Lock) |
| 2000 | Linux 2.4 | Finer-grained locking begins |
| 2003 | Linux 2.6 | Preemptible kernel, better SMP |
| 2005 | First dual-core x86 CPUs | Multicore goes mainstream |
| 2008+ | Many-core systems | Modern fine-grained synchronization |
The SMP Retrofit Challenge:
When operating systems were adapted for SMP, engineers faced a fundamental challenge: immense codebases written with the assumption that cli meant "no one else runs."
The Big Kernel Lock (BKL):
Early SMP kernels used a simple, brute-force solution: a single global lock for the entire kernel. Before entering the kernel (from a system call or interrupt), acquire this lock. Before leaving, release it.
This was correct but catastrophic for performance:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
/* * STAGE 1: Big Kernel Lock (BKL) * Simple but poor performance */spinlock_t big_kernel_lock; void syscall_entry(void) { spin_lock(&big_kernel_lock); /* All syscalls serialize here */ do_syscall(); spin_unlock(&big_kernel_lock);} /* * STAGE 2: Subsystem locks * Better parallelism, more complexity */spinlock_t fs_lock; /* Filesystem operations */spinlock_t net_lock; /* Network stack */spinlock_t mm_lock; /* Memory management */spinlock_t sched_lock; /* Scheduler */ void read_file(void) { spin_lock(&fs_lock); /* Only serializes filesystem ops */ do_read(); spin_unlock(&fs_lock);} void send_packet(void) { spin_lock(&net_lock); /* Filesystem can run in parallel! */ do_send(); spin_unlock(&net_lock);} /* * STAGE 3: Fine-grained per-object locks * Maximum parallelism, maximum complexity */struct inode { spinlock_t lock; /* This specific file's lock */ /* ... other fields ... */}; void read_inode(struct inode *inode) { spin_lock(&inode->lock); /* Only this inode is locked */ do_read_inode(inode); spin_unlock(&inode->lock);} /* * Different files can now be read TRULY in parallel, * even on operations within the same subsystem! */The Lesson:
The transition from UP to SMP taught the systems programming community that:
Modern operating systems are designed from the ground up with SMP in mind, using a rich toolkit of spinlocks, mutexes, RCU, atomic operations, and lock-free data structures—all of which exist because simple interrupt disabling doesn't work.
You may encounter old kernel code or documentation that uses interrupt disabling for mutual exclusion without spinlocks. This code was written for uniprocessor systems and may not be correct on SMP. Always verify synchronization assumptions in legacy code before running on modern multicore systems.
We've thoroughly examined why interrupt disabling provides mutual exclusion only on uniprocessor systems. Let's consolidate the key insights:
What's Next:
Now that we understand the scope of interrupt disabling, we'll examine another critical aspect: it's a privileged operation. User-space programs cannot disable interrupts—this is a kernel-only capability, and for very good reasons.
You now understand why the interrupt disable approach works only on single-processor systems. The per-CPU nature of interrupt control, the reality of true parallelism on SMP systems, and the different nature of multiprocessor race conditions all explain this limitation. Next, we'll explore why this capability is restricted to privileged kernel code.