Operating SystemsDisabling Interrupts

Disabling Interrupts for Mutual Exclusion

LevelIntermediate

Duration60 mins

TopicDisabling Interrupts

2 / 5

Single Processor Only

The Multiprocessor Reality

In the previous page, we established that disabling interrupts provides an elegant and absolute guarantee of mutual exclusion. A process running with interrupts disabled cannot be preempted, and therefore no other process can execute during its critical section.

But there's a critical assumption hidden in this guarantee.

The entire argument rests on the premise that there is only one processor. In a uniprocessor system, if the CPU isn't executing my code, it must be executing someone else's code—and vice versa. Mutual exclusion follows directly from exclusive use of the single execution resource.

In the modern world of multi-core processors, this assumption is fundamentally broken. Your laptop likely has 4-16 cores. Data center servers routinely have 64-256 cores. Even smartphones have 8 cores.

In a multiprocessor system, disabling interrupts on one CPU has zero effect on any other CPU.

This page examines why the interrupt disable approach fails in multiprocessor environments, what happens when engineers mistakenly rely on it, and how this limitation shaped the development of more sophisticated synchronization mechanisms.

Critical Limitation

Disabling interrupts is a LOCAL operation. Each processor in an SMP (Symmetric Multiprocessing) system has its own interrupt flag. When CPU₀ executes 'cli', it only affects CPU₀. CPUs 1, 2, 3... continue operating normally, potentially accessing the same shared data simultaneously. This is the fundamental reason interrupt disabling alone cannot provide mutual exclusion on multiprocessor systems.

The Uniprocessor Model: Why It Works

Let's first precisely characterize why interrupt disabling provides mutual exclusion on a uniprocessor system. This understanding forms the foundation for seeing why the approach fails on multiprocessors.

The Uniprocessor Execution Model:

In a uniprocessor system, execution follows a simple model:

There is exactly one CPU capable of executing instructions
At any instant, that CPU is either:
- Executing user process A, or
- Executing user process B, or
- Executing kernel code, or
- Halted (idle, waiting for interrupts)
Transitions between these states occur through:
- Interrupts (timer, I/O, etc.)
- System calls (voluntary transition to kernel)
- Exceptions (page faults, traps)

The key insight is that only one thread of execution exists at any moment. Concurrent execution is an illusion created by rapid context switching.

Converting Mermaid diagram...

Concurrency vs Parallelism on Uniprocessors:

On a uniprocessor, we have concurrency but not parallelism:

Concurrency: Multiple processes make progress over time through interleaved execution
Parallelism: Multiple processes execute simultaneously at the same instant

Interleaved execution means that at any given instant, only one process is running. The others are suspended, waiting for their turn. When we disable interrupts, we prevent the transition that would suspend our process and resume another. Since there's only one CPU and it's running our code, no one else can run.

Mathematical Formalization:

Let's formalize this. Define:

T = the set of all time instants
P = the set of all processes
running(t) = the single process executing at time t

For a uniprocessor: ∀t ∈ T: |running(t)| = 1

This means at every instant, exactly one process is running. If process A is in its critical section at time t, the mutual exclusion condition trivially holds because no other process exists in any section at time t—they're all suspended.

The Exclusive Resource Principle

On a uniprocessor, the CPU is the exclusive resource. If I have it, you don't. If I prevent the mechanism that takes it away from me (interrupts), I have it for as long as I want. Mutual exclusion is a natural consequence of resource exclusivity.

The Multiprocessor Reality: Why It Fails

Now let's examine what changes in a multiprocessor system and why the interrupt disable approach completely breaks down.

The Multiprocessor Execution Model:

In a symmetric multiprocessor (SMP) system:

There are N CPUs (cores), each capable of independent execution
At any instant, up to N processes can execute simultaneously
Each CPU has its own:
- Program counter and registers
- Interrupt flag (IF) in its status register
- Local APIC (Advanced Programmable Interrupt Controller)
- Cache hierarchy (L1, typically L2)
All CPUs share:
- Main memory (RAM)
- Last-level cache (often L3)
- Peripheral devices
- The operating system's data structures

The presence of shared memory is precisely what creates the synchronization problem. Multiple CPUs can read and write the same memory locations simultaneously.

Uniprocessor vs Multiprocessor Characteristics
Characteristic	Uniprocessor	Multiprocessor
Active execution units	1	N (typically 2-256)
Simultaneous processes	1 (illusion of many)	Up to N truly parallel
Interrupt flag scope	Global (one CPU)	Per-CPU (local only)
Context switch source	Timer/I/O interrupts only	Timer interrupts + true parallelism
Race condition source	Only during interleaving	True simultaneous access
CLI effect	Blocks ALL other execution	Blocks only local CPU preemption
Memory visibility	Trivial (one cache)	Complex (coherence protocols)

The Fatal Flaw:

When CPU₀ executes cli, it sets its local IF = 0. This prevents:

Timer interrupts from preempting processes on CPU₀
I/O interrupts from being serviced by CPU₀

But CPU₁, CPU₂, ... CPUₙ₋₁ all have their own interrupt flags, which remain set to 1. They continue operating normally:

Running their own processes
Servicing their own interrupts
Accessing the same shared memory

Scenario: The Broken Synchronization

broken_multiprocessor.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* BROKEN: This does NOT work on multiprocessor systems! */
/* Shared counter accessed by processes on multiple CPUs */
 
volatile int shared_counter = 0;
 
void increment_counter(void) {
    disable_interrupts();     /* Only affects THIS CPU! */
    
    /* 
     * DANGEROUS: We think we have mutual exclusion,
     * but another CPU could be executing this exact
     * same code AT THE SAME INSTANT
     */
    int temp = shared_counter;
    temp = temp + 1;
    shared_counter = temp;
    
    enable_interrupts();
}
 
/*
 * RACE CONDITION EXAMPLE:
 * 
 * Time    CPU 0                       CPU 1
 * ----    -----                       -----
 * t0      disable_interrupts()        disable_interrupts()
 * t1      temp₀ = shared_counter (0)  temp₁ = shared_counter (0)
 * t2      temp₀ = temp₀ + 1 (1)       temp₁ = temp₁ + 1 (1)
 * t3      shared_counter = temp₀ (1)  shared_counter = temp₁ (1)
 * t4      enable_interrupts()         enable_interrupts()
 * 
 * RESULT: Counter should be 2, but is 1!
 * Both CPUs had their interrupts disabled.
 * Neither could be preempted.
 * But they ran SIMULTANEOUSLY and corrupted the data.
 */

Converting Mermaid diagram...

True Parallelism Cannot Be Stopped by Local Actions

The fundamental issue is that disabling interrupts is a LOCAL operation. It affects only the CPU executing the instruction. Other CPUs are entirely unaware that you've disabled your interrupts. They don't care. They will continue executing their code, including accesses to shared memory, regardless of what you do.

Per-CPU Interrupt Architecture

To fully understand why interrupt disabling is local, let's examine the hardware architecture of interrupt handling in multiprocessor systems.

The Local APIC (Advanced Programmable Interrupt Controller):

In modern x86 multiprocessor systems, each CPU has a Local APIC (LAPIC) integrated into the processor core. This is a crucial piece of hardware that:

Receives interrupt signals destined for this specific CPU
Prioritizes interrupts based on configured priorities
Signals the CPU core when an interrupt should be serviced
Manages the interrupt enable/disable state (the IF flag)

There's also an I/O APIC (a separate chip or integrated into the chipset) that:

Receives interrupts from I/O devices
Routes interrupts to the appropriate CPU's Local APIC
Supports load balancing of interrupts across CPUs

Converting Mermaid diagram...

Per-CPU State:

Each CPU maintains its own set of registers, including:

Program Counter (PC/RIP): What instruction this CPU is executing
Stack Pointer (SP/RSP): This CPU's current stack
General Purpose Registers: This CPU's working registers
Status Register (FLAGS/EFLAGS): Including the Interrupt Flag (IF)
Control Registers (CR0-CR4): System-level configuration

When you execute cli on CPU₀, it modifies CPU₀'s FLAGS register. CPU₁'s FLAGS register is completely separate and unaffected.

The Interrupt Delivery Path:

Let's trace an interrupt from device to handler:

Device signals interrupt → Electrical signal to I/O APIC
I/O APIC routes interrupt → Based on configuration, sends to specific LAPIC
LAPIC receives interrupt → Compares priority, checks if CPU is accepting
LAPIC checks IF flag → If IF=1, interrupt is delivered; if IF=0, held pending
CPU invokes handler → Only if IF=1, saves context and jumps to handler

lapic_structure.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/* Conceptual Local APIC register structure */
/* Each CPU has its OWN copy of these registers */
 
struct local_apic {
    /* Identification */
    uint32_t id;                    /* Unique LAPIC ID for this CPU */
    uint32_t version;               /* LAPIC version */
    
    /* Priority management */
    uint32_t task_priority;         /* TPR: below this, interrupts masked */
    uint32_t processor_priority;    /* PPR: effective priority */
    
    /* Interrupt state */
    uint32_t in_service[8];         /* ISR: currently servicing */
    uint32_t trigger_mode[8];       /* TMR: edge vs level */
    uint32_t request[8];            /* IRR: pending interrupts */
    
    /* Control */
    uint32_t spurious;              /* Spurious interrupt vector + enable */
    
    /* Inter-processor interrupts */
    uint32_t icr_low;               /* IPI destination + mode */
    uint32_t icr_high;              /* IPI target processor */
    
    /* Timer */
    uint32_t timer_initial;         /* Timer initial count */
    uint32_t timer_current;         /* Timer current count */
    uint32_t timer_divide;          /* Timer divisor */
};
 
/*
 * KEY POINT: The IF (Interrupt Flag) in EFLAGS is separate
 * from the LAPIC. CLI clears IF in the CPU's EFLAGS register.
 * The LAPIC can still receive and queue interrupts.
 * But the CPU won't service them until IF = 1.
 * 
 * EACH CPU HAS ITS OWN:
 * - EFLAGS.IF
 * - LAPIC registers
 * - Pending interrupt queue
 * 
 * Disabling interrupts on CPU 0 has NO EFFECT on CPU 1's
 * interrupt handling capability.
 */

Architectural Independence

The per-CPU nature of interrupt handling is not a design flaw—it's a deliberate architectural choice that enables scalability. If all CPUs shared a single interrupt enable/disable flag, that flag would become a massive bottleneck. Every interrupt-related operation would require global coordination, destroying parallelism.

Memory Access Races in Multiprocessor Systems

The synchronization problem on multiprocessors is fundamentally about memory access races, not about preemption. Even if you could magically disable preemption on all CPUs simultaneously, you'd still have race conditions because multiple CPUs can access memory at the same time.

The Memory Hierarchy Problem:

Modern multiprocessor systems have complex memory hierarchies:

CPU 0                  CPU 1
  │                      │
┌─┴─┐                  ┌─┴─┐
│L1$│                  │L1$│  ← Private per-CPU (32-64KB)
└─┬─┘                  └─┬─┘
┌─┴─┐                  ┌─┴─┐
│L2$│                  │L2$│  ← Private or shared per-core (256KB-1MB)
└─┬─┘                  └─┬─┘
  └────────┬───────────┘
         ┌─┴─┐
         │L3$│              ← Shared last-level cache (8-64MB)
         └─┬─┘
         ┌─┴─┐
         │RAM│              ← Main memory (gigabytes)
         └───┘

When CPU 0 writes to memory, the write goes to CPU 0's L1 cache first. CPU 1 doesn't immediately see this change—it might have an old copy in its own cache. Cache coherence protocols (like MESI) eventually propagate the change, but there's a window where CPUs disagree about memory contents.

Simultaneous Memory Operations:

Consider two CPUs executing the following at the exact same time:

CPU 0: mov eax, [counter]    CPU 1: mov eax, [counter]
CPU 0: add eax, 1            CPU 1: add eax, 1  
CPU 0: mov [counter], eax    CPU 1: mov [counter], eax

Even though each CPU has interrupts disabled (so neither can be preempted), both are executing truly simultaneously. The memory operations race:

Both reads might occur before either write (lost update)
Writes might occur in arbitrary order (last write wins)
The cache coherence protocol might serialize operations differently than expected

The problem isn't preemption—it's true parallelism.

Race Types: Uniprocessor vs Multiprocessor
Race Type	Cause	Uniprocessor	Multiprocessor
Read-Modify-Write	Interleaved operations	CLI prevents (no interleaving)	CLI doesn't help (true parallelism)
Check-Then-Act	State changes between check and action	CLI prevents (atomic execution)	CLI doesn't help (parallel checks)
Write Ordering	Compile/hardware reordering	Usually not visible	Critical (memory barriers needed)
Cache Visibility	Stale cache values	N/A (one cache)	Requires coherence protocol awareness

memory_race_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
/* Memory access races on multiprocessor systems */
 
volatile int flag = 0;
volatile int data = 0;
 
/* Producer (CPU 0) */
void producer(void) {
    disable_interrupts();  /* Has no effect on CPU 1! */
    
    data = 42;             /* Write data first */
    flag = 1;              /* Then signal ready */
    
    enable_interrupts();
}
 
/* Consumer (CPU 1) */
void consumer(void) {
    disable_interrupts();  /* Has no effect on CPU 0! */
    
    while (flag == 0) {
        /* Spin waiting for producer */
    }
    
    int value = data;      /* Read the data */
    /* value might be 0 or 42! */
    
    enable_interrupts();
}
 
/*
 * PROBLEM 1: Visibility
 * Even after producer writes flag = 1, consumer might
 * still see flag == 0 due to cache coherence delays.
 * 
 * PROBLEM 2: Reordering
 * Compiler or CPU might reorder the writes in producer:
 *   flag = 1;  // Moved before data = 42!
 *   data = 42;
 * 
 * Now consumer sees flag == 1 but data == 0.
 * 
 * SOLUTION: Memory barriers + proper synchronization
 * Disabling interrupts does NOTHING for these issues.
 */
 
/* CORRECT: Using atomic operations and barriers */
#include <stdatomic.h>
 
atomic_int flag_atomic = 0;
int data_protected = 0;
 
void producer_correct(void) {
    data_protected = 42;
    atomic_store_explicit(&flag_atomic, 1, memory_order_release);
    /* Release ensures data write is visible before flag write */
}
 
void consumer_correct(void) {
    while (atomic_load_explicit(&flag_atomic, memory_order_acquire) == 0) {
        /* Spin */
    }
    /* Acquire ensures we see all writes before the flag was set */
    int value = data_protected;  /* Guaranteed to be 42 */
}

It's Not About Preemption

On multiprocessors, the synchronization problem shifts from preventing interleaving (which interrupts control) to preventing simultaneous access (which requires atomic operations, locks, or lock-free algorithms). Disabling interrupts addresses the wrong problem entirely.

What Interrupt Disabling DOES Provide on SMP

While interrupt disabling doesn't provide mutual exclusion on multiprocessor systems, it's not useless. It provides specific guarantees that remain important:

1. Prevention of Local Preemption:

Disabling interrupts ensures that the current CPU won't switch to a different process or handle an interrupt while you're in a critical section. This is important when:

You're modifying per-CPU data structures that interrupt handlers might access
You need to complete a short sequence without being preempted
You're implementing higher-level synchronization primitives

local_protection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
/* 
 * PER-CPU DATA: Interrupt disabling IS sufficient
 * Each CPU has its own copy, so no cross-CPU races
 */
 
struct per_cpu_stats {
    unsigned long interrupts_handled;
    unsigned long context_switches;
    unsigned long syscalls;
};
 
DEFINE_PER_CPU(struct per_cpu_stats, cpu_stats);
 
/* Safe even on SMP: accessing only THIS CPU's data */
void update_cpu_stats(void) {
    unsigned long flags;
    
    local_irq_save(flags);
    /* No other CPU ever accesses OUR per-CPU stats */
    this_cpu_inc(cpu_stats.interrupts_handled);
    local_irq_restore(flags);
}
 
/* 
 * COMBINED PROTECTION: Spinlock + interrupt disable
 * Standard pattern for SMP kernel code
 */
 
spinlock_t global_lock;
 
void update_global_data(void) {
    unsigned long flags;
    
    /* spin_lock_irqsave: 
     * 1. Saves interrupt state
     * 2. Disables local interrupts 
     * 3. Acquires spinlock (spins if another CPU holds it)
     */
    spin_lock_irqsave(&global_lock, flags);
    
    /* 
     * Now we have BOTH:
     * - Mutual exclusion across CPUs (spinlock)
     * - Local preemption disabled (CLI)
     * 
     * Why both? An interrupt handler on OUR CPU might try
     * to acquire the same lock, causing deadlock if we
     * didn't disable interrupts.
     */
    modify_global_data();
    
    spin_unlock_irqrestore(&global_lock, flags);
}

2. Deadlock Prevention:

A critical use case for interrupt disabling on SMP is preventing deadlocks between thread context and interrupt context:

Scenario without interrupt disable:

Thread T on CPU 0 acquires spinlock L
Interrupt fires on CPU 0, handler tries to acquire L
Handler spins waiting for L
Thread T cannot run because CPU 0 is handling interrupt
DEADLOCK: Handler waits for T, but T can't run until handler finishes

Scenario with interrupt disable:

Thread T on CPU 0 disables interrupts, then acquires L
Interrupt fires on CPU 0, held pending (IF = 0)
Thread T continues, releases L, enables interrupts
Now interrupt handler runs, acquires L successfully
No deadlock

Valid Uses of Interrupt Disabling on SMP

•Per-CPU data access — When modifying data that only this CPU ever accesses, interrupt disabling provides complete protection
•Spinlock acquisition — Combined with spinlocks (spin_lock_irqsave) to prevent local interrupt-induced deadlocks
•Short critical sections before proper locking — Temporarily prevent preemption while setting up a more robust lock
•Low-level hardware manipulation — Some hardware register sequences must not be interrupted
•Timing-sensitive operations — When precise timing is required, prevent interrupt-induced latency

The SMP Pattern

On SMP systems, the standard pattern is: disable local interrupts, THEN acquire a spinlock. This provides both local protection (no preemption/interrupt during critical section) and global protection (no other CPU can enter). This is why spin_lock_irqsave() is so common in kernel code.

The Historical Transition: UP to SMP

The transition from uniprocessor to multiprocessor computing created one of the most significant shifts in operating system design. Understanding this history helps explain why we have the synchronization primitives we do today.

The Uniprocessor Era (1960s-1980s):

During this era, operating systems were designed with the uniprocessor assumption:

Disabling interrupts was the primary synchronization mechanism in kernels
Preemption points were the only concern for mutual exclusion
Memory models were simple—one cache, one view of memory
Code was written assuming sequential execution

Key Milestones in Multiprocessor OS Evolution
Year	System/Event	Significance
1962	Burroughs D825	First commercial multiprocessor
1969	CDC 7600	Multiple functional units (proto-SMP)
1986	Sequent Balance	Commercial SMP Unix systems
1989	Intel i860	Microprocessor designed for multiprocessor
1991	Linux 0.01	Born as uniprocessor-only
1996	Linux 2.0	First SMP support (BKL - Big Kernel Lock)
2000	Linux 2.4	Finer-grained locking begins
2003	Linux 2.6	Preemptible kernel, better SMP
2005	First dual-core x86 CPUs	Multicore goes mainstream
2008+	Many-core systems	Modern fine-grained synchronization

The SMP Retrofit Challenge:

When operating systems were adapted for SMP, engineers faced a fundamental challenge: immense codebases written with the assumption that cli meant "no one else runs."

The Big Kernel Lock (BKL):

Early SMP kernels used a simple, brute-force solution: a single global lock for the entire kernel. Before entering the kernel (from a system call or interrupt), acquire this lock. Before leaving, release it.

This was correct but catastrophic for performance:

Only one CPU could execute kernel code at a time
Other CPUs spun waiting, wasting resources
Kernel-intensive workloads saw minimal speedup from multiple CPUs

bkl_evolution.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* 
 * STAGE 1: Big Kernel Lock (BKL)
 * Simple but poor performance
 */
spinlock_t big_kernel_lock;
 
void syscall_entry(void) {
    spin_lock(&big_kernel_lock);  /* All syscalls serialize here */
    do_syscall();
    spin_unlock(&big_kernel_lock);
}
 
/* 
 * STAGE 2: Subsystem locks
 * Better parallelism, more complexity
 */
spinlock_t fs_lock;      /* Filesystem operations */
spinlock_t net_lock;     /* Network stack */
spinlock_t mm_lock;      /* Memory management */
spinlock_t sched_lock;   /* Scheduler */
 
void read_file(void) {
    spin_lock(&fs_lock);  /* Only serializes filesystem ops */
    do_read();
    spin_unlock(&fs_lock);
}
 
void send_packet(void) {
    spin_lock(&net_lock);  /* Filesystem can run in parallel! */
    do_send();
    spin_unlock(&net_lock);
}
 
/*
 * STAGE 3: Fine-grained per-object locks
 * Maximum parallelism, maximum complexity
 */
struct inode {
    spinlock_t lock;      /* This specific file's lock */
    /* ... other fields ... */
};
 
void read_inode(struct inode *inode) {
    spin_lock(&inode->lock);  /* Only this inode is locked */
    do_read_inode(inode);
    spin_unlock(&inode->lock);
}
 
/* 
 * Different files can now be read TRULY in parallel,
 * even on operations within the same subsystem!
 */

The Lesson:

The transition from UP to SMP taught the systems programming community that:

Synchronization is architecturally fundamental — It can't be retrofitted easily
Local guarantees don't compose — Disabling YOUR preemption means nothing about THEIR execution
Scalability requires fine-grained locking — Coarse locks become bottlenecks
Correctness gets harder — More concurrency means more potential for subtle bugs

Modern operating systems are designed from the ground up with SMP in mind, using a rich toolkit of spinlocks, mutexes, RCU, atomic operations, and lock-free data structures—all of which exist because simple interrupt disabling doesn't work.

Legacy Code Warning

You may encounter old kernel code or documentation that uses interrupt disabling for mutual exclusion without spinlocks. This code was written for uniprocessor systems and may not be correct on SMP. Always verify synchronization assumptions in legacy code before running on modern multicore systems.

Summary: The Single Processor Limitation

We've thoroughly examined why interrupt disabling provides mutual exclusion only on uniprocessor systems. Let's consolidate the key insights:

Key Takeaways

•Uniprocessors have pseudo-concurrency — Only interleaved execution, never true parallelism. Blocking interleaving (via interrupts) provides mutual exclusion.
•Each CPU has its own interrupt flag — CLI on CPU₀ only affects CPU₀. CPUs 1, 2, ...n continue running independently.
•Multiprocessor race conditions are about simultaneous access — Not interleaving. Both CPUs execute at the same instant.
•Memory access races require different solutions — Atomic operations, locks, memory barriers—mechanisms that operate across CPUs.
•Interrupt disabling still has SMP uses — Per-CPU data protection, deadlock prevention with spinlocks, short timing-critical sections.
•Historical evolution required new abstractions — From BKL to fine-grained locking to lock-free algorithms.

What's Next:

Now that we understand the scope of interrupt disabling, we'll examine another critical aspect: it's a privileged operation. User-space programs cannot disable interrupts—this is a kernel-only capability, and for very good reasons.

Page Complete

You now understand why the interrupt disable approach works only on single-processor systems. The per-CPU nature of interrupt control, the reality of true parallelism on SMP systems, and the different nature of multiprocessor race conditions all explain this limitation. Next, we'll explore why this capability is restricted to privileged kernel code.

2 / 5

Loading learning content...

Operating SystemsDisabling Interrupts

Disabling Interrupts for Mutual Exclusion

LevelIntermediate

Duration60 mins

TopicDisabling Interrupts

2 / 5

Single Processor Only

The Multiprocessor Reality

But there's a critical assumption hidden in this guarantee.

In a multiprocessor system, disabling interrupts on one CPU has zero effect on any other CPU.

Critical Limitation

The Uniprocessor Model: Why It Works

The Uniprocessor Execution Model:

In a uniprocessor system, execution follows a simple model:

There is exactly one CPU capable of executing instructions
At any instant, that CPU is either:
- Executing user process A, or
- Executing user process B, or
- Executing kernel code, or
- Halted (idle, waiting for interrupts)
Transitions between these states occur through:
- Interrupts (timer, I/O, etc.)
- System calls (voluntary transition to kernel)
- Exceptions (page faults, traps)

The key insight is that only one thread of execution exists at any moment. Concurrent execution is an illusion created by rapid context switching.

Converting Mermaid diagram...

Concurrency vs Parallelism on Uniprocessors:

On a uniprocessor, we have concurrency but not parallelism:

Concurrency: Multiple processes make progress over time through interleaved execution
Parallelism: Multiple processes execute simultaneously at the same instant

Mathematical Formalization:

Let's formalize this. Define:

T = the set of all time instants
P = the set of all processes
running(t) = the single process executing at time t

For a uniprocessor: ∀t ∈ T: |running(t)| = 1

The Exclusive Resource Principle

The Multiprocessor Reality: Why It Fails

Now let's examine what changes in a multiprocessor system and why the interrupt disable approach completely breaks down.

The Multiprocessor Execution Model:

In a symmetric multiprocessor (SMP) system:

There are N CPUs (cores), each capable of independent execution
At any instant, up to N processes can execute simultaneously
Each CPU has its own:
- Program counter and registers
- Interrupt flag (IF) in its status register
- Local APIC (Advanced Programmable Interrupt Controller)
- Cache hierarchy (L1, typically L2)
All CPUs share:
- Main memory (RAM)
- Last-level cache (often L3)
- Peripheral devices
- The operating system's data structures

The presence of shared memory is precisely what creates the synchronization problem. Multiple CPUs can read and write the same memory locations simultaneously.

Uniprocessor vs Multiprocessor Characteristics
Characteristic	Uniprocessor	Multiprocessor
Active execution units	1	N (typically 2-256)
Simultaneous processes	1 (illusion of many)	Up to N truly parallel
Interrupt flag scope	Global (one CPU)	Per-CPU (local only)
Context switch source	Timer/I/O interrupts only	Timer interrupts + true parallelism
Race condition source	Only during interleaving	True simultaneous access
CLI effect	Blocks ALL other execution	Blocks only local CPU preemption
Memory visibility	Trivial (one cache)	Complex (coherence protocols)

The Fatal Flaw:

When CPU₀ executes cli, it sets its local IF = 0. This prevents:

Timer interrupts from preempting processes on CPU₀
I/O interrupts from being serviced by CPU₀

But CPU₁, CPU₂, ... CPUₙ₋₁ all have their own interrupt flags, which remain set to 1. They continue operating normally:

Running their own processes
Servicing their own interrupts
Accessing the same shared memory

Scenario: The Broken Synchronization

broken_multiprocessor.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* BROKEN: This does NOT work on multiprocessor systems! */
/* Shared counter accessed by processes on multiple CPUs */
 
volatile int shared_counter = 0;
 
void increment_counter(void) {
    disable_interrupts();     /* Only affects THIS CPU! */
    
    /* 
     * DANGEROUS: We think we have mutual exclusion,
     * but another CPU could be executing this exact
     * same code AT THE SAME INSTANT
     */
    int temp = shared_counter;
    temp = temp + 1;
    shared_counter = temp;
    
    enable_interrupts();
}
 
/*
 * RACE CONDITION EXAMPLE:
 * 
 * Time    CPU 0                       CPU 1
 * ----    -----                       -----
 * t0      disable_interrupts()        disable_interrupts()
 * t1      temp₀ = shared_counter (0)  temp₁ = shared_counter (0)
 * t2      temp₀ = temp₀ + 1 (1)       temp₁ = temp₁ + 1 (1)
 * t3      shared_counter = temp₀ (1)  shared_counter = temp₁ (1)
 * t4      enable_interrupts()         enable_interrupts()
 * 
 * RESULT: Counter should be 2, but is 1!
 * Both CPUs had their interrupts disabled.
 * Neither could be preempted.
 * But they ran SIMULTANEOUSLY and corrupted the data.
 */

Converting Mermaid diagram...

True Parallelism Cannot Be Stopped by Local Actions

Per-CPU Interrupt Architecture

To fully understand why interrupt disabling is local, let's examine the hardware architecture of interrupt handling in multiprocessor systems.

The Local APIC (Advanced Programmable Interrupt Controller):

In modern x86 multiprocessor systems, each CPU has a Local APIC (LAPIC) integrated into the processor core. This is a crucial piece of hardware that:

Receives interrupt signals destined for this specific CPU
Prioritizes interrupts based on configured priorities
Signals the CPU core when an interrupt should be serviced
Manages the interrupt enable/disable state (the IF flag)

There's also an I/O APIC (a separate chip or integrated into the chipset) that:

Receives interrupts from I/O devices
Routes interrupts to the appropriate CPU's Local APIC
Supports load balancing of interrupts across CPUs

Converting Mermaid diagram...

Per-CPU State:

Each CPU maintains its own set of registers, including:

Program Counter (PC/RIP): What instruction this CPU is executing
Stack Pointer (SP/RSP): This CPU's current stack
General Purpose Registers: This CPU's working registers
Status Register (FLAGS/EFLAGS): Including the Interrupt Flag (IF)
Control Registers (CR0-CR4): System-level configuration

When you execute cli on CPU₀, it modifies CPU₀'s FLAGS register. CPU₁'s FLAGS register is completely separate and unaffected.

The Interrupt Delivery Path:

Let's trace an interrupt from device to handler:

Device signals interrupt → Electrical signal to I/O APIC
I/O APIC routes interrupt → Based on configuration, sends to specific LAPIC
LAPIC receives interrupt → Compares priority, checks if CPU is accepting
LAPIC checks IF flag → If IF=1, interrupt is delivered; if IF=0, held pending
CPU invokes handler → Only if IF=1, saves context and jumps to handler

lapic_structure.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/* Conceptual Local APIC register structure */
/* Each CPU has its OWN copy of these registers */
 
struct local_apic {
    /* Identification */
    uint32_t id;                    /* Unique LAPIC ID for this CPU */
    uint32_t version;               /* LAPIC version */
    
    /* Priority management */
    uint32_t task_priority;         /* TPR: below this, interrupts masked */
    uint32_t processor_priority;    /* PPR: effective priority */
    
    /* Interrupt state */
    uint32_t in_service[8];         /* ISR: currently servicing */
    uint32_t trigger_mode[8];       /* TMR: edge vs level */
    uint32_t request[8];            /* IRR: pending interrupts */
    
    /* Control */
    uint32_t spurious;              /* Spurious interrupt vector + enable */
    
    /* Inter-processor interrupts */
    uint32_t icr_low;               /* IPI destination + mode */
    uint32_t icr_high;              /* IPI target processor */
    
    /* Timer */
    uint32_t timer_initial;         /* Timer initial count */
    uint32_t timer_current;         /* Timer current count */
    uint32_t timer_divide;          /* Timer divisor */
};
 
/*
 * KEY POINT: The IF (Interrupt Flag) in EFLAGS is separate
 * from the LAPIC. CLI clears IF in the CPU's EFLAGS register.
 * The LAPIC can still receive and queue interrupts.
 * But the CPU won't service them until IF = 1.
 * 
 * EACH CPU HAS ITS OWN:
 * - EFLAGS.IF
 * - LAPIC registers
 * - Pending interrupt queue
 * 
 * Disabling interrupts on CPU 0 has NO EFFECT on CPU 1's
 * interrupt handling capability.
 */

Architectural Independence

Memory Access Races in Multiprocessor Systems

The Memory Hierarchy Problem:

Modern multiprocessor systems have complex memory hierarchies:

CPU 0                  CPU 1
  │                      │
┌─┴─┐                  ┌─┴─┐
│L1$│                  │L1$│  ← Private per-CPU (32-64KB)
└─┬─┘                  └─┬─┘
┌─┴─┐                  ┌─┴─┐
│L2$│                  │L2$│  ← Private or shared per-core (256KB-1MB)
└─┬─┘                  └─┬─┘
  └────────┬───────────┘
         ┌─┴─┐
         │L3$│              ← Shared last-level cache (8-64MB)
         └─┬─┘
         ┌─┴─┐
         │RAM│              ← Main memory (gigabytes)
         └───┘

Simultaneous Memory Operations:

Consider two CPUs executing the following at the exact same time:

CPU 0: mov eax, [counter]    CPU 1: mov eax, [counter]
CPU 0: add eax, 1            CPU 1: add eax, 1  
CPU 0: mov [counter], eax    CPU 1: mov [counter], eax

Even though each CPU has interrupts disabled (so neither can be preempted), both are executing truly simultaneously. The memory operations race:

Both reads might occur before either write (lost update)
Writes might occur in arbitrary order (last write wins)
The cache coherence protocol might serialize operations differently than expected

The problem isn't preemption—it's true parallelism.

Race Types: Uniprocessor vs Multiprocessor
Race Type	Cause	Uniprocessor	Multiprocessor
Read-Modify-Write	Interleaved operations	CLI prevents (no interleaving)	CLI doesn't help (true parallelism)
Check-Then-Act	State changes between check and action	CLI prevents (atomic execution)	CLI doesn't help (parallel checks)
Write Ordering	Compile/hardware reordering	Usually not visible	Critical (memory barriers needed)
Cache Visibility	Stale cache values	N/A (one cache)	Requires coherence protocol awareness

memory_race_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
/* Memory access races on multiprocessor systems */
 
volatile int flag = 0;
volatile int data = 0;
 
/* Producer (CPU 0) */
void producer(void) {
    disable_interrupts();  /* Has no effect on CPU 1! */
    
    data = 42;             /* Write data first */
    flag = 1;              /* Then signal ready */
    
    enable_interrupts();
}
 
/* Consumer (CPU 1) */
void consumer(void) {
    disable_interrupts();  /* Has no effect on CPU 0! */
    
    while (flag == 0) {
        /* Spin waiting for producer */
    }
    
    int value = data;      /* Read the data */
    /* value might be 0 or 42! */
    
    enable_interrupts();
}
 
/*
 * PROBLEM 1: Visibility
 * Even after producer writes flag = 1, consumer might
 * still see flag == 0 due to cache coherence delays.
 * 
 * PROBLEM 2: Reordering
 * Compiler or CPU might reorder the writes in producer:
 *   flag = 1;  // Moved before data = 42!
 *   data = 42;
 * 
 * Now consumer sees flag == 1 but data == 0.
 * 
 * SOLUTION: Memory barriers + proper synchronization
 * Disabling interrupts does NOTHING for these issues.
 */
 
/* CORRECT: Using atomic operations and barriers */
#include <stdatomic.h>
 
atomic_int flag_atomic = 0;
int data_protected = 0;
 
void producer_correct(void) {
    data_protected = 42;
    atomic_store_explicit(&flag_atomic, 1, memory_order_release);
    /* Release ensures data write is visible before flag write */
}
 
void consumer_correct(void) {
    while (atomic_load_explicit(&flag_atomic, memory_order_acquire) == 0) {
        /* Spin */
    }
    /* Acquire ensures we see all writes before the flag was set */
    int value = data_protected;  /* Guaranteed to be 42 */
}

It's Not About Preemption

What Interrupt Disabling DOES Provide on SMP

While interrupt disabling doesn't provide mutual exclusion on multiprocessor systems, it's not useless. It provides specific guarantees that remain important:

1. Prevention of Local Preemption:

Disabling interrupts ensures that the current CPU won't switch to a different process or handle an interrupt while you're in a critical section. This is important when:

You're modifying per-CPU data structures that interrupt handlers might access
You need to complete a short sequence without being preempted
You're implementing higher-level synchronization primitives

local_protection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
/* 
 * PER-CPU DATA: Interrupt disabling IS sufficient
 * Each CPU has its own copy, so no cross-CPU races
 */
 
struct per_cpu_stats {
    unsigned long interrupts_handled;
    unsigned long context_switches;
    unsigned long syscalls;
};
 
DEFINE_PER_CPU(struct per_cpu_stats, cpu_stats);
 
/* Safe even on SMP: accessing only THIS CPU's data */
void update_cpu_stats(void) {
    unsigned long flags;
    
    local_irq_save(flags);
    /* No other CPU ever accesses OUR per-CPU stats */
    this_cpu_inc(cpu_stats.interrupts_handled);
    local_irq_restore(flags);
}
 
/* 
 * COMBINED PROTECTION: Spinlock + interrupt disable
 * Standard pattern for SMP kernel code
 */
 
spinlock_t global_lock;
 
void update_global_data(void) {
    unsigned long flags;
    
    /* spin_lock_irqsave: 
     * 1. Saves interrupt state
     * 2. Disables local interrupts 
     * 3. Acquires spinlock (spins if another CPU holds it)
     */
    spin_lock_irqsave(&global_lock, flags);
    
    /* 
     * Now we have BOTH:
     * - Mutual exclusion across CPUs (spinlock)
     * - Local preemption disabled (CLI)
     * 
     * Why both? An interrupt handler on OUR CPU might try
     * to acquire the same lock, causing deadlock if we
     * didn't disable interrupts.
     */
    modify_global_data();
    
    spin_unlock_irqrestore(&global_lock, flags);
}

2. Deadlock Prevention:

A critical use case for interrupt disabling on SMP is preventing deadlocks between thread context and interrupt context:

Scenario without interrupt disable:

Thread T on CPU 0 acquires spinlock L
Interrupt fires on CPU 0, handler tries to acquire L
Handler spins waiting for L
Thread T cannot run because CPU 0 is handling interrupt
DEADLOCK: Handler waits for T, but T can't run until handler finishes

Scenario with interrupt disable:

Thread T on CPU 0 disables interrupts, then acquires L
Interrupt fires on CPU 0, held pending (IF = 0)
Thread T continues, releases L, enables interrupts
Now interrupt handler runs, acquires L successfully
No deadlock

Valid Uses of Interrupt Disabling on SMP

•Per-CPU data access — When modifying data that only this CPU ever accesses, interrupt disabling provides complete protection
•Spinlock acquisition — Combined with spinlocks (spin_lock_irqsave) to prevent local interrupt-induced deadlocks
•Short critical sections before proper locking — Temporarily prevent preemption while setting up a more robust lock
•Low-level hardware manipulation — Some hardware register sequences must not be interrupted
•Timing-sensitive operations — When precise timing is required, prevent interrupt-induced latency

The SMP Pattern

The Historical Transition: UP to SMP

The Uniprocessor Era (1960s-1980s):

During this era, operating systems were designed with the uniprocessor assumption:

Disabling interrupts was the primary synchronization mechanism in kernels
Preemption points were the only concern for mutual exclusion
Memory models were simple—one cache, one view of memory
Code was written assuming sequential execution

Key Milestones in Multiprocessor OS Evolution
Year	System/Event	Significance
1962	Burroughs D825	First commercial multiprocessor
1969	CDC 7600	Multiple functional units (proto-SMP)
1986	Sequent Balance	Commercial SMP Unix systems
1989	Intel i860	Microprocessor designed for multiprocessor
1991	Linux 0.01	Born as uniprocessor-only
1996	Linux 2.0	First SMP support (BKL - Big Kernel Lock)
2000	Linux 2.4	Finer-grained locking begins
2003	Linux 2.6	Preemptible kernel, better SMP
2005	First dual-core x86 CPUs	Multicore goes mainstream
2008+	Many-core systems	Modern fine-grained synchronization

The SMP Retrofit Challenge:

When operating systems were adapted for SMP, engineers faced a fundamental challenge: immense codebases written with the assumption that cli meant "no one else runs."

The Big Kernel Lock (BKL):

This was correct but catastrophic for performance:

Only one CPU could execute kernel code at a time
Other CPUs spun waiting, wasting resources
Kernel-intensive workloads saw minimal speedup from multiple CPUs

bkl_evolution.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* 
 * STAGE 1: Big Kernel Lock (BKL)
 * Simple but poor performance
 */
spinlock_t big_kernel_lock;
 
void syscall_entry(void) {
    spin_lock(&big_kernel_lock);  /* All syscalls serialize here */
    do_syscall();
    spin_unlock(&big_kernel_lock);
}
 
/* 
 * STAGE 2: Subsystem locks
 * Better parallelism, more complexity
 */
spinlock_t fs_lock;      /* Filesystem operations */
spinlock_t net_lock;     /* Network stack */
spinlock_t mm_lock;      /* Memory management */
spinlock_t sched_lock;   /* Scheduler */
 
void read_file(void) {
    spin_lock(&fs_lock);  /* Only serializes filesystem ops */
    do_read();
    spin_unlock(&fs_lock);
}
 
void send_packet(void) {
    spin_lock(&net_lock);  /* Filesystem can run in parallel! */
    do_send();
    spin_unlock(&net_lock);
}
 
/*
 * STAGE 3: Fine-grained per-object locks
 * Maximum parallelism, maximum complexity
 */
struct inode {
    spinlock_t lock;      /* This specific file's lock */
    /* ... other fields ... */
};
 
void read_inode(struct inode *inode) {
    spin_lock(&inode->lock);  /* Only this inode is locked */
    do_read_inode(inode);
    spin_unlock(&inode->lock);
}
 
/* 
 * Different files can now be read TRULY in parallel,
 * even on operations within the same subsystem!
 */

The Lesson:

The transition from UP to SMP taught the systems programming community that:

Synchronization is architecturally fundamental — It can't be retrofitted easily
Local guarantees don't compose — Disabling YOUR preemption means nothing about THEIR execution
Scalability requires fine-grained locking — Coarse locks become bottlenecks
Correctness gets harder — More concurrency means more potential for subtle bugs

Legacy Code Warning

Summary: The Single Processor Limitation

We've thoroughly examined why interrupt disabling provides mutual exclusion only on uniprocessor systems. Let's consolidate the key insights:

Key Takeaways

•Uniprocessors have pseudo-concurrency — Only interleaved execution, never true parallelism. Blocking interleaving (via interrupts) provides mutual exclusion.
•Each CPU has its own interrupt flag — CLI on CPU₀ only affects CPU₀. CPUs 1, 2, ...n continue running independently.
•Multiprocessor race conditions are about simultaneous access — Not interleaving. Both CPUs execute at the same instant.
•Memory access races require different solutions — Atomic operations, locks, memory barriers—mechanisms that operate across CPUs.
•Interrupt disabling still has SMP uses — Per-CPU data protection, deadlock prevention with spinlocks, short timing-critical sections.
•Historical evolution required new abstractions — From BKL to fine-grained locking to lock-free algorithms.

What's Next:

Page Complete

2 / 5