Loading learning content...
Symmetric Multiprocessing (SMP) dominates modern general-purpose computing, but it's not the only multiprocessor architecture—nor is it always the best choice. Before SMP became ubiquitous, and continuing in specialized domains today, Asymmetric Multiprocessing (AMP) offers an alternative model where processors have differentiated, specialized roles.
Understanding AMP is essential for several reasons: it illuminates why SMP design decisions were made by contrasting with alternatives; it explains architectures still prevalent in embedded systems, real-time computing, and heterogeneous platforms; and it provides context for emerging hybrid architectures that blend symmetric and asymmetric characteristics.
By the end of this page, you will understand Asymmetric Multiprocessing architectures—their design principles, the master-slave scheduling model, advantages in specialized scenarios, and why general-purpose computing moved toward symmetry. You'll also see how modern heterogeneous processors represent a renaissance of asymmetric concepts within nominally symmetric systems.
Asymmetric Multiprocessing (AMP) describes multiprocessor architectures where processors have different roles, capabilities, or access rights. The "asymmetry" can manifest in several forms:
1. Role Asymmetry (Master-Slave):
One processor (the "master") runs the operating system kernel and makes all scheduling decisions. Other processors ("slaves") execute user applications as directed by the master but do not directly handle system services.
2. Capability Asymmetry:
Processors have different instruction sets, performance characteristics, or functional capabilities. Some processors might handle floating-point operations while others cannot; some might have access to specific I/O devices while others do not.
3. Access Asymmetry:
Processors have different access rights to memory regions, I/O devices, or system resources. Even with identical hardware, the operating system may restrict what each processor can do.
4. Execution Asymmetry:
Certain code (kernel, interrupt handlers) runs only on designated processors, while other code (user applications) may run on a different set of processors.
Real systems exist on a spectrum between pure SMP and pure AMP. Many "SMP" systems have subtle asymmetries (CPU 0 often handles more interrupts, bootstrap always starts on a specific processor). Conversely, "AMP" systems may have processors that are hardware-identical but software-differentiated. The label reflects the predominant design philosophy rather than absolute purity.
Historical Context:
Asymmetric multiprocessing predates symmetric multiprocessing. Early multiprocessor systems in the 1960s and 1970s often used asymmetric designs for practical reasons:
The master-slave model provided a pragmatic path to multiprocessing without requiring the full complexity of symmetric operation. The master processor ran the single-threaded OS kernel safely, while slave processors provided additional compute capacity.
The most common form of asymmetric multiprocessing is the master-slave (also called master-worker or boss-worker) architecture. This model creates a clear division of responsibilities between processors.
Master Processor Responsibilities:
Slave Processor Responsibilities:
Master-Slave AMP System Architecture: ┌─────────────────────────────────────────────────────────────────┐│ SYSTEM OVERVIEW │└─────────────────────────────────────────────────────────────────┘ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ MASTER (CPU 0) │ │ SLAVE (CPU 1) │ │ SLAVE (CPU 2) │ │ │ │ │ │ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │ │ │ Kernel │ │ │ │ │ │ │ │ │ │ │ │ - Scheduler│ │ │ │ User │ │ │ │ User │ │ │ │ - Memory │ │ │ │ Process A │ │ │ │ Process B │ │ │ │ - I/O │ │ │ │ │ │ │ │ │ │ │ │ - Syscalls │ │ │ │ │ │ │ │ │ │ │ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │ │ │ │ │ │ │ │ [IRQ Handler] │ │ [Trap to Master]│ │ [Trap to Master]│ │ [Timer IRQ] │ │ [for syscalls] │ │ [for syscalls] │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ │ ┌────────────────────┼────────────────────────┤ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ SHARED MEMORY BUS │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ MAIN MEMORY │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │ │ Kernel Code │ │ User Process │ │ User Process │ │ │ │ & Data │ │ A Memory │ │ B Memory │ │ │ │ (Master only)│ │ │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ Control Flow for System Call from Slave: 1. Process on Slave 1 calls read() 2. Slave 1 generates inter-processor interrupt to Master 3. Master is interrupted, saves context 4. Master handles read() syscall (schedules I/O, blocks process) 5. Master selects next process for Slave 1 6. Master signals Slave 1 with new assignment 7. Slave 1 resumes with new processSystem Call Handling in Master-Slave Systems:
One of the most significant differences from SMP is how system calls work. In an SMP system, each processor can execute kernel code directly—a system call runs to completion on the processor where it was invoked. In master-slave AMP:
This round-trip to the master for every system call creates significant overhead, especially for system-call-intensive workloads.
The master processor becomes a critical bottleneck in AMP systems. Every system call, every interrupt, every scheduling decision flows through it. As slave count increases, the master can become saturated, limiting system scalability. This fundamental limitation drove the transition to SMP for general-purpose computing, where any processor can handle any kernel operation.
Scheduling in AMP systems is fundamentally different from SMP scheduling because all scheduling decisions are centralized in the master processor. This creates both simplifications and limitations.
Centralized Scheduling Advantages:
Centralized Scheduling Disadvantages:
The AMP Scheduling Loop:
The master processor runs a scheduling loop that manages all slaves:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
/* Conceptual AMP Master Scheduler */ void master_scheduler_loop(void) { while (true) { /* Check for events from slaves */ for (int slave = 1; slave < num_cpus; slave++) { if (slave_event_pending(slave)) { handle_slave_event(slave); } } /* Handle timer expiration - check time slice exhaustion */ check_timers_and_preempt(); /* Handle I/O completions */ handle_io_completions(); /* Assign processes to idle slaves */ for (int slave = 1; slave < num_cpus; slave++) { if (slave_is_idle(slave)) { struct task *next = pick_next_task(); if (next != NULL) { assign_task_to_slave(slave, next); } } } /* Run any master-only tasks */ run_master_tasks(); }} void handle_slave_event(int slave) { enum event_type event = get_slave_event(slave); switch (event) { case SYSCALL_REQUEST: /* Execute syscall on behalf of slave's process */ execute_syscall_for_slave(slave); break; case TIME_SLICE_EXPIRED: /* Slave's process used its quantum */ preempt_slave_process(slave); break; case PROCESS_EXITED: /* Clean up and find new work for slave */ cleanup_exited_process(slave); assign_task_to_slave(slave, pick_next_task()); break; case BLOCKING_EVENT: /* Process blocked (waiting for I/O, lock, etc.) */ block_process_on_slave(slave); assign_task_to_slave(slave, pick_next_task()); break; }}Despite its general-purpose limitations, the centralized nature of AMP scheduling provides advantages for real-time systems. Timing analysis is simpler when all scheduling decisions funnel through one point. Worst-Case Execution Time (WCET) calculations don't need to account for inter-processor scheduling races. This predictability is why AMP persists in safety-critical embedded systems where certification requires demonstrable timing guarantees.
Understanding the trade-offs between AMP and SMP illuminates why modern general-purpose systems overwhelmingly use SMP, while AMP persists in specialized domains.
| Characteristic | Asymmetric (AMP) | Symmetric (SMP) |
|---|---|---|
| Kernel execution | Master processor only | Any/all processors |
| Scheduling decisions | Centralized, serialized | Distributed, potentially concurrent |
| System call latency | High (round-trip to master) | Low (local execution) |
| Interrupt handling | Master processor only | Distributed across processors |
| Scalability | Limited by master capacity | Limited by synchronization overhead |
| Implementation complexity | Lower (simpler kernel) | Higher (concurrent kernel) |
| Timing predictability | High (deterministic) | Lower (concurrent interactions) |
| Fault tolerance | Lower (SPOF at master) | Higher (graceful degradation possible) |
| Cache efficiency | Variable (master overhead) | Good (local execution) |
| Load balancing | Centralized, optimal knowledge | Distributed, heuristic-based |
Scalability Deep Dive:
The scalability characteristics of AMP and SMP differ fundamentally:
AMP Scalability Limit:
In AMP, the master processor handles:
As slave count increases, master load grows linearly. Eventually, the master becomes 100% utilized handling slave requests, creating a hard ceiling. Practical AMP systems rarely exceed 4-8 slaves.
SMP Scalability Limit:
In SMP, scalability is limited by:
These limitations are less severe for well-designed workloads. SMP systems routinely scale to 64-128+ processors, with careful attention to lock granularity and data locality.
While SMP dominates general-purpose computing, asymmetric multiprocessing remains prevalent in specialized domains where its characteristics provide advantages.
1. Embedded Real-Time Systems:
Safety-critical systems in automotive, aerospace, and industrial control often use AMP:
2. Heterogeneous Processing:
Systems with different processor types are inherently asymmetric:
3. AMP on Multi-Core Chips:
Some multi-core embedded systems deliberately use AMP on otherwise symmetric hardware:
4. Boot and Initialization:
Even SMP systems often boot asymmetrically:
Even production SMP systems often have AMP-like behaviors. CPU 0 typically handles time-sensitive operations like timekeeping. Interrupt affinity settings may direct all interrupts to specific processors. The kernel's boot sequence is inherently asymmetric. Understanding AMP helps recognize and debug these asymmetric aspects within nominally symmetric systems.
Modern processors are experiencing a renaissance of asymmetric concepts within nominally symmetric packaging. Heterogeneous multi-core processors combine different core types on a single chip, challenging the pure SMP model.
ARM big.LITTLE and DynamIQ:
ARM pioneered heterogeneous multi-core with big.LITTLE architecture:
DynamIQ extends this, allowing more flexible core mixing (e.g., 2 big + 6 LITTLE) and enabling big and LITTLE cores to share cache clusters.
Intel Performance and Efficiency Cores:
Intel's 12th generation (Alder Lake) and beyond use a similar approach:
| Characteristic | Performance Cores | Efficiency Cores |
|---|---|---|
| Microarchitecture | Complex out-of-order | Simpler in-order or narrow OoO |
| Clock speed | Higher (up to 5.8 GHz) | Lower (up to 4.3 GHz) |
| Power consumption | Higher per core | 1/4 to 1/2 of P-core |
| SMT/Hyper-Threading | Yes (2 threads/core) | No |
| Die area | Larger | ~1/4 P-core size |
| Cache | Larger L1/L2 | Smaller, often shared L2 |
| Best workloads | Latency-sensitive, bursty | Background, throughput |
Scheduling Implications:
Heterogeneous processors create scheduling challenges that echo AMP concerns:
Linux has developed sophisticated mechanisms for heterogeneous scheduling, including the EEVDF (Earliest Eligible Virtual Deadline First) enhancements to CFS that consider core asymmetry.
Heterogeneous processors signal a shift away from pure SMP toward hybrid architectures. The scheduling challenge now includes matching task characteristics to core capabilities—a form of resource-aware scheduling that borrows from both SMP (multiple cores running the same OS) and AMP (different core roles) traditions. Expect this trend to accelerate as power efficiency becomes ever more critical.
For system designers considering AMP architectures—common in embedded development—there are practical implementation considerations beyond the theoretical model.
Memory Layout Strategies:
AMP systems typically partition memory explicitly:
This explicit partitioning simplifies memory protection but requires careful design of shared data structures and communication protocols.
Typical AMP Memory Layout: Address Space┌─────────────────────────────────────────────────────────────┐│ 0xFFFFFFFF │ Interrupt Vectors (Master only) │├─────────────────────────────────────────────────────────────┤│ │ ││ │ Master Private Memory ││ │ - Kernel code and data ││ │ - Kernel stacks ││ │ - Master-only peripherals ││ │ │├─────────────────────────────────────────────────────────────┤│ │ ││ │ Shared Memory Region ││ │ - Inter-processor mailboxes ││ │ - Work queues (tasks for slaves) ││ │ - Completion queues (results from slaves) ││ │ - Shared data buffers ││ │ │├─────────────────────────────────────────────────────────────┤│ │ ││ │ Slave 1 Private Memory ││ │ - Application code ││ │ - Stacks and local data ││ │ │├─────────────────────────────────────────────────────────────┤│ │ ││ │ Slave 2 Private Memory ││ │ - Application code ││ │ - Stacks and local data ││ │ │├─────────────────────────────────────────────────────────────┤│ 0x00000000 │ Boot/ROM region │└─────────────────────────────────────────────────────────────┘ Memory Protection Unit (MPU) Configuration:- Master: Full access to all regions - Slave 1: Private + Shared only- Slave 2: Private + Shared only- Violations generate faults to MasterInter-Processor Communication Patterns:
AMP systems need reliable mechanisms for master-slave communication:
1. Hardware Mailboxes: Dedicated hardware registers for message passing. Writing to a mailbox can trigger an interrupt on the receiving processor.
2. Shared Memory Queues: Ring buffers in shared memory with careful synchronization. Lock-free designs using atomic operations are preferred.
3. Software IPIs: Generic inter-processor interrupts that signal attention needed, with details in shared memory.
4. Doorbell Registers: Simple signaling mechanism—one processor writes, another monitors and responds.
Even in AMP systems where the master makes all decisions, concurrent access to shared memory requires careful synchronization. The master might update a work queue while a slave reads it. Memory barriers and atomic operations remain essential. The simplification is in kernel internals, not in all inter-processor coordination.
We have explored Asymmetric Multiprocessing from its historical origins through its modern applications. This knowledge complements our SMP understanding and provides context for the full spectrum of multiprocessor architectures.
Consolidating Our Understanding:
What's Next:
With both SMP and AMP architectures understood, we'll explore Processor Affinity—the mechanisms that bind processes to specific processors. Affinity bridges both paradigms: in SMP, it's an optimization to preserve cache locality; in AMP, it can be a necessity for correctness. Understanding affinity is essential for effective multi-processor scheduling.
You now understand both symmetric and asymmetric multiprocessing architectures—the fundamental design choices for multiprocessor systems. This dual perspective enables you to evaluate scheduling strategies in context: what works for SMP may not suit AMP, and vice versa. Modern hybrid architectures require understanding both traditions.