Loading learning content...
In multiprocessor systems, the fundamental promise of parallelism—multiple CPUs working simultaneously to achieve greater throughput—can be undermined by a deceptively simple problem: load imbalance. When some processors are overwhelmed with work while others sit idle, the system fails to realize its full potential, wasting both computational resources and power.
Push migration represents one of the most intuitive and widely-deployed solutions to this problem. It embodies a proactive philosophy: when a processor detects that it is overloaded relative to its peers, it actively 'pushes' excess tasks to less-loaded processors, redistributing work before the imbalance becomes severe.
This page provides a comprehensive exploration of push migration—its mechanisms, implementation strategies, performance characteristics, and role in modern operating system schedulers. By the end, you will understand not just how push migration works, but when it is the optimal choice and why it behaves the way it does under various workload conditions.
By completing this page, you will understand: (1) The fundamental concept and motivation behind push migration, (2) The architectural components required for implementation, (3) Algorithms for detecting overload and selecting migration candidates, (4) Integration with SMP schedulers in production operating systems, and (5) The performance tradeoffs inherent in proactive load redistribution.
To understand push migration deeply, we must first appreciate the operating environment it addresses and the specific problem it solves.
The Multiprocessor Scheduling Context
In symmetric multiprocessing (SMP) systems, each CPU maintains its own run queue—a data structure containing processes ready to execute. The scheduler on each CPU independently selects the next process to run from its local queue. This per-CPU queue architecture is essential for:
However, this distributed design introduces a fundamental challenge: run queues can become unbalanced. One CPU might have 20 runnable processes while another has none. Without intervention, the overloaded CPU cannot make progress on all its tasks, while the idle CPU wastes cycles.
Load imbalance arises naturally from multiple sources: processes are often created on specific CPUs (e.g., where the parent runs), processes exit unpredictably, I/O completions wake processes on arbitrary CPUs, and user-space thread pools may distribute work unevenly. Even with perfect initial placement, the system drifts toward imbalance over time.
The Push Migration Philosophy
Push migration addresses imbalance through a proactive, sender-initiated approach. The core concept is elegantly simple:
This 'push' terminology reflects the direction of agency: the overloaded processor initiates the migration, pushing work away from itself. This contrasts with 'pull' migration where idle processors request work from busy ones.
Formal Definition
We can define push migration formally as follows:
Push migration is a load balancing mechanism in which a processor P, upon detecting that its local load L(P) exceeds a threshold T, selects one or more tasks from its run queue and transfers them to processors whose load is below T, thereby reducing L(P) and increasing utilization of underloaded processors.
The elegance of this definition belies the complexity of implementation: What constitutes 'load'? How is the threshold determined? Which tasks should be migrated? How do we avoid thrashing? These questions drive the detailed design we explore next.
| Design Aspect | Question | Typical Approaches |
|---|---|---|
| Load Metric | How do we quantify CPU load? | Run queue length, weighted by priority/niceness |
| Threshold | When is a CPU 'overloaded'? | Absolute count, relative to average, percentage above mean |
| Target Selection | Which CPU receives migrated tasks? | Least loaded, round-robin among idle, NUMA-aware selection |
| Task Selection | Which tasks should migrate? | Lowest priority, most recently queued, cache-cold processes |
| Timing | When do we check for imbalance? | Periodic timer interrupt, after scheduling events |
Implementing push migration requires several coordinated components within the operating system's scheduler. Understanding these components is essential for grasping how the abstract concept becomes working code.
Per-CPU Run Queue Infrastructure
The foundation of push migration is the per-CPU run queue. Each processor maintains its own queue of runnable tasks, typically organized as a multi-level structure with different priority levels or scheduling classes.
In Linux, for example, the struct rq (run queue) structure exists for each CPU and contains:
12345678910111213141516171819202122232425
/* Simplified representation of per-CPU run queue components */struct run_queue { /* Core scheduling structures */ struct list_head tasks; /* List of runnable tasks */ unsigned int nr_running; /* Number of runnable tasks */ unsigned int nr_waiting; /* Tasks waiting for I/O */ /* Load tracking for balancing decisions */ unsigned long load_weight; /* Weighted load (priority-adjusted) */ unsigned long avg_load; /* Running average over time */ unsigned long cpu_capacity; /* This CPU's processing capacity */ /* Push migration specific fields */ int overloaded; /* Flag: is this CPU overloaded? */ int push_count; /* Tasks pushed in current interval */ unsigned long last_push_time; /* Timestamp of last push attempt */ /* Locking and synchronization */ spinlock_t lock; /* Protects queue modifications */ int migration_disabled; /* Temporarily prevent migrations */ /* Migration target tracking */ cpumask_t idle_siblings; /* Known idle CPUs in local domain */ int busy_idx; /* Index for scanning busy CPUs */};Load Calculation Subsystem
Accurate load calculation is the cornerstone of effective push migration. The system must answer a deceptively complex question: How busy is this CPU compared to others?
Simple metrics like 'number of runnable tasks' are inadequate because:
Weighted Load Calculation
Modern schedulers use weighted load calculations that account for task priority. Each task contributes a 'load weight' proportional to its scheduling priority:
123456789101112131415161718192021222324252627282930313233343536
/* Priority-to-weight mapping (simplified from Linux CFS) */static const int priority_to_weight[40] = {/* -20 */ 88761, 71755, 56483, 46273, 36291,/* -15 */ 29154, 23254, 18705, 14949, 11916,/* -10 */ 9548, 7620, 6100, 4904, 3906,/* -5 */ 3121, 2501, 1991, 1586, 1277,/* 0 */ 1024, 820, 655, 526, 423,/* 5 */ 335, 272, 215, 172, 137,/* 10 */ 110, 87, 70, 56, 45,/* 15 */ 36, 29, 23, 18, 15,}; /* Calculate weighted load for a run queue */unsigned long calculate_weighted_load(struct run_queue *rq) { unsigned long total_weight = 0; struct task_struct *task; list_for_each_entry(task, &rq->tasks, run_list) { int priority = task->static_prio - 100; /* Normalize to 0-39 */ if (priority < 0) priority = 0; if (priority > 39) priority = 39; total_weight += priority_to_weight[priority]; } return total_weight;} /* Calculate exponential moving average for stability */unsigned long update_load_average(struct run_queue *rq, unsigned long new_load) { /* Classic EMA: new_avg = alpha * new + (1-alpha) * old */ /* Using fixed-point: (new * 4 + old * 12) / 16 */ rq->avg_load = (new_load * 4 + rq->avg_load * 12) >> 4; return rq->avg_load;}Using a running average rather than instantaneous load prevents 'migration thrashing'—where tasks bounce between CPUs responding to momentary fluctuations. The smoothing factor (alpha) balances responsiveness against stability. Too responsive: thrashing. Too stable: slow to correct imbalances.
Overload Detection Mechanism
With load calculated, the next component determines when a CPU is sufficiently overloaded to trigger migration. Several strategies exist:
Absolute Threshold
if (rq->nr_running > PUSH_THRESHOLD) trigger_push();
Simple but inflexible—ignores system-wide load context.
Relative Threshold (Average-Based)
if (rq->avg_load > system_avg_load * 1.25) trigger_push();
Adapts to overall system load but requires cross-CPU coordination.
Imbalance-Based
imbalance = rq->avg_load - busiest_group_avg_load / nr_cpus;
if (imbalance > min_migration_threshold) migrate(imbalance);
Most sophisticated—considers where work would go, not just local overload.
Linux's CFS scheduler uses a hybrid approach, computing 'imbalance' as the weighted difference between scheduling domains and triggering migration when this exceeds a configurable threshold.
With the foundational components in place, we can now examine the push migration algorithm itself. This section presents a production-quality algorithm with all the nuances required for robust operation.
High-Level Algorithm Flow
The push migration process follows a four-phase structure:
Let's examine each phase in detail.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
/* Main push migration entry point - called from timer interrupt */void try_to_push_tasks(struct run_queue *this_rq) { int this_cpu = smp_processor_id(); struct migration_context ctx; int nr_pushed = 0; /* Phase 1: Quick check - any point in pushing? */ if (!should_attempt_push(this_rq)) { return; /* Not overloaded or no migration needed */ } /* Acquire local runqueue lock */ spin_lock(&this_rq->lock); /* Phase 2: Detailed analysis */ if (!calculate_push_imbalance(this_rq, &ctx)) { spin_unlock(&this_rq->lock); return; /* No actionable imbalance */ } /* Phase 3: Find destination CPUs and candidate tasks */ while (ctx.imbalance > 0 && nr_pushed < MAX_PUSH_PER_CYCLE) { struct task_struct *task; int dst_cpu; /* Find a suitable destination CPU */ dst_cpu = find_push_destination(this_rq, &ctx); if (dst_cpu < 0) { break; /* No suitable destinations available */ } /* Select a task to migrate */ task = select_task_for_push(this_rq, dst_cpu, &ctx); if (!task) { break; /* No suitable tasks to migrate */ } /* Phase 4: Execute the migration */ if (migrate_task_to_cpu(task, dst_cpu)) { ctx.imbalance -= task_load_weight(task); nr_pushed++; this_rq->push_count++; } } spin_unlock(&this_rq->lock); /* Update statistics */ update_push_statistics(this_cpu, nr_pushed);}Phase 1: Trigger and Quick Check
Push migration is typically triggered by a periodic timer interrupt (every few milliseconds) or after significant scheduling events. Before proceeding with expensive calculations, a quick check determines if migration is even plausible:
1234567891011121314151617181920212223242526272829303132
/* Quick pre-check to avoid unnecessary work */static bool should_attempt_push(struct run_queue *rq) { /* Need at least 2 runnable tasks to push one away */ if (rq->nr_running < 2) { return false; } /* Check if we've pushed recently (prevent thrashing) */ if (time_before(jiffies, rq->last_push_time + PUSH_MIN_INTERVAL)) { return false; } /* Check if system has any idle CPUs worth pushing to */ if (cpumask_empty(&rq->idle_siblings)) { /* No known idle CPUs - do detailed scan only periodically */ if (!time_to_rescan_idle_cpus(rq)) { return false; } } /* Basic weighted load check */ if (rq->load_weight <= avg_load_per_cpu() * PUSH_THRESHOLD_PCT / 100) { return false; /* Not overloaded relative to average */ } return true; /* Worth doing detailed analysis */} /* Constants governing push behavior */#define PUSH_MIN_INTERVAL (HZ / 10) /* Max 10 push attempts/second */#define PUSH_THRESHOLD_PCT 125 /* 25% above average triggers push */#define MAX_PUSH_PER_CYCLE 2 /* Limit work per cycle */Phase 2: Imbalance Calculation
If the quick check passes, we compute the precise imbalance—the amount of load that should be migrated to achieve balance:
1234567891011121314151617181920212223242526272829303132333435363738394041
/* Detailed imbalance calculation */static bool calculate_push_imbalance(struct run_queue *rq, struct migration_context *ctx) { unsigned long this_load = rq->avg_load; unsigned long target_load; unsigned long system_load = 0; int nr_online_cpus = 0; int cpu; /* Calculate system-wide load */ for_each_online_cpu(cpu) { system_load += per_cpu(runqueues, cpu).avg_load; nr_online_cpus++; } /* Target: each CPU should have average load */ target_load = system_load / nr_online_cpus; /* Imbalance is how much we exceed target */ if (this_load <= target_load) { return false; /* We're at or below average - no push needed */ } ctx->imbalance = this_load - target_load; /* Apply minimum threshold to prevent trivial migrations */ if (ctx->imbalance < MIN_PUSH_IMBALANCE) { return false; } /* Adjust for migration cost - don't push if benefit is marginal */ if (ctx->imbalance < estimated_migration_cost()) { return false; } ctx->this_load = this_load; ctx->target_load = target_load; ctx->system_load = system_load; return true;}Every migration has a cost: cache invalidation, TLB flushes, memory bandwidth for moving task state. Push migration must only occur when the expected benefit (reduced imbalance) exceeds this cost. Without this check, aggressive pushing can decrease overall throughput despite achieving better balance on paper.
Phase 3: Destination and Task Selection
With imbalance quantified, we must select which task to push and where to push it. These decisions profoundly affect migration effectiveness:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
/* Find the best destination CPU for pushing a task */static int find_push_destination(struct run_queue *src_rq, struct migration_context *ctx) { int best_cpu = -1; unsigned long best_capacity = 0; int cpu; /* Priority 1: Check for any idle CPUs in same NUMA node */ for_each_cpu(cpu, &src_rq->idle_siblings) { if (cpu_is_same_numa_node(src_rq, cpu)) { struct run_queue *dst_rq = &per_cpu(runqueues, cpu); if (dst_rq->nr_running == 0) { /* Perfect match: idle CPU on same NUMA node */ return cpu; } } } /* Priority 2: Any idle CPU (cross-NUMA if necessary) */ for_each_cpu(cpu, &src_rq->idle_siblings) { struct run_queue *dst_rq = &per_cpu(runqueues, cpu); if (dst_rq->nr_running == 0) { return cpu; } } /* Priority 3: Find least-loaded CPU that's below average */ for_each_online_cpu(cpu) { struct run_queue *dst_rq = &per_cpu(runqueues, cpu); unsigned long available_capacity; if (cpu == smp_processor_id()) { continue; /* Don't push to self */ } /* Skip CPUs at or above average load */ if (dst_rq->avg_load >= ctx->target_load) { continue; } /* Calculate how much load this CPU can accept */ available_capacity = ctx->target_load - dst_rq->avg_load; if (available_capacity > best_capacity) { best_capacity = available_capacity; best_cpu = cpu; } } return best_cpu;} /* Select a task suitable for migration to given destination */static struct task_struct *select_task_for_push(struct run_queue *rq, int dst_cpu, struct migration_context *ctx) { struct task_struct *best_task = NULL; struct task_struct *task; int best_score = INT_MIN; list_for_each_entry(task, &rq->tasks, run_list) { int score = 0; /* Skip tasks that cannot migrate */ if (!task_can_migrate(task, dst_cpu)) { continue; } /* Prefer cache-cold tasks (haven't run recently) */ if (task_cache_cold(task)) { score += 100; } /* Prefer tasks with weak CPU affinity */ if (task_has_weak_affinity(task)) { score += 50; } /* Prefer lower priority tasks (less latency-sensitive) */ score += (MAX_PRIO - task->prio); /* Prefer tasks whose load contribution matches our needs */ if (task_load_weight(task) <= ctx->imbalance * 2) { score += 25; /* Right size for our imbalance */ } if (score > best_score) { best_score = score; best_task = task; } } return best_task;} /* Check if a task can legally migrate to a given CPU */static bool task_can_migrate(struct task_struct *task, int dst_cpu) { /* Check CPU affinity mask */ if (!cpumask_test_cpu(dst_cpu, &task->cpus_allowed)) { return false; } /* Check if task requested migration disabled */ if (task->migration_disabled) { return false; } /* Currently running tasks cannot migrate */ if (task_running(task)) { return false; } /* Kernel threads with CPU bindings */ if (task_is_bound_kthread(task)) { return false; } return true;}Phase 4: Migration Execution
Once destination and task are selected, the actual migration transfers the task's scheduling state:
1234567891011121314151617181920212223242526272829303132333435363738394041
/* Execute the migration of a task to a new CPU */static bool migrate_task_to_cpu(struct task_struct *task, int dst_cpu) { struct run_queue *src_rq = task_rq(task); struct run_queue *dst_rq = &per_cpu(runqueues, dst_cpu); /* Double-lock ordering to prevent deadlock */ /* Always lock lower-numbered CPU first */ if (dst_cpu < smp_processor_id()) { spin_lock(&dst_rq->lock); } /* Dequeue from source */ dequeue_task(src_rq, task); /* Update task's CPU assignment */ task->cpu = dst_cpu; /* Enqueue on destination */ enqueue_task(dst_rq, task); /* Update load tracking */ src_rq->load_weight -= task_load_weight(task); dst_rq->load_weight += task_load_weight(task); src_rq->nr_running--; dst_rq->nr_running++; /* Release destination lock if we took it */ if (dst_cpu < smp_processor_id()) { spin_unlock(&dst_rq->lock); } /* Record migration for statistics */ record_migration(smp_processor_id(), dst_cpu, task); /* Send IPI to wake destination CPU if it was idle */ if (need_resched_cpu(dst_cpu)) { send_reschedule_ipi(dst_cpu); } return true;}Modern multiprocessor systems are typically organized as Non-Uniform Memory Access (NUMA) architectures, where memory access latency varies based on the physical relationship between CPUs and memory controllers. This architectural reality profoundly impacts push migration strategies.
The NUMA Challenge
In NUMA systems, migrating a task to a distant CPU can significantly degrade that task's performance:
Naive push migration that ignores NUMA topology can actually decrease system throughput despite achieving better load balance.
| Approach | Load Balance | Memory Locality | Overall Throughput |
|---|---|---|---|
| NUMA-Blind | Optimal | Severely degraded | May decrease 20-40% |
| Local-Only | Suboptimal | Preserved | Limited improvement |
| NUMA-Aware Hybrid | Near-optimal | Mostly preserved | Best overall results |
Hierarchical Domain Organization
To handle NUMA effectively, schedulers organize CPUs into hierarchical scheduling domains. Each domain represents a set of CPUs that share some architectural characteristic:
NUMA Node 0 NUMA Node 1
+-----------+ +-----------+
| Core 0-3 |<------>| Core 4-7 |
| L3 Cache | QPI | L3 Cache |
| DDR4 Bank | | DDR4 Bank |
+-----------+ +-----------+
| |
v v
SMT Domain SMT Domain
(Core siblings) (Core siblings)
NUMA-Aware Push Strategy
With domain hierarchy in place, push migration follows a domain-aware strategy:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
/* NUMA-aware destination selection for push migration */static int find_numa_aware_destination(struct run_queue *src_rq, struct migration_context *ctx) { int src_cpu = rq_cpu(src_rq); struct sched_domain *sd; int best_cpu = -1; /* Walk up the domain hierarchy, starting local */ for_each_domain(src_cpu, sd) { struct sched_group *sg = sd->groups; int found_in_domain = false; /* Scan scheduling groups within this domain */ do { int cpu; for_each_cpu(cpu, sched_group_cpus(sg)) { struct run_queue *dst_rq = &per_cpu(runqueues, cpu); if (cpu == src_cpu) continue; /* Check if this CPU can accept work */ if (dst_rq->avg_load < ctx->target_load) { /* At lower domains (closer), prefer any available */ /* At higher domains (NUMA), require significant benefit */ unsigned long domain_cost = domain_migration_cost(sd); unsigned long benefit = ctx->imbalance; if (benefit > domain_cost) { best_cpu = cpu; found_in_domain = true; /* Prefer idle CPUs */ if (dst_rq->nr_running == 0) { return cpu; /* Perfect match at this level */ } } } } sg = sg->next; } while (sg != sd->groups); /* If we found a candidate at this level, use it */ /* Prevents unnecessary promotion to higher (costlier) domains */ if (found_in_domain && best_cpu >= 0) { return best_cpu; } } return best_cpu; /* May be -1 if no suitable destination */} /* Estimate cost of migrating across a scheduling domain */static unsigned long domain_migration_cost(struct sched_domain *sd) { /* Base cost increases with domain level */ unsigned long base_cost = sd->level * 1000; /* NUMA domains have additional memory latency cost */ if (sd->flags & SD_NUMA) { base_cost += numa_remote_access_penalty(); } /* Factor in domain's historical migration success rate */ base_cost = base_cost * sd->migration_success_rate / 100; return base_cost;}The scheduling domain hierarchy encodes migration costs implicitly. SMT siblings share L1 cache—nearly free migration. Same-socket cores share L3—cheap migration. Same-node CPUs share memory—moderate cost. Cross-node requires interconnect—expensive. Push migration uses this hierarchy to make cost-aware decisions.
Linux's Completely Fair Scheduler (CFS) provides a production-grade implementation of push migration that has been refined over two decades of deployment on systems ranging from smartphones to supercomputers. Examining its design reveals practical solutions to the theoretical challenges we've discussed.
CFS Load Balancing Overview
CFS integrates push migration into a broader load balancing framework triggered by timer interrupts. The run_rebalance_domains() function, invoked periodically, walks the scheduling domain hierarchy and calls load_balance() for domains requiring intervention.
Key CFS Concepts for Push
Load Weight — CFS uses PELT (Per-Entity Load Tracking) to compute temporally-decayed load for each task, accounting for both recent CPU usage and historical patterns.
Scheduling Domains — Hierarchical CPU groupings (SMT → Core → Socket → NUMA) with per-domain balance intervals.
Busiest Queue Detection — Before pushing, CFS identifies the busiest CPU/group, ensuring push actions actually improve balance.
Migration Throttling — Rate limits on migrations prevent oscillation.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
/* Simplified representation of CFS load balancing logic *//* See kernel/sched/fair.c for full implementation */ /* Main load balance function called for each scheduling domain */static int load_balance(struct lb_env *env) { struct run_queue *busiest; unsigned long imbalance; int ld_moved = 0; /* Find the busiest group in this scheduling domain */ struct sched_group *busiest_group = find_busiest_group(env); if (!busiest_group) { return 0; /* No imbalance at group level */ } /* Find the busiest run queue within that group */ busiest = find_busiest_queue(env, busiest_group); if (!busiest) { return 0; /* No specific CPU to balance from */ } /* Calculate how much load to move */ imbalance = calculate_imbalance(env, busiest_group); if (imbalance == 0) { return 0; /* Imbalance below threshold */ } /* Attempt to migrate tasks from busiest to local CPU */ env->src_rq = busiest; env->dst_rq = this_rq(); env->imbalance = imbalance; /* The actual migration loop */ while (env->imbalance > 0) { struct task_struct *p; /* Select a task that can migrate */ p = detach_one_task(env); if (!p) { break; /* No more migratable tasks */ } /* Attach the task to local run queue */ attach_one_task(env->dst_rq, p); ld_moved++; env->imbalance -= task_load(p); } return ld_moved;} /* PELT-based load calculation (simplified) */static unsigned long task_load(struct task_struct *p) { /* Load is decayed average of CPU demand */ return p->se.load.weight * p->se.avg.load_avg / LOAD_AVG_MAX;}Balance Intervals by Domain Level
CFS adjusts push migration frequency based on domain level, balancing responsiveness against overhead:
| Domain Level | Balance Interval | Rationale |
|---|---|---|
| SMT (Hyperthreads) | 4ms | Same core—nearly free migration |
| MC (Multi-core) | 8ms | Same socket—low cost |
| DIE (Same die) | 16ms | Shared L3—moderate cost |
| NUMA (Cross-node) | 64ms | High latency—cautious balancing |
CFS actually implements both push and pull, but in a unified framework. The 'push' perspective occurs when code running on CPU A considers sending tasks away. The 'pull' perspective occurs when idle CPU B looks for work to take. CFS's load_balance() can be viewed as 'push from busiest to local' or 'pull from busiest to local' depending on which CPU initiates.
Understanding when push migration helps—and when it hurts—requires analyzing its performance characteristics under different workload patterns.
Beneficial Scenarios
Push migration provides significant benefits in scenarios with:
Detrimental Scenarios
Push migration can hurt performance when:
| Overhead Source | Typical Cost | Mitigation Strategy |
|---|---|---|
| Run queue lock contention | 1-10 μs | Per-CPU queues, lock-free checking |
| Task state copying | 0.5-2 μs | Minimal state transfer |
| Cache invalidation (L1/L2) | 100-500 cycles | Prefer cache-cold tasks |
| TLB flush on new CPU | 50-200 cycles | Batch migrations |
| Memory controller switch (NUMA) | 100-300 ns additional latency | NUMA-aware selection |
| IPI for destination wakeup | 1-5 μs | Coalesce with other IPIs |
Quantifying the Benefit
The net benefit of push migration can be modeled as:
Benefit = (Imbalance_Reduction × Task_Throughput_Gain) - Migration_Cost
Where:
For a task that would wait 50ms in queue on an overloaded CPU:
For a task that would wait 200μs:
For a task with 1ms remaining lifetime:
Production schedulers often track task 'cache footprint' or 'run time since last migration'. Tasks with warm caches or short expected remaining runtime are skipped for migration. This simple heuristic prevents the most common anti-pattern: migrating a task that was about to complete anyway.
Push migration represents a fundamental technique in the operating system's arsenal for extracting maximum performance from multiprocessor hardware. Let's consolidate the key insights from this comprehensive exploration:
Connecting to the Broader Picture
Push migration is one half of the load balancing equation. In the next page, we'll explore pull migration—the complementary approach where idle processors actively seek work from busy ones. Together, push and pull form a complete solution for maintaining balance across diverse workload patterns.
You now possess deep understanding of push migration—from conceptual foundations through production implementation. You can reason about when push migration helps, when it hurts, and how operating systems tune its behavior across different hardware topologies. Next, we examine the complementary pull migration mechanism.