Loading learning content...
Throughout this module, we've discussed 'load', 'imbalance', and 'overload' without precisely defining how these are measured. This final page addresses the foundational question: How do we quantify load and evaluate balance quality?
The choice of metrics profoundly affects scheduler behavior. A metric that emphasizes queue length produces different decisions than one emphasizing CPU utilization or weighted task priority. Understanding the available metrics—their strengths, limitations, and implementation—enables you to reason about scheduler behavior and make informed tuning decisions.
This page provides a comprehensive exploration of load balancing metrics: from simple counts to sophisticated weighted averages, from instantaneous measurements to temporally-decayed tracking, and from single-CPU metrics to system-wide balance indicators.
By completing this page, you will understand: (1) The fundamental metrics for quantifying CPU load, (2) Per-Entity Load Tracking (PELT) and other decay mechanisms, (3) Imbalance calculation and balance quality indicators, (4) NUMA-aware metrics and capacity-normalized load, and (5) Observability and monitoring of load balance effectiveness.
Before we can balance load, we must define what 'load' means. Several fundamental metrics capture different aspects of system demand.
Run Queue Length
The simplest metric: count the number of runnable tasks on each CPU's queue.
Load(CPU) = count of tasks in RUNNABLE state on CPU's queue
Advantages: Simple, fast to compute, intuitive. Disadvantages: Ignores task priority, overhead, and CPU utilization variations.
12345678910111213141516171819202122232425
/* Run queue length as load metric */ /* Simple count of runnable tasks */unsigned int queue_length_load(struct run_queue *rq) { return rq->nr_running;} /* Check for imbalance using queue length */bool queue_length_imbalanced(void) { int max_length = 0, min_length = INT_MAX; int cpu; for_each_online_cpu(cpu) { unsigned int length = queue_length_load(cpu_rq(cpu)); max_length = max(max_length, (int)length); min_length = min(min_length, (int)length); } /* Imbalanced if max exceeds min by threshold */ return (max_length - min_length) > IMBALANCE_THRESHOLD;} /* Limitation: A CPU with 10 nice +19 tasks appears same as * a CPU with 10 nice -20 tasks, despite vastly different * actual processing demand */Weighted Load
Account for task priority by weighting each task's contribution:
Load(CPU) = Σ weight(task) for each runnable task
Higher-priority tasks contribute more to load, reflecting their greater demand on CPU time.
123456789101112131415161718192021222324252627282930313233343536
/* Priority-weighted load calculation */ /* Linux's nice-to-weight conversion table (simplified) *//* nice 0 = weight 1024, each nice increment ~= 1.25x change */static const int nice_to_weight[40] = {/* -20 */ 88761, 71755, 56483, 46273, 36291,/* -15 */ 29154, 23254, 18705, 14949, 11916,/* -10 */ 9548, 7620, 6100, 4904, 3906,/* -5 */ 3121, 2501, 1991, 1586, 1277,/* 0 */ 1024, 820, 655, 526, 423,/* 5 */ 335, 272, 215, 172, 137,/* 10 */ 110, 87, 70, 56, 45,/* 15 */ 36, 29, 23, 18, 15,}; /* Get weight for a task based on nice value */unsigned long task_weight(struct task_struct *p) { int nice = task_nice(p); /* -20 to +19 */ int idx = nice + 20; /* Convert to 0-39 index */ return nice_to_weight[idx];} /* Calculate weighted load for a run queue */unsigned long weighted_load(struct run_queue *rq) { unsigned long total = 0; struct task_struct *p; list_for_each_entry(p, &rq->tasks, run_list) { total += task_weight(p); } return total;} /* This is what CFS uses as basis for load balancing *//* A nice -20 task contributes ~6000x more than a nice +19 task */| Metric | Accounts For | Complexity | Use Case |
|---|---|---|---|
| Queue Length | Task count only | O(1) | Simple systems, quick checks |
| Weighted Load | Priority differences | O(n) | CFS-style fair scheduling |
| CPU Utilization | Actual time consumed | N/A (sampled) | Performance monitoring |
| Runnable Time | Queue wait time | O(n) | Latency-focused scheduling |
CPU Utilization
Measure actual CPU consumption rather than queue state:
Utilization(CPU) = time_busy / (time_busy + time_idle) over period
Advantages: Captures actual demand, not just potential demand. Disadvantages: Lagging indicator (measures past, not current), doesn't predict future load.
Queue length is a leading indicator—it shows demand waiting to be served. Utilization is a lagging indicator—it shows demand that was served. For proactive balancing, leading indicators are more useful, but utilization helps validate that balancing is working.
Instantaneous load measurements are noisy—a task may run for 1ms then sleep for 99ms. Using instantaneous load would see the CPU as 'fully loaded' during that 1ms. Temporal averaging smooths these fluctuations.
Per-Entity Load Tracking (PELT)
Linux's CFS uses PELT—a sophisticated exponential decay tracking mechanism that computes temporally-weighted averages for each scheduling entity (task or group).
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
/* PELT: Per-Entity Load Tracking */ /* PELT decays load with a half-life of approximately 32ms *//* This means: after 32ms, contribution decays to 50% *//* after 64ms, contribution decays to 25% *//* after 96ms, contribution decays to 12.5% */ #define LOAD_AVG_PERIOD 32 /* ms, decay half-life */#define LOAD_AVG_MAX 47742 /* Maximum load_avg value */ struct sched_avg { /* Running average of runnable time */ unsigned long load_avg; /* Running average of running time */ unsigned long runnable_avg; /* Running average of utilization */ unsigned long util_avg; /* Period tracking for decay */ u64 last_update_time; u32 period_contrib;}; /* Update PELT averages - called on each scheduler tick */void update_load_avg(struct sched_entity *se, struct run_queue *rq, int running) { struct sched_avg *sa = &se->avg; u64 now = rq_clock(rq); u64 delta_time = now - sa->last_update_time; if (delta_time == 0) return; /* Calculate decayed values */ /* Core formula: new_avg = old_avg * decay + contribution * (1 - decay) */ /* Step 1: Decay existing average */ u32 periods = delta_time / LOAD_AVG_PERIOD_NS; if (periods > 0) { /* Apply geometric decay for full periods */ sa->load_avg = decay_load(sa->load_avg, periods); sa->runnable_avg = decay_load(sa->runnable_avg, periods); sa->util_avg = decay_load(sa->util_avg, periods); } /* Step 2: Add contribution from current period */ u32 contrib = calculate_contribution(delta_time, running); if (se->on_rq) { sa->load_avg += contrib * se->load.weight / LOAD_AVG_MAX; sa->runnable_avg += contrib; } if (running) { sa->util_avg += contrib; } sa->last_update_time = now;} /* Geometric decay: load * (1/2)^(periods/32) */static inline unsigned long decay_load(unsigned long load, int periods) { /* Use pre-computed decay values for efficiency */ static const u32 decay_table[32] = { /* decay_table[i] = (1/2)^(i/32) * 2^32 */ 4294967296UL, 4264570326UL, 4234504929UL, /* ... */ }; if (periods >= 2016) return 0; /* Fully decayed */ while (periods >= 32) { load = (load * decay_table[31]) >> 32; periods -= 32; } if (periods > 0) { load = (load * decay_table[periods]) >> 32; } return load;}Why Exponential Decay?
Exponential decay provides several desirable properties:
The Three PELT Metrics
PELT tracks three distinct averages for each entity:
| Metric | What It Tracks | Used For |
|---|---|---|
| load_avg | Priority-weighted runnable time | Load balancing decisions |
| runnable_avg | Total runnable time (unweighted) | Capacity planning |
| util_avg | Actual running time | DVFS (CPU frequency scaling) |
12345678910111213141516171819202122232425262728293031
/* How PELT metrics guide scheduler decisions */ /* Load balancing uses load_avg */unsigned long task_load_for_balancing(struct task_struct *p) { return p->se.avg.load_avg;} /* CPU frequency scaling uses util_avg */unsigned long cpu_util_for_dvfs(int cpu) { return cpu_rq(cpu)->cfs.avg.util_avg;} /* Example: Balance decision based on PELT */bool should_migrate_task(struct task_struct *p, struct run_queue *src_rq, struct run_queue *dst_rq) { unsigned long task_load = task_load_for_balancing(p); unsigned long src_load = src_rq->cfs.avg.load_avg; unsigned long dst_load = dst_rq->cfs.avg.load_avg; /* Migration improves balance if: * (src_load - task_load) closer to dst_load + task_load * than src_load is to dst_load */ unsigned long current_imbalance = abs_diff(src_load, dst_load); unsigned long new_imbalance = abs_diff(src_load - task_load, dst_load + task_load); /* Need minimum improvement to justify migration cost */ return (current_imbalance - new_imbalance) > MIN_BALANCE_IMPROVEMENT;}The 32ms half-life is a compromise. Shorter half-life = more responsive but noisier. Longer half-life = more stable but slower to adapt. Some researchers have proposed adaptive half-life that shortens during high activity and lengthens during stability. Linux allows boot-time adjustment via kernel parameters.
With per-CPU load defined, we can quantify system-wide imbalance. Several formulations capture different aspects of load distribution quality.
Simple Imbalance: Max - Min
The most intuitive definition:
Imbalance = max(Load(CPU)) - min(Load(CPU))
Simple, but ignores intermediate CPUs—doesn't distinguish between 'one CPU overloaded' vs. 'several CPUs overloaded'.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
/* Various imbalance metrics */ /* Simple max-min imbalance */unsigned long max_min_imbalance(void) { unsigned long max_load = 0, min_load = ULONG_MAX; int cpu; for_each_online_cpu(cpu) { unsigned long load = cpu_load(cpu); max_load = max(max_load, load); min_load = min(min_load, load); } return max_load - min_load;} /* Variance-based imbalance - accounts for all CPUs */unsigned long variance_imbalance(void) { unsigned long total = 0, sq_total = 0; int count = 0; int cpu; for_each_online_cpu(cpu) { unsigned long load = cpu_load(cpu); total += load; sq_total += load * load; count++; } unsigned long mean = total / count; unsigned long variance = (sq_total / count) - (mean * mean); return int_sqrt(variance); /* Standard deviation */} /* What CFS actually uses: group-based imbalance */unsigned long cfs_imbalance(struct sched_domain *sd) { struct sched_group *busiest = find_busiest_group(sd); struct sched_group *local = sd->groups; /* Local group */ if (!busiest || busiest == local) { return 0; /* No imbalance */ } unsigned long busiest_avg = busiest->load_avg / busiest->nr_cpus; unsigned long local_avg = local->load_avg / local->nr_cpus; if (busiest_avg <= local_avg) { return 0; /* Local group is busier or equal */ } /* Imbalance is the excess that should move to local */ return (busiest_avg - local_avg) * local->nr_cpus;}Group-Based Imbalance
CFS organizes CPUs into scheduling groups (reflecting NUMA nodes, sockets, etc.) and computes imbalance between groups rather than individual CPUs. This reduces noise from individual CPU fluctuations and matches the migration cost structure—balancing within a group is cheaper than across groups.
Imbalance Threshold
Not every imbalance warrants action. The scheduler defines minimum thresholds below which imbalance is tolerated:
12345678910111213141516171819202122232425262728
/* Imbalance thresholds and action triggers */ /* CFS uses imbalance_pct per scheduling domain */struct sched_domain { /* ... */ unsigned int imbalance_pct; /* 0-1000, representing 0.0%-100.0% */ /* imbalance_pct = 117 means 17% imbalance triggers action */}; /* Check if imbalance exceeds threshold */bool imbalance_exceeds_threshold(struct sched_domain *sd, unsigned long imbalance, unsigned long avg_load) { /* Threshold is percentage-based */ unsigned long threshold = avg_load * sd->imbalance_pct / 100; /* Also enforce minimum absolute threshold */ threshold = max(threshold, sd->min_imbalance); return imbalance > threshold;} /* Typical imbalance_pct values by domain level *//* SMT: 110 (10% imbalance triggers) - cheap migration *//* MC: 125 (25% imbalance triggers) - moderate cost *//* NUMA: 133 (33% imbalance triggers) - expensive migration */ /* The threshold prevents thrashing on minor fluctuations */Imbalance thresholds implicitly provide statistical significance for balance decisions. Small imbalances could be noise (random fluctuation in task behavior). Only when imbalance exceeds the threshold—typically 2-3 standard deviations equivalent—do we act. This is similar to hypothesis testing in statistics.
So far, we've assumed all CPUs have equal capacity. Modern systems increasingly feature heterogeneous CPUs (different speeds, capabilities) requiring capacity-adjusted metrics.
CPU Capacity
Capacity represents a CPU's processing power relative to a baseline:
Capacity(CPU) = (CPU's max throughput) / (reference CPU's max throughput)
A 'big' core in ARM big.LITTLE might have capacity 1024, while a 'LITTLE' core has capacity 512.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
/* CPU capacity and capacity-adjusted load */ /* Per-CPU capacity (normalized, 1024 = standard core) */DEFINE_PER_CPU(unsigned long, cpu_capacity); /* Sources of capacity variation: * 1. Heterogeneous cores (big.LITTLE) * 2. Current CPU frequency (DVFS) * 3. Thermal throttling * 4. Architecture differences */ /* Initialize capacity at boot based on hardware */void init_cpu_capacity(int cpu) { struct cpuinfo *info = &cpu_data(cpu); unsigned long capacity = SCHED_CAPACITY_SCALE; /* 1024 = baseline */ /* Adjust for max frequency relative to fastest CPU */ capacity = capacity * info->max_freq / reference_max_freq; /* Adjust for IPC (instructions per cycle) differences */ if (info->core_type == CORE_LITTLE) { capacity = capacity * LITTLE_IPC_RATIO / 100; } per_cpu(cpu_capacity, cpu) = capacity;} /* Runtime capacity update for frequency changes */void update_cpu_capacity_for_freq(int cpu, unsigned long new_freq) { unsigned long base_capacity = per_cpu(cpu_base_capacity, cpu); unsigned long max_freq = per_cpu(cpu_max_freq, cpu); per_cpu(cpu_capacity, cpu) = base_capacity * new_freq / max_freq; /* Trigger balance reconsideration - capacities changed */ set_balance_needed(cpu);} /* Capacity-normalized load: what fraction of capacity is used? */unsigned long capacity_normalized_load(int cpu) { unsigned long load = cpu_load(cpu); unsigned long capacity = per_cpu(cpu_capacity, cpu); /* normalized_load = load * SCHED_CAPACITY_SCALE / capacity */ return (load << SCHED_CAPACITY_SHIFT) / capacity;}Capacity-Aware Balancing
With capacity known, balancing compares utilized fraction rather than absolute load:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
/* Capacity-aware load balancing */ /* Compare CPUs by utilized fraction, not absolute load */bool cpu_overloaded_for_capacity(int cpu) { unsigned long load = cpu_load(cpu); unsigned long capacity = cpu_capacity_of(cpu); /* Overloaded if load exceeds 80% of capacity */ return (load * 100 / capacity) > CAPACITY_OVERLOAD_PCT;} /* Find best destination accounting for capacity */int find_best_destination_capacity_aware(struct task_struct *p) { int best_cpu = -1; unsigned long best_spare_capacity = 0; int cpu; for_each_cpu(cpu, &p->cpus_allowed) { unsigned long load = cpu_load(cpu); unsigned long capacity = cpu_capacity_of(cpu); unsigned long task_load = task_load_for_balancing(p); /* Would adding this task overload the CPU? */ if (load + task_load > capacity) { continue; /* Would exceed capacity */ } /* Calculate spare capacity after adding task */ unsigned long spare = capacity - (load + task_load); if (spare > best_spare_capacity) { best_spare_capacity = spare; best_cpu = cpu; } } return best_cpu;} /* Energy-aware scheduling: prefer efficient CPUs */int find_energy_efficient_cpu(struct task_struct *p) { int best_cpu = -1; unsigned long best_energy = ULONG_MAX; int cpu; for_each_cpu(cpu, &p->cpus_allowed) { unsigned long energy = estimate_cpu_energy(cpu, p); if (energy < best_energy && cpu_has_capacity(cpu, p)) { best_energy = energy; best_cpu = cpu; } } return best_cpu;}ARM's Energy Aware Scheduler (EAS) integrates with capacity tracking to make energy-optimal placement decisions. Small tasks go to 'LITTLE' cores (lower power). Large tasks go to 'big' cores (faster). This capacity-aware approach extends to mobile and embedded systems where power matters as much as performance.
Non-Uniform Memory Access (NUMA) architectures add another dimension: memory locality. Effective NUMA-aware scheduling requires metrics that capture both CPU load and memory placement.
Memory Placement Score
Track what fraction of a task's memory accesses are local vs. remote:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
/* NUMA-aware load and placement metrics */ struct task_numa_stats { /* Page access tracking */ unsigned long local_faults; /* Pages accessed on local node */ unsigned long remote_faults; /* Pages accessed on remote node */ /* Per-node access counts */ unsigned long faults[MAX_NUMA_NODES]; /* Preferred node based on memory access pattern */ int preferred_node; /* NUMA scanning period */ unsigned long scan_period;}; /* Calculate locality score for a task on a given node */unsigned long task_locality_score(struct task_struct *p, int node) { struct task_numa_stats *numa = &p->numa_stats; unsigned long total_faults = numa->local_faults + numa->remote_faults; if (total_faults == 0) { return 0; /* No data yet */ } unsigned long node_faults = numa->faults[node]; /* Score = fraction of accesses that would be local if on this node */ return (node_faults * NUMA_LOCALITY_MAX) / total_faults;} /* Find best node for a task based on memory access pattern */int find_best_numa_node(struct task_struct *p) { int best_node = numa_node_id(); /* Default to current */ unsigned long best_score = 0; int node; for_each_online_node(node) { unsigned long score = task_locality_score(p, node); if (score > best_score) { best_score = score; best_node = node; } } /* Only prefer different node if significantly better */ if (best_score > task_locality_score(p, numa_node_id()) * 130 / 100) { return best_node; /* 30% improvement threshold */ } return numa_node_id();}Combined CPU + NUMA Metrics
The scheduler must balance CPU load against memory locality. Sometimes the optimal choice is a busier CPU on the right NUMA node:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
/* Combined CPU load and NUMA locality scoring */ struct migration_score { unsigned long cpu_score; /* Lower is better (less loaded) */ unsigned long numa_score; /* Higher is better (more local) */ unsigned long combined_score; /* Overall score */}; /* Calculate combined migration score */void calculate_migration_score(struct task_struct *p, int dst_cpu, struct migration_score *score) { int dst_node = cpu_to_node(dst_cpu); /* CPU score: inverse of load (lower load = higher score) */ unsigned long load = cpu_load(dst_cpu); unsigned long capacity = cpu_capacity_of(dst_cpu); score->cpu_score = (capacity - load) * 100 / capacity; /* % spare */ /* NUMA score: locality percentage */ score->numa_score = task_locality_score(p, dst_node); /* Combined score: weighted average */ /* Weight depends on task's memory intensity */ unsigned long mem_weight = task_memory_intensity(p); /* 0-100 */ unsigned long cpu_weight = 100 - mem_weight; score->combined_score = (score->cpu_score * cpu_weight + score->numa_score * mem_weight) / 100;} /* Find best CPU considering both load and NUMA */int find_best_cpu_numa_aware(struct task_struct *p) { int best_cpu = task_cpu(p); /* Default: stay put */ struct migration_score best_score = { 0 }; int cpu; calculate_migration_score(p, best_cpu, &best_score); for_each_cpu(cpu, &p->cpus_allowed) { struct migration_score score; calculate_migration_score(p, cpu, &score); if (score.combined_score > best_score.combined_score) { /* Need significant improvement to justify migration */ if (score.combined_score > best_score.combined_score * 115 / 100) { best_score = score; best_cpu = cpu; } } } return best_cpu;}Tracking per-node page faults requires hardware support (NUMA balancing page table modifications) and regular scanning. This overhead is worthwhile for memory-bound workloads but wasteful for CPU-bound ones. Linux's automatic NUMA balancing can be disabled for workloads where it hurts more than helps.
Metrics aren't just for the scheduler—they're essential for operators and developers to understand system behavior. Linux exposes rich scheduling statistics through several interfaces.
/proc/schedstat
Per-CPU and per-domain scheduling statistics:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
#!/bin/bash# Reading and interpreting /proc/schedstat # Format documentation:# cpu<N> <yld_count> <sched_count> <sched_goidle> \# <ttwu_count> <ttwu_local> <sum_exec_runtime> \# <sum_sleep_runtime># domain<N> <balance_count> <balance_failed> \# <push_count> <push_failed> <pull_count> \# <pull_failed> <alb_count> <alb_failed> # View raw schedstatcat /proc/schedstat # Parse key metrics for each CPUawk '/^cpu/ { cpu = $1 sched_count = $2 goidle = $3 ttwu = $4 local_ttwu = $5 local_pct = (local_ttwu * 100 / ttwu) printf "%s: scheduled %d times, went idle %d times, "\ "wakeups: %d (%.1f%% local)\n", cpu, sched_count, goidle, ttwu, local_pct}' /proc/schedstat # Domain-level balance statisticsawk '/^domain/ { domain = $1 balance_count = $2 balance_fail = $3 push = $4 push_fail = $5 pull = $6 pull_fail = $7 if (balance_count > 0) { fail_pct = (balance_fail * 100 / balance_count) printf "%s: %d balance attempts (%.1f%% failed), "\ "push: %d/%d, pull: %d/%d\n", domain, balance_count, fail_pct, push - push_fail, push, pull - pull_fail, pull }}' /proc/schedstatperf sched
The perf tool provides detailed scheduler tracing and analysis:
12345678910111213141516171819202122232425262728293031
#!/bin/bash# Using perf sched for scheduler analysis # Record scheduler events for 10 secondsperf sched record -- sleep 10 # Analyze scheduler latenciesperf sched latency# Output: Task latencies (how long tasks wait in queue)# Per-task breakdown# Max, avg, and distribution # Show per-CPU balance activity perf sched map# Output: Visual timeline of task-to-CPU mapping# Shows migrations as task movements # Migration analysisperf sched migrate# Output: Which tasks migrated, from where to where# Migration frequency per task # Detailed scheduler traceperf sched script# Output: Raw scheduler events with timestamps# sched:sched_switch, sched:sched_wakeup, etc. # Example: Find excessive migrationsperf sched record -g -- ./my_applicationperf sched migrate --sort=migrations# High migration count may indicate poor affinity settingsKey Health Indicators
Monitor these metrics to assess scheduling health:
| Metric | Healthy Range | Problem Indication |
|---|---|---|
| Balance success rate | 80% | Low = excessive failed attempts |
| Local wakeup % | 70% | Low = poor affinity, cache loss |
| Migrations/sec | < 100/sec typical | High = thrashing |
| Run queue latency | < 10ms p99 | High = overload or imbalance |
| Idle time imbalance | < 10% difference | High = load imbalance |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
#!/usr/bin/env python3"""Scheduler health monitoring script""" import subprocessimport refrom collections import defaultdict def parse_schedstat(): """Parse /proc/schedstat for key metrics""" metrics = defaultdict(dict) with open('/proc/schedstat', 'r') as f: for line in f: parts = line.split() if parts[0].startswith('cpu'): cpu = parts[0] metrics[cpu] = { 'schedules': int(parts[1]), 'idle_entries': int(parts[2]), 'wakeups': int(parts[3]), 'local_wakeups': int(parts[4]), } elif parts[0].startswith('domain'): domain = parts[0] metrics[domain] = { 'balance_attempts': int(parts[1]), 'balance_failed': int(parts[2]), 'migrations': int(parts[3]) - int(parts[4]), } return metrics def check_health(metrics): """Evaluate scheduler health from metrics""" issues = [] # Check local wakeup percentage total_wakeups = sum(m.get('wakeups', 0) for m in metrics.values() if 'wakeups' in m) local_wakeups = sum(m.get('local_wakeups', 0) for m in metrics.values() if 'local_wakeups' in m) if total_wakeups > 0: local_pct = local_wakeups * 100 / total_wakeups if local_pct < 70: issues.append(f"Low local wakeup rate: {local_pct:.1f}%") # Check balance success rate for domain, stats in metrics.items(): if 'balance_attempts' in stats and stats['balance_attempts'] > 100: success_rate = 100 - (stats['balance_failed'] * 100 / stats['balance_attempts']) if success_rate < 80: issues.append( f"{domain}: Low balance success rate: {success_rate:.1f}%" ) return issues or ["All metrics healthy"] if __name__ == "__main__": metrics = parse_schedstat() for issue in check_health(metrics): print(issue)Integrate scheduler metrics into your monitoring stack (Prometheus, Grafana, etc.). Alert on anomalies like sudden migration spikes or degraded local wakeup rates. These often indicate workload changes or configuration issues that need attention.
Metrics form the foundation of intelligent load balancing—you cannot optimize what you cannot measure. Let's consolidate the key insights from this comprehensive exploration:
Module Complete: Load Balancing
With this page, we've completed our comprehensive exploration of load balancing in multiprocessor systems. You now understand:
This knowledge equips you to reason about scheduler behavior, diagnose performance issues, and make informed tuning decisions across diverse multiprocessor systems.
Congratulations! You've mastered load balancing in multiprocessor operating systems. From mechanism (push/pull/work stealing) through timing (frequency) to measurement (metrics), you possess the conceptual foundation to understand, analyze, and optimize scheduling on modern hardware.