Loading content...
We have proven that SJF is optimal for minimizing average waiting time—if we know the CPU burst lengths. But here lies the fundamental challenge: the operating system cannot know the future.
When a process enters the ready queue, the scheduler doesn't know how long it will use the CPU before voluntarily yielding or blocking for I/O. This isn't a technical limitation we can engineer around—it's a fundamental property of computation. A process might execute 3 milliseconds or 3 seconds; we won't know until it finishes.
This transforms SJF from a theoretical guarantee to a practical approximation. The scheduler must predict the next CPU burst based on available information—typically the process's past behavior. The quality of these predictions directly determines how close we get to SJF's theoretical optimality.
By completing this page, you will understand why burst prediction is challenging, master the exponential averaging technique used in real schedulers, comprehend the tradeoffs in prediction parameters, and recognize the practical limits of prediction accuracy in production systems.
Before diving into prediction techniques, we must understand why this problem resists simple solutions.
At its theoretical extreme, predicting exact burst times is related to the Halting Problem—one of computer science's fundamental impossibilities. If we could perfectly predict how long any code would run, we could solve the Halting Problem by checking if the predicted runtime is finite.
Of course, practical scheduling doesn't need perfect prediction. But this connection explains why prediction must rely on heuristics rather than exact computation.
Even for the same process, CPU bursts vary dramatically:
| Variability Source | Example | Impact |
|---|---|---|
| Input-dependent branches | Loop iterations vary with data size | 10x - 1000x variation |
| Cache effects | First access vs cached access | 10x - 100x difference |
| Paging/swapping | Memory in RAM vs on disk | 100,000x difference |
| Contention | Shared resources, locks | Unpredictable delays |
| Workload phases | Initialization vs steady state | Orders of magnitude |
| User interaction | Typing speed, think time | Arbitrary variation |
Ironically, processes that would benefit most from SJF (high variance in burst times) are also the hardest to predict. If all bursts were similar, prediction would be trivial—but SJF would offer no advantage over FCFS. The value of SJF comes precisely in scenarios where prediction is most difficult.
The scheduler has limited information for prediction:
Most schedulers rely primarily on process history—the assumption that future behavior resembles past behavior. This is often (but not always) true.
Early operating systems explored various prediction strategies before settling on exponential averaging.
The most intuitive approach: predict the next burst as the average of all previous bursts.
τₙ₊₁ = (1/n) · Σᵢ₌₁ⁿ tᵢ
Where tᵢ is the i-th observed burst.
Problems:
Consider only the last k bursts:
τₙ₊₁ = (1/k) · Σᵢ₌ₙ₋ₖ₊₁ⁿ tᵢ
Improvements:
Remaining problems:
| Method | Memory | Adaptation Speed | Smoothness | Suitability |
|---|---|---|---|---|
| Last Value | O(1) | Instant | Very poor (noisy) | Rarely used alone |
| Simple Average | O(n) or running sum | Very slow | Very smooth | Static workloads |
| Moving Average (k=5) | O(k) | Moderate | Moderate | Regular patterns |
| Weighted Average | O(n) | Configurable | Configurable | Rarely implemented |
| Exponential Average | O(1) | Configurable | Very smooth | Standard choice |
Using only the last burst as the prediction (τₙ₊₁ = tₙ) is maximally responsive but produces noisy, oscillating predictions. A single anomalous burst completely determines the prediction. This is why smoothing—combining history with recent values—is essential.
Exponential averaging (also called exponential smoothing or exponentially weighted moving average - EWMA) is the standard technique for CPU burst prediction. It elegantly balances recent observations against historical patterns.
τₙ₊₁ = α · tₙ + (1 - α) · τₙ
Where:
Each new prediction is a weighted blend of:
If α = 0.5:
If α = 0.2:
If α = 0.8:
Expand the recurrence: τₙ₊₁ = α·tₙ + (1-α)·α·tₙ₋₁ + (1-α)²·α·tₙ₋₂ + ... The weight on the k-th past burst is α(1-α)ᵏ, which decreases exponentially with age. Hence the name exponential averaging.
Let's trace how historical bursts contribute to the current prediction:
τₙ₊₁ = α·tₙ + (1-α)·τₙ τₙ₊₁ = α·tₙ + (1-α)·[α·tₙ₋₁ + (1-α)·τₙ₋₁] τₙ₊₁ = α·tₙ + α(1-α)·tₙ₋₁ + (1-α)²·τₙ₋₁
Continuing: τₙ₊₁ = α·tₙ + α(1-α)·tₙ₋₁ + α(1-α)²·tₙ₋₂ + ... + α(1-α)ⁿ⁻¹·t₁ + (1-α)ⁿ·τ₁
The weight on burst tᵢ from i steps ago is: α(1-α)^i
With α = 0.5:
After ~7 bursts, weights are nearly negligible (< 1%).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
#include <stdio.h> /** * Exponential Averaging CPU Burst Predictor * Demonstrates the core prediction algorithm used in SJF implementations */ typedef struct { double alpha; // Smoothing factor (0 < alpha <= 1) double prediction; // Current predicted burst time double initial_guess; // Used for first prediction int burst_count; // Number of bursts observed} BurstPredictor; /** * Initialize a new predictor * * @param alpha - Smoothing factor. Higher = more responsive to recent values. * Typical values: 0.2 (smooth), 0.5 (balanced), 0.8 (responsive) * @param initial_guess - Prediction for first burst (no history available) * Often set to system average or typical burst length */void predictor_init(BurstPredictor *pred, double alpha, double initial_guess) { pred->alpha = alpha; pred->prediction = initial_guess; pred->initial_guess = initial_guess; pred->burst_count = 0;} /** * Get the current prediction for next burst */double predictor_get(BurstPredictor *pred) { return pred->prediction;} /** * Update prediction after observing an actual burst * * @param actual_burst - The actual CPU burst time just observed * * Formula: τₙ₊₁ = α × tₙ + (1 - α) × τₙ */void predictor_update(BurstPredictor *pred, double actual_burst) { pred->burst_count++; // Exponential averaging formula pred->prediction = pred->alpha * actual_burst + (1.0 - pred->alpha) * pred->prediction;} /** * Calculate prediction error */double prediction_error(double predicted, double actual) { double error = predicted - actual; return error < 0 ? -error : error; // Absolute error} /** * Demonstration: Trace prediction through a sequence of bursts */void demonstrate_exponential_averaging() { // Actual burst sequence (observe the pattern shift around burst 5) double bursts[] = {10, 12, 11, 13, 10, // Phase 1: ~10-13ms 25, 28, 24, 26, 27, // Phase 2: ~24-28ms (shift!) 24, 25, 26, 25, 24}; // Phase 3: stable ~25ms int n = sizeof(bursts) / sizeof(bursts[0]); printf("Exponential Averaging Demonstration\n"); printf("====================================\n\n"); // Compare different alpha values double alphas[] = {0.2, 0.5, 0.8}; for (int a = 0; a < 3; a++) { BurstPredictor pred; predictor_init(&pred, alphas[a], 10.0); // Initial guess: 10ms double total_error = 0; printf("\nα = %.1f (", alphas[a]); if (alphas[a] < 0.4) printf("smooth/slow adaptation):\n"); else if (alphas[a] > 0.6) printf("responsive/fast adaptation):\n"); else printf("balanced):\n"); printf("%-8s %-12s %-12s %-12s\n", "Burst#", "Actual", "Predicted", "Error"); printf("─────────────────────────────────────────\n"); for (int i = 0; i < n; i++) { double predicted = predictor_get(&pred); double actual = bursts[i]; double error = prediction_error(predicted, actual); total_error += error; printf("%-8d %-12.2f %-12.2f %-12.2f\n", i + 1, actual, predicted, error); predictor_update(&pred, actual); } printf("─────────────────────────────────────────\n"); printf("Total Error: %.2f | Avg Error: %.2f\n", total_error, total_error / n); }} int main() { demonstrate_exponential_averaging(); return 0;}The choice of α fundamentally affects prediction behavior. There is no universally optimal value—it depends on workload characteristics.
We can characterize α by its effective history length—how many past bursts significantly influence the prediction.
The weight on a burst k steps ago is α(1-α)ᵏ. Bursts with weight < 1% are effectively ignored.
| α | Weight at k=5 | Weight at k=10 | Effective Length |
|---|---|---|---|
| 0.1 | 0.59 (59%) | 0.35 (35%) | ~30+ bursts |
| 0.3 | 0.17 (17%) | 0.03 (3%) | ~10 bursts |
| 0.5 | 0.03 (3%) | 0.001 (0.1%) | ~7 bursts |
| 0.7 | 0.005 (0.5%) | ~0 | ~5 bursts |
| 0.9 | ~0 | ~0 | ~2 bursts |
Most systems use α ≈ 0.5 as a reasonable default. This:
Some systems use adaptive α that changes based on prediction error—increasing α when predictions are poor (faster adaptation needed).
If α = 0: τₙ₊₁ = τₙ (prediction never changes—just uses initial guess forever). If α = 1: τₙ₊₁ = tₙ (prediction is always the last burst—no smoothing). Neither extreme is useful; real systems use 0.1 ≤ α ≤ 0.9.
When a process first arrives, we have no history to base predictions on. This cold start problem requires special handling.
1. Fixed Initial Guess
Use a constant value for all new processes:
Pros: Simple, predictable Cons: May be very wrong for specific processes
2. Process Type Classification
Different initial guesses based on process characteristics:
Pros: Better accuracy for classified processes Cons: Requires classification mechanism
3. User/Program Hints
Allow programs to declare expected behavior:
Pros: Potentially accurate if hints are honest Cons: Users may game the system
| Strategy | Initial τ₁ | Accuracy | Fairness | Complexity |
|---|---|---|---|---|
| System average | ~10-50ms | Variable | Neutral | Low |
| Conservative short | ~5ms | Good for interactive | Favors new processes | Low |
| Conservative long | ~100ms | Good for batch | Punishes new processes | Low |
| Process classification | Varies by class | Good | Varies | Medium |
| Adaptive (learn quickly) | Short + high α initially | Improves rapidly | Neutral | Medium |
| User-specified | From job submission | High if honest | Gameable | Low |
Many modern systems use a short initial guess combined with temporarily high α for new processes. This gives new processes a chance to run (short guess) and quickly converges to accurate predictions (high initial α). After a few bursts, α returns to its normal value.
How accurate are CPU burst predictions in practice? And how much accuracy do we actually need?
Common error metrics:
Mean Absolute Error (MAE): $$MAE = \frac{1}{n}\sum_{i=1}^{n}|\tau_i - t_i|$$
Mean Squared Error (MSE): $$MSE = \frac{1}{n}\sum_{i=1}^{n}(\tau_i - t_i)^2$$
Mean Absolute Percentage Error (MAPE): $$MAPE = \frac{100%}{n}\sum_{i=1}^{n}\left|\frac{\tau_i - t_i}{t_i}\right|$$
Studies on real workloads show:
| Workload Type | MAPE | MAE (typical) | Notes |
|---|---|---|---|
| Stable batch | 5-15% | 1-3ms | Predictable, regular |
| Interactive | 20-40% | 2-10ms | Variable but short |
| Mixed workload | 30-50% | 5-20ms | Phase transitions hurt |
| Highly dynamic | 50-100%+ | Varies widely | Prediction nearly useless |
If prediction errors are too large, SJF may order processes incorrectly—scheduling a long job before a short one. Research suggests that MAPE > 50% often degrades SJF performance below FCFS. The prediction must be "good enough" to usually get the ordering right, even if magnitudes are wrong.
Interestingly, SJF doesn't need exact predictions—it only needs the relative ordering to be correct. If we predict (10, 30, 20) but actuals are (8, 45, 17), the prediction correctly orders P1 < P3 < P2.
Key insight: SJF works well if predictions preserve the ranking of burst times, not their exact values.
This is why exponential averaging works despite significant magnitude errors. As long as "short" processes are predicted shorter than "long" processes, SJF schedules correctly.
Prediction fundamentally fails when:
In these cases, mispredictions are unavoidable. Robust schedulers include mechanisms (like aging) to recover from prediction failures.
Beyond basic exponential averaging, researchers have explored more sophisticated prediction methods.
Adjust α based on prediction accuracy:
if prediction_error > threshold:
α = min(α + 0.1, 0.9) // More responsive
else:
α = max(α - 0.05, 0.1) // More stable
Instead of averaging, use the median of recent bursts:
Some processes exhibit predictable patterns:
Advanced predictors detect and exploit these patterns.
| Technique | Complexity | Best For | Overhead |
|---|---|---|---|
| Exponential Averaging | O(1) time, O(1) space | General purpose | Minimal |
| Adaptive α | O(1) time, O(1) space | Phase-changing workloads | Low |
| Running Median | O(k) time, O(k) space | Outlier-heavy workloads | Moderate |
| Bayesian Prediction | O(1) time, O(k) space | Multi-modal distributions | Moderate |
| Machine Learning | Varies | Complex patterns (research) | High |
| Pattern Matching | O(k) time, O(k) space | Periodic workloads | Moderate |
Despite decades of research, most production schedulers use simple exponential averaging or don't implement SJF at all. The overhead of sophisticated prediction rarely justifies the marginal improvement. Modern schedulers like Linux CFS use a fundamentally different approach—measuring actual runtime rather than predicting future bursts.
Implementing burst prediction in a real scheduler involves several practical considerations.
Each process needs:
This data resides in the Process Control Block (PCB), adding ~16-32 bytes per process.
Prediction updates occur when a process:
The kernel must accurately measure CPU burst duration:
With floating-point predictions:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
/** * PCB extension for SJF with exponential averaging * Demonstrates practical implementation structure */ #include <stdint.h>#include <limits.h> #define ALPHA_SHIFT 8 // Fixed-point: α = alpha_fixed / 256#define DEFAULT_ALPHA (256 * 50 / 100) // 0.5 in fixed point = 128#define MIN_PREDICTION_US 100 // 100 microseconds minimum#define MAX_PREDICTION_US 10000000 // 10 seconds maximum typedef struct { // Standard PCB fields... int pid; int state; // ... // Burst prediction fields uint32_t predicted_burst_us; // τₙ in microseconds (fixed-point avoids floats) uint32_t alpha_fixed; // α × 256 (fixed-point) uint64_t burst_start_tsc; // TSC when current burst started uint32_t burst_count; // Number of bursts observed // For adaptive α uint32_t recent_error_sum; // Sum of recent |prediction - actual| uint32_t recent_burst_count; // Count for averaging error } ProcessControlBlock; /** * Initialize prediction for new process */void init_burst_prediction(ProcessControlBlock *pcb, uint32_t initial_guess_us) { pcb->predicted_burst_us = initial_guess_us; pcb->alpha_fixed = DEFAULT_ALPHA; pcb->burst_count = 0; pcb->recent_error_sum = 0; pcb->recent_burst_count = 0;} /** * Record burst start (when process is dispatched) */void burst_started(ProcessControlBlock *pcb, uint64_t current_tsc) { pcb->burst_start_tsc = current_tsc;} /** * Update prediction when burst ends * Uses fixed-point arithmetic for efficiency */void burst_ended(ProcessControlBlock *pcb, uint64_t current_tsc, uint64_t tsc_freq) { // Calculate actual burst length in microseconds uint64_t tsc_elapsed = current_tsc - pcb->burst_start_tsc; uint32_t actual_us = (uint32_t)((tsc_elapsed * 1000000) / tsc_freq); // Clamp to valid range if (actual_us < MIN_PREDICTION_US) actual_us = MIN_PREDICTION_US; if (actual_us > MAX_PREDICTION_US) actual_us = MAX_PREDICTION_US; // Track prediction error for adaptive α uint32_t error = (actual_us > pcb->predicted_burst_us) ? (actual_us - pcb->predicted_burst_us) : (pcb->predicted_burst_us - actual_us); pcb->recent_error_sum += error; pcb->recent_burst_count++; // Exponential averaging: τ = α·actual + (1-α)·τ // Fixed-point: τ = (α_fixed × actual + (256 - α_fixed) × τ) / 256 uint64_t weighted_actual = (uint64_t)pcb->alpha_fixed * actual_us; uint64_t weighted_prev = (uint64_t)(256 - pcb->alpha_fixed) * pcb->predicted_burst_us; pcb->predicted_burst_us = (uint32_t)((weighted_actual + weighted_prev) >> 8); // Clamp result if (pcb->predicted_burst_us < MIN_PREDICTION_US) pcb->predicted_burst_us = MIN_PREDICTION_US; if (pcb->predicted_burst_us > MAX_PREDICTION_US) pcb->predicted_burst_us = MAX_PREDICTION_US; pcb->burst_count++; // Optionally adapt α based on error rate // (simplified; real implementation would be more sophisticated) if (pcb->recent_burst_count >= 10) { uint32_t avg_error = pcb->recent_error_sum / pcb->recent_burst_count; if (avg_error > pcb->predicted_burst_us / 2) { // High error: increase α for faster adaptation if (pcb->alpha_fixed < 230) pcb->alpha_fixed += 10; } else { // Low error: decrease α for more stability if (pcb->alpha_fixed > 50) pcb->alpha_fixed -= 5; } pcb->recent_error_sum = 0; pcb->recent_burst_count = 0; }} /** * Get predicted burst for scheduling decisions */uint32_t get_predicted_burst(ProcessControlBlock *pcb) { return pcb->predicted_burst_us;}We have explored the depth of CPU burst prediction—the essential bridge between SJF's theoretical optimality and practical implementation.
Looking Ahead:
The next page introduces SRTF (Shortest Remaining Time First)—the preemptive variant of SJF. Unlike non-preemptive SJF, SRTF re-evaluates scheduling decisions when new processes arrive, achieving even better average waiting times at the cost of increased context switching.
You now understand the prediction challenge that transforms SJF from a theoretical guarantee to a practical heuristic. Exponential averaging, despite its simplicity, remains remarkably effective—a testament to the power of well-designed algorithms over complex alternatives.