Sjf And Srtf - Learning Module

Loading content...

0/227

Prediction Challenges

The Prediction Problem

We have proven that SJF is optimal for minimizing average waiting time—if we know the CPU burst lengths. But here lies the fundamental challenge: the operating system cannot know the future.

When a process enters the ready queue, the scheduler doesn't know how long it will use the CPU before voluntarily yielding or blocking for I/O. This isn't a technical limitation we can engineer around—it's a fundamental property of computation. A process might execute 3 milliseconds or 3 seconds; we won't know until it finishes.

This transforms SJF from a theoretical guarantee to a practical approximation. The scheduler must predict the next CPU burst based on available information—typically the process's past behavior. The quality of these predictions directly determines how close we get to SJF's theoretical optimality.

What You Will Master

By completing this page, you will understand why burst prediction is challenging, master the exponential averaging technique used in real schedulers, comprehend the tradeoffs in prediction parameters, and recognize the practical limits of prediction accuracy in production systems.

Why Burst Prediction is Fundamentally Hard

Before diving into prediction techniques, we must understand why this problem resists simple solutions.

The Halting Problem Connection

At its theoretical extreme, predicting exact burst times is related to the Halting Problem—one of computer science's fundamental impossibilities. If we could perfectly predict how long any code would run, we could solve the Halting Problem by checking if the predicted runtime is finite.

Of course, practical scheduling doesn't need perfect prediction. But this connection explains why prediction must rely on heuristics rather than exact computation.

Process Behavior Variability

Even for the same process, CPU bursts vary dramatically:

Variability Source	Example	Impact
Input-dependent branches	Loop iterations vary with data size	10x - 1000x variation
Cache effects	First access vs cached access	10x - 100x difference
Paging/swapping	Memory in RAM vs on disk	100,000x difference
Contention	Shared resources, locks	Unpredictable delays
Workload phases	Initialization vs steady state	Orders of magnitude
User interaction	Typing speed, think time	Arbitrary variation

The Prediction Accuracy Paradox

Ironically, processes that would benefit most from SJF (high variance in burst times) are also the hardest to predict. If all bursts were similar, prediction would be trivial—but SJF would offer no advantage over FCFS. The value of SJF comes precisely in scenarios where prediction is most difficult.

What Information is Available?

The scheduler has limited information for prediction:

Process History — Past CPU bursts for this specific process
Process Type — Interactive vs batch, system vs user
Program Metadata — Executable name, arguments (rarely used)
System State — Current load, memory pressure, I/O queue lengths
User Hints — Priority classes, nice values (indirect)

Most schedulers rely primarily on process history—the assumption that future behavior resembles past behavior. This is often (but not always) true.

Historical Prediction Approaches

Early operating systems explored various prediction strategies before settling on exponential averaging.

Simple Average

The most intuitive approach: predict the next burst as the average of all previous bursts.

τₙ₊₁ = (1/n) · Σᵢ₌₁ⁿ tᵢ

Where tᵢ is the i-th observed burst.

Problems:

Equal weight to all history—ancient behavior counts as much as recent
Doesn't adapt quickly to changed behavior
Requires storing entire burst history (unbounded memory)

Moving Average

Consider only the last k bursts:

τₙ₊₁ = (1/k) · Σᵢ₌ₙ₋ₖ₊₁ⁿ tᵢ

Improvements:

Adapts to recent behavior
Bounded memory (store last k values)

Remaining problems:

Abrupt changes when old values "age out"
All recent values weighted equally
k is a fixed parameter—may not suit all processes

Comparison of Simple Prediction Methods
Method	Memory	Adaptation Speed	Smoothness	Suitability
Last Value	O(1)	Instant	Very poor (noisy)	Rarely used alone
Simple Average	O(n) or running sum	Very slow	Very smooth	Static workloads
Moving Average (k=5)	O(k)	Moderate	Moderate	Regular patterns
Weighted Average	O(n)	Configurable	Configurable	Rarely implemented
Exponential Average	O(1)	Configurable	Very smooth	Standard choice

The Last Value Problem

Using only the last burst as the prediction (τₙ₊₁ = tₙ) is maximally responsive but produces noisy, oscillating predictions. A single anomalous burst completely determines the prediction. This is why smoothing—combining history with recent values—is essential.

Exponential Averaging: The Standard Technique

Exponential averaging (also called exponential smoothing or exponentially weighted moving average - EWMA) is the standard technique for CPU burst prediction. It elegantly balances recent observations against historical patterns.

The Formula

τₙ₊₁ = α · tₙ + (1 - α) · τₙ

Where:

τₙ₊₁ = predicted next CPU burst
tₙ = most recent actual CPU burst
τₙ = previous prediction
α = smoothing factor (0 < α ≤ 1)

Intuitive Interpretation

Each new prediction is a weighted blend of:

The latest observation (weight α)
What we predicted before (weight 1-α)

If α = 0.5:

New prediction = 50% latest burst + 50% previous prediction

If α = 0.2:

New prediction = 20% latest burst + 80% previous prediction (more smoothing)

If α = 0.8:

New prediction = 80% latest burst + 20% previous prediction (more responsive)

Why "Exponential"?

Expand the recurrence: τₙ₊₁ = α·tₙ + (1-α)·α·tₙ₋₁ + (1-α)²·α·tₙ₋₂ + ... The weight on the k-th past burst is α(1-α)ᵏ, which decreases exponentially with age. Hence the name exponential averaging.

Expansion of the Formula

Let's trace how historical bursts contribute to the current prediction:

τₙ₊₁ = α·tₙ + (1-α)·τₙ τₙ₊₁ = α·tₙ + (1-α)·[α·tₙ₋₁ + (1-α)·τₙ₋₁] τₙ₊₁ = α·tₙ + α(1-α)·tₙ₋₁ + (1-α)²·τₙ₋₁

Continuing: τₙ₊₁ = α·tₙ + α(1-α)·tₙ₋₁ + α(1-α)²·tₙ₋₂ + ... + α(1-α)ⁿ⁻¹·t₁ + (1-α)ⁿ·τ₁

The weight on burst tᵢ from i steps ago is: α(1-α)^i

With α = 0.5:

t₀ (current): 0.5
t₁ (1 ago): 0.25
t₂ (2 ago): 0.125
t₃ (3 ago): 0.0625
...

After ~7 bursts, weights are nearly negligible (< 1%).

exponential_averaging.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
#include <stdio.h>
 
/**
 * Exponential Averaging CPU Burst Predictor
 * Demonstrates the core prediction algorithm used in SJF implementations
 */
 
typedef struct {
    double alpha;           // Smoothing factor (0 < alpha <= 1)
    double prediction;      // Current predicted burst time
    double initial_guess;   // Used for first prediction
    int burst_count;        // Number of bursts observed
} BurstPredictor;
 
/**
 * Initialize a new predictor
 * 
 * @param alpha - Smoothing factor. Higher = more responsive to recent values.
 *                Typical values: 0.2 (smooth), 0.5 (balanced), 0.8 (responsive)
 * @param initial_guess - Prediction for first burst (no history available)
 *                        Often set to system average or typical burst length
 */
void predictor_init(BurstPredictor *pred, double alpha, double initial_guess) {
    pred->alpha = alpha;
    pred->prediction = initial_guess;
    pred->initial_guess = initial_guess;
    pred->burst_count = 0;
}
 
/**
 * Get the current prediction for next burst
 */
double predictor_get(BurstPredictor *pred) {
    return pred->prediction;
}
 
/**
 * Update prediction after observing an actual burst
 * 
 * @param actual_burst - The actual CPU burst time just observed
 * 
 * Formula: τₙ₊₁ = α × tₙ + (1 - α) × τₙ
 */
void predictor_update(BurstPredictor *pred, double actual_burst) {
    pred->burst_count++;
    
    // Exponential averaging formula
    pred->prediction = pred->alpha * actual_burst + 
                       (1.0 - pred->alpha) * pred->prediction;
}
 
/**
 * Calculate prediction error
 */
double prediction_error(double predicted, double actual) {
    double error = predicted - actual;
    return error < 0 ? -error : error;  // Absolute error
}
 
/**
 * Demonstration: Trace prediction through a sequence of bursts
 */
void demonstrate_exponential_averaging() {
    // Actual burst sequence (observe the pattern shift around burst 5)
    double bursts[] = {10, 12, 11, 13, 10,    // Phase 1: ~10-13ms
                       25, 28, 24, 26, 27,    // Phase 2: ~24-28ms (shift!)
                       24, 25, 26, 25, 24};   // Phase 3: stable ~25ms
    int n = sizeof(bursts) / sizeof(bursts[0]);
    
    printf("Exponential Averaging Demonstration\n");
    printf("====================================\n\n");
    
    // Compare different alpha values
    double alphas[] = {0.2, 0.5, 0.8};
    
    for (int a = 0; a < 3; a++) {
        BurstPredictor pred;
        predictor_init(&pred, alphas[a], 10.0);  // Initial guess: 10ms
        
        double total_error = 0;
        
        printf("\nα = %.1f (", alphas[a]);
        if (alphas[a] < 0.4) printf("smooth/slow adaptation):\n");
        else if (alphas[a] > 0.6) printf("responsive/fast adaptation):\n");
        else printf("balanced):\n");
        
        printf("%-8s %-12s %-12s %-12s\n", 
               "Burst#", "Actual", "Predicted", "Error");
        printf("─────────────────────────────────────────\n");
        
        for (int i = 0; i < n; i++) {
            double predicted = predictor_get(&pred);
            double actual = bursts[i];
            double error = prediction_error(predicted, actual);
            total_error += error;
            
            printf("%-8d %-12.2f %-12.2f %-12.2f\n",
                   i + 1, actual, predicted, error);
            
            predictor_update(&pred, actual);
        }
        
        printf("─────────────────────────────────────────\n");
        printf("Total Error: %.2f  |  Avg Error: %.2f\n",
               total_error, total_error / n);
    }
}
 
int main() {
    demonstrate_exponential_averaging();
    return 0;
}

Choosing the Smoothing Factor (α)

The choice of α fundamentally affects prediction behavior. There is no universally optimal value—it depends on workload characteristics.

Effect of α on Prediction Behavior

Low α (0.1 - 0.3): Heavy Smoothing

•Prediction changes slowly
•Outliers have minimal impact
•Slow to adapt to workload changes
•Best for: stable, predictable processes
•History is "long" (~10-20 bursts matter)

High α (0.7 - 0.9): Light Smoothing

•Prediction changes quickly
•Outliers affect prediction significantly
•Fast adaptation to workload changes
•Best for: dynamic, phase-changing processes
•History is "short" (~2-5 bursts matter)

Effective History Length

We can characterize α by its effective history length—how many past bursts significantly influence the prediction.

The weight on a burst k steps ago is α(1-α)ᵏ. Bursts with weight < 1% are effectively ignored.

α	Weight at k=5	Weight at k=10	Effective Length
0.1	0.59 (59%)	0.35 (35%)	~30+ bursts
0.3	0.17 (17%)	0.03 (3%)	~10 bursts
0.5	0.03 (3%)	0.001 (0.1%)	~7 bursts
0.7	0.005 (0.5%)	~0	~5 bursts
0.9	~0	~0	~2 bursts

Common α Values in Practice

Most systems use α ≈ 0.5 as a reasonable default. This:

Gives ~50% weight to the most recent burst
Provides effective history of ~7 bursts
Balances responsiveness and stability

Some systems use adaptive α that changes based on prediction error—increasing α when predictions are poor (faster adaptation needed).

The α = 0 and α = 1 Edge Cases

If α = 0: τₙ₊₁ = τₙ (prediction never changes—just uses initial guess forever). If α = 1: τₙ₊₁ = tₙ (prediction is always the last burst—no smoothing). Neither extreme is useful; real systems use 0.1 ≤ α ≤ 0.9.

The Cold Start Problem

When a process first arrives, we have no history to base predictions on. This cold start problem requires special handling.

Common Solutions

1. Fixed Initial Guess

Use a constant value for all new processes:

System-wide average burst time
A "typical" value (e.g., 10ms)
Very short value (give new processes a chance to run)

Pros: Simple, predictable Cons: May be very wrong for specific processes

2. Process Type Classification

Different initial guesses based on process characteristics:

Interactive processes: Short initial guess (e.g., 5ms)
Batch processes: Longer initial guess (e.g., 100ms)
System processes: Based on known behavior

Pros: Better accuracy for classified processes Cons: Requires classification mechanism

3. User/Program Hints

Allow programs to declare expected behavior:

Submit estimated runtime with job
Priority classes that imply burst patterns
Linux nice values (indirect hint)

Pros: Potentially accurate if hints are honest Cons: Users may game the system

Cold Start Strategies Comparison
Strategy	Initial τ₁	Accuracy	Fairness	Complexity
System average	~10-50ms	Variable	Neutral	Low
Conservative short	~5ms	Good for interactive	Favors new processes	Low
Conservative long	~100ms	Good for batch	Punishes new processes	Low
Process classification	Varies by class	Good	Varies	Medium
Adaptive (learn quickly)	Short + high α initially	Improves rapidly	Neutral	Medium
User-specified	From job submission	High if honest	Gameable	Low

Modern Approach: Quick Learning

Many modern systems use a short initial guess combined with temporarily high α for new processes. This gives new processes a chance to run (short guess) and quickly converges to accurate predictions (high initial α). After a few bursts, α returns to its normal value.

Prediction Accuracy and Its Limits

How accurate are CPU burst predictions in practice? And how much accuracy do we actually need?

Measuring Prediction Error

Common error metrics:

Mean Absolute Error (MAE): $$MAE = \frac{1}{n}\sum_{i=1}^{n}|\tau_i - t_i|$$

Mean Squared Error (MSE): $$MSE = \frac{1}{n}\sum_{i=1}^{n}(\tau_i - t_i)^2$$

Mean Absolute Percentage Error (MAPE): $$MAPE = \frac{100%}{n}\sum_{i=1}^{n}\left|\frac{\tau_i - t_i}{t_i}\right|$$

Empirical Accuracy

Studies on real workloads show:

Workload Type	MAPE	MAE (typical)	Notes
Stable batch	5-15%	1-3ms	Predictable, regular
Interactive	20-40%	2-10ms	Variable but short
Mixed workload	30-50%	5-20ms	Phase transitions hurt
Highly dynamic	50-100%+	Varies widely	Prediction nearly useless

The Accuracy Threshold

If prediction errors are too large, SJF may order processes incorrectly—scheduling a long job before a short one. Research suggests that MAPE > 50% often degrades SJF performance below FCFS. The prediction must be "good enough" to usually get the ordering right, even if magnitudes are wrong.

Do Magnitudes Matter?

Interestingly, SJF doesn't need exact predictions—it only needs the relative ordering to be correct. If we predict (10, 30, 20) but actuals are (8, 45, 17), the prediction correctly orders P1 < P3 < P2.

Key insight: SJF works well if predictions preserve the ranking of burst times, not their exact values.

This is why exponential averaging works despite significant magnitude errors. As long as "short" processes are predicted shorter than "long" processes, SJF schedules correctly.

When Prediction Fails

Prediction fundamentally fails when:

Workload phase changes — A batch process becomes interactive (or vice versa)
Anomalous bursts — One-time long bursts due to unusual input
First bursts — No history for new processes
System state changes — Memory pressure, I/O contention

In these cases, mispredictions are unavoidable. Robust schedulers include mechanisms (like aging) to recover from prediction failures.

Advanced Prediction Techniques

Beyond basic exponential averaging, researchers have explored more sophisticated prediction methods.

Adaptive α (Self-Tuning)

Adjust α based on prediction accuracy:

Track recent prediction errors
If errors are high, increase α (adapt faster)
If errors are low, decrease α (trust history more)

if prediction_error > threshold:
    α = min(α + 0.1, 0.9)  // More responsive
else:
    α = max(α - 0.05, 0.1)  // More stable

Running Median

Instead of averaging, use the median of recent bursts:

Robust to outliers (a single long burst doesn't skew prediction)
Requires storing recent history
Slower to adapt to genuine changes

Pattern Recognition

Some processes exhibit predictable patterns:

Periodic behavior (e.g., every 10th burst is long)
Correlated bursts (short after I/O, long during computation)

Advanced predictors detect and exploit these patterns.

Advanced Prediction Techniques Comparison
Technique	Complexity	Best For	Overhead
Exponential Averaging	O(1) time, O(1) space	General purpose	Minimal
Adaptive α	O(1) time, O(1) space	Phase-changing workloads	Low
Running Median	O(k) time, O(k) space	Outlier-heavy workloads	Moderate
Bayesian Prediction	O(1) time, O(k) space	Multi-modal distributions	Moderate
Machine Learning	Varies	Complex patterns (research)	High
Pattern Matching	O(k) time, O(k) space	Periodic workloads	Moderate

Production Reality

Despite decades of research, most production schedulers use simple exponential averaging or don't implement SJF at all. The overhead of sophisticated prediction rarely justifies the marginal improvement. Modern schedulers like Linux CFS use a fundamentally different approach—measuring actual runtime rather than predicting future bursts.

Practical Implementation Considerations

Implementing burst prediction in a real scheduler involves several practical considerations.

Where to Store Predictions

Each process needs:

Current prediction (τ)
Possibly α if adaptive
Burst count (for cold start logic)

This data resides in the Process Control Block (PCB), adding ~16-32 bytes per process.

When to Update Predictions

Prediction updates occur when a process:

Voluntarily yields — The burst just ended; we know its length
Blocks for I/O — CPU burst complete before I/O
Is preempted (SRTF) — Record partial burst; more complex

Measuring Burst Length

The kernel must accurately measure CPU burst duration:

Use high-resolution timers (TSC, HPET)
Track when process started/stopped on CPU
Account for interrupts (subtract interrupt time?)

Numerical Stability

With floating-point predictions:

Avoid denormalized numbers (very slow on some CPUs)
Consider fixed-point arithmetic for determinism
Clamp predictions to reasonable bounds (e.g., 1μs to 10s)

pcb_prediction.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
/**
 * PCB extension for SJF with exponential averaging
 * Demonstrates practical implementation structure
 */
 
#include <stdint.h>
#include <limits.h>
 
#define ALPHA_SHIFT 8                    // Fixed-point: α = alpha_fixed / 256
#define DEFAULT_ALPHA (256 * 50 / 100)   // 0.5 in fixed point = 128
#define MIN_PREDICTION_US 100            // 100 microseconds minimum
#define MAX_PREDICTION_US 10000000       // 10 seconds maximum
 
typedef struct {
    // Standard PCB fields...
    int pid;
    int state;
    // ...
    
    // Burst prediction fields
    uint32_t predicted_burst_us;     // τₙ in microseconds (fixed-point avoids floats)
    uint32_t alpha_fixed;            // α × 256 (fixed-point)
    uint64_t burst_start_tsc;        // TSC when current burst started
    uint32_t burst_count;            // Number of bursts observed
    
    // For adaptive α
    uint32_t recent_error_sum;       // Sum of recent |prediction - actual|
    uint32_t recent_burst_count;     // Count for averaging error
    
} ProcessControlBlock;
 
/**
 * Initialize prediction for new process
 */
void init_burst_prediction(ProcessControlBlock *pcb, uint32_t initial_guess_us) {
    pcb->predicted_burst_us = initial_guess_us;
    pcb->alpha_fixed = DEFAULT_ALPHA;
    pcb->burst_count = 0;
    pcb->recent_error_sum = 0;
    pcb->recent_burst_count = 0;
}
 
/**
 * Record burst start (when process is dispatched)
 */
void burst_started(ProcessControlBlock *pcb, uint64_t current_tsc) {
    pcb->burst_start_tsc = current_tsc;
}
 
/**
 * Update prediction when burst ends
 * Uses fixed-point arithmetic for efficiency
 */
void burst_ended(ProcessControlBlock *pcb, uint64_t current_tsc, uint64_t tsc_freq) {
    // Calculate actual burst length in microseconds
    uint64_t tsc_elapsed = current_tsc - pcb->burst_start_tsc;
    uint32_t actual_us = (uint32_t)((tsc_elapsed * 1000000) / tsc_freq);
    
    // Clamp to valid range
    if (actual_us < MIN_PREDICTION_US) actual_us = MIN_PREDICTION_US;
    if (actual_us > MAX_PREDICTION_US) actual_us = MAX_PREDICTION_US;
    
    // Track prediction error for adaptive α
    uint32_t error = (actual_us > pcb->predicted_burst_us) ?
                     (actual_us - pcb->predicted_burst_us) :
                     (pcb->predicted_burst_us - actual_us);
    pcb->recent_error_sum += error;
    pcb->recent_burst_count++;
    
    // Exponential averaging: τ = α·actual + (1-α)·τ
    // Fixed-point: τ = (α_fixed × actual + (256 - α_fixed) × τ) / 256
    uint64_t weighted_actual = (uint64_t)pcb->alpha_fixed * actual_us;
    uint64_t weighted_prev = (uint64_t)(256 - pcb->alpha_fixed) * pcb->predicted_burst_us;
    pcb->predicted_burst_us = (uint32_t)((weighted_actual + weighted_prev) >> 8);
    
    // Clamp result
    if (pcb->predicted_burst_us < MIN_PREDICTION_US) 
        pcb->predicted_burst_us = MIN_PREDICTION_US;
    if (pcb->predicted_burst_us > MAX_PREDICTION_US) 
        pcb->predicted_burst_us = MAX_PREDICTION_US;
    
    pcb->burst_count++;
    
    // Optionally adapt α based on error rate
    // (simplified; real implementation would be more sophisticated)
    if (pcb->recent_burst_count >= 10) {
        uint32_t avg_error = pcb->recent_error_sum / pcb->recent_burst_count;
        if (avg_error > pcb->predicted_burst_us / 2) {
            // High error: increase α for faster adaptation
            if (pcb->alpha_fixed < 230) pcb->alpha_fixed += 10;
        } else {
            // Low error: decrease α for more stability
            if (pcb->alpha_fixed > 50) pcb->alpha_fixed -= 5;
        }
        pcb->recent_error_sum = 0;
        pcb->recent_burst_count = 0;
    }
}
 
/**
 * Get predicted burst for scheduling decisions
 */
uint32_t get_predicted_burst(ProcessControlBlock *pcb) {
    return pcb->predicted_burst_us;
}

Summary: The Prediction Reality

We have explored the depth of CPU burst prediction—the essential bridge between SJF's theoretical optimality and practical implementation.

Key Takeaways

•Prediction is fundamentally impossible to perfect — Future CPU bursts cannot be known exactly; all practical SJF is approximate
•Exponential averaging is the standard technique — Formula: τₙ₊₁ = α·tₙ + (1-α)·τₙ balances responsiveness and stability
•α controls prediction behavior — Low α (smooth, slow adaptation) vs high α (responsive, noisy)
•Cold start requires special handling — New processes need initial guesses until history accumulates
•Ranking matters more than magnitude — SJF works if predictions correctly order jobs, even with magnitude errors
•Prediction accuracy degrades with workload variability — Stable workloads: ~10-20% error; dynamic workloads: 50%+ error
•Advanced techniques exist but rarely justify overhead — Simple exponential averaging remains the practical choice

Looking Ahead:

The next page introduces SRTF (Shortest Remaining Time First)—the preemptive variant of SJF. Unlike non-preemptive SJF, SRTF re-evaluates scheduling decisions when new processes arrive, achieving even better average waiting times at the cost of increased context switching.

Page Complete

You now understand the prediction challenge that transforms SJF from a theoretical guarantee to a practical heuristic. Exponential averaging, despite its simplicity, remains remarkably effective—a testament to the power of well-designed algorithms over complex alternatives.