Operating SystemsCPU Scheduling

Shortest Job First (SJF) & SRTF

LevelIntermediate

Duration75 mins

TopicCPU Scheduling

5 / 5

Starvation Problem

The Dark Side of Optimality

We have established that SJF and SRTF achieve optimal average waiting time. They are mathematically beautiful algorithms with provable properties. Yet neither algorithm sees widespread use in general-purpose operating systems. Why?

The answer lies in a critical flaw: starvation.

Imagine a batch processing job with a 10-minute CPU burst. Every time it's about to run, a short interactive process arrives—a keystroke, a mouse click, a network packet. Under SJF/SRTF, the short process runs first. If short processes arrive frequently enough, the long job never runs. It waits indefinitely, starved of CPU time.

This isn't a theoretical edge case. In real systems with mixed workloads—interactive users alongside batch jobs—starvation is practically guaranteed. The very property that makes SJF optimal (favoring short jobs) creates systematic unfairness against long jobs.

What You Will Master

By completing this page, you will understand the starvation problem's causes and conditions, be able to analyze when starvation will occur, master detection techniques, and learn prevention strategies—particularly the aging mechanism that transforms SJF into a fair algorithm.

Formal Definition of Starvation

Starvation (also called indefinite blocking or livelock in some contexts) occurs when a process waits indefinitely for a resource it needs, despite that resource being available to other processes.

Formal Definition

A scheduling algorithm causes starvation if there exists a valid input (set of processes with arrival times and burst lengths) such that at least one process has unbounded waiting time—it never receives CPU service.

Key Characteristics

Starvation Properties

•Indefinite postponement — The process is ready to run but repeatedly bypassed
•Resource availability — The CPU is not broken or unavailable—it's actively serving other processes
•Systemic unfairness — The algorithm's rules systematically disadvantage certain processes
•Not deadlock — Unlike deadlock, starving processes aren't blocked waiting for each other; they're simply never selected
•Workload-dependent — May not occur under all workloads; requires specific conditions

Starvation vs Deadlock

Starvation and deadlock are distinct problems. Deadlock: a circular wait where processes block each other (e.g., P1 waits for P2, P2 waits for P1). Starvation: a process waits while others proceed—there's no circular dependency. A starving process could run if selected; a deadlocked process cannot run until the cycle breaks.

Starvation vs Deadlock
Aspect	Starvation	Deadlock
Cause	Scheduling policy bias	Circular resource wait
State of victim	Ready (could run)	Blocked (cannot run)
System progress	Other processes proceed	All involved processes blocked
Detection	Timeout-based	Cycle detection in wait graph
Resolution	Priority boost / aging	Preemption / rollback
Algorithm type	Scheduling algorithms	Resource allocation algorithms

Conditions for Starvation in SJF/SRTF

Starvation in SJF/SRTF requires specific conditions. Understanding these conditions helps predict when starvation will (and won't) occur.

Necessary Conditions

Condition 1: Long process exists

At least one process has a burst time significantly longer than others. This process is vulnerable to starvation.

Condition 2: Continuous arrival of short processes

Short processes arrive frequently enough that:'

There's always a short process in the ready queue, OR
A short process arrives before the long one can be selected

Condition 3: Arrival rate exceeds service rate (for short jobs)

Let:

λ_short = arrival rate of short jobs
μ_short = service rate (1 / average short burst)

If λ_short × (average short burst) ≈ 1, the CPU is fully utilized by short jobs, leaving no capacity for long jobs.

The Workload That Guarantees Starvation

Consider: Long process L with burst 100ms. Short processes S1, S2, S3, ... each with 5ms burst, arriving every 4ms. The CPU is always busy with short processes (arriving faster than they complete). L never runs. This pattern is common in interactive systems with background batch jobs.

When Starvation Doesn't Occur

Starvation is impossible under SJF/SRTF if:

Closed system — Fixed set of processes, no new arrivals after initial burst
Sparse arrivals — Short processes arrive slowly enough that long jobs get scheduled
Burst time similarity — All processes have similar burst times (SJF ≈ FCFS)
Long process is shortest — Rare but possible

Probabilistic Analysis

In a queuing model with Poisson arrivals:

If short jobs keep CPU utilization below 80-90%, long jobs eventually run
Above 90% utilization, starvation becomes likely
At 100% short-job utilization, starvation is guaranteed

This is why heavily loaded systems are particularly vulnerable.

Converting Mermaid diagram...

Mathematical Analysis of Starvation

We can formally analyze starvation probability under various workload models.

M/M/1 Queue with Priority Classes

Consider two classes of jobs:

Class 1 (Short): Arrival rate λ₁, service rate μ₁ (mean burst 1/μ₁)
Class 2 (Long): Arrival rate λ₂, service rate μ₂ (mean burst 1/μ₂)

Under strict priority (short always first):

Utilization by short jobs: ρ₁ = λ₁/μ₁

Class 2 (long jobs) expected wait time: $$W_2 = \frac{W_0}{(1-\rho_1)^2}$$

Where W₀ is the wait time without priorities.

Key insight: As ρ₁ → 1, W₂ → ∞

When short jobs consume close to 100% of CPU capacity, long job wait time diverges to infinity—starvation.

Long Job Wait Time vs Short Job Utilization
Short Job Utilization (ρ₁)	Wait Time Multiplier for Long Jobs	Practical Impact
50%	4x	Tolerable
70%	11x	Noticeable delays
80%	25x	Serious delays
90%	100x	Near starvation
95%	400x	Effective starvation
99%	10,000x	Complete starvation

The 80% Rule

System administrators often target 70-80% CPU utilization for interactive systems. The mathematical analysis shows why: above 80%, priority inversion effects become severe. This isn't arbitrary—it's a consequence of queuing theory.

Worst-Case Analysis

In the worst case—adversarial workload designed to cause starvation:

Construction:

Long process L arrives at time 0 with burst B
Short processes arrive at times ε, 2ε, 3ε, ... with burst ε (infinitesimally small)

As ε → 0:

Short processes form a continuous stream
L never becomes the shortest job
L's wait time → ∞

Theorem: SJF and SRTF do not provide bounded waiting time guarantees.

No matter how long L has waited, there exists a workload where it waits longer. This is the formal sense in which starvation is "unbounded."

Detecting Starvation

Detecting starvation requires monitoring process wait times and identifying anomalies.

Timeout-Based Detection

The simplest approach: flag processes that have waited beyond a threshold.

#define STARVATION_THRESHOLD_MS 10000  // 10 seconds

void check_starvation(ReadyQueue* queue, int current_time) {
    for (Process* p : queue) {
        int wait_time = current_time - p->arrival_time;
        if (wait_time > STARVATION_THRESHOLD_MS) {
            printf("WARNING: P%d starving (waited %d ms)\n", 
                   p->pid, wait_time);
            handle_starvation(p);  // Apply remediation
        }
    }
}

Choosing the threshold:

Too low: false positives (normal wait flagged as starvation)
Too high: starvation causes damage before detection
Adaptive: base on historical wait times for similar processes

Statistical Detection

More sophisticated: detect when wait times deviate from expected distributions.

starvation_detection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
/**
 * Advanced Starvation Detection using Wait Time Statistics
 */
 
#include <stdio.h>
#include <math.h>
 
typedef struct {
    double sum_wait;          // Sum of wait times seen
    double sum_wait_squared;  // Sum of squared wait times
    int count;                // Number of processes completed
    double mean;              // Running mean
    double variance;          // Running variance
} WaitTimeStats;
 
void update_stats(WaitTimeStats* stats, double wait_time) {
    stats->count++;
    stats->sum_wait += wait_time;
    stats->sum_wait_squared += wait_time * wait_time;
    
    stats->mean = stats->sum_wait / stats->count;
    if (stats->count > 1) {
        stats->variance = (stats->sum_wait_squared - 
                          (stats->sum_wait * stats->sum_wait) / stats->count) 
                          / (stats->count - 1);
    }
}
 
/**
 * Check if a process's current wait time is anomalous
 * Uses z-score: how many standard deviations from mean
 * 
 * @return 1 if likely starving, 0 otherwise
 */
int is_starving(WaitTimeStats* stats, double current_wait) {
    if (stats->count < 10) {
        // Not enough data; use absolute threshold
        return current_wait > 10000;  // 10 seconds
    }
    
    double std_dev = sqrt(stats->variance);
    if (std_dev < 1.0) std_dev = 1.0;  // Avoid division issues
    
    double z_score = (current_wait - stats->mean) / std_dev;
    
    // Starving if > 3 standard deviations above mean
    return z_score > 3.0;
}
 
/**
 * Alternative: Percentile-based detection
 * Maintain histogram of wait times, flag if > 99th percentile
 */
#define HISTOGRAM_BUCKETS 100
#define MAX_WAIT_MS 60000  // 1 minute max for histogram
 
typedef struct {
    int buckets[HISTOGRAM_BUCKETS];
    int total;
} WaitTimeHistogram;
 
void histogram_add(WaitTimeHistogram* hist, int wait_ms) {
    int bucket = (wait_ms * HISTOGRAM_BUCKETS) / MAX_WAIT_MS;
    if (bucket >= HISTOGRAM_BUCKETS) bucket = HISTOGRAM_BUCKETS - 1;
    hist->buckets[bucket]++;
    hist->total++;
}
 
int get_percentile(WaitTimeHistogram* hist, int percentile) {
    int target = (hist->total * percentile) / 100;
    int cumulative = 0;
    
    for (int i = 0; i < HISTOGRAM_BUCKETS; i++) {
        cumulative += hist->buckets[i];
        if (cumulative >= target) {
            return (i * MAX_WAIT_MS) / HISTOGRAM_BUCKETS;
        }
    }
    return MAX_WAIT_MS;
}
 
int is_starving_percentile(WaitTimeHistogram* hist, int current_wait) {
    int p99 = get_percentile(hist, 99);
    return current_wait > p99;
}

Production Systems

Production operating systems typically don't run explicit starvation detection because they use algorithms that prevent starvation by design (like aging in MLFQ or guaranteed CPU share in CFS). Detection is more useful in research systems or when analyzing scheduler behavior.

Prevention Strategy: Aging

Aging is the primary technique for preventing starvation in priority-based schedulers. The idea is elegant: as a process waits, its priority gradually increases. Eventually, even the lowest-priority process becomes the highest priority.

The Aging Mechanism

In standard SJF, priority = burst length (lower is higher priority).

With aging:

Effective Priority = Burst Length - (Aging Factor × Wait Time)

Or equivalently:

Effective Priority = Burst Length - Age Bonus

Where Age Bonus increases with wait time.

How It Prevents Starvation

Consider a long process L with burst 100ms:

Initially: priority = 100 (low)
After 50ms waiting: priority = 100 - 50 = 50 (medium)
After 100ms waiting: priority = 100 - 100 = 0 (high!)
After 100ms, L has the highest possible priority

No matter how short new arrivals are, L will eventually accumulate enough age bonus to be scheduled.

Aging Example: Long Process (burst=100) vs Short Arrivals (burst=5)
Time (ms)	L's Wait	L's Age Bonus	L's Effective Priority	Scheduled?
0	0	0	100	No (short jobs arriving)
20	20	20	80	No
50	50	50	50	Maybe (depends on arrivals)
80	80	80	20	Likely
100	100	100	0	Yes (highest priority)

sjf_with_aging.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/**
 * SJF with Aging - Starvation-Free Variant
 */
 
#include <stdio.h>
 
typedef struct {
    int pid;
    int arrival_time;
    int burst_time;
    int remaining_time;
    int waiting_since;    // Time when process entered ready queue
} ProcessWithAging;
 
// Aging configuration
#define AGING_FACTOR 1.0  // 1ms of age bonus per 1ms of waiting
#define MAX_AGE_BONUS 1000 // Cap the bonus to prevent overflow
 
/**
 * Calculate effective priority with aging
 * Lower value = higher priority
 */
double calculate_priority(ProcessWithAging* p, int current_time) {
    int wait_time = current_time - p->waiting_since;
    double age_bonus = wait_time * AGING_FACTOR;
    
    if (age_bonus > MAX_AGE_BONUS) {
        age_bonus = MAX_AGE_BONUS;
    }
    
    // Effective priority: burst adjusted by age bonus
    double effective = p->remaining_time - age_bonus;
    
    // Ensure priority doesn't go negative (optional)
    if (effective < 0) effective = 0;
    
    return effective;
}
 
/**
 * Find process with best (lowest) effective priority
 */
int find_best_aged_process(ProcessWithAging procs[], int n, 
                           int current_time, int completed[]) {
    int best_idx = -1;
    double best_priority = 1e9;
    
    for (int i = 0; i < n; i++) {
        if (procs[i].arrival_time <= current_time && !completed[i]) {
            double priority = calculate_priority(&procs[i], current_time);
            
            if (priority < best_priority) {
                best_priority = priority;
                best_idx = i;
            }
        }
    }
    
    return best_idx;
}
 
/**
 * Guarantee: With aging factor α > 0, maximum wait time is bounded by:
 * 
 *   Max Wait ≤ (Burst Length) / α
 * 
 * With α = 1, a process with burst B waits at most B milliseconds
 * before its priority exceeds any new arrival with burst ≤ B.
 * 
 * This provides a formal starvation-freedom guarantee.
 */

Starvation Freedom Guarantee

With aging factor α, the maximum wait time for any process is bounded by (MaxBurst / α). This is a formal guarantee: no process can wait indefinitely. The tradeoff is that SJF's optimality is weakened—we may schedule a long job before a short one to prevent starvation.

Tuning the Aging Parameter

The aging factor α controls the balance between SJF's efficiency and fairness.

Effect of α

Low α (Weak Aging)

•Age bonus accumulates slowly
•Long processes may wait a long time
•Closer to pure SJF behavior
•Better average waiting time
•Risk: near-starvation before rescue

High α (Strong Aging)

•Age bonus accumulates quickly
•Long processes promoted faster
•Deviates more from pure SJF
•Higher average waiting time
•Benefit: strong fairness guarantee

Choosing α

Guidelines for selecting the aging factor:

Based on SLA (Service Level Agreement):

If maximum wait time must be ≤ W_max:

α ≥ max_burst / W_max

Example: If longest job is 60 seconds and max wait must be ≤ 5 minutes:

α ≥ 60,000 / 300,000 = 0.2

Based on workload analysis:

Measure the ratio of long to short burst times
Measure short job arrival rate
Simulate different α values
Choose α that achieves target wait time distribution

Non-Linear Aging

Linear aging (bonus grows linearly with wait) is simplest, but alternatives exist:

Exponential aging: bonus = k × e^(α × wait)

Slow start, rapid growth
Provides "grace period" for short jobs

Quadratic aging: bonus = α × wait²

Similar to exponential: accelerating urgency

Step aging: bonus jumps at thresholds

Process stays at base priority until threshold
Then jumps to high priority
Easy to reason about, common in MLFQ

Practical Recommendation

Start with α = 1.0 (1ms bonus per 1ms wait). This ensures a process with burst B waits at most B additional milliseconds beyond its "fair" share. Adjust based on observed behavior: increase α if long jobs still experience unacceptable delays; decrease if interactive response suffers.

Alternative Prevention Strategies

Besides aging, several other techniques prevent or mitigate starvation.

Priority Ceiling

Set a maximum priority level. No process can have priority better than the ceiling:

effective_priority = max(actual_priority, PRIORITY_CEILING);

This prevents any process from permanently dominating. Limitation: doesn't guarantee progress—just limits advantage.

Fair Queuing

Allocate CPU time proportionally rather than by priority:

Each process class gets a share (e.g., short jobs: 70%, long jobs: 30%)
Within each class, use SJF
Long jobs guaranteed 30% of CPU regardless of short job arrivals

Example: Linux CFS approximates this with vruntime: processes that have run less get scheduling preference.

Round-Robin Fallback

If a process waits beyond threshold, move it to a guaranteed execution queue:

if (wait_time > THRESHOLD) {
    move_to_guaranteed_queue(process);
    // Guaranteed queue uses round-robin
}

Combines SJF efficiency with RR fairness as a safety net.

Starvation Prevention Strategies
Strategy	Mechanism	Pros	Cons
Aging	Increase priority with wait time	Smooth, tunable	Harder to analyze bounds
Priority Ceiling	Cap maximum priority	Simple	Doesn't guarantee progress
Fair Queuing	Proportional CPU shares	Strong fairness	Suboptimal for pure wait time
RR Fallback	Switch algorithm after threshold	Clear guarantee	Abrupt behavior change
Lottery Scheduling	Random selection weighted by priority	Probabilistic fairness	Non-deterministic
MLFQ	Multiple queues with aging	Comprehensive	Complex to configure

MLFQ: The Practical Standard

Multi-Level Feedback Queue (MLFQ) combines many of these ideas: multiple priority queues, priority adjustment based on behavior (aging up, promotion for I/O wait), and guaranteed minimum service at each level. Most modern OS schedulers are variants of MLFQ.

Real-World Starvation Cases

Starvation isn't just theoretical—it causes real problems in production systems.

Case Study 1: Database Batch Jobs

Scenario:

OLTP (Online Transaction Processing) queries: short, high-volume
OLAP (Analytics) queries: long, lower priority

Problem: With priority scheduling favoring OLTP, OLAP queries starve. Reports that should take 5 minutes take hours—or never complete.

Solution: Databases like PostgreSQL use query scheduling with resource pools. OLAP queries get guaranteed resources regardless of OLTP load.

Case Study 2: Network Packet Scheduling

Scenario:

VoIP packets: small, latency-sensitive
File download packets: large, throughput-oriented

Problem: Strict priority for VoIP can starve bulk transfers indefinitely during calls.

Solution: Network QoS uses weighted fair queuing: VoIP gets priority but bulk transfers are guaranteed minimum bandwidth.

Case Study 3: Linux Before CFS

Scenario:

Interactive processes (X server, terminal): receive priority boost
Background processes (compilation, backup): lower priority

Problem: Under heavy interactive load, batch jobs could starve for minutes.

Solution: CFS (introduced in Linux 2.6.23) uses virtual runtime instead of priorities. Every process makes progress proportional to its weight. Starvation is impossible by design.

The Mars Pathfinder Bug

In 1997, the Mars Pathfinder mission experienced system resets caused by priority inversion—a related but distinct problem. A low-priority task held a lock needed by a high-priority task, while medium-priority tasks ran. This wasn't starvation (the high-priority task was blocked, not just unpicked), but it illustrates how priority-based scheduling can fail dramatically.

Summary: Balancing Efficiency and Fairness

We have thoroughly examined the starvation problem—the critical flaw that limits SJF/SRTF's practical applicability.

Key Takeaways

•Starvation is indefinite postponement — A ready process never gets CPU despite the system making progress
•SJF/SRTF cause starvation when short jobs continuously arrive — Long processes are perpetually bypassed
•Starvation probability increases with system utilization — Above 80-90% utilization, starvation becomes likely
•Aging is the primary prevention mechanism — Priority increases with wait time, guaranteeing eventual service
•Aging parameter α controls the efficiency-fairness tradeoff — Higher α = stronger fairness, weaker optimality
•Modern schedulers (CFS, MLFQ) prevent starvation by design — They don't use pure priority/SJF scheduling

Module Conclusion:

Across this module, we've journeyed from SJF's elegant optimality to its practical limitations. We've seen how SRTF improves on SJF through preemption, how burst prediction makes these algorithms usable, and finally, how starvation threatens their applicability.

The lesson is profound: optimality for one metric (average wait time) can cause failure on another (fairness). Real schedulers must balance multiple objectives, which is why production systems use hybrid approaches like MLFQ or CFS rather than pure SJF/SRTF.

Armed with this understanding, you can now analyze any scheduling algorithm through multiple lenses: efficiency, fairness, predictability, and implementation complexity.

Module Complete

Congratulations! You have completed Module 3: Shortest Job First (SJF) & SRTF. You now possess deep understanding of these foundational scheduling algorithms—their optimality, their approximation through prediction, their preemptive variants, and their critical limitation: starvation. This knowledge forms the foundation for understanding more sophisticated schedulers covered in later modules.

5 / 5

Loading learning content...

Operating SystemsCPU Scheduling

Shortest Job First (SJF) & SRTF

LevelIntermediate

Duration75 mins

TopicCPU Scheduling

5 / 5

Starvation Problem

The Dark Side of Optimality

The answer lies in a critical flaw: starvation.

What You Will Master

Formal Definition of Starvation

Formal Definition

Key Characteristics

Starvation Properties

•Indefinite postponement — The process is ready to run but repeatedly bypassed
•Resource availability — The CPU is not broken or unavailable—it's actively serving other processes
•Systemic unfairness — The algorithm's rules systematically disadvantage certain processes
•Not deadlock — Unlike deadlock, starving processes aren't blocked waiting for each other; they're simply never selected
•Workload-dependent — May not occur under all workloads; requires specific conditions

Starvation vs Deadlock

Starvation vs Deadlock
Aspect	Starvation	Deadlock
Cause	Scheduling policy bias	Circular resource wait
State of victim	Ready (could run)	Blocked (cannot run)
System progress	Other processes proceed	All involved processes blocked
Detection	Timeout-based	Cycle detection in wait graph
Resolution	Priority boost / aging	Preemption / rollback
Algorithm type	Scheduling algorithms	Resource allocation algorithms

Conditions for Starvation in SJF/SRTF

Starvation in SJF/SRTF requires specific conditions. Understanding these conditions helps predict when starvation will (and won't) occur.

Necessary Conditions

Condition 1: Long process exists

At least one process has a burst time significantly longer than others. This process is vulnerable to starvation.

Condition 2: Continuous arrival of short processes

Short processes arrive frequently enough that:'

There's always a short process in the ready queue, OR
A short process arrives before the long one can be selected

Condition 3: Arrival rate exceeds service rate (for short jobs)

Let:

λ_short = arrival rate of short jobs
μ_short = service rate (1 / average short burst)

If λ_short × (average short burst) ≈ 1, the CPU is fully utilized by short jobs, leaving no capacity for long jobs.

The Workload That Guarantees Starvation

When Starvation Doesn't Occur

Starvation is impossible under SJF/SRTF if:

Closed system — Fixed set of processes, no new arrivals after initial burst
Sparse arrivals — Short processes arrive slowly enough that long jobs get scheduled
Burst time similarity — All processes have similar burst times (SJF ≈ FCFS)
Long process is shortest — Rare but possible

Probabilistic Analysis

In a queuing model with Poisson arrivals:

If short jobs keep CPU utilization below 80-90%, long jobs eventually run
Above 90% utilization, starvation becomes likely
At 100% short-job utilization, starvation is guaranteed

This is why heavily loaded systems are particularly vulnerable.

Converting Mermaid diagram...

Mathematical Analysis of Starvation

We can formally analyze starvation probability under various workload models.

M/M/1 Queue with Priority Classes

Consider two classes of jobs:

Class 1 (Short): Arrival rate λ₁, service rate μ₁ (mean burst 1/μ₁)
Class 2 (Long): Arrival rate λ₂, service rate μ₂ (mean burst 1/μ₂)

Under strict priority (short always first):

Utilization by short jobs: ρ₁ = λ₁/μ₁

Class 2 (long jobs) expected wait time: $$W_2 = \frac{W_0}{(1-\rho_1)^2}$$

Where W₀ is the wait time without priorities.

Key insight: As ρ₁ → 1, W₂ → ∞

When short jobs consume close to 100% of CPU capacity, long job wait time diverges to infinity—starvation.

Long Job Wait Time vs Short Job Utilization
Short Job Utilization (ρ₁)	Wait Time Multiplier for Long Jobs	Practical Impact
50%	4x	Tolerable
70%	11x	Noticeable delays
80%	25x	Serious delays
90%	100x	Near starvation
95%	400x	Effective starvation
99%	10,000x	Complete starvation

The 80% Rule

Worst-Case Analysis

In the worst case—adversarial workload designed to cause starvation:

Construction:

Long process L arrives at time 0 with burst B
Short processes arrive at times ε, 2ε, 3ε, ... with burst ε (infinitesimally small)

As ε → 0:

Short processes form a continuous stream
L never becomes the shortest job
L's wait time → ∞

Theorem: SJF and SRTF do not provide bounded waiting time guarantees.

No matter how long L has waited, there exists a workload where it waits longer. This is the formal sense in which starvation is "unbounded."

Detecting Starvation

Detecting starvation requires monitoring process wait times and identifying anomalies.

Timeout-Based Detection

The simplest approach: flag processes that have waited beyond a threshold.

#define STARVATION_THRESHOLD_MS 10000  // 10 seconds

void check_starvation(ReadyQueue* queue, int current_time) {
    for (Process* p : queue) {
        int wait_time = current_time - p->arrival_time;
        if (wait_time > STARVATION_THRESHOLD_MS) {
            printf("WARNING: P%d starving (waited %d ms)\n", 
                   p->pid, wait_time);
            handle_starvation(p);  // Apply remediation
        }
    }
}

Choosing the threshold:

Too low: false positives (normal wait flagged as starvation)
Too high: starvation causes damage before detection
Adaptive: base on historical wait times for similar processes

Statistical Detection

More sophisticated: detect when wait times deviate from expected distributions.

starvation_detection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
/**
 * Advanced Starvation Detection using Wait Time Statistics
 */
 
#include <stdio.h>
#include <math.h>
 
typedef struct {
    double sum_wait;          // Sum of wait times seen
    double sum_wait_squared;  // Sum of squared wait times
    int count;                // Number of processes completed
    double mean;              // Running mean
    double variance;          // Running variance
} WaitTimeStats;
 
void update_stats(WaitTimeStats* stats, double wait_time) {
    stats->count++;
    stats->sum_wait += wait_time;
    stats->sum_wait_squared += wait_time * wait_time;
    
    stats->mean = stats->sum_wait / stats->count;
    if (stats->count > 1) {
        stats->variance = (stats->sum_wait_squared - 
                          (stats->sum_wait * stats->sum_wait) / stats->count) 
                          / (stats->count - 1);
    }
}
 
/**
 * Check if a process's current wait time is anomalous
 * Uses z-score: how many standard deviations from mean
 * 
 * @return 1 if likely starving, 0 otherwise
 */
int is_starving(WaitTimeStats* stats, double current_wait) {
    if (stats->count < 10) {
        // Not enough data; use absolute threshold
        return current_wait > 10000;  // 10 seconds
    }
    
    double std_dev = sqrt(stats->variance);
    if (std_dev < 1.0) std_dev = 1.0;  // Avoid division issues
    
    double z_score = (current_wait - stats->mean) / std_dev;
    
    // Starving if > 3 standard deviations above mean
    return z_score > 3.0;
}
 
/**
 * Alternative: Percentile-based detection
 * Maintain histogram of wait times, flag if > 99th percentile
 */
#define HISTOGRAM_BUCKETS 100
#define MAX_WAIT_MS 60000  // 1 minute max for histogram
 
typedef struct {
    int buckets[HISTOGRAM_BUCKETS];
    int total;
} WaitTimeHistogram;
 
void histogram_add(WaitTimeHistogram* hist, int wait_ms) {
    int bucket = (wait_ms * HISTOGRAM_BUCKETS) / MAX_WAIT_MS;
    if (bucket >= HISTOGRAM_BUCKETS) bucket = HISTOGRAM_BUCKETS - 1;
    hist->buckets[bucket]++;
    hist->total++;
}
 
int get_percentile(WaitTimeHistogram* hist, int percentile) {
    int target = (hist->total * percentile) / 100;
    int cumulative = 0;
    
    for (int i = 0; i < HISTOGRAM_BUCKETS; i++) {
        cumulative += hist->buckets[i];
        if (cumulative >= target) {
            return (i * MAX_WAIT_MS) / HISTOGRAM_BUCKETS;
        }
    }
    return MAX_WAIT_MS;
}
 
int is_starving_percentile(WaitTimeHistogram* hist, int current_wait) {
    int p99 = get_percentile(hist, 99);
    return current_wait > p99;
}

Production Systems

Prevention Strategy: Aging

The Aging Mechanism

In standard SJF, priority = burst length (lower is higher priority).

With aging:

Effective Priority = Burst Length - (Aging Factor × Wait Time)

Or equivalently:

Effective Priority = Burst Length - Age Bonus

Where Age Bonus increases with wait time.

How It Prevents Starvation

Consider a long process L with burst 100ms:

Initially: priority = 100 (low)
After 50ms waiting: priority = 100 - 50 = 50 (medium)
After 100ms waiting: priority = 100 - 100 = 0 (high!)
After 100ms, L has the highest possible priority

No matter how short new arrivals are, L will eventually accumulate enough age bonus to be scheduled.

Aging Example: Long Process (burst=100) vs Short Arrivals (burst=5)
Time (ms)	L's Wait	L's Age Bonus	L's Effective Priority	Scheduled?
0	0	0	100	No (short jobs arriving)
20	20	20	80	No
50	50	50	50	Maybe (depends on arrivals)
80	80	80	20	Likely
100	100	100	0	Yes (highest priority)

sjf_with_aging.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/**
 * SJF with Aging - Starvation-Free Variant
 */
 
#include <stdio.h>
 
typedef struct {
    int pid;
    int arrival_time;
    int burst_time;
    int remaining_time;
    int waiting_since;    // Time when process entered ready queue
} ProcessWithAging;
 
// Aging configuration
#define AGING_FACTOR 1.0  // 1ms of age bonus per 1ms of waiting
#define MAX_AGE_BONUS 1000 // Cap the bonus to prevent overflow
 
/**
 * Calculate effective priority with aging
 * Lower value = higher priority
 */
double calculate_priority(ProcessWithAging* p, int current_time) {
    int wait_time = current_time - p->waiting_since;
    double age_bonus = wait_time * AGING_FACTOR;
    
    if (age_bonus > MAX_AGE_BONUS) {
        age_bonus = MAX_AGE_BONUS;
    }
    
    // Effective priority: burst adjusted by age bonus
    double effective = p->remaining_time - age_bonus;
    
    // Ensure priority doesn't go negative (optional)
    if (effective < 0) effective = 0;
    
    return effective;
}
 
/**
 * Find process with best (lowest) effective priority
 */
int find_best_aged_process(ProcessWithAging procs[], int n, 
                           int current_time, int completed[]) {
    int best_idx = -1;
    double best_priority = 1e9;
    
    for (int i = 0; i < n; i++) {
        if (procs[i].arrival_time <= current_time && !completed[i]) {
            double priority = calculate_priority(&procs[i], current_time);
            
            if (priority < best_priority) {
                best_priority = priority;
                best_idx = i;
            }
        }
    }
    
    return best_idx;
}
 
/**
 * Guarantee: With aging factor α > 0, maximum wait time is bounded by:
 * 
 *   Max Wait ≤ (Burst Length) / α
 * 
 * With α = 1, a process with burst B waits at most B milliseconds
 * before its priority exceeds any new arrival with burst ≤ B.
 * 
 * This provides a formal starvation-freedom guarantee.
 */

Starvation Freedom Guarantee

Tuning the Aging Parameter

The aging factor α controls the balance between SJF's efficiency and fairness.

Effect of α

Low α (Weak Aging)

•Age bonus accumulates slowly
•Long processes may wait a long time
•Closer to pure SJF behavior
•Better average waiting time
•Risk: near-starvation before rescue

High α (Strong Aging)

•Age bonus accumulates quickly
•Long processes promoted faster
•Deviates more from pure SJF
•Higher average waiting time
•Benefit: strong fairness guarantee

Choosing α

Guidelines for selecting the aging factor:

Based on SLA (Service Level Agreement):

If maximum wait time must be ≤ W_max:

α ≥ max_burst / W_max

Example: If longest job is 60 seconds and max wait must be ≤ 5 minutes:

α ≥ 60,000 / 300,000 = 0.2

Based on workload analysis:

Measure the ratio of long to short burst times
Measure short job arrival rate
Simulate different α values
Choose α that achieves target wait time distribution

Non-Linear Aging

Linear aging (bonus grows linearly with wait) is simplest, but alternatives exist:

Exponential aging: bonus = k × e^(α × wait)

Slow start, rapid growth
Provides "grace period" for short jobs

Quadratic aging: bonus = α × wait²

Similar to exponential: accelerating urgency

Step aging: bonus jumps at thresholds

Process stays at base priority until threshold
Then jumps to high priority
Easy to reason about, common in MLFQ

Practical Recommendation

Alternative Prevention Strategies

Besides aging, several other techniques prevent or mitigate starvation.

Priority Ceiling

Set a maximum priority level. No process can have priority better than the ceiling:

effective_priority = max(actual_priority, PRIORITY_CEILING);

This prevents any process from permanently dominating. Limitation: doesn't guarantee progress—just limits advantage.

Fair Queuing

Allocate CPU time proportionally rather than by priority:

Each process class gets a share (e.g., short jobs: 70%, long jobs: 30%)
Within each class, use SJF
Long jobs guaranteed 30% of CPU regardless of short job arrivals

Example: Linux CFS approximates this with vruntime: processes that have run less get scheduling preference.

Round-Robin Fallback

If a process waits beyond threshold, move it to a guaranteed execution queue:

if (wait_time > THRESHOLD) {
    move_to_guaranteed_queue(process);
    // Guaranteed queue uses round-robin
}

Combines SJF efficiency with RR fairness as a safety net.

Starvation Prevention Strategies
Strategy	Mechanism	Pros	Cons
Aging	Increase priority with wait time	Smooth, tunable	Harder to analyze bounds
Priority Ceiling	Cap maximum priority	Simple	Doesn't guarantee progress
Fair Queuing	Proportional CPU shares	Strong fairness	Suboptimal for pure wait time
RR Fallback	Switch algorithm after threshold	Clear guarantee	Abrupt behavior change
Lottery Scheduling	Random selection weighted by priority	Probabilistic fairness	Non-deterministic
MLFQ	Multiple queues with aging	Comprehensive	Complex to configure

MLFQ: The Practical Standard

Real-World Starvation Cases

Starvation isn't just theoretical—it causes real problems in production systems.

Case Study 1: Database Batch Jobs

Scenario:

OLTP (Online Transaction Processing) queries: short, high-volume
OLAP (Analytics) queries: long, lower priority

Problem: With priority scheduling favoring OLTP, OLAP queries starve. Reports that should take 5 minutes take hours—or never complete.

Solution: Databases like PostgreSQL use query scheduling with resource pools. OLAP queries get guaranteed resources regardless of OLTP load.

Case Study 2: Network Packet Scheduling

Scenario:

VoIP packets: small, latency-sensitive
File download packets: large, throughput-oriented

Problem: Strict priority for VoIP can starve bulk transfers indefinitely during calls.

Solution: Network QoS uses weighted fair queuing: VoIP gets priority but bulk transfers are guaranteed minimum bandwidth.

Case Study 3: Linux Before CFS

Scenario:

Interactive processes (X server, terminal): receive priority boost
Background processes (compilation, backup): lower priority

Problem: Under heavy interactive load, batch jobs could starve for minutes.

Solution: CFS (introduced in Linux 2.6.23) uses virtual runtime instead of priorities. Every process makes progress proportional to its weight. Starvation is impossible by design.

The Mars Pathfinder Bug

Summary: Balancing Efficiency and Fairness

We have thoroughly examined the starvation problem—the critical flaw that limits SJF/SRTF's practical applicability.

Key Takeaways

•Starvation is indefinite postponement — A ready process never gets CPU despite the system making progress
•SJF/SRTF cause starvation when short jobs continuously arrive — Long processes are perpetually bypassed
•Starvation probability increases with system utilization — Above 80-90% utilization, starvation becomes likely
•Aging is the primary prevention mechanism — Priority increases with wait time, guaranteeing eventual service
•Aging parameter α controls the efficiency-fairness tradeoff — Higher α = stronger fairness, weaker optimality
•Modern schedulers (CFS, MLFQ) prevent starvation by design — They don't use pure priority/SJF scheduling

Module Conclusion:

Armed with this understanding, you can now analyze any scheduling algorithm through multiple lenses: efficiency, fairness, predictability, and implementation complexity.

Module Complete

5 / 5