Loading learning content...
We have established that SJF and SRTF achieve optimal average waiting time. They are mathematically beautiful algorithms with provable properties. Yet neither algorithm sees widespread use in general-purpose operating systems. Why?
The answer lies in a critical flaw: starvation.
Imagine a batch processing job with a 10-minute CPU burst. Every time it's about to run, a short interactive process arrives—a keystroke, a mouse click, a network packet. Under SJF/SRTF, the short process runs first. If short processes arrive frequently enough, the long job never runs. It waits indefinitely, starved of CPU time.
This isn't a theoretical edge case. In real systems with mixed workloads—interactive users alongside batch jobs—starvation is practically guaranteed. The very property that makes SJF optimal (favoring short jobs) creates systematic unfairness against long jobs.
By completing this page, you will understand the starvation problem's causes and conditions, be able to analyze when starvation will occur, master detection techniques, and learn prevention strategies—particularly the aging mechanism that transforms SJF into a fair algorithm.
Starvation (also called indefinite blocking or livelock in some contexts) occurs when a process waits indefinitely for a resource it needs, despite that resource being available to other processes.
A scheduling algorithm causes starvation if there exists a valid input (set of processes with arrival times and burst lengths) such that at least one process has unbounded waiting time—it never receives CPU service.
Starvation and deadlock are distinct problems. Deadlock: a circular wait where processes block each other (e.g., P1 waits for P2, P2 waits for P1). Starvation: a process waits while others proceed—there's no circular dependency. A starving process could run if selected; a deadlocked process cannot run until the cycle breaks.
| Aspect | Starvation | Deadlock |
|---|---|---|
| Cause | Scheduling policy bias | Circular resource wait |
| State of victim | Ready (could run) | Blocked (cannot run) |
| System progress | Other processes proceed | All involved processes blocked |
| Detection | Timeout-based | Cycle detection in wait graph |
| Resolution | Priority boost / aging | Preemption / rollback |
| Algorithm type | Scheduling algorithms | Resource allocation algorithms |
Starvation in SJF/SRTF requires specific conditions. Understanding these conditions helps predict when starvation will (and won't) occur.
Condition 1: Long process exists
At least one process has a burst time significantly longer than others. This process is vulnerable to starvation.
Condition 2: Continuous arrival of short processes
Short processes arrive frequently enough that:'
Condition 3: Arrival rate exceeds service rate (for short jobs)
Let:
If λ_short × (average short burst) ≈ 1, the CPU is fully utilized by short jobs, leaving no capacity for long jobs.
Consider: Long process L with burst 100ms. Short processes S1, S2, S3, ... each with 5ms burst, arriving every 4ms. The CPU is always busy with short processes (arriving faster than they complete). L never runs. This pattern is common in interactive systems with background batch jobs.
Starvation is impossible under SJF/SRTF if:
In a queuing model with Poisson arrivals:
This is why heavily loaded systems are particularly vulnerable.
We can formally analyze starvation probability under various workload models.
Consider two classes of jobs:
Under strict priority (short always first):
Utilization by short jobs: ρ₁ = λ₁/μ₁
Class 2 (long jobs) expected wait time: $$W_2 = \frac{W_0}{(1-\rho_1)^2}$$
Where W₀ is the wait time without priorities.
Key insight: As ρ₁ → 1, W₂ → ∞
When short jobs consume close to 100% of CPU capacity, long job wait time diverges to infinity—starvation.
| Short Job Utilization (ρ₁) | Wait Time Multiplier for Long Jobs | Practical Impact |
|---|---|---|
| 50% | 4x | Tolerable |
| 70% | 11x | Noticeable delays |
| 80% | 25x | Serious delays |
| 90% | 100x | Near starvation |
| 95% | 400x | Effective starvation |
| 99% | 10,000x | Complete starvation |
System administrators often target 70-80% CPU utilization for interactive systems. The mathematical analysis shows why: above 80%, priority inversion effects become severe. This isn't arbitrary—it's a consequence of queuing theory.
In the worst case—adversarial workload designed to cause starvation:
Construction:
As ε → 0:
Theorem: SJF and SRTF do not provide bounded waiting time guarantees.
No matter how long L has waited, there exists a workload where it waits longer. This is the formal sense in which starvation is "unbounded."
Detecting starvation requires monitoring process wait times and identifying anomalies.
The simplest approach: flag processes that have waited beyond a threshold.
#define STARVATION_THRESHOLD_MS 10000 // 10 seconds
void check_starvation(ReadyQueue* queue, int current_time) {
for (Process* p : queue) {
int wait_time = current_time - p->arrival_time;
if (wait_time > STARVATION_THRESHOLD_MS) {
printf("WARNING: P%d starving (waited %d ms)\n",
p->pid, wait_time);
handle_starvation(p); // Apply remediation
}
}
}
Choosing the threshold:
More sophisticated: detect when wait times deviate from expected distributions.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
/** * Advanced Starvation Detection using Wait Time Statistics */ #include <stdio.h>#include <math.h> typedef struct { double sum_wait; // Sum of wait times seen double sum_wait_squared; // Sum of squared wait times int count; // Number of processes completed double mean; // Running mean double variance; // Running variance} WaitTimeStats; void update_stats(WaitTimeStats* stats, double wait_time) { stats->count++; stats->sum_wait += wait_time; stats->sum_wait_squared += wait_time * wait_time; stats->mean = stats->sum_wait / stats->count; if (stats->count > 1) { stats->variance = (stats->sum_wait_squared - (stats->sum_wait * stats->sum_wait) / stats->count) / (stats->count - 1); }} /** * Check if a process's current wait time is anomalous * Uses z-score: how many standard deviations from mean * * @return 1 if likely starving, 0 otherwise */int is_starving(WaitTimeStats* stats, double current_wait) { if (stats->count < 10) { // Not enough data; use absolute threshold return current_wait > 10000; // 10 seconds } double std_dev = sqrt(stats->variance); if (std_dev < 1.0) std_dev = 1.0; // Avoid division issues double z_score = (current_wait - stats->mean) / std_dev; // Starving if > 3 standard deviations above mean return z_score > 3.0;} /** * Alternative: Percentile-based detection * Maintain histogram of wait times, flag if > 99th percentile */#define HISTOGRAM_BUCKETS 100#define MAX_WAIT_MS 60000 // 1 minute max for histogram typedef struct { int buckets[HISTOGRAM_BUCKETS]; int total;} WaitTimeHistogram; void histogram_add(WaitTimeHistogram* hist, int wait_ms) { int bucket = (wait_ms * HISTOGRAM_BUCKETS) / MAX_WAIT_MS; if (bucket >= HISTOGRAM_BUCKETS) bucket = HISTOGRAM_BUCKETS - 1; hist->buckets[bucket]++; hist->total++;} int get_percentile(WaitTimeHistogram* hist, int percentile) { int target = (hist->total * percentile) / 100; int cumulative = 0; for (int i = 0; i < HISTOGRAM_BUCKETS; i++) { cumulative += hist->buckets[i]; if (cumulative >= target) { return (i * MAX_WAIT_MS) / HISTOGRAM_BUCKETS; } } return MAX_WAIT_MS;} int is_starving_percentile(WaitTimeHistogram* hist, int current_wait) { int p99 = get_percentile(hist, 99); return current_wait > p99;}Production operating systems typically don't run explicit starvation detection because they use algorithms that prevent starvation by design (like aging in MLFQ or guaranteed CPU share in CFS). Detection is more useful in research systems or when analyzing scheduler behavior.
Aging is the primary technique for preventing starvation in priority-based schedulers. The idea is elegant: as a process waits, its priority gradually increases. Eventually, even the lowest-priority process becomes the highest priority.
In standard SJF, priority = burst length (lower is higher priority).
With aging:
Effective Priority = Burst Length - (Aging Factor × Wait Time)
Or equivalently:
Effective Priority = Burst Length - Age Bonus
Where Age Bonus increases with wait time.
Consider a long process L with burst 100ms:
No matter how short new arrivals are, L will eventually accumulate enough age bonus to be scheduled.
| Time (ms) | L's Wait | L's Age Bonus | L's Effective Priority | Scheduled? |
|---|---|---|---|---|
| 0 | 0 | 0 | 100 | No (short jobs arriving) |
| 20 | 20 | 20 | 80 | No |
| 50 | 50 | 50 | 50 | Maybe (depends on arrivals) |
| 80 | 80 | 80 | 20 | Likely |
| 100 | 100 | 100 | 0 | Yes (highest priority) |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
/** * SJF with Aging - Starvation-Free Variant */ #include <stdio.h> typedef struct { int pid; int arrival_time; int burst_time; int remaining_time; int waiting_since; // Time when process entered ready queue} ProcessWithAging; // Aging configuration#define AGING_FACTOR 1.0 // 1ms of age bonus per 1ms of waiting#define MAX_AGE_BONUS 1000 // Cap the bonus to prevent overflow /** * Calculate effective priority with aging * Lower value = higher priority */double calculate_priority(ProcessWithAging* p, int current_time) { int wait_time = current_time - p->waiting_since; double age_bonus = wait_time * AGING_FACTOR; if (age_bonus > MAX_AGE_BONUS) { age_bonus = MAX_AGE_BONUS; } // Effective priority: burst adjusted by age bonus double effective = p->remaining_time - age_bonus; // Ensure priority doesn't go negative (optional) if (effective < 0) effective = 0; return effective;} /** * Find process with best (lowest) effective priority */int find_best_aged_process(ProcessWithAging procs[], int n, int current_time, int completed[]) { int best_idx = -1; double best_priority = 1e9; for (int i = 0; i < n; i++) { if (procs[i].arrival_time <= current_time && !completed[i]) { double priority = calculate_priority(&procs[i], current_time); if (priority < best_priority) { best_priority = priority; best_idx = i; } } } return best_idx;} /** * Guarantee: With aging factor α > 0, maximum wait time is bounded by: * * Max Wait ≤ (Burst Length) / α * * With α = 1, a process with burst B waits at most B milliseconds * before its priority exceeds any new arrival with burst ≤ B. * * This provides a formal starvation-freedom guarantee. */With aging factor α, the maximum wait time for any process is bounded by (MaxBurst / α). This is a formal guarantee: no process can wait indefinitely. The tradeoff is that SJF's optimality is weakened—we may schedule a long job before a short one to prevent starvation.
The aging factor α controls the balance between SJF's efficiency and fairness.
Guidelines for selecting the aging factor:
Based on SLA (Service Level Agreement):
If maximum wait time must be ≤ W_max:
α ≥ max_burst / W_max
Example: If longest job is 60 seconds and max wait must be ≤ 5 minutes:
α ≥ 60,000 / 300,000 = 0.2
Based on workload analysis:
Linear aging (bonus grows linearly with wait) is simplest, but alternatives exist:
Exponential aging: bonus = k × e^(α × wait)
Quadratic aging: bonus = α × wait²
Step aging: bonus jumps at thresholds
Start with α = 1.0 (1ms bonus per 1ms wait). This ensures a process with burst B waits at most B additional milliseconds beyond its "fair" share. Adjust based on observed behavior: increase α if long jobs still experience unacceptable delays; decrease if interactive response suffers.
Besides aging, several other techniques prevent or mitigate starvation.
Set a maximum priority level. No process can have priority better than the ceiling:
effective_priority = max(actual_priority, PRIORITY_CEILING);
This prevents any process from permanently dominating. Limitation: doesn't guarantee progress—just limits advantage.
Allocate CPU time proportionally rather than by priority:
Example: Linux CFS approximates this with vruntime: processes that have run less get scheduling preference.
If a process waits beyond threshold, move it to a guaranteed execution queue:
if (wait_time > THRESHOLD) {
move_to_guaranteed_queue(process);
// Guaranteed queue uses round-robin
}
Combines SJF efficiency with RR fairness as a safety net.
| Strategy | Mechanism | Pros | Cons |
|---|---|---|---|
| Aging | Increase priority with wait time | Smooth, tunable | Harder to analyze bounds |
| Priority Ceiling | Cap maximum priority | Simple | Doesn't guarantee progress |
| Fair Queuing | Proportional CPU shares | Strong fairness | Suboptimal for pure wait time |
| RR Fallback | Switch algorithm after threshold | Clear guarantee | Abrupt behavior change |
| Lottery Scheduling | Random selection weighted by priority | Probabilistic fairness | Non-deterministic |
| MLFQ | Multiple queues with aging | Comprehensive | Complex to configure |
Multi-Level Feedback Queue (MLFQ) combines many of these ideas: multiple priority queues, priority adjustment based on behavior (aging up, promotion for I/O wait), and guaranteed minimum service at each level. Most modern OS schedulers are variants of MLFQ.
Starvation isn't just theoretical—it causes real problems in production systems.
Scenario:
Problem: With priority scheduling favoring OLTP, OLAP queries starve. Reports that should take 5 minutes take hours—or never complete.
Solution: Databases like PostgreSQL use query scheduling with resource pools. OLAP queries get guaranteed resources regardless of OLTP load.
Scenario:
Problem: Strict priority for VoIP can starve bulk transfers indefinitely during calls.
Solution: Network QoS uses weighted fair queuing: VoIP gets priority but bulk transfers are guaranteed minimum bandwidth.
Scenario:
Problem: Under heavy interactive load, batch jobs could starve for minutes.
Solution: CFS (introduced in Linux 2.6.23) uses virtual runtime instead of priorities. Every process makes progress proportional to its weight. Starvation is impossible by design.
In 1997, the Mars Pathfinder mission experienced system resets caused by priority inversion—a related but distinct problem. A low-priority task held a lock needed by a high-priority task, while medium-priority tasks ran. This wasn't starvation (the high-priority task was blocked, not just unpicked), but it illustrates how priority-based scheduling can fail dramatically.
We have thoroughly examined the starvation problem—the critical flaw that limits SJF/SRTF's practical applicability.
Module Conclusion:
Across this module, we've journeyed from SJF's elegant optimality to its practical limitations. We've seen how SRTF improves on SJF through preemption, how burst prediction makes these algorithms usable, and finally, how starvation threatens their applicability.
The lesson is profound: optimality for one metric (average wait time) can cause failure on another (fairness). Real schedulers must balance multiple objectives, which is why production systems use hybrid approaches like MLFQ or CFS rather than pure SJF/SRTF.
Armed with this understanding, you can now analyze any scheduling algorithm through multiple lenses: efficiency, fairness, predictability, and implementation complexity.
Congratulations! You have completed Module 3: Shortest Job First (SJF) & SRTF. You now possess deep understanding of these foundational scheduling algorithms—their optimality, their approximation through prediction, their preemptive variants, and their critical limitation: starvation. This knowledge forms the foundation for understanding more sophisticated schedulers covered in later modules.