Loading content...
In the previous page, we stated that Shortest Job First (SJF) provably minimizes average waiting time among all non-preemptive scheduling algorithms. This is a remarkable claim—not a heuristic observation, not a "usually works well" guideline, but a mathematical guarantee.
Understanding why SJF is optimal matters for several reasons:
This page presents multiple approaches to understanding and proving SJF's optimality, from intuitive arguments to formal mathematical proofs.
By completing this page, you will understand the intuition behind SJF's optimality, follow a complete mathematical proof, comprehend the exchange argument technique, and recognize the precise conditions under which SJF guarantees minimal average waiting time.
Before formal proofs, let's build intuition about why scheduling shorter jobs first minimizes average waiting time.
Consider what happens when a process runs. Every process behind it in the queue must wait for it to complete. If a process with burst time B runs first, every subsequent process waits at least B time units.
This creates a cascade effect:
Mathematically, if we schedule processes in order j₁, j₂, ..., jₙ with burst times b₁, b₂, ..., bₙ:
Total Waiting Time = (n-1)·b_{j₁} + (n-2)·b_{j₂} + ... + 1·b_{j_{n-1}} + 0·b_{jₙ}
Each burst time is weighted by how many processes must wait for it.
The first job's burst time is multiplied by (n-1) because all other processes wait. The last job's burst time is multiplied by 0 because no one waits after it. To minimize the weighted sum, we should assign the largest weights to the smallest burst times. This is exactly what SJF does!
Consider 3 processes with bursts: A=1, B=2, C=3
Schedule A→B→C (SJF order):
Schedule C→B→A (reverse order):
The SJF order produces half the total waiting time!
Using our formula:
By assigning large weights (2, 1, 0) to small values (1, 2, 3), SJF minimizes the weighted sum.
| Schedule | Weighted Sum Calculation | Total Wait | Average Wait |
|---|---|---|---|
| 1→2→3 (SJF) | 2×1 + 1×2 + 0×3 | 4 | 1.33 |
| 1→3→2 | 2×1 + 1×3 + 0×2 | 5 | 1.67 |
| 2→1→3 | 2×2 + 1×1 + 0×3 | 5 | 1.67 |
| 2→3→1 | 2×2 + 1×3 + 0×1 | 7 | 2.33 |
| 3→1→2 | 2×3 + 1×1 + 0×2 | 7 | 2.33 |
| 3→2→1 (Worst) | 2×3 + 1×2 + 0×1 | 8 | 2.67 |
The table confirms that the SJF order (1→2→3) achieves the minimum possible average waiting time of 1.33, while the reverse order achieves the maximum of 2.67. No permutation beats SJF.
Before proving optimality, we must precisely state what we're proving.
Given:
Define:
To Prove: The schedule S* that orders processes by non-decreasing burst times (SJF order) minimizes average waiting time over all possible schedules.
Let π* be the SJF permutation where b_{π*(1)} ≤ b_{π*(2)} ≤ ... ≤ b_{π*(n)}.
For any permutation π:
Total Waiting Time(π) = Σᵢ₌₁ⁿ [Σⱼ₌₁^{i-1} b_{π(j)}]
This says: the waiting time for the i-th process in schedule order is the sum of burst times of all processes before it.
Equivalently:
Total Waiting Time(π) = Σᵢ₌₁ⁿ (n - i) · b_{π(i)}
The i-th scheduled process's burst time is weighted by how many processes come after it.
This proof assumes all processes arrive at time 0. With different arrival times, we can extend the analysis to show that SJF remains optimal at each decision point, selecting the shortest available job. The core insight—minimize the impact of early jobs on later jobs—still holds.
The exchange argument is a powerful proof technique. We show that any schedule not in SJF order can be improved by swapping adjacent out-of-order elements. Since every non-SJF schedule can be incrementally improved toward SJF, no non-SJF schedule can be optimal.
Lemma: If schedule S has two adjacent processes pᵢ and pⱼ where pᵢ precedes pⱼ but bᵢ > bⱼ (longer job before shorter), then swapping them produces schedule S' with strictly lower total waiting time.
Proof of Lemma:
Consider processes at positions k and k+1 in the schedule:
Let W_before and W_after denote total waiting times.
Key observation: The swap only affects:
All other processes are unaffected—their relative positions and the total burst time before them remain unchanged.
Before swap:
After swap:
Change in total waiting time: W_after - W_before = bⱼ - bᵢ < 0 (since bⱼ < bᵢ)
The total waiting time decreases by (bᵢ - bⱼ) > 0.
∎
When we swap a longer job with a shorter job that follows it, the longer job now waits for the shorter one (small penalty), while the shorter job no longer waits for the longer one (big benefit). The net effect is always positive because we're trading a large waiting penalty for a small one.
Theorem: The SJF schedule (processes ordered by non-decreasing burst times) achieves minimum total waiting time.
Proof:
Let S be any schedule that is not SJF order. Since S is not sorted by burst times, there must exist at least one pair of adjacent processes where a longer process precedes a shorter one.
Step 1: By the Lemma, we can swap these adjacent processes to reduce total waiting time, producing schedule S₁ with W(S₁) < W(S).
Step 2: If S₁ is still not in SJF order, repeat the swap process. Each swap reduces total waiting time.
Step 3: Since there are finitely many possible swaps (at most n(n-1)/2 inversions to fix), and each swap strictly decreases waiting time, this process terminates.
Step 4: When no more improving swaps are possible, the schedule must be sorted by burst times—the SJF order.
Step 5: Therefore, starting from any non-SJF schedule, we can reach the SJF schedule through a series of improvements. This means no non-SJF schedule can have lower waiting time than SJF.
Conclusion: SJF achieves the minimum total (and hence average) waiting time.
∎
A more elegant proof uses the Rearrangement Inequality, a fundamental result from mathematical analysis.
Theorem (Rearrangement Inequality): Let a₁ ≤ a₂ ≤ ... ≤ aₙ and b₁ ≤ b₂ ≤ ... ≤ bₙ be sorted sequences of real numbers. For any permutation π:
aₙb₁ + aₙ₋₁b₂ + ... + a₁bₙ ≤ a_{π(1)}b₁ + a_{π(2)}b₂ + ... + a_{π(n)}bₙ ≤ a₁b₁ + a₂b₂ + ... + aₙbₙ
In words:
Recall our total waiting time formula:
Total Waiting Time = Σᵢ₌₁ⁿ (n - i) · b_{π(i)}
We can view this as a dot product of two sequences:
To minimize the sum, by the Rearrangement Inequality, we should pair:
This means burst times should be in ascending order — exactly the SJF order!
∎
The Rearrangement Inequality proof reveals that SJF's optimality is an instance of a deeper mathematical principle. Any problem that reduces to minimizing a weighted sum with fixed descending weights is solved by ordering items in ascending order of their values.
| Position | Weight (n-i) | SJF: Pairs With | Reverse: Pairs With |
|---|---|---|---|
| 1st | (n-1) = largest | Smallest burst | Largest burst |
| 2nd | (n-2) | 2nd smallest burst | 2nd largest burst |
| ... | ... | ... | ... |
| n-th | 0 = smallest | Largest burst | Smallest burst |
With 3 bursts [1, 2, 3] and weights [2, 1, 0]:
SJF (ascending bursts): 2×1 + 1×2 + 0×3 = 2 + 2 + 0 = 4
Reverse (descending bursts): 2×3 + 1×2 + 0×1 = 6 + 2 + 0 = 8
The Rearrangement Inequality guarantees 4 ≤ any permutation ≤ 8.
All 6 permutations produce values in [4, 8], confirming the theory.
SJF's optimality is proven under specific conditions. Understanding these conditions clarifies when the guarantee applies and when it doesn't.
| Condition Violated | Result | Mitigation |
|---|---|---|
| Burst times unknown | Only approximate optimality possible | Use prediction (exponential averaging) |
| Different arrival times | SJF at each decision point is still optimal for that point | Apply SJF incrementally as processes arrive |
| Preemption required | Consider SRTF instead | SRTF is optimal among preemptive algorithms |
| Multiple CPUs | SJF is not necessarily optimal | Use load-balancing heuristics |
| Fairness required | SJF causes starvation | Use aging to limit waiting times |
SJF is optimal for average waiting time, but that doesn't make it the best scheduler. If fairness matters, SJF's starvation problem is a fatal flaw. If response time matters, preemptive SRTF outperforms non-preemptive SJF. Always consider the complete set of system requirements, not just one metric.
The proof above assumes all processes arrive at time 0. When processes arrive at different times, the analysis requires extension.
With non-simultaneous arrivals, SJF becomes:
At each scheduling decision point, select the shortest job among all currently ready (arrived and waiting) processes.
This is still SJF—applied dynamically to the available process set.
Local Optimality: At each decision point, selecting the shortest available job is optimal for that decision, given the processes present.
Global Optimality: However, SJF is not necessarily globally optimal with different arrival times. Consider:
| Process | Arrival | Burst |
|---|---|---|
| P1 | 0 | 7 |
| P2 | 2 | 1 |
SJF Execution:
If we could see the future (and wait for P2):
Waiting for P2 produces lower average waiting time!
Optimal scheduling with different arrivals would require knowing the future—when processes will arrive and their burst times. Real schedulers can't wait for processes that haven't arrived yet. SJF among current processes is the best we can do without future knowledge, but it's not globally optimal.
With different arrival times, SJF still provides:
For truly optimal scheduling with arrivals, preemptive SRTF (covered in Page 4) provides significant improvements by allowing re-evaluation when new processes arrive.
From an algorithmic standpoint, the SJF optimality result connects to broader complexity concepts.
Finding the SJF schedule is equivalent to sorting processes by burst time:
This makes SJF a remarkably efficient algorithm to compute (when burst times are known).
Many scheduling problems are NP-hard—no efficient algorithm guarantees optimal solutions:
| Problem | Complexity | SJF Applicable? |
|---|---|---|
| Single machine, minimize total wait | P (SJF is optimal, O(n log n)) | Yes |
| Multiple machines, minimize makespan | NP-hard | No |
| Single machine with deadlines | NP-hard (in general) | No |
| Job shop scheduling | NP-hard | No |
| Preemptive single machine | P (SRTF) | Related |
SJF's polynomial-time optimality is special. Most realistic scheduling problems have no efficient optimal algorithm.
Understanding that SJF optimality falls into P (polynomial time) while most scheduling is NP-hard helps explain why operating systems use heuristics. When burst times are unknown and multiple objectives compete, we're in NP-hard territory. SJF's optimality is a rare island of tractability.
SJF is a greedy algorithm—at each step, it makes the locally optimal choice (shortest job) and never reconsiders.
Greedy algorithms are optimal when:
For SJF with simultaneous arrivals, both properties hold:
This connects SJF to the broader family of greedy scheduling algorithms, including variants like weighted job scheduling and activity selection.
We have rigorously established SJF's optimality property. Let's consolidate the key insights:
Looking Ahead:
The optimality proof assumes we know burst times exactly—which we don't in practice. The next page tackles the critical challenge of burst time prediction: How can a scheduler estimate future CPU burst lengths using past behavior? We'll explore exponential averaging and other prediction techniques that make SJF practical.
You now possess a rigorous understanding of why SJF minimizes average waiting time. The exchange argument and rearrangement inequality proofs equip you to reason about scheduling optimality in other contexts as well. This mathematical foundation is essential for critically evaluating scheduling algorithms.