Data Structures & AlgorithmsProving Greedy Correctness

Proving Greedy Algorithm Correctness

LevelIntermediate

Duration75 mins

TopicProving Greedy Correctness

2 / 4

Greedy Stays Ahead Argument

The Race That Never Falls Behind

Imagine watching a marathon where you can observe every runner at every checkpoint. If one runner is consistently at or ahead of all others at every single checkpoint, there's no mystery about who wins—the runner who never falls behind must finish first or tied for first.

This intuition captures the essence of the Greedy Stays Ahead proof technique. Instead of comparing the final results of different algorithms, we track their progress throughout execution. If we can demonstrate that greedy's partial solution is always at least as good as any alternative's partial solution at corresponding stages, we prove that greedy's final solution must be optimal.

This technique is particularly powerful because it sidesteps the need to analyze exponentially many alternative solutions. We don't enumerate alternatives—we simply prove that at every moment, we're doing at least as well as the best possible alternative could be doing.

What You Will Learn

By the end of this page, you will master the 'Greedy Stays Ahead' proof technique. You'll see its formal structure, understand when to apply it, and work through complete proofs for Activity Selection and other problems. This technique will become a core tool in your algorithm design toolkit.

The Core Idea: Measuring Progress

The 'Greedy Stays Ahead' technique works by defining a measure of progress that we can compute at each step of the algorithm. This measure captures how 'good' the partial solution is at that point.

The Key Insight

If we can prove that the greedy algorithm's partial solution has a progress measure at least as good as any other algorithm's partial solution at the same step, then:

After step 1, greedy is ahead or tied
After step 2, greedy is still ahead or tied
...
After the final step, greedy is ahead or tied

Since both algorithms solve the same problem, 'ahead or tied at the end' means greedy's solution is optimal.

Choosing the Right Measure

The art of this technique lies in choosing the right progress measure. A good measure should: (1) increase monotonically as the algorithm progresses, (2) be directly related to solution quality, and (3) allow meaningful comparison between greedy and alternative solutions at the same step.

Formal Structure

Let G be the greedy algorithm and let A be any arbitrary alternative algorithm that produces an optimal solution. Let:

g₁, g₂, ..., gₖ be the sequence of choices made by greedy
a₁, a₂, ..., aₘ be the sequence of choices made by alternative A

We prove a staying ahead lemma: For all i where both have made at least i choices, the greedy solution through step i has a progress measure at least as good as the alternative's.

Formally: For all i ≤ min(k, m): progress(g₁, ..., gᵢ) ≥ progress(a₁, ..., aᵢ)

Conclusion: If k ≥ m (greedy makes at least as many choices) and greedy stays ahead, greedy is optimal.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
THEOREM: Greedy algorithm G produces an optimal solution for problem P.
 
PROOF (Greedy Stays Ahead):
 
1. DEFINE PROGRESS MEASURE
   - Let M(S, i) = progress measure of solution S after i steps
   - [Choose M so that higher M means "better" progress toward goal]
 
2. STAYING AHEAD LEMMA
   Let G produce sequence g₁, g₂, ..., gₖ
   Let OPT be any optimal solution with sequence a₁, a₂, ..., aₘ
   
   Claim: For all i ≤ min(k, m): M(G, i) ≥ M(OPT, i)
   
   Proof by induction on i:
   
   Base case (i = 1):
     - G's first choice g₁ is [greedy criterion, e.g., smallest, earliest]
     - By definition of greedy, g₁ is at least as good as a₁
     - Therefore M(G, 1) ≥ M(OPT, 1)
   
   Inductive step:
     - Assume M(G, i-1) ≥ M(OPT, i-1)
     - Show M(G, i) ≥ M(OPT, i)
     - Key argument: [Problem-specific reasoning showing greedy maintains lead]
 
3. CONCLUSION
   By staying ahead lemma: M(G, k) ≥ M(OPT, m)
   Show that k ≥ m (greedy picks at least as many items as optimal)
   Therefore |G's solution| ≥ |OPT's solution|
   Since OPT is optimal, G must also be optimal.  ∎

Complete Example: Activity Selection Problem

The Activity Selection Problem is the canonical example for demonstrating the Greedy Stays Ahead technique. It's clean, intuitive, and the proof is particularly elegant.

Problem Statement

Given n activities, each with a start time sᵢ and finish time fᵢ, select the maximum number of non-overlapping activities.

Two activities are compatible (non-overlapping) if: fᵢ ≤ sⱼ or fⱼ ≤ sᵢ (one finishes before the other starts)

Goal: Maximize the number of selected activities

Greedy Strategy: Always select the activity that finishes earliest among remaining compatible activities.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def activity_selection_greedy(activities):
    """
    Select maximum number of non-overlapping activities.
    
    Args:
        activities: List of (start_time, finish_time, name) tuples
    
    Returns:
        List of selected activities (maximum non-overlapping set)
    
    Greedy criterion: Always pick the activity that finishes earliest.
    """
    # Sort by finish time (earliest first)
    sorted_activities = sorted(activities, key=lambda x: x[1])
    
    selected = []
    last_finish_time = 0
    
    for start, finish, name in sorted_activities:
        # If this activity starts after our last one finished, it's compatible
        if start >= last_finish_time:
            selected.append((start, finish, name))
            last_finish_time = finish
    
    return selected
 
# Example
activities = [
    (1, 4, "A"),   # 1-4
    (3, 5, "B"),   # 3-5  
    (0, 6, "C"),   # 0-6
    (5, 7, "D"),   # 5-7
    (3, 9, "E"),   # 3-9
    (5, 9, "F"),   # 5-9
    (6, 10, "G"),  # 6-10
    (8, 11, "H"),  # 8-11
    (8, 12, "I"),  # 8-12
    (2, 14, "J"),  # 2-14
    (12, 16, "K"), # 12-16
]
 
result = activity_selection_greedy(activities)
print("Selected activities:")
for start, finish, name in result:
    print(f"  {name}: [{start}, {finish})")
print(f"Total: {len(result)} activities")
 
# Output:
# Selected activities:
#   A: [1, 4)
#   D: [5, 7)
#   H: [8, 11)
#   K: [12, 16)
# Total: 4 activities

Visualizing the Greedy Selection

Let's visualize why selecting by earliest finish time works:

Time:  0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16
       |--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|
C:     ████████████████████          (0-6)  ← Long, blocks many
A:        ██████████                  (1-4)  ← Selected first ✓
B:           ████████                 (3-5)
J:        ████████████████████████████████████  (2-14)  ← Very long
D:                 ████████           (5-7)  ← Selected second ✓
E:           ████████████████████     (3-9)
F:                 ████████████████   (5-9)
G:                    ████████████████  (6-10)
H:                          ██████████  (8-11)  ← Selected third ✓
I:                          ████████████████  (8-12)
K:                                   ████████████  (12-16)  ← Selected fourth ✓

By always choosing the activity that finishes earliest, we leave the maximum possible time for future activities.

The Complete Proof: Activity Selection

Now we apply the Greedy Stays Ahead technique to rigorously prove that the earliest-finish-time greedy algorithm is optimal for Activity Selection.

Setup and Definitions

Let activities be sorted by finish time: f₁ ≤ f₂ ≤ ... ≤ fₙ

Let G = {g₁, g₂, ..., gₖ} be the greedy solution (activities in order selected)

Let O = {o₁, o₂, ..., oₘ} be any optimal solution (activities in finish time order)

Progress Measure: After selecting i activities, our measure is the finish time of the iᵗʰ selected activity. A smaller finish time is "better" because it leaves more room for future activities.

We want to show: For all i, finish(gᵢ) ≤ finish(oᵢ)

This says: the iᵗʰ greedy activity finishes no later than the iᵗʰ optimal activity.

Why This Measure Works

Finish time is the perfect progress measure here because: (1) it's what the greedy algorithm optimizes for, (2) earlier finish = more time remaining = more opportunity for future activities, (3) it allows direct comparison between solutions at the same stage.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
THEOREM: The earliest-finish-time greedy algorithm produces an 
         optimal solution for Activity Selection.
 
PROOF:
 
STEP 1: DEFINITIONS
  Let activities be numbered 1 to n in order of finish time: f₁ ≤ f₂ ≤ ... ≤ fₙ
  Let G = {g₁, g₂, ..., gₖ} be greedy's solution (in selection order)
  Let O = {o₁, o₂, ..., oₘ} be an optimal solution (in finish time order)
  
STEP 2: STAYING AHEAD LEMMA
  Claim: For all i ≤ min(k, m): f(gᵢ) ≤ f(oᵢ)
         [Greedy's iᵗʰ activity finishes no later than optimal's iᵗʰ]
  
  Proof by induction on i:
  
  BASE CASE (i = 1):
    Greedy selects g₁, the activity with earliest finish time.
    By definition, f(g₁) ≤ f(a) for all activities a.
    In particular, f(g₁) ≤ f(o₁).  ✓
  
  INDUCTIVE STEP (i → i+1):
    Assume f(gᵢ) ≤ f(oᵢ) for some i < min(k, m).
    We need to show f(gᵢ₊₁) ≤ f(oᵢ₊₁).
    
    Since O is a valid solution, o_{i+1} is compatible with o₁, ..., oᵢ.
    This means: s(oᵢ₊₁) ≥ f(oᵢ)   [oᵢ₊₁ starts after oᵢ finishes]
    
    By inductive hypothesis: f(gᵢ) ≤ f(oᵢ)
    
    Therefore: s(oᵢ₊₁) ≥ f(oᵢ) ≥ f(gᵢ)
    
    This means oᵢ₊₁ is compatible with greedy's first i choices!
    In other words, oᵢ₊₁ is a candidate that greedy could select.
    
    But greedy selects gᵢ₊₁, the earliest-finishing compatible activity.
    Since oᵢ₊₁ is compatible and greedy chose gᵢ₊₁ over it:
    
    f(gᵢ₊₁) ≤ f(oᵢ₊₁)  ✓
    
    This completes the inductive step.
 
STEP 3: GREEDY SELECTS AT LEAST AS MANY ACTIVITIES
  Suppose for contradiction that k < m (greedy selects fewer than optimal).
  
  By the staying ahead lemma: f(gₖ) ≤ f(oₖ)
  
  Since m > k, there exists oₖ₊₁ in the optimal solution.
  Since O is valid: s(oₖ₊₁) ≥ f(oₖ) ≥ f(gₖ)
  
  This means oₖ₊₁ is compatible with g₁, ..., gₖ.
  But greedy continues selecting while compatible activities exist!
  
  Contradiction: Greedy would have selected at least one more activity.
  Therefore k ≥ m.
 
STEP 4: CONCLUSION
  We've shown:
  - Greedy selects k activities
  - Optimal selects m activities  
  - k ≥ m
  
  Since optimal is by definition maximum, we must have k = m.
  Therefore greedy's solution is also optimal.  ∎

Understanding the Proof's Power

Notice what this proof accomplishes:

We never enumerated alternative solutions — We argued against an arbitrary optimal solution O, which represents all possible optimal solutions.
The staying ahead property cascades — Because gᵢ finishes no later than oᵢ, any activity compatible with O's first i is also compatible with G's first i. This is the key insight.
Greedy can't terminate early — The same staying-ahead property ensures greedy has opportunities whenever optimal does, so it can't stop before matching optimal's count.

Why Other Greedy Criteria Fail

It's instructive to see why other seemingly reasonable greedy criteria fail for Activity Selection. This shows that the proof technique reveals deep truths about which criteria work.

Alternative Criterion 1: Earliest Start Time

Greedy by earliest start time: Always pick the activity that starts first.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Counterexample for greedy by earliest start time
 
activities = [
    (0, 10, "A"),   # Starts earliest but runs very long
    (1, 3, "B"),    # Short, starts at 1
    (4, 6, "C"),    # Short, starts at 4
    (7, 9, "D"),    # Short, starts at 7
]
 
def greedy_by_start(activities):
    sorted_by_start = sorted(activities, key=lambda x: x[0])
    selected = []
    last_finish = 0
    for start, finish, name in sorted_by_start:
        if start >= last_finish:
            selected.append((start, finish, name))
            last_finish = finish
    return selected
 
print("Greedy by earliest start:", greedy_by_start(activities))
# Output: [A] - only 1 activity!
# Because A (0-10) blocks all others
 
# Optimal: [B, C, D] - 3 activities!
# B(1-3), C(4-6), D(7-9) are all compatible

Why the proof fails for earliest start time:

The staying ahead lemma requires: after i selections, greedy leaves at least as much room as optimal.

With earliest start: we cannot prove greedy's iᵗʰ selection finishes before optimal's iᵗʰ. Starting early doesn't imply finishing early. The long early-starting activity can block everything else.

Alternative Criterion 2: Shortest Duration

Greedy by shortest duration: Always pick the shortest remaining compatible activity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Counterexample for greedy by shortest duration
 
activities = [
    (0, 2, "A"),    # Duration 2
    (3, 5, "B"),    # Duration 2
    (1, 4, "C"),    # Duration 3, but bridges A and B!
]
 
def greedy_by_duration(activities):
    sorted_by_duration = sorted(activities, key=lambda x: x[1] - x[0])
    selected = []
    last_finish = 0
    
    for start, finish, name in sorted_by_duration:
        # Check if compatible with all selected
        compatible = True
        for s_start, s_finish, s_name in selected:
            if not (finish <= s_start or start >= s_finish):
                compatible = False
                break
        if compatible:
            selected.append((start, finish, name))
    return selected
 
print("Greedy by shortest duration:", greedy_by_duration(activities))
# Output: [(0, 2, 'A'), (3, 5, 'B')] - 2 activities ✓
# Wait, this works! Need a better counterexample...
 
# Better counterexample with more activities
activities2 = [
    (0, 4, "A"),    # Duration 4
    (3, 6, "B"),    # Duration 3 - SHORTEST!
    (5, 9, "C"),    # Duration 4
]
 
print("Greedy by shortest duration:", greedy_by_duration(activities2))
# Output: [(3, 6, 'B')] - 1 activity
# Optimal: [(0, 4, 'A'), (5, 9, 'C')] - 2 activities!

Short Doesn't Mean Optimal

The shortest activity might be positioned in the middle of the optimal solution's activities, blocking multiple better choices. Duration alone doesn't account for where the activity falls on the timeline.

Alternative Criterion 3: Fewest Conflicts

Greedy by fewest conflicts: Always pick the activity that overlaps with the fewest remaining activities.

This seems sophisticated—surely picking the 'least disruptive' activity should work?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Counterexample for greedy by fewest conflicts
 
activities = [
    (0, 4, "A"),     # Conflicts with B
    (3, 6, "B"),     # Conflicts with A, C
    (5, 9, "C"),     # Conflicts with B
    (0, 2, "D"),     # Conflicts with A only
    (7, 9, "E"),     # Conflicts with C only
]
 
def count_conflicts(activity, all_activities):
    count = 0
    s1, f1, _ = activity
    for s2, f2, _ in all_activities:
        if (s1, f1) != (s2, f2):  # Don't count self
            if not (f1 <= s2 or s1 >= f2):  # They overlap
                count += 1
    return count
 
# Count conflicts for each
for act in activities:
    print(f"{act[2]}: {count_conflicts(act, activities)} conflicts")
 
# D: 1 conflict (with A)
# E: 1 conflict (with C)  
# A: 1 conflict (with B)
# C: 1 conflict (with B)
# B: 2 conflicts (with A and C)
 
# Greedy by fewest conflicts: pick D first, then E
# D(0-2), E(7-9) = 2 activities
# 
# But optimal: A(0-4), C(5-9) = 2 activities (same count, but...)
#
# Actually this example doesn't show failure well.
# The key issue: fewest conflicts is expensive O(n²) per step
# and doesn't guarantee optimality in general graphs.

Why Different Criteria Fail
Criterion	Intuition	Why It Fails	Proof Obstacle
Earliest Start	Get going early	Long activities block others	Can't maintain finish time invariant
Shortest Duration	Minimize each commitment	Short activity might bridge optimal choices	Position matters, not just length
Fewest Conflicts	Minimize disruption	Complex interactions; counterexamples exist	Conflicts change dynamically as we select
Latest Start	Wait for information	Misses early activities that end early	Same problem as earliest start, reversed
Earliest Finish ✓	Leave room for more	WORKS!	Maintains staying-ahead invariant

General Pattern for Scheduling Proofs

Activity Selection is one instance of a broader pattern. Many scheduling problems yield to Greedy Stays Ahead proofs with similar structure. Here's the general pattern:

Scheduling Problem Template

Given:

A set of jobs/tasks/intervals with time parameters
Constraint: selected items must be compatible (various definitions)
Objective: maximize count, minimize completion time, etc.

Greedy Stays Ahead Structure

Order items by a time-based criterion (usually finish time, deadline, or duration)
Select greedily while maintaining compatibility
Prove that greedy's k-th selection has a time measure at least as good as any optimal solution's k-th selection
Conclude that greedy can select at least as many (or achieves as good a value)

Example: Interval Scheduling to Minimize Lateness

While Activity Selection maximizes count, consider a different objective:

Problem: Each job has a processing time pᵢ and a deadline dᵢ. Jobs run one at a time (no parallelism). Minimize maximum lateness, where lateness = finish_time - deadline (0 if finished early).

Greedy Strategy: Schedule jobs in order of earliest deadline first (EDF).

Proof Sketch (Greedy Stays Ahead):

Progress measure: maximum lateness among jobs scheduled so far
Show: at each step, greedy's max lateness ≤ any alternative's max lateness
Key insight: if we swap two adjacent jobs where later deadline precedes earlier deadline, lateness can only decrease

This is another instance of Greedy Stays Ahead, though the proof uses exchange ideas as well.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def minimize_lateness_greedy(jobs):
    """
    Schedule jobs to minimize maximum lateness.
    
    Args:
        jobs: List of (processing_time, deadline, name) tuples
        
    Returns:
        (schedule, max_lateness) where schedule is ordered list of jobs
        
    Greedy: Process in order of earliest deadline.
    """
    # Sort by deadline (earliest first)
    sorted_jobs = sorted(jobs, key=lambda x: x[1])
    
    schedule = []
    current_time = 0
    max_lateness = 0
    
    for proc_time, deadline, name in sorted_jobs:
        start_time = current_time
        finish_time = current_time + proc_time
        lateness = max(0, finish_time - deadline)
        max_lateness = max(max_lateness, lateness)
        
        schedule.append({
            'name': name,
            'start': start_time,
            'finish': finish_time,
            'deadline': deadline,
            'lateness': lateness
        })
        current_time = finish_time
    
    return schedule, max_lateness
 
# Example
jobs = [
    (3, 6, "A"),   # Takes 3 units, due at time 6
    (2, 8, "B"),   # Takes 2 units, due at time 8
    (1, 9, "C"),   # Takes 1 unit, due at time 9
    (4, 9, "D"),   # Takes 4 units, due at time 9
    (3, 14, "E"), # Takes 3 units, due at time 14
    (2, 15, "F"), # Takes 2 units, due at time 15
]
 
schedule, max_lateness = minimize_lateness_greedy(jobs)
print("Schedule (EDF greedy):")
for job in schedule:
    print(f"  {job['name']}: run [{job['start']}, {job['finish']}), "
          f"deadline={job['deadline']}, lateness={job['lateness']}")
print(f"Maximum lateness: {max_lateness}")

Recognizing the Pattern

When you see a scheduling problem with a time-based objective (minimize tardiness, maximize throughput, minimize idle time), consider whether a Greedy Stays Ahead proof applies. The key question: can you define a progress measure where greedy provably stays ahead?

When Greedy Stays Ahead Applies

Not every greedy algorithm can be proven correct via Greedy Stays Ahead. This technique works best for problems with specific characteristics.

Ideal Problem Characteristics

Sequential selection structure: The algorithm makes choices one at a time in a clear order
Monotonic progress measure: There's a natural measure that only improves as selections are made
Comparability at each stage: Greedy's partial solution can be meaningfully compared to any alternative's partial solution at the same stage
The greedy criterion directly optimizes the progress measure: If greedy picks 'earliest finish', the measure should involve finish times
Independence of future choices: The 'goodness' of current partial solution doesn't depend on what future selections will be

Good Candidates for GSA

•Activity Selection — Finish time is perfect measure
•Interval scheduling variants — Time-based measures work well
•Tape storage — Minimize seek time, clear ordering
•Scheduling to minimize lateness — Deadline ordering
•Optimal caching — Evict farthest in future

Poor Candidates for GSA

•Huffman coding — Tree structure, exchange argument better
•MST (Kruskal/Prim) — Edge additions, cut property used
•Fractional knapsack — Ratio-based, exchange works
•Set cover — No clear progress measure
•Any problem where greedy fails — Obviously!

Not Mutually Exclusive

Some problems can be proven correct with either Greedy Stays Ahead or Exchange Argument. The techniques are complementary, not competing. Sometimes one is more intuitive than the other for a given problem.

Common Pitfalls in Stays Ahead Proofs

Even when Greedy Stays Ahead is the right technique, proofs can go wrong. Here are common mistakes to avoid:

Proof Pitfalls to Avoid

•Wrong progress measure: Choosing a measure that doesn't align with optimality. Just because greedy 'stays ahead' in one measure doesn't prove optimality in another.
•Assuming alignment: Assuming greedy's iᵗʰ choice corresponds to optimal's iᵗʰ choice without justification. The selections might be different items!
•Ignoring feasibility: Proving greedy stays ahead but forgetting to verify that greedy's choices remain feasible (satisfy constraints).
•Incomplete induction: Proving base case and inductive step but missing edge cases like when k ≠ m (different solution sizes).
•Circular progress definition: Defining 'progress' in terms of solution quality, then claiming greedy makes progress because it's greedy.
•Forgetting to show termination: For some problems, you must also prove greedy eventually terminates with a complete solution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
INCORRECT PROOF ATTEMPT for Activity Selection:
 
Progress measure: number of activities selected so far
 
Claim: At step i, greedy has selected i activities, optimal has selected i activities.
       Therefore they're tied!
 
PROBLEM: This is trivially true and proves nothing!
         Both algorithms have made i selections at step i by definition.
         
The measure must capture QUALITY of selections, not just COUNT.
We need: "greedy's i selections leave more room" not just "greedy made i selections"
 
CORRECT measure: finish time of iᵗʰ selection
This captures quality because earlier finish = more room for future.

Summary: Greedy Stays Ahead

We've thoroughly explored the Greedy Stays Ahead proof technique. Let's consolidate the key insights:

Key Takeaways

•The core idea: Show greedy's partial solution is at least as good as any alternative's at every step. If greedy never falls behind, it must be optimal.
•Progress measure is critical: Choose a measure that (1) aligns with optimality, (2) is what greedy optimizes, and (3) allows comparison at each stage.
•Proof structure: Base case (first choice is optimal), inductive step (maintaining lead), conclusion (greedy matches or exceeds optimal).
•Activity Selection is canonical: Earliest finish time greedy is proven by showing each selection finishes no later than the corresponding optimal selection.
•Other criteria fail: Earliest start, shortest duration, and fewest conflicts all have counterexamples. Only earliest finish has the staying-ahead property.
•Best for scheduling problems: Greedy Stays Ahead excels when there's a natural time-based progress measure.
•Avoid common pitfalls: Wrong measures, assumed alignment, and circular reasoning can invalidate proofs.

What's next:

Now that you've mastered Greedy Stays Ahead, we'll explore the second major proof technique: the Exchange Argument. This technique works by showing that any optimal solution can be transformed into the greedy solution without losing optimality—a powerful approach for problems where staying-ahead measures aren't natural.

Page Complete

You now command the Greedy Stays Ahead proof technique. You can apply it to Activity Selection and similar scheduling problems, recognize when it's the right approach, and avoid common proof mistakes. Next, we'll add the Exchange Argument to your toolkit.

2 / 4

Loading learning content...

Data Structures & AlgorithmsProving Greedy Correctness

Proving Greedy Algorithm Correctness

LevelIntermediate

Duration75 mins

TopicProving Greedy Correctness

2 / 4

Greedy Stays Ahead Argument

The Race That Never Falls Behind

What You Will Learn

The Core Idea: Measuring Progress

The Key Insight

If we can prove that the greedy algorithm's partial solution has a progress measure at least as good as any other algorithm's partial solution at the same step, then:

After step 1, greedy is ahead or tied
After step 2, greedy is still ahead or tied
...
After the final step, greedy is ahead or tied

Since both algorithms solve the same problem, 'ahead or tied at the end' means greedy's solution is optimal.

Choosing the Right Measure

Formal Structure

Let G be the greedy algorithm and let A be any arbitrary alternative algorithm that produces an optimal solution. Let:

g₁, g₂, ..., gₖ be the sequence of choices made by greedy
a₁, a₂, ..., aₘ be the sequence of choices made by alternative A

We prove a staying ahead lemma: For all i where both have made at least i choices, the greedy solution through step i has a progress measure at least as good as the alternative's.

Formally: For all i ≤ min(k, m): progress(g₁, ..., gᵢ) ≥ progress(a₁, ..., aᵢ)

Conclusion: If k ≥ m (greedy makes at least as many choices) and greedy stays ahead, greedy is optimal.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
THEOREM: Greedy algorithm G produces an optimal solution for problem P.
 
PROOF (Greedy Stays Ahead):
 
1. DEFINE PROGRESS MEASURE
   - Let M(S, i) = progress measure of solution S after i steps
   - [Choose M so that higher M means "better" progress toward goal]
 
2. STAYING AHEAD LEMMA
   Let G produce sequence g₁, g₂, ..., gₖ
   Let OPT be any optimal solution with sequence a₁, a₂, ..., aₘ
   
   Claim: For all i ≤ min(k, m): M(G, i) ≥ M(OPT, i)
   
   Proof by induction on i:
   
   Base case (i = 1):
     - G's first choice g₁ is [greedy criterion, e.g., smallest, earliest]
     - By definition of greedy, g₁ is at least as good as a₁
     - Therefore M(G, 1) ≥ M(OPT, 1)
   
   Inductive step:
     - Assume M(G, i-1) ≥ M(OPT, i-1)
     - Show M(G, i) ≥ M(OPT, i)
     - Key argument: [Problem-specific reasoning showing greedy maintains lead]
 
3. CONCLUSION
   By staying ahead lemma: M(G, k) ≥ M(OPT, m)
   Show that k ≥ m (greedy picks at least as many items as optimal)
   Therefore |G's solution| ≥ |OPT's solution|
   Since OPT is optimal, G must also be optimal.  ∎

Complete Example: Activity Selection Problem

The Activity Selection Problem is the canonical example for demonstrating the Greedy Stays Ahead technique. It's clean, intuitive, and the proof is particularly elegant.

Problem Statement

Given n activities, each with a start time sᵢ and finish time fᵢ, select the maximum number of non-overlapping activities.

Two activities are compatible (non-overlapping) if: fᵢ ≤ sⱼ or fⱼ ≤ sᵢ (one finishes before the other starts)

Goal: Maximize the number of selected activities

Greedy Strategy: Always select the activity that finishes earliest among remaining compatible activities.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def activity_selection_greedy(activities):
    """
    Select maximum number of non-overlapping activities.
    
    Args:
        activities: List of (start_time, finish_time, name) tuples
    
    Returns:
        List of selected activities (maximum non-overlapping set)
    
    Greedy criterion: Always pick the activity that finishes earliest.
    """
    # Sort by finish time (earliest first)
    sorted_activities = sorted(activities, key=lambda x: x[1])
    
    selected = []
    last_finish_time = 0
    
    for start, finish, name in sorted_activities:
        # If this activity starts after our last one finished, it's compatible
        if start >= last_finish_time:
            selected.append((start, finish, name))
            last_finish_time = finish
    
    return selected
 
# Example
activities = [
    (1, 4, "A"),   # 1-4
    (3, 5, "B"),   # 3-5  
    (0, 6, "C"),   # 0-6
    (5, 7, "D"),   # 5-7
    (3, 9, "E"),   # 3-9
    (5, 9, "F"),   # 5-9
    (6, 10, "G"),  # 6-10
    (8, 11, "H"),  # 8-11
    (8, 12, "I"),  # 8-12
    (2, 14, "J"),  # 2-14
    (12, 16, "K"), # 12-16
]
 
result = activity_selection_greedy(activities)
print("Selected activities:")
for start, finish, name in result:
    print(f"  {name}: [{start}, {finish})")
print(f"Total: {len(result)} activities")
 
# Output:
# Selected activities:
#   A: [1, 4)
#   D: [5, 7)
#   H: [8, 11)
#   K: [12, 16)
# Total: 4 activities

Visualizing the Greedy Selection

Let's visualize why selecting by earliest finish time works:

Time:  0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16
       |--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|
C:     ████████████████████          (0-6)  ← Long, blocks many
A:        ██████████                  (1-4)  ← Selected first ✓
B:           ████████                 (3-5)
J:        ████████████████████████████████████  (2-14)  ← Very long
D:                 ████████           (5-7)  ← Selected second ✓
E:           ████████████████████     (3-9)
F:                 ████████████████   (5-9)
G:                    ████████████████  (6-10)
H:                          ██████████  (8-11)  ← Selected third ✓
I:                          ████████████████  (8-12)
K:                                   ████████████  (12-16)  ← Selected fourth ✓

By always choosing the activity that finishes earliest, we leave the maximum possible time for future activities.

The Complete Proof: Activity Selection

Now we apply the Greedy Stays Ahead technique to rigorously prove that the earliest-finish-time greedy algorithm is optimal for Activity Selection.

Setup and Definitions

Let activities be sorted by finish time: f₁ ≤ f₂ ≤ ... ≤ fₙ

Let G = {g₁, g₂, ..., gₖ} be the greedy solution (activities in order selected)

Let O = {o₁, o₂, ..., oₘ} be any optimal solution (activities in finish time order)

We want to show: For all i, finish(gᵢ) ≤ finish(oᵢ)

This says: the iᵗʰ greedy activity finishes no later than the iᵗʰ optimal activity.

Why This Measure Works

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
THEOREM: The earliest-finish-time greedy algorithm produces an 
         optimal solution for Activity Selection.
 
PROOF:
 
STEP 1: DEFINITIONS
  Let activities be numbered 1 to n in order of finish time: f₁ ≤ f₂ ≤ ... ≤ fₙ
  Let G = {g₁, g₂, ..., gₖ} be greedy's solution (in selection order)
  Let O = {o₁, o₂, ..., oₘ} be an optimal solution (in finish time order)
  
STEP 2: STAYING AHEAD LEMMA
  Claim: For all i ≤ min(k, m): f(gᵢ) ≤ f(oᵢ)
         [Greedy's iᵗʰ activity finishes no later than optimal's iᵗʰ]
  
  Proof by induction on i:
  
  BASE CASE (i = 1):
    Greedy selects g₁, the activity with earliest finish time.
    By definition, f(g₁) ≤ f(a) for all activities a.
    In particular, f(g₁) ≤ f(o₁).  ✓
  
  INDUCTIVE STEP (i → i+1):
    Assume f(gᵢ) ≤ f(oᵢ) for some i < min(k, m).
    We need to show f(gᵢ₊₁) ≤ f(oᵢ₊₁).
    
    Since O is a valid solution, o_{i+1} is compatible with o₁, ..., oᵢ.
    This means: s(oᵢ₊₁) ≥ f(oᵢ)   [oᵢ₊₁ starts after oᵢ finishes]
    
    By inductive hypothesis: f(gᵢ) ≤ f(oᵢ)
    
    Therefore: s(oᵢ₊₁) ≥ f(oᵢ) ≥ f(gᵢ)
    
    This means oᵢ₊₁ is compatible with greedy's first i choices!
    In other words, oᵢ₊₁ is a candidate that greedy could select.
    
    But greedy selects gᵢ₊₁, the earliest-finishing compatible activity.
    Since oᵢ₊₁ is compatible and greedy chose gᵢ₊₁ over it:
    
    f(gᵢ₊₁) ≤ f(oᵢ₊₁)  ✓
    
    This completes the inductive step.
 
STEP 3: GREEDY SELECTS AT LEAST AS MANY ACTIVITIES
  Suppose for contradiction that k < m (greedy selects fewer than optimal).
  
  By the staying ahead lemma: f(gₖ) ≤ f(oₖ)
  
  Since m > k, there exists oₖ₊₁ in the optimal solution.
  Since O is valid: s(oₖ₊₁) ≥ f(oₖ) ≥ f(gₖ)
  
  This means oₖ₊₁ is compatible with g₁, ..., gₖ.
  But greedy continues selecting while compatible activities exist!
  
  Contradiction: Greedy would have selected at least one more activity.
  Therefore k ≥ m.
 
STEP 4: CONCLUSION
  We've shown:
  - Greedy selects k activities
  - Optimal selects m activities  
  - k ≥ m
  
  Since optimal is by definition maximum, we must have k = m.
  Therefore greedy's solution is also optimal.  ∎

Understanding the Proof's Power

Notice what this proof accomplishes:

We never enumerated alternative solutions — We argued against an arbitrary optimal solution O, which represents all possible optimal solutions.
The staying ahead property cascades — Because gᵢ finishes no later than oᵢ, any activity compatible with O's first i is also compatible with G's first i. This is the key insight.
Greedy can't terminate early — The same staying-ahead property ensures greedy has opportunities whenever optimal does, so it can't stop before matching optimal's count.

Why Other Greedy Criteria Fail

It's instructive to see why other seemingly reasonable greedy criteria fail for Activity Selection. This shows that the proof technique reveals deep truths about which criteria work.

Alternative Criterion 1: Earliest Start Time

Greedy by earliest start time: Always pick the activity that starts first.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Counterexample for greedy by earliest start time
 
activities = [
    (0, 10, "A"),   # Starts earliest but runs very long
    (1, 3, "B"),    # Short, starts at 1
    (4, 6, "C"),    # Short, starts at 4
    (7, 9, "D"),    # Short, starts at 7
]
 
def greedy_by_start(activities):
    sorted_by_start = sorted(activities, key=lambda x: x[0])
    selected = []
    last_finish = 0
    for start, finish, name in sorted_by_start:
        if start >= last_finish:
            selected.append((start, finish, name))
            last_finish = finish
    return selected
 
print("Greedy by earliest start:", greedy_by_start(activities))
# Output: [A] - only 1 activity!
# Because A (0-10) blocks all others
 
# Optimal: [B, C, D] - 3 activities!
# B(1-3), C(4-6), D(7-9) are all compatible

Why the proof fails for earliest start time:

The staying ahead lemma requires: after i selections, greedy leaves at least as much room as optimal.

Alternative Criterion 2: Shortest Duration

Greedy by shortest duration: Always pick the shortest remaining compatible activity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Counterexample for greedy by shortest duration
 
activities = [
    (0, 2, "A"),    # Duration 2
    (3, 5, "B"),    # Duration 2
    (1, 4, "C"),    # Duration 3, but bridges A and B!
]
 
def greedy_by_duration(activities):
    sorted_by_duration = sorted(activities, key=lambda x: x[1] - x[0])
    selected = []
    last_finish = 0
    
    for start, finish, name in sorted_by_duration:
        # Check if compatible with all selected
        compatible = True
        for s_start, s_finish, s_name in selected:
            if not (finish <= s_start or start >= s_finish):
                compatible = False
                break
        if compatible:
            selected.append((start, finish, name))
    return selected
 
print("Greedy by shortest duration:", greedy_by_duration(activities))
# Output: [(0, 2, 'A'), (3, 5, 'B')] - 2 activities ✓
# Wait, this works! Need a better counterexample...
 
# Better counterexample with more activities
activities2 = [
    (0, 4, "A"),    # Duration 4
    (3, 6, "B"),    # Duration 3 - SHORTEST!
    (5, 9, "C"),    # Duration 4
]
 
print("Greedy by shortest duration:", greedy_by_duration(activities2))
# Output: [(3, 6, 'B')] - 1 activity
# Optimal: [(0, 4, 'A'), (5, 9, 'C')] - 2 activities!

Short Doesn't Mean Optimal

Alternative Criterion 3: Fewest Conflicts

Greedy by fewest conflicts: Always pick the activity that overlaps with the fewest remaining activities.

This seems sophisticated—surely picking the 'least disruptive' activity should work?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Counterexample for greedy by fewest conflicts
 
activities = [
    (0, 4, "A"),     # Conflicts with B
    (3, 6, "B"),     # Conflicts with A, C
    (5, 9, "C"),     # Conflicts with B
    (0, 2, "D"),     # Conflicts with A only
    (7, 9, "E"),     # Conflicts with C only
]
 
def count_conflicts(activity, all_activities):
    count = 0
    s1, f1, _ = activity
    for s2, f2, _ in all_activities:
        if (s1, f1) != (s2, f2):  # Don't count self
            if not (f1 <= s2 or s1 >= f2):  # They overlap
                count += 1
    return count
 
# Count conflicts for each
for act in activities:
    print(f"{act[2]}: {count_conflicts(act, activities)} conflicts")
 
# D: 1 conflict (with A)
# E: 1 conflict (with C)  
# A: 1 conflict (with B)
# C: 1 conflict (with B)
# B: 2 conflicts (with A and C)
 
# Greedy by fewest conflicts: pick D first, then E
# D(0-2), E(7-9) = 2 activities
# 
# But optimal: A(0-4), C(5-9) = 2 activities (same count, but...)
#
# Actually this example doesn't show failure well.
# The key issue: fewest conflicts is expensive O(n²) per step
# and doesn't guarantee optimality in general graphs.

Why Different Criteria Fail
Criterion	Intuition	Why It Fails	Proof Obstacle
Earliest Start	Get going early	Long activities block others	Can't maintain finish time invariant
Shortest Duration	Minimize each commitment	Short activity might bridge optimal choices	Position matters, not just length
Fewest Conflicts	Minimize disruption	Complex interactions; counterexamples exist	Conflicts change dynamically as we select
Latest Start	Wait for information	Misses early activities that end early	Same problem as earliest start, reversed
Earliest Finish ✓	Leave room for more	WORKS!	Maintains staying-ahead invariant

General Pattern for Scheduling Proofs

Activity Selection is one instance of a broader pattern. Many scheduling problems yield to Greedy Stays Ahead proofs with similar structure. Here's the general pattern:

Scheduling Problem Template

Given:

A set of jobs/tasks/intervals with time parameters
Constraint: selected items must be compatible (various definitions)
Objective: maximize count, minimize completion time, etc.

Greedy Stays Ahead Structure

Order items by a time-based criterion (usually finish time, deadline, or duration)
Select greedily while maintaining compatibility
Prove that greedy's k-th selection has a time measure at least as good as any optimal solution's k-th selection
Conclude that greedy can select at least as many (or achieves as good a value)

Example: Interval Scheduling to Minimize Lateness

While Activity Selection maximizes count, consider a different objective:

Greedy Strategy: Schedule jobs in order of earliest deadline first (EDF).

Proof Sketch (Greedy Stays Ahead):

Progress measure: maximum lateness among jobs scheduled so far
Show: at each step, greedy's max lateness ≤ any alternative's max lateness
Key insight: if we swap two adjacent jobs where later deadline precedes earlier deadline, lateness can only decrease

This is another instance of Greedy Stays Ahead, though the proof uses exchange ideas as well.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def minimize_lateness_greedy(jobs):
    """
    Schedule jobs to minimize maximum lateness.
    
    Args:
        jobs: List of (processing_time, deadline, name) tuples
        
    Returns:
        (schedule, max_lateness) where schedule is ordered list of jobs
        
    Greedy: Process in order of earliest deadline.
    """
    # Sort by deadline (earliest first)
    sorted_jobs = sorted(jobs, key=lambda x: x[1])
    
    schedule = []
    current_time = 0
    max_lateness = 0
    
    for proc_time, deadline, name in sorted_jobs:
        start_time = current_time
        finish_time = current_time + proc_time
        lateness = max(0, finish_time - deadline)
        max_lateness = max(max_lateness, lateness)
        
        schedule.append({
            'name': name,
            'start': start_time,
            'finish': finish_time,
            'deadline': deadline,
            'lateness': lateness
        })
        current_time = finish_time
    
    return schedule, max_lateness
 
# Example
jobs = [
    (3, 6, "A"),   # Takes 3 units, due at time 6
    (2, 8, "B"),   # Takes 2 units, due at time 8
    (1, 9, "C"),   # Takes 1 unit, due at time 9
    (4, 9, "D"),   # Takes 4 units, due at time 9
    (3, 14, "E"), # Takes 3 units, due at time 14
    (2, 15, "F"), # Takes 2 units, due at time 15
]
 
schedule, max_lateness = minimize_lateness_greedy(jobs)
print("Schedule (EDF greedy):")
for job in schedule:
    print(f"  {job['name']}: run [{job['start']}, {job['finish']}), "
          f"deadline={job['deadline']}, lateness={job['lateness']}")
print(f"Maximum lateness: {max_lateness}")

Recognizing the Pattern

When Greedy Stays Ahead Applies

Not every greedy algorithm can be proven correct via Greedy Stays Ahead. This technique works best for problems with specific characteristics.

Ideal Problem Characteristics

Sequential selection structure: The algorithm makes choices one at a time in a clear order
Monotonic progress measure: There's a natural measure that only improves as selections are made
Comparability at each stage: Greedy's partial solution can be meaningfully compared to any alternative's partial solution at the same stage
The greedy criterion directly optimizes the progress measure: If greedy picks 'earliest finish', the measure should involve finish times
Independence of future choices: The 'goodness' of current partial solution doesn't depend on what future selections will be

Good Candidates for GSA

•Activity Selection — Finish time is perfect measure
•Interval scheduling variants — Time-based measures work well
•Tape storage — Minimize seek time, clear ordering
•Scheduling to minimize lateness — Deadline ordering
•Optimal caching — Evict farthest in future

Poor Candidates for GSA

•Huffman coding — Tree structure, exchange argument better
•MST (Kruskal/Prim) — Edge additions, cut property used
•Fractional knapsack — Ratio-based, exchange works
•Set cover — No clear progress measure
•Any problem where greedy fails — Obviously!

Not Mutually Exclusive

Common Pitfalls in Stays Ahead Proofs

Even when Greedy Stays Ahead is the right technique, proofs can go wrong. Here are common mistakes to avoid:

Proof Pitfalls to Avoid

•Wrong progress measure: Choosing a measure that doesn't align with optimality. Just because greedy 'stays ahead' in one measure doesn't prove optimality in another.
•Assuming alignment: Assuming greedy's iᵗʰ choice corresponds to optimal's iᵗʰ choice without justification. The selections might be different items!
•Ignoring feasibility: Proving greedy stays ahead but forgetting to verify that greedy's choices remain feasible (satisfy constraints).
•Incomplete induction: Proving base case and inductive step but missing edge cases like when k ≠ m (different solution sizes).
•Circular progress definition: Defining 'progress' in terms of solution quality, then claiming greedy makes progress because it's greedy.
•Forgetting to show termination: For some problems, you must also prove greedy eventually terminates with a complete solution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
INCORRECT PROOF ATTEMPT for Activity Selection:
 
Progress measure: number of activities selected so far
 
Claim: At step i, greedy has selected i activities, optimal has selected i activities.
       Therefore they're tied!
 
PROBLEM: This is trivially true and proves nothing!
         Both algorithms have made i selections at step i by definition.
         
The measure must capture QUALITY of selections, not just COUNT.
We need: "greedy's i selections leave more room" not just "greedy made i selections"
 
CORRECT measure: finish time of iᵗʰ selection
This captures quality because earlier finish = more room for future.

Summary: Greedy Stays Ahead

We've thoroughly explored the Greedy Stays Ahead proof technique. Let's consolidate the key insights:

Key Takeaways

•The core idea: Show greedy's partial solution is at least as good as any alternative's at every step. If greedy never falls behind, it must be optimal.
•Progress measure is critical: Choose a measure that (1) aligns with optimality, (2) is what greedy optimizes, and (3) allows comparison at each stage.
•Proof structure: Base case (first choice is optimal), inductive step (maintaining lead), conclusion (greedy matches or exceeds optimal).
•Activity Selection is canonical: Earliest finish time greedy is proven by showing each selection finishes no later than the corresponding optimal selection.
•Other criteria fail: Earliest start, shortest duration, and fewest conflicts all have counterexamples. Only earliest finish has the staying-ahead property.
•Best for scheduling problems: Greedy Stays Ahead excels when there's a natural time-based progress measure.
•Avoid common pitfalls: Wrong measures, assumed alignment, and circular reasoning can invalidate proofs.

What's next:

Page Complete

2 / 4