Loading learning content...
Before we can solve the Activity Selection Problem optimally, we must precisely define what we're optimizing. This isn't pedantic formalism—rigorous definitions prevent subtle errors, enable correctness proofs, and reveal the structure that makes greedy solutions possible.
In this page, we develop the formal machinery: definitions of activities, compatibility, and maximum subsets. This foundation makes subsequent algorithmic and proof discussions rigorous and unambiguous.
By the end of this page, you will understand the precise definitions underlying Activity Selection, multiple formulations of compatibility, the mathematical structure of maximum compatible subsets, and why 'maximum' here means number of activities, not total time or any other measure.
Let's establish the formal vocabulary for Activity Selection. These definitions are standard across textbooks and research literature.
Definition 1: Activity
An activity is a tuple (s, f) representing a task requiring exclusive use of a shared resource from time s (start) to time f (finish).
Formally, activity i is represented as: a_i = (s_i, f_i) where s_i ∈ ℝ, f_i ∈ ℝ, and s_i < f_i.
Definition 2: Activity Set
An activity set S = {a_1, a_2, ..., a_n} is a collection of n activities, each with associated start and finish times.
Definition 3: Compatibility
Two activities a_i = (s_i, f_i) and a_j = (s_j, f_j) are compatible if and only if their time intervals don't overlap:
f_i ≤ s_j OR f_j ≤ s_i
Equivalently, activities are compatible if one finishes before or exactly when the other starts.
The convention f_i ≤ s_j (rather than f_i < s_j) matters: if activity A ends at time 5 and activity B starts at time 5, they are compatible by our definition. This models scenarios where handoffs are instantaneous (like consecutive meeting slots). Some formulations use strict inequality—check problem statements carefully.
Definition 4: Compatible Set
A set of activities C ⊆ S is a compatible set (or independent set in scheduling terminology) if every pair of activities in C is mutually compatible:
∀ a_i, a_j ∈ C where i ≠ j: compatible(a_i, a_j) = true
Definition 5: Maximum Compatible Set
A maximum compatible set is a compatible set of maximum cardinality:
C* = argmax_{C ⊆ S, C is compatible} |C|
Note: There may be multiple maximum compatible sets of the same size. Our goal is to find any maximum compatible set, not all of them.
Compatibility is the core constraint in Activity Selection. Let's develop deep intuition by examining it from multiple perspectives.
Perspective 1: Interval Non-Overlap
Viewing each activity as a closed interval [s, f], compatibility means intervals don't share interior points:
Perspective 2: Timeline Feasibility
On a single timeline representing our shared resource, compatible activities can be "drawn" without overlapping:
Time: 0 1 2 3 4 5 6 7 8 9
|--|--|--|--|--|--|--|--|--|
A: [=====]
D: [=====]
H: [=====]
✓ Activities A, D, H are mutually compatible
Perspective 3: Graph Theoretic
We can construct an interval graph where:
A compatible set is an independent set in this graph (no two vertices connected by an edge). Activity Selection becomes: find the maximum independent set in an interval graph.
Maximum independent set is NP-hard for general graphs, but polynomial for interval graphs! The special structure of intervals—their one-dimensional, ordered nature—enables greedy solutions. Activity Selection's tractability is fundamentally tied to this geometric structure.
A critical aspect of Activity Selection is its objective: we maximize the count of activities, not some weighted value. This distinction is fundamental to why greedy works.
| Objective | Description | Greedy Works? | Algorithm |
|---|---|---|---|
| Maximize count | Most activities selected | ✅ Yes | Sort by finish time, greedy |
| Maximize total time | Sum of (f_i - s_i) for selected | ❌ No | Often select fewer, longer activities |
| Maximize value | Sum of values v_i for selected | ❌ No | Dynamic programming (weighted interval scheduling) |
| Minimize idle time | Total gaps between activities | ⚠️ Sometimes | Problem-dependent |
| Maximize resource utilization | Fraction of time covered | ❌ No | May prefer few long activities |
Why counting enables greedy:
When all activities are equally valued (count objective), there's no reason to prefer one activity over another except for compatibility implications. This means:
No regret from choosing any compatible activity — Every activity counts as "1" toward our objective.
Earliest finish maximizes future options — Finishing earlier can never reduce the number of future compatible activities.
Choices are independent — Selecting activity A instead of activity B (when both are current candidates) doesn't change the structure of remaining subproblems, only their timing.
When activities have different values (weighted variant), these properties break. Choosing a high-value activity that blocks many future activities might be suboptimal if those blocked activities have even higher combined value. This interdependence defeats greedy.
Weighted Activity Selection (also called Weighted Interval Scheduling) requires dynamic programming. A common interview mistake is applying the greedy approach to weighted problems. Always verify: is the objective pure count, or are activities valued differently?
There are several equivalent ways to represent Activity Selection input. Understanding these representations helps implement solutions efficiently and recognize the problem in different disguises.
Representation:
Two arrays of the same length:
start[i] = start time of activity ifinish[i] = finish time of activity iExample:
start = [1, 3, 0, 5, 3, 5, 6, 8, 8, 12]
finish = [4, 5, 6, 7, 9, 9, 10, 11, 12, 16]
Pros: Simple, cache-friendly for sorting, common in competitive programming.
Cons: Easy to accidentally desynchronize arrays during sorting.
Implementation recommendation:
For Activity Selection, the tuple/object representation is cleanest:
from typing import List, Tuple
def activity_selection(activities: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
# activities is list of (start, finish) tuples
# Returns list of selected activities
...
Sorting is straightforward (activities.sort(key=lambda x: x[1])) and the solution naturally returns selected activities.
Before appreciating why greedy is efficient, let's understand the scope of the problem: how many possible solutions exist?
Brute Force Analysis:
A brute force approach would:
For n activities:
For 20 activities: over 1 million subsets For 40 activities: over 1 trillion subsets For 100 activities: more subsets than atoms in the universe
| n (activities) | Subsets (2^n) | Brute Force Time | Greedy Time |
|---|---|---|---|
| 10 | 1,024 | ~10,000 ops | ~10 ops |
| 20 | ~1 million | ~20 million ops | ~20 ops |
| 30 | ~1 billion | ~30 billion ops | ~30 ops |
| 50 | ~10^15 | ~10^18 ops | ~50 ops |
| 100 | ~10^30 | Infeasible | ~100 ops (after sort) |
The greedy insight:
Greedy transforms an exponential problem into a linear one (after sorting). Instead of exploring all 2^n subsets, we make n decisions, each in O(1) time.
Total: O(n log n) for sorting + O(n) for selection = O(n log n)
This dramatic improvement—from exponential to near-linear—is what makes greedy algorithms valuable. But it only works when the greedy choice property holds.
Greedy algorithms implicitly prune the search space. By proving that locally optimal choices lead to globally optimal solutions, we eliminate the need to explore alternatives. Activity Selection's greedy choice prunes 2^n subsets down to examining just n activities in order.
Before developing the algorithm, let's establish properties that any optimal solution must satisfy. These properties guide our greedy design.
Key Observation: Earliest Finish Time Dominance
Consider all optimal solutions. Claim: there exists an optimal solution containing the activity with the earliest finish time.
Informal proof: Let O be any optimal solution. Let a_e be the activity with earliest finish time among all activities. If a_e ∈ O, we're done. If a_e ∉ O, let a_f be the first activity in O (the one finishing earliest among O's activities).
Since a_e has the earliest finish time overall, f_e ≤ f_f. Replace a_f with a_e in O:
This observation is the heart of the greedy choice property, which we'll prove rigorously in a later page.
The non-overlapping constraint can be modeled in multiple ways, each offering different insights.
Mathematical formulation:
We can express Activity Selection as an integer linear program:
Maximize: Σ x_i (total activities selected)
Subject to: x_i + x_j ≤ 1 for all incompatible pairs (i, j)
x_i ∈ {0, 1} for all i
Where x_i = 1 means activity i is selected.
The constraint x_i + x_j ≤ 1 ensures that at most one of any incompatible pair is selected.
While this ILP formulation is NP-hard in general, the special structure of interval graphs makes it solvable in polynomial time via greedy—a remarkable example of how problem structure enables efficient solutions.
With formal definitions in place, we can now characterize what the greedy solution must achieve:
Goal: Find a maximum compatible set C* ⊆ S.
Greedy Approach Preview:
The Key Question: What order guarantees optimality?
We've hinted at the answer: sort by finish time. But other orderings might seem reasonable:
The next page examines why "sort by finish time" is the correct—and only—greedy order that guarantees optimal results.
You now have the formal foundation for Activity Selection: precise definitions, multiple perspectives on compatibility, understanding of the objective, and properties of optimal solutions. This rigor enables the correctness proof we'll develop after examining the algorithm.
Key takeaways from this page:
What's next:
The next page develops the core algorithm: sort by end time, greedily select. We'll implement it carefully, analyze its complexity, and see it in action on examples.