Data Structures & AlgorithmsActivity Selection Problem

Activity Selection Problem

LevelIntermediate

Duration60 mins

TopicActivity Selection Problem

2 / 4

Maximize Non-Overlapping Activities

Formalizing the Objective

Before we can solve the Activity Selection Problem optimally, we must precisely define what we're optimizing. This isn't pedantic formalism—rigorous definitions prevent subtle errors, enable correctness proofs, and reveal the structure that makes greedy solutions possible.

In this page, we develop the formal machinery: definitions of activities, compatibility, and maximum subsets. This foundation makes subsequent algorithmic and proof discussions rigorous and unambiguous.

What You Will Learn

By the end of this page, you will understand the precise definitions underlying Activity Selection, multiple formulations of compatibility, the mathematical structure of maximum compatible subsets, and why 'maximum' here means number of activities, not total time or any other measure.

Formal Definitions

Let's establish the formal vocabulary for Activity Selection. These definitions are standard across textbooks and research literature.

Definition 1: Activity

An activity is a tuple (s, f) representing a task requiring exclusive use of a shared resource from time s (start) to time f (finish).

Formally, activity i is represented as: a_i = (s_i, f_i) where s_i ∈ ℝ, f_i ∈ ℝ, and s_i < f_i.

Definition 2: Activity Set

An activity set S = {a_1, a_2, ..., a_n} is a collection of n activities, each with associated start and finish times.

Definition 3: Compatibility

Two activities a_i = (s_i, f_i) and a_j = (s_j, f_j) are compatible if and only if their time intervals don't overlap:

f_i ≤ s_j  OR  f_j ≤ s_i

Equivalently, activities are compatible if one finishes before or exactly when the other starts.

The Boundary Convention

The convention f_i ≤ s_j (rather than f_i < s_j) matters: if activity A ends at time 5 and activity B starts at time 5, they are compatible by our definition. This models scenarios where handoffs are instantaneous (like consecutive meeting slots). Some formulations use strict inequality—check problem statements carefully.

Definition 4: Compatible Set

A set of activities C ⊆ S is a compatible set (or independent set in scheduling terminology) if every pair of activities in C is mutually compatible:

∀ a_i, a_j ∈ C where i ≠ j: compatible(a_i, a_j) = true

Definition 5: Maximum Compatible Set

A maximum compatible set is a compatible set of maximum cardinality:

C* = argmax_{C ⊆ S, C is compatible} |C|

Note: There may be multiple maximum compatible sets of the same size. Our goal is to find any maximum compatible set, not all of them.

Understanding Compatibility

Compatibility is the core constraint in Activity Selection. Let's develop deep intuition by examining it from multiple perspectives.

Perspective 1: Interval Non-Overlap

Viewing each activity as a closed interval [s, f], compatibility means intervals don't share interior points:

Compatible: [1, 3] and [3, 5] — they touch at boundary only
Compatible: [1, 3] and [5, 7] — they don't touch at all
Incompatible: [1, 5] and [3, 7] — they share the interval [3, 5]

Perspective 2: Timeline Feasibility

On a single timeline representing our shared resource, compatible activities can be "drawn" without overlapping:

Time:  0  1  2  3  4  5  6  7  8  9
       |--|--|--|--|--|--|--|--|--|
A:     [=====]           
D:                 [=====]     
H:                          [=====]

✓ Activities A, D, H are mutually compatible

Perspective 3: Graph Theoretic

We can construct an interval graph where:

Each activity is a vertex
Edges connect incompatible activities

A compatible set is an independent set in this graph (no two vertices connected by an edge). Activity Selection becomes: find the maximum independent set in an interval graph.

Why Interval Graphs Are Special

Maximum independent set is NP-hard for general graphs, but polynomial for interval graphs! The special structure of intervals—their one-dimensional, ordered nature—enables greedy solutions. Activity Selection's tractability is fundamentally tied to this geometric structure.

The Objective Function

A critical aspect of Activity Selection is its objective: we maximize the count of activities, not some weighted value. This distinction is fundamental to why greedy works.

Different Objectives Lead to Different Algorithms
Objective	Description	Greedy Works?	Algorithm
Maximize count	Most activities selected	✅ Yes	Sort by finish time, greedy
Maximize total time	Sum of (f_i - s_i) for selected	❌ No	Often select fewer, longer activities
Maximize value	Sum of values v_i for selected	❌ No	Dynamic programming (weighted interval scheduling)
Minimize idle time	Total gaps between activities	⚠️ Sometimes	Problem-dependent
Maximize resource utilization	Fraction of time covered	❌ No	May prefer few long activities

Why counting enables greedy:

When all activities are equally valued (count objective), there's no reason to prefer one activity over another except for compatibility implications. This means:

No regret from choosing any compatible activity — Every activity counts as "1" toward our objective.
Earliest finish maximizes future options — Finishing earlier can never reduce the number of future compatible activities.
Choices are independent — Selecting activity A instead of activity B (when both are current candidates) doesn't change the structure of remaining subproblems, only their timing.

When activities have different values (weighted variant), these properties break. Choosing a high-value activity that blocks many future activities might be suboptimal if those blocked activities have even higher combined value. This interdependence defeats greedy.

The Weighted Variant Is NOT Greedy

Weighted Activity Selection (also called Weighted Interval Scheduling) requires dynamic programming. A common interview mistake is applying the greedy approach to weighted problems. Always verify: is the objective pure count, or are activities valued differently?

Modeling the Input

There are several equivalent ways to represent Activity Selection input. Understanding these representations helps implement solutions efficiently and recognize the problem in different disguises.

Representation:

Two arrays of the same length:

start[i] = start time of activity i
finish[i] = finish time of activity i

Example:

start  = [1, 3, 0, 5, 3, 5, 6, 8,  8, 12]
finish = [4, 5, 6, 7, 9, 9, 10, 11, 12, 16]

Pros: Simple, cache-friendly for sorting, common in competitive programming.

Cons: Easy to accidentally desynchronize arrays during sorting.

Implementation recommendation:

For Activity Selection, the tuple/object representation is cleanest:

from typing import List, Tuple

def activity_selection(activities: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
    # activities is list of (start, finish) tuples
    # Returns list of selected activities
    ...

Sorting is straightforward (activities.sort(key=lambda x: x[1])) and the solution naturally returns selected activities.

The Search Space

Before appreciating why greedy is efficient, let's understand the scope of the problem: how many possible solutions exist?

Brute Force Analysis:

A brute force approach would:

Generate all possible subsets of activities (2^n subsets)
Check each subset for compatibility
Return the largest compatible subset

For n activities:

Total subsets: 2^n
Time complexity: O(2^n × n²) to generate and check all subsets

For 20 activities: over 1 million subsets For 40 activities: over 1 trillion subsets For 100 activities: more subsets than atoms in the universe

Search Space Growth
n (activities)	Subsets (2^n)	Brute Force Time	Greedy Time
10	1,024	~10,000 ops	~10 ops
20	~1 million	~20 million ops	~20 ops
30	~1 billion	~30 billion ops	~30 ops
50	~10^15	~10^18 ops	~50 ops
100	~10^30	Infeasible	~100 ops (after sort)

The greedy insight:

Greedy transforms an exponential problem into a linear one (after sorting). Instead of exploring all 2^n subsets, we make n decisions, each in O(1) time.

Total: O(n log n) for sorting + O(n) for selection = O(n log n)

This dramatic improvement—from exponential to near-linear—is what makes greedy algorithms valuable. But it only works when the greedy choice property holds.

Greedy as Intelligent Pruning

Greedy algorithms implicitly prune the search space. By proving that locally optimal choices lead to globally optimal solutions, we eliminate the need to explore alternatives. Activity Selection's greedy choice prunes 2^n subsets down to examining just n activities in order.

Properties of Optimal Solutions

Before developing the algorithm, let's establish properties that any optimal solution must satisfy. These properties guide our greedy design.

Properties of Maximum Compatible Sets

•Property 1: Maximality — An optimal solution cannot accommodate any additional activity. Adding any unselected activity would create an overlap.
•Property 2: No Wasted Gaps — While there may be gaps between selected activities, reducing gaps cannot increase the count (though it might allow different activities).
•Property 3: Timeline Coverage — Selected activities partition the timeline into covered segments and gaps, with no overlaps.
•Property 4: Order Independence — The set of selected activities matters, not the order of selection. (Selection order affects which optimal solution we find, not its size.)
•Property 5: Polynomial Certificate — Given a claimed solution, we can verify it's valid in O(n²) time by checking pairwise compatibility.

Key Observation: Earliest Finish Time Dominance

Consider all optimal solutions. Claim: there exists an optimal solution containing the activity with the earliest finish time.

Informal proof: Let O be any optimal solution. Let a_e be the activity with earliest finish time among all activities. If a_e ∈ O, we're done. If a_e ∉ O, let a_f be the first activity in O (the one finishing earliest among O's activities).

Since a_e has the earliest finish time overall, f_e ≤ f_f. Replace a_f with a_e in O:

a_e finishes no later than a_f
All activities in O that were compatible with a_f are also compatible with a_e
The replacement solution has the same size as O
Therefore, O' = (O - {a_f}) ∪ {a_e} is also optimal

This observation is the heart of the greedy choice property, which we'll prove rigorously in a later page.

Non-Overlapping as a Constraint

The non-overlapping constraint can be modeled in multiple ways, each offering different insights.

Constraint Perspectives

•Resource Exclusivity — At any time t, at most one activity is using the resource.
•Interval Disjointness — Selected intervals have empty intersection pairwise.
•Precedence Ordering — Selected activities can be totally ordered by time.
•Graph Independence — Selected activities form an independent set in the conflict graph.

Implications

•Enables event-based checking: at any time point, count active activities ≤ 1.
•Enables geometric reasoning about interval placement.
•Enables sorting-based algorithms (process in finish time order).
•Connects to maximum independent set on special graph classes.

Mathematical formulation:

We can express Activity Selection as an integer linear program:

Maximize:   Σ x_i        (total activities selected)

Subject to: x_i + x_j ≤ 1   for all incompatible pairs (i, j)
            x_i ∈ {0, 1}    for all i

Where x_i = 1 means activity i is selected.

The constraint x_i + x_j ≤ 1 ensures that at most one of any incompatible pair is selected.

While this ILP formulation is NP-hard in general, the special structure of interval graphs makes it solvable in polynomial time via greedy—a remarkable example of how problem structure enables efficient solutions.

Setting Up the Greedy Solution

With formal definitions in place, we can now characterize what the greedy solution must achieve:

Goal: Find a maximum compatible set C* ⊆ S.

Greedy Approach Preview:

Process activities in some strategic order
At each step, include an activity if it's compatible with already-selected activities
The strategic order should guarantee this process produces an optimal solution

The Key Question: What order guarantees optimality?

We've hinted at the answer: sort by finish time. But other orderings might seem reasonable:

Sort by start time?
Sort by duration?
Sort by number of conflicts?

The next page examines why "sort by finish time" is the correct—and only—greedy order that guarantees optimal results.

Foundation Complete

You now have the formal foundation for Activity Selection: precise definitions, multiple perspectives on compatibility, understanding of the objective, and properties of optimal solutions. This rigor enables the correctness proof we'll develop after examining the algorithm.

Key takeaways from this page:

Activities are interval-valued tasks requiring exclusive resource access
Compatibility means non-overlapping time intervals
The objective is maximizing count (not total time or value)
The search space is exponential (2^n), motivating efficient algorithms
Optimal solutions have an earliest-finish-time dominance property
Multiple perspectives (intervals, graphs, ILP) illuminate the problem structure

What's next:

The next page develops the core algorithm: sort by end time, greedily select. We'll implement it carefully, analyze its complexity, and see it in action on examples.

2 / 4

Loading learning content...

Data Structures & AlgorithmsActivity Selection Problem

Activity Selection Problem

LevelIntermediate

Duration60 mins

TopicActivity Selection Problem

2 / 4

Maximize Non-Overlapping Activities

Formalizing the Objective

What You Will Learn

Formal Definitions

Let's establish the formal vocabulary for Activity Selection. These definitions are standard across textbooks and research literature.

Definition 1: Activity

An activity is a tuple (s, f) representing a task requiring exclusive use of a shared resource from time s (start) to time f (finish).

Formally, activity i is represented as: a_i = (s_i, f_i) where s_i ∈ ℝ, f_i ∈ ℝ, and s_i < f_i.

Definition 2: Activity Set

An activity set S = {a_1, a_2, ..., a_n} is a collection of n activities, each with associated start and finish times.

Definition 3: Compatibility

Two activities a_i = (s_i, f_i) and a_j = (s_j, f_j) are compatible if and only if their time intervals don't overlap:

f_i ≤ s_j  OR  f_j ≤ s_i

Equivalently, activities are compatible if one finishes before or exactly when the other starts.

The Boundary Convention

Definition 4: Compatible Set

A set of activities C ⊆ S is a compatible set (or independent set in scheduling terminology) if every pair of activities in C is mutually compatible:

∀ a_i, a_j ∈ C where i ≠ j: compatible(a_i, a_j) = true

Definition 5: Maximum Compatible Set

A maximum compatible set is a compatible set of maximum cardinality:

C* = argmax_{C ⊆ S, C is compatible} |C|

Note: There may be multiple maximum compatible sets of the same size. Our goal is to find any maximum compatible set, not all of them.

Understanding Compatibility

Compatibility is the core constraint in Activity Selection. Let's develop deep intuition by examining it from multiple perspectives.

Perspective 1: Interval Non-Overlap

Viewing each activity as a closed interval [s, f], compatibility means intervals don't share interior points:

Compatible: [1, 3] and [3, 5] — they touch at boundary only
Compatible: [1, 3] and [5, 7] — they don't touch at all
Incompatible: [1, 5] and [3, 7] — they share the interval [3, 5]

Perspective 2: Timeline Feasibility

On a single timeline representing our shared resource, compatible activities can be "drawn" without overlapping:

Time:  0  1  2  3  4  5  6  7  8  9
       |--|--|--|--|--|--|--|--|--|
A:     [=====]           
D:                 [=====]     
H:                          [=====]

✓ Activities A, D, H are mutually compatible

Perspective 3: Graph Theoretic

We can construct an interval graph where:

Each activity is a vertex
Edges connect incompatible activities

A compatible set is an independent set in this graph (no two vertices connected by an edge). Activity Selection becomes: find the maximum independent set in an interval graph.

Why Interval Graphs Are Special

The Objective Function

A critical aspect of Activity Selection is its objective: we maximize the count of activities, not some weighted value. This distinction is fundamental to why greedy works.

Different Objectives Lead to Different Algorithms
Objective	Description	Greedy Works?	Algorithm
Maximize count	Most activities selected	✅ Yes	Sort by finish time, greedy
Maximize total time	Sum of (f_i - s_i) for selected	❌ No	Often select fewer, longer activities
Maximize value	Sum of values v_i for selected	❌ No	Dynamic programming (weighted interval scheduling)
Minimize idle time	Total gaps between activities	⚠️ Sometimes	Problem-dependent
Maximize resource utilization	Fraction of time covered	❌ No	May prefer few long activities

Why counting enables greedy:

When all activities are equally valued (count objective), there's no reason to prefer one activity over another except for compatibility implications. This means:

No regret from choosing any compatible activity — Every activity counts as "1" toward our objective.
Earliest finish maximizes future options — Finishing earlier can never reduce the number of future compatible activities.
Choices are independent — Selecting activity A instead of activity B (when both are current candidates) doesn't change the structure of remaining subproblems, only their timing.

The Weighted Variant Is NOT Greedy

Modeling the Input

There are several equivalent ways to represent Activity Selection input. Understanding these representations helps implement solutions efficiently and recognize the problem in different disguises.

Representation:

Two arrays of the same length:

start[i] = start time of activity i
finish[i] = finish time of activity i

Example:

start  = [1, 3, 0, 5, 3, 5, 6, 8,  8, 12]
finish = [4, 5, 6, 7, 9, 9, 10, 11, 12, 16]

Pros: Simple, cache-friendly for sorting, common in competitive programming.

Cons: Easy to accidentally desynchronize arrays during sorting.

Implementation recommendation:

For Activity Selection, the tuple/object representation is cleanest:

from typing import List, Tuple

def activity_selection(activities: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
    # activities is list of (start, finish) tuples
    # Returns list of selected activities
    ...

Sorting is straightforward (activities.sort(key=lambda x: x[1])) and the solution naturally returns selected activities.

The Search Space

Before appreciating why greedy is efficient, let's understand the scope of the problem: how many possible solutions exist?

Brute Force Analysis:

A brute force approach would:

Generate all possible subsets of activities (2^n subsets)
Check each subset for compatibility
Return the largest compatible subset

For n activities:

Total subsets: 2^n
Time complexity: O(2^n × n²) to generate and check all subsets

For 20 activities: over 1 million subsets For 40 activities: over 1 trillion subsets For 100 activities: more subsets than atoms in the universe

Search Space Growth
n (activities)	Subsets (2^n)	Brute Force Time	Greedy Time
10	1,024	~10,000 ops	~10 ops
20	~1 million	~20 million ops	~20 ops
30	~1 billion	~30 billion ops	~30 ops
50	~10^15	~10^18 ops	~50 ops
100	~10^30	Infeasible	~100 ops (after sort)

The greedy insight:

Greedy transforms an exponential problem into a linear one (after sorting). Instead of exploring all 2^n subsets, we make n decisions, each in O(1) time.

Total: O(n log n) for sorting + O(n) for selection = O(n log n)

This dramatic improvement—from exponential to near-linear—is what makes greedy algorithms valuable. But it only works when the greedy choice property holds.

Greedy as Intelligent Pruning

Properties of Optimal Solutions

Before developing the algorithm, let's establish properties that any optimal solution must satisfy. These properties guide our greedy design.

Properties of Maximum Compatible Sets

•Property 1: Maximality — An optimal solution cannot accommodate any additional activity. Adding any unselected activity would create an overlap.
•Property 2: No Wasted Gaps — While there may be gaps between selected activities, reducing gaps cannot increase the count (though it might allow different activities).
•Property 3: Timeline Coverage — Selected activities partition the timeline into covered segments and gaps, with no overlaps.
•Property 4: Order Independence — The set of selected activities matters, not the order of selection. (Selection order affects which optimal solution we find, not its size.)
•Property 5: Polynomial Certificate — Given a claimed solution, we can verify it's valid in O(n²) time by checking pairwise compatibility.

Key Observation: Earliest Finish Time Dominance

Consider all optimal solutions. Claim: there exists an optimal solution containing the activity with the earliest finish time.

Since a_e has the earliest finish time overall, f_e ≤ f_f. Replace a_f with a_e in O:

a_e finishes no later than a_f
All activities in O that were compatible with a_f are also compatible with a_e
The replacement solution has the same size as O
Therefore, O' = (O - {a_f}) ∪ {a_e} is also optimal

This observation is the heart of the greedy choice property, which we'll prove rigorously in a later page.

Non-Overlapping as a Constraint

The non-overlapping constraint can be modeled in multiple ways, each offering different insights.

Constraint Perspectives

•Resource Exclusivity — At any time t, at most one activity is using the resource.
•Interval Disjointness — Selected intervals have empty intersection pairwise.
•Precedence Ordering — Selected activities can be totally ordered by time.
•Graph Independence — Selected activities form an independent set in the conflict graph.

Implications

•Enables event-based checking: at any time point, count active activities ≤ 1.
•Enables geometric reasoning about interval placement.
•Enables sorting-based algorithms (process in finish time order).
•Connects to maximum independent set on special graph classes.

Mathematical formulation:

We can express Activity Selection as an integer linear program:

Maximize:   Σ x_i        (total activities selected)

Subject to: x_i + x_j ≤ 1   for all incompatible pairs (i, j)
            x_i ∈ {0, 1}    for all i

Where x_i = 1 means activity i is selected.

The constraint x_i + x_j ≤ 1 ensures that at most one of any incompatible pair is selected.

Setting Up the Greedy Solution

With formal definitions in place, we can now characterize what the greedy solution must achieve:

Goal: Find a maximum compatible set C* ⊆ S.

Greedy Approach Preview:

Process activities in some strategic order
At each step, include an activity if it's compatible with already-selected activities
The strategic order should guarantee this process produces an optimal solution

The Key Question: What order guarantees optimality?

We've hinted at the answer: sort by finish time. But other orderings might seem reasonable:

Sort by start time?
Sort by duration?
Sort by number of conflicts?

The next page examines why "sort by finish time" is the correct—and only—greedy order that guarantees optimal results.

Foundation Complete

Key takeaways from this page:

Activities are interval-valued tasks requiring exclusive resource access
Compatibility means non-overlapping time intervals
The objective is maximizing count (not total time or value)
The search space is exponential (2^n), motivating efficient algorithms
Optimal solutions have an earliest-finish-time dominance property
Multiple perspectives (intervals, graphs, ILP) illuminate the problem structure

What's next:

The next page develops the core algorithm: sort by end time, greedily select. We'll implement it carefully, analyze its complexity, and see it in action on examples.

2 / 4