Loading problem...
In modern MLOps infrastructure, machine learning workflows are composed of numerous interdependent stages—from raw data ingestion and preprocessing, through feature engineering and model training, to validation, deployment, and monitoring. These workflows are naturally modeled as a Directed Acyclic Graph (DAG), where nodes represent computational tasks and edges encode precedence constraints.
Understanding the structure of these task dependencies is critical for:
Given a collection of pipeline tasks where each task has a unique identifier, an execution duration, and a list of predecessor tasks that must complete before it can begin, implement a function analyze_ml_pipeline(tasks) that performs a complete critical path analysis of the workflow.
Your function must compute the following metrics:
A valid linear ordering of tasks such that for every dependency edge (A → B), task A appears before task B. When multiple tasks are simultaneously available (all dependencies satisfied), they should be ordered alphabetically by their task ID.
For each task, calculate:
Tasks with no dependencies have an earliest start of 0.
Working backwards from the total project duration:
Terminal tasks (those with no successors) have their latest finish equal to the makespan.
For each task:
Slack represents how much a task can be delayed without impacting the total completion time.
The sequence of tasks with zero slack that forms the longest path through the DAG. This path determines the minimum possible pipeline duration. If there are multiple critical paths, return the one formed by following zero-slack tasks in topological order.
The total time required to complete all tasks in the pipeline (equal to the maximum earliest finish time).
tasks = [
{"id": "start", "duration": 2, "dependencies": []},
{"id": "left", "duration": 5, "dependencies": ["start"]},
{"id": "right", "duration": 10, "dependencies": ["start"]},
{"id": "end", "duration": 3, "dependencies": ["left", "right"]}
]{
"execution_order": ["start", "left", "right", "end"],
"earliest_start": {"start": 0, "left": 2, "right": 2, "end": 12},
"earliest_finish": {"start": 2, "left": 7, "right": 12, "end": 15},
"latest_start": {"start": 0, "left": 7, "right": 2, "end": 12},
"latest_finish": {"start": 2, "left": 12, "right": 12, "end": 15},
"slack": {"start": 0, "left": 5, "right": 0, "end": 0},
"critical_path": ["start", "right", "end"],
"makespan": 15
}This pipeline forms a classic diamond pattern: the initial 'start' task branches into two parallel paths ('left' and 'right'), which converge at the final 'end' task.
Forward Pass (Earliest Times): • start: ES=0, EF=0+2=2 (no dependencies) • left: ES=2 (after start), EF=2+5=7 • right: ES=2 (after start), EF=2+10=12 • end: ES=max(7,12)=12 (waits for both branches), EF=12+3=15
Backward Pass (Latest Times) from makespan=15: • end: LF=15, LS=15-3=12 (terminal task) • right: LF=12 (before end), LS=12-10=2 • left: LF=12 (before end), LS=12-5=7 • start: LF=min(2,7)=2, LS=2-2=0
Slack Calculation: • start: 0-0=0 (critical) • left: 7-2=5 (can be delayed 5 units) • right: 2-2=0 (critical) • end: 12-12=0 (critical)
The critical path follows zero-slack tasks: start→right→end, determining the minimum duration of 15 time units.
tasks = [
{"id": "a", "duration": 3, "dependencies": []},
{"id": "b", "duration": 4, "dependencies": ["a"]},
{"id": "c", "duration": 2, "dependencies": ["b"]}
]{
"execution_order": ["a", "b", "c"],
"earliest_start": {"a": 0, "b": 3, "c": 7},
"earliest_finish": {"a": 3, "b": 7, "c": 9},
"latest_start": {"a": 0, "b": 3, "c": 7},
"latest_finish": {"a": 3, "b": 7, "c": 9},
"slack": {"a": 0, "b": 0, "c": 0},
"critical_path": ["a", "b", "c"],
"makespan": 9
}This is a simple linear chain where tasks execute sequentially with no parallelism.
Forward Pass: • a: ES=0, EF=3 • b: ES=3 (after a), EF=7 • c: ES=7 (after b), EF=9
Backward Pass: • c: LF=9, LS=7 • b: LF=7, LS=3 • a: LF=3, LS=0
Slack: All tasks have zero slack since the chain forms a single path.
The entire chain is critical—any delay to a, b, or c directly increases the total pipeline duration. The makespan is the sum of all durations: 3+4+2=9.
tasks = [
{"id": "task_1", "duration": 5, "dependencies": []},
{"id": "task_2", "duration": 3, "dependencies": []},
{"id": "task_3", "duration": 7, "dependencies": []}
]{
"execution_order": ["task_1", "task_2", "task_3"],
"earliest_start": {"task_1": 0, "task_2": 0, "task_3": 0},
"earliest_finish": {"task_1": 5, "task_2": 3, "task_3": 7},
"latest_start": {"task_1": 2, "task_2": 4, "task_3": 0},
"latest_finish": {"task_1": 7, "task_2": 7, "task_3": 7},
"slack": {"task_1": 2, "task_2": 4, "task_3": 0},
"critical_path": ["task_3"],
"makespan": 7
}These are three independent tasks with no dependencies between them, representing fully parallelizable work.
Forward Pass: All tasks can start immediately at time 0. • task_1: ES=0, EF=5 • task_2: ES=0, EF=3 • task_3: ES=0, EF=7
Backward Pass from makespan=7: • task_1: LF=7, LS=7-5=2 (slack=2) • task_2: LF=7, LS=7-3=4 (slack=4) • task_3: LF=7, LS=7-7=0 (slack=0, critical)
The makespan is 7, determined solely by the longest task (task_3). With unlimited parallelism, all tasks would complete by time 7. The critical path contains only task_3 since it's the bottleneck determining total duration.
Constraints