Loading content...
ML development is an optimization problem conducted under resource constraints. You have limited time, compute budget, and team capacity—but an enormous space of possible experiments. Iteration strategy is the discipline of deciding what to try next.
Naive approaches—random exploration, following intuition, or exhaustive grid search—waste resources. Expert ML practitioners develop systematic iteration strategies that:
This page provides frameworks for intelligent iteration that accelerate model development.
By completing this page, you will be able to: (1) Prioritize experiments based on expected value and information gain, (2) Navigate the exploration-exploitation tradeoff, (3) Apply systematic debugging strategies when models underperform, (4) Allocate resources efficiently across multiple improvement dimensions, and (5) Know when to pivot, persist, or stop.
Every experiment has an expected value—the product of its potential impact and its probability of success. Systematic prioritization ranks experiments by this expected value.
The Prioritization Matrix:
Evaluate each potential experiment on two dimensions:
| Low Probability | High Probability | |
|---|---|---|
| High Impact | Moonshots: Try selectively | Quick wins: Prioritize heavily |
| Low Impact | Avoid entirely | Incremental: Fill gaps |
| Category | Examples | Strategy |
|---|---|---|
| Quick Wins | Bug fixes, feature additions with known value | Do immediately |
| Incremental | Hyperparameter tuning, minor architecture changes | Systematic search |
| Moonshots | New architectures, novel features | Allocate 20% of budget |
| Avoid | Low-impact, low-probability changes | Document and skip |
Information Value Prioritization:
Some experiments are valuable not for their direct improvement but for the information they provide. Consider:
These experiments may not improve the model directly but guide subsequent work more effectively.
80% of your improvement typically comes from 20% of your experiments. Prioritize ruthlessly. Before running an experiment, ask: "If this works, does it matter?" If the answer is no, skip it regardless of how intellectually interesting it seems.
The exploration-exploitation tradeoff is fundamental to ML iteration:
Phase-Dependent Balance:
The optimal balance shifts throughout a project:
| Phase | Exploration | Exploitation | Rationale |
|---|---|---|---|
| Early | 80% | 20% | Find promising directions |
| Middle | 50% | 50% | Balance breadth and depth |
| Late | 20% | 80% | Polish best approach |
| Deadline | 0% | 100% | Ship what works |
Portfolio Approach:
Maintain a portfolio of experiments across risk levels:
This ensures steady progress while maintaining upside potential.
Pure exploitation leads to local optima—you perfect an approach that's fundamentally limited. Reserve exploration budget even when exploitation is working. Many major ML improvements came from teams willing to try something completely different when incremental gains stalled.
When models underperform, random experimentation wastes resources. Systematic debugging identifies root causes and guides targeted fixes.
The Debugging Hierarchy:
Diagnose problems in order of fundamentality:
Diagnostic Experiments:
| Symptom | Diagnostic Experiment | What It Reveals |
|---|---|---|
| High training error | Overfit a small batch | Model capacity sufficient? |
| High train-test gap | Learning curves by data size | More data needed? |
| Feature importance all low | Train with known-good labels | Features have signal? |
| Performance ceiling | Train with oracle features | What's achievable? |
| Inconsistent results | Multiple random seeds | Variance vs signal? |
| Specific failure modes | Error analysis by segment | Where does model fail? |
1234567891011121314151617181920212223242526272829303132333435
import numpy as npfrom sklearn.model_selection import learning_curve def diagnose_with_learning_curves(model, X, y): """ Generate learning curves to diagnose data vs model issues. Interpretation: - Both curves high + converging = need more data - Train low, gap small = underfitting (need complexity) - Train low, gap large = overfitting (regularize/simplify) """ train_sizes, train_scores, val_scores = learning_curve( model, X, y, train_sizes=np.linspace(0.1, 1.0, 10), cv=5, scoring='accuracy', n_jobs=-1 ) train_mean = train_scores.mean(axis=1) val_mean = val_scores.mean(axis=1) gap = train_mean - val_mean # Diagnosis if train_mean[-1] < 0.7: print("DIAGNOSIS: Underfitting - increase model capacity") elif gap[-1] > 0.15: print("DIAGNOSIS: Overfitting - regularize or get more data") elif val_mean[-1] < val_mean[-3]: print("DIAGNOSIS: More data could help - curves still climbing") else: print("DIAGNOSIS: Model is well-fit for available data") return train_sizes, train_mean, val_meanBefore any debugging experiment, manually examine 50-100 errors. This often reveals immediately actionable issues: data bugs, labeling errors, or systematic failure modes that suggest specific fixes. An hour of error analysis can save days of random experimentation.
ML iteration consumes three scarce resources: time, compute, and attention. Effective allocation maximizes learning per unit of resource spent.
Time Allocation:
Typical ML project time breakdown:
| Activity | Allocation | Notes |
|---|---|---|
| Data preparation | 30-40% | Often underestimated |
| Feature engineering | 20-30% | High leverage activity |
| Model development | 20-30% | Actual ML work |
| Evaluation & debugging | 10-20% | Critical for quality |
Compute Allocation:
Not all experiments need full compute. Use tiered allocation:
Compute can be purchased; time passes regardless. But attention—the ability to think deeply about experiments—is truly limited. Batch experiments to free attention for analysis. Automate routine runs. Reserve cognitive resources for interpretation and strategy, not babysitting training jobs.
One of the hardest decisions in ML is knowing when to change direction. Sunk cost fallacy and optimism bias often lead teams to persist on doomed approaches. Objective decision criteria prevent waste.
Decision Framework:
| Signal | Persist | Pivot | Stop |
|---|---|---|---|
| Improvement rate | Steady gains | Plateaued despite exploration | No improvement for 20+ experiments |
| Gap to goal | On track | Large gap, time remaining | Gap too large for timeline |
| Diagnostic findings | Clear improvement path | Fundamental blockers identified | Problem proven infeasible |
| Resource status | Sufficient runway | Resources available for new direction | Resources exhausted |
| Stakeholder alignment | Support continues | Appetite for different approach | Interest withdrawn |
Time already spent is irrelevant to future decisions. The question is: "Given what we know now, is this the best use of remaining resources?" Teams that can objectively kill failing projects and reallocate resources outperform those that persist out of emotional attachment.
You now understand iteration strategies that accelerate ML development. Next, we'll explore launch criteria—the rigorous standards that determine when a model is ready for production deployment and how to ensure safe, successful launches.