Loading learning content...
Every failed ML project shares a common origin: insufficient understanding of requirements. Not bad algorithms. Not inadequate data. Not poor engineering. The root cause is almost always a gap between what was built and what was actually needed.
This isn't hyperbole. Industry surveys consistently show that 85-90% of ML projects fail to make it into production, and the primary reason isn't technical—it's organizational. The model works on test data but doesn't solve the actual business problem. The system achieves impressive metrics but doesn't integrate with existing workflows. The solution is technically elegant but answers the wrong question.
Requirements gathering for ML systems is fundamentally different from traditional software development. In conventional software, you can often specify exact inputs, outputs, and behaviors. In ML, you're building systems that operate under uncertainty, whose behavior emerges from data, and whose success depends on probabilistic guarantees that stakeholders may not intuitively understand.
By the end of this page, you will understand how to systematically gather requirements for ML systems—translating ambiguous business objectives into precise technical specifications, aligning stakeholders with realistic expectations, defining success metrics that matter, and identifying constraints before they derail your project.
Effective ML requirements gathering operates across multiple dimensions simultaneously. Unlike traditional software where functional requirements dominate, ML systems require careful consideration of data, model behavior, operational constraints, and business objectives—all intertwined and mutually constraining.
The ML Requirements Framework organizes these concerns into a structured approach that ensures nothing critical is overlooked. Each dimension informs the others, creating a coherent picture of what success looks like and how to achieve it.
These dimensions must be addressed roughly in order. Jumping to operational constraints before defining the problem leads to solutions looking for problems. Discussing data before establishing success metrics leads to 'we have this data, what can we do with it?' thinking—which rarely produces valuable outcomes.
The most critical and most frequently botched aspect of ML requirements is problem definition. Stakeholders often come with solutions in mind ("we need a recommendation engine" or "let's use deep learning for this") rather than clearly articulated problems. Your first task is to work backward from the proposed solution to the underlying problem, then forward again to potentially different—and often simpler—solutions.
The Problem Definition Canvas captures the essential questions that must be answered before any ML work begins:
| Question | Why It Matters | Red Flags If Missing |
|---|---|---|
| What decision will change? | ML systems are decision-support tools. If no decision changes, no value is created. | Answers like 'we'll have better insights' or 'more data-driven culture' |
| Who makes this decision today? | Understanding current process reveals opportunities and constraints. | Nobody knows or 'it's handled manually somehow' |
| How is this decision made now? | Baseline for comparison. Often reveals simpler alternatives. | Vague answers or inability to describe current state |
| What's the cost of a wrong prediction? | Determines acceptable error rates and need for human-in-loop. | All predictions treated as equally important |
| What's the expected impact of improvement? | Quantifies ROI and sets realistic expectations. | Vague claims like 'significant improvement' without numbers |
| Is this actually an ML problem? | Many 'ML projects' are better solved with rules or simple heuristics. | Assumption that ML is always the answer |
The Translation Challenge
Business stakeholders speak in terms of outcomes: "We want to reduce customer churn" or "We need to optimize pricing." These high-level objectives must be translated into precise ML problem formulations:
Each translation involves choices that significantly impact the solution approach. Framing churn as binary classification is simpler than predicting days-until-churn (regression) or optimal intervention timing (reinforcement learning). The right framing depends on what decisions will actually be made with the predictions.
When stakeholders request an ML system, ask 'why' five times. 'We need a churn prediction model.' Why? 'To identify at-risk customers.' Why? 'So we can intervene before they leave.' Why? 'Because retention is cheaper than acquisition.' Why? 'Because our CAC is $200 but LTV is $2000.' Now you understand the economics that define success: if intervention costs $20 and saves 10% of addressed users, you need to target customers with >10% churn probability for ROI.
Problem Decomposition
Complex business problems often require decomposition into multiple ML tasks. Consider a content moderation system:
High-level need: Remove harmful content from the platform.
Decomposed ML tasks:
Each sub-problem has different requirements, evaluation criteria, and acceptable error rates. Treating this as a single classification task would miss the nuanced decision-making that effective moderation requires.
Defining success metrics for ML systems is deceptively complex. The challenge is bridging three distinct worlds: the business metrics that stakeholders care about, the model metrics that ML engineers optimize, and the system metrics that ensure reliable operation. These three types of metrics are interconnected but distinct, and confusion between them is a leading cause of ML project failure.
The Metrics Alignment Problem
The fundamental challenge is that improving model metrics doesn't guarantee improving business metrics. This happens for several reasons:
Offline-Online Gap: A model trained to minimize log loss on historical data might not perform well on live traffic where data distributions shift.
Proxy Metric Divergence: Click-through rate is easy to measure but may not correlate with user satisfaction or purchase intent.
Goodhart's Law: Once a metric becomes a target, it ceases to be a good metric. Optimizing for engagement can lead to addictive design patterns that hurt long-term retention.
Simpson's Paradox: Model improvements within segments can disappear or reverse when aggregated, especially if the segment distribution shifts.
Confounding Variables: Business metrics are affected by seasonality, marketing campaigns, product changes, and competitor actions—making causal attribution difficult.
| Business Metric | Model Metric | System Metric | Connection Challenge |
|---|---|---|---|
| Revenue per session | NDCG@10, Hit Rate@K | P50 latency < 100ms | Higher ranking quality should improve revenue, but depends on pricing and inventory |
| Conversion rate | Precision@recommendation | Availability > 99.9% | Good recommendations need to be shown consistently to impact conversion |
| Average order value | Cross-category exposure | Cold-boot time < 5s | Diverse recommendations may increase AOV but decrease immediate clicks |
| Customer retention | Long-term engagement score | Error rate < 0.1% | Short-term optimization can hurt long-term loyalty (e.g., recommending only deals) |
Be wary of vanity metrics that look impressive but don't connect to business value. '95% accuracy' is meaningless without context—95% accuracy on a 50/50 balanced dataset is barely better than random for many applications. Always ask: 'If this metric improves by X%, what changes in the real world? Can we quantify the dollar impact?'
Establishing Metric Baselines
Before any ML work begins, establish baselines for all relevant metrics:
Simple Baselines:
Current State Baseline:
Theoretical Ceiling:
The Improvement Target: With baselines established, define the minimum viable improvement. A 2% lift in AUC might be meaningless, or it might translate to $10M in annual revenue. The business translation determines whether the project is worth pursuing.
ML projects typically involve stakeholders with fundamentally different perspectives, incentives, and risk tolerances. Aligning these stakeholders is not a soft skill afterthought—it's a core technical requirement. Misalignment results in projects that are technically successful but organizationally rejected.
Understanding the different stakeholder archetypes and their concerns enables proactive alignment:
| Stakeholder | Primary Concern | Common Objections | Alignment Strategy |
|---|---|---|---|
| Business Sponsor | ROI, timeline, business impact | "When will we see results? What's the cost?" | Quantify expected impact, establish milestones, define success criteria upfront |
| End Users | Usability, trust, workflow integration | "I don't trust black boxes. This will automate my job." | Involve early, demonstrate explainability, position as augmentation not replacement |
| Data Teams | Data quality, access, governance | "The data isn't ready. Privacy constraints apply." | Data audit early, define minimum viable data, address compliance proactively |
| Engineering | Integration, maintenance, reliability | "How does this fit our stack? Who maintains it?" | Design for operations from day one, involve in architecture decisions |
| Legal/Compliance | Risk, liability, regulatory adherence | "What if the model is biased? Who's liable?" | Document decision processes, plan for audits, address fairness explicitly |
| Executive Leadership | Strategic alignment, competitive position | "Why this project over others? What's the risk?" | Connect to strategic goals, present honest risk assessment, define abort criteria |
Managing Expectations with the ML Uncertainty Principle
Unlike traditional software where you can often guarantee specific behaviors, ML systems have inherent uncertainty. Stakeholders accustomed to deterministic software need to understand this fundamental difference:
What ML can provide:
What ML cannot provide:
This isn't a limitation to hide—it's a fundamental characteristic to communicate early and often. Stakeholders who expect 100% accuracy will be disappointed; stakeholders who understand they're getting probabilistic decision support will be satisfied.
Abstract discussions about precision and recall rarely resonate. Instead, prepare concrete examples: 'With 90% precision, for every 100 alerts your team investigates, 90 will be real issues and 10 will be false alarms. Is that acceptable workload?' This makes abstract metrics tangible and enables informed trade-off discussions.
The Requirements Sign-Off Process
Formalize stakeholder alignment with a requirements sign-off that includes:
This document serves as a reference point throughout the project, preventing scope creep and misunderstandings.
Data is the fuel for ML systems, and data requirements gathering is often where projects discover they can't proceed. Better to discover data gaps early—before investing in model development—than to build a sophisticated model that can't be trained or deployed due to data limitations.
The Data Requirements Assessment covers seven critical dimensions:
The Data Audit Checklist
For each data source identified as relevant, conduct a systematic audit:
| Dimension | Questions to Answer | Documentation Required |
|---|---|---|
| Source Identity | What systems generate this data? Who owns it? | Data lineage, ownership documentation |
| Access Method | API, database query, file transfer? | Access procedures, credentials management |
| Update Frequency | Real-time, hourly, daily, weekly? | SLAs, freshness guarantees |
| Historical Depth | How far back does data exist? | Retention policies, archival status |
| Schema Stability | Does the schema change? How often? | Schema versioning, change notification |
| Quality Issues | Known problems, biases, gaps? | Data quality reports, known limitations |
| Legal Status | Consent basis, permitted uses, restrictions? | Legal review, DPA agreements |
Supervised learning requires labels, and labels are expensive. A project needing 100,000 labeled examples at $0.50/label has a $50,000 data cost before any model development. Include labeling costs, timelines, and quality assurance in your requirements. Consider: Can you use active learning to reduce labeling needs? Can you leverage weak supervision? Is unsupervised or self-supervised learning viable?
Feature Availability Analysis
Beyond raw data, assess whether the features needed for prediction will be available at inference time:
Training-Serving Skew Risks:
Example: A churn prediction model uses 'last 30 days of activity' as a feature. During training, this is computed accurately from historical data. In production, for real-time serving, you need streaming infrastructure to maintain rolling 30-day aggregations—fundamentally different architecture.
The Feature Feasibility Matrix: For each candidate feature, assess:
Every ML system operates within constraints—computational resources, latency requirements, budget limitations, regulatory mandates, organizational capabilities. Identifying these constraints early prevents building systems that can't be deployed or sustained.
The Constraint Categories:
The Iron Triangle of ML
ML systems face fundamental trade-offs that cannot be escaped—only navigated. Understanding these trade-offs enables explicit prioritization:
Accuracy vs. Latency: More complex models are often more accurate but slower. A 100-layer transformer beats a logistic regression on most tasks—but not if you need sub-millisecond response times.
Precision vs. Recall: You cannot maximize both. A spam filter with high precision rarely catches legitimate email—but also catches less spam. High recall catches all spam—but also many legitimate messages.
Freshness vs. Cost: Real-time model updates enable rapid adaptation but require expensive streaming infrastructure. Daily retraining is cheaper but responds slowly to distribution shifts.
Generalization vs. Personalization: Global models are simpler to train and deploy. Per-user models capture individual preferences but create cold-start problems and scalability challenges.
Interpretability vs. Performance: Linear models are easy to explain but limited in what they can learn. Deep networks capture complex patterns but act as black boxes.
Force stakeholders to rank constraints. Give them 100 points to distribute across: accuracy, latency, explainability, cost, time-to-market. This exercise reveals true priorities and prevents the 'everything is critical' trap that leads to impossible requirements.
A comprehensive ML requirements document synthesizes all gathered information into a single reference that guides development and enables accountability. This document should be living—updated as understanding evolves—but versioned to track how requirements changed over time.
Standard ML Requirements Document Structure:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
# ML System Requirements Document ## 1. Executive Summary- One-paragraph description of system purpose- Expected business impact (quantified)- Key success metrics- High-level timeline ## 2. Problem Definition### 2.1 Business Context- Current state and pain points- Why ML is the right approach- What alternatives were considered ### 2.2 ML Problem Formulation- Task type (classification, regression, ranking, etc.)- Input specification- Output specification- Decision that will be made with predictions ## 3. Success Criteria### 3.1 Business Metrics- Primary metric with target value- Secondary metrics with targets- Measurement methodology ### 3.2 Model Metrics- Offline evaluation metrics- Online evaluation metrics - Baseline performance to beat ### 3.3 System Metrics- Latency requirements (P50, P95, P99)- Throughput requirements- Availability requirements ## 4. Data Requirements### 4.1 Training Data- Data sources with access method- Volume and time range- Labeling strategy and cost ### 4.2 Serving Data- Real-time data requirements- Feature computation strategy- Freshness requirements ### 4.3 Data Governance- Privacy compliance (GDPR, CCPA, etc.)- Retention policies- Access controls ## 5. Constraints### 5.1 Technical Constraints- Infrastructure limitations- Integration requirements- Performance budgets ### 5.2 Business Constraints- Budget (development + operational)- Timeline with milestones- Team capabilities ### 5.3 Regulatory Constraints- Explainability requirements- Fairness requirements- Audit requirements ## 6. Risks and Mitigations| Risk | Likelihood | Impact | Mitigation ||------|------------|--------|------------|| ... | ... | ... | ... | ## 7. Stakeholder Sign-off| Stakeholder | Role | Date | Signature ||-------------|------|------|-----------|| ... | ... | ... | ... | ## Appendix: Glossary- Domain-specific terms definedThe value of this document isn't just in creating it—it's in sharing it. All stakeholders should have access, and the document should be referenced in design reviews, stand-ups, and retrospectives. When trade-offs are debated, point to the documented priorities. When scope creeps, reference the agreed constraints.
Experience reveals recurring patterns of requirements failure. Recognizing these anti-patterns helps you avoid them in your own projects:
The most dangerous assumption in ML requirements is that offline model performance predicts online system value. Many metrics that look great in development provide no business value in production—or even negative value. Build feedback loops from production outcomes back to requirements early.
Requirements gathering for ML systems is a discipline unto itself—distinct from traditional software requirements and deserving of serious investment. The time spent here pays dividends throughout the project lifecycle.
What's next:
With requirements gathered and documented, the next step is designing the data infrastructure that will feed your ML system. The next page explores Data Pipeline Design—how to build the data infrastructure that transforms raw data into training sets and features, enabling reliable model development and production serving.
You now understand how to systematically gather requirements for ML systems. This foundation—clear problem definition, aligned stakeholders, defined metrics, audited data, and documented constraints—enables everything that follows. Next, we design the data pipelines that make ML systems possible.