Clarifying Requirements - Learning Module

Loading content...

0/273

Essential Questions to Ask

The First Five Minutes That Define Everything

The moment an interviewer says "Design Twitter" or "Design a URL shortener," most candidates make a critical mistake: they immediately reach for their mental architecture diagrams and start sketching boxes. Load balancers here, databases there, cache layers everywhere. They're solving the problem before they understand it.

The best candidates do something counterintuitive: they slow down.

They spend the first 5-10 minutes of a 45-60 minute interview asking questions—often more questions than the interviewer expected. And remarkably, this apparent "delay" dramatically increases their success rate. Why? Because system design interviews aren't tests of whether you can remember distributed systems components. They're evaluations of how you think through ambiguity.

The questions you ask—and how you ask them—reveal more about your engineering judgment than any architecture diagram you'll draw.

What You Will Learn

By the end of this page, you will master the art of asking the right questions at the right time. You'll understand the taxonomy of system design questions, learn frameworks for comprehensive requirements gathering, and develop the instinct for which questions matter most in different problem domains.

Why Questions Matter More Than Answers

In production engineering, ambiguous requirements are the root cause of most project failures. A system designed for the wrong constraints—no matter how elegantly architected—is a failure. Interviewers know this. They deliberately leave prompts vague to see if candidates recognize the need for clarification.

Consider what happens when a candidate doesn't clarify:

"Design a chat application."

Without questions, a candidate might design for 1,000 concurrent users when the interviewer expected 10 million. They might assume text-only when the interviewer wanted file sharing. They might design for eventual consistency when strict ordering was essential. Every assumption is a fork in the road—and wrong forks lead to wrong destinations.

What Interviewers Evaluate During Requirements Clarification

•Intellectual Humility — Do you recognize what you don't know? Can you admit uncertainty?
•Structured Thinking — Do you approach requirements systematically or randomly?
•Domain Intuition — Do your questions reveal understanding of the problem space?
•Communication Skills — Can you engage in productive dialogue with stakeholders?
•Prioritization Judgment — Do you distinguish essential requirements from nice-to-haves?
•Collaborative Instinct — Do you treat the interviewer as a partner or proceed in isolation?

The Meta-Signal

When you ask thoughtful clarifying questions, you're demonstrating the exact behavior that makes senior engineers valuable: the ability to navigate ambiguity before committing to solutions. This is the behavior that prevents expensive mistakes in production. Interviewers are watching for this signal explicitly.

The Competence vs. Confidence Trap:

Junior engineers often equate confidence with competence. They feel that asking questions reveals uncertainty—weakness. The opposite is true. In complex systems work, the engineer who charges ahead without exploring constraints is the dangerous one. Experienced interviewers have seen projects fail because no one asked "What happens when the database is down?" or "What's the expected latency budget?"

Asking questions isn't a sign of uncertainty about your skills—it's a sign of certainty about how real engineering works.

The Question Taxonomy: Categories of Essential Questions

Not all questions are created equal. Random questions feel scattered and waste time. Strategic questions follow a taxonomy—a structured classification that ensures comprehensive coverage. Master this taxonomy, and you'll never miss a critical requirement.

The taxonomy has six primary categories, each revealing different dimensions of the problem:

The Six Categories of System Design Questions
Category	Purpose	Example Questions
Functional Scope	Define what the system does	What are the core features? What can we defer? Who are the primary users?
Scale & Traffic	Define how much the system handles	How many users? Requests per second? Data volume? Growth rate?
Latency & Performance	Define how fast the system responds	What's the acceptable latency? Read-heavy or write-heavy? Real-time requirements?
Availability & Reliability	Define how resilient the system is	What's the uptime requirement? What's the impact of downtime? Any SLAs?
Consistency & Correctness	Define how accurate the data is	Is eventual consistency acceptable? Can we lose any data? What happens during network partitions?
Constraints & Context	Define what limits the system operates within	Budget? Existing infrastructure? Geographic distribution? Regulatory requirements?

Let's explore each category with the depth required to use it effectively.

The SCALE Framework Memory Aid

Some candidates use the mnemonic SCALE: Scope, Capacity, Availability, Latency, Environment. While mnemonics can help, the goal is internalization—not memorization. With practice, the taxonomy becomes instinct.

Category 1: Functional Scope Questions

Functional scope questions establish the boundaries of what the system does. They're the most fundamental category because everything else—scale, latency, availability—depends on understanding the feature set.

The Danger of Scope Creep:

System design prompts are intentionally broad. "Design Instagram" could mean photo upload, stories, reels, direct messages, explore pages, ads, and more. Without explicit scoping, you'll either:

Waste time designing features the interviewer doesn't care about
Miss features the interviewer considers essential
Spread too thin across too many features, demonstrating depth in none

Strategic scoping isn't limiting—it's focusing.

Essential Functional Scope Questions

•"What are the primary use cases we need to support?" — Forces the interviewer to prioritize. Listen for signals about what they emphasize.
•"Who are the main users of this system?" — Different user types often require different features and experiences.
•"Should we focus on any specific feature in depth?" — Directly addresses prioritization; shows you understand depth vs. breadth tradeoffs.
•"Are there any features we should explicitly exclude for now?" — Equally important: knowing what not to design. Prevents wasted effort.
•"What does the user journey look like?" — Reveals sequences of operations and helps identify critical paths.
•"Are there any administrative or operational features to consider?" — Often overlooked: moderation, analytics, monitoring.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
**Prompt:** "Design WhatsApp"
 
**Before Scoping (Too Broad):**
- 1:1 messaging
- Group messaging  
- Voice calls
- Video calls
- Status/Stories
- End-to-end encryption
- File sharing
- Location sharing
- Payments
- Business accounts
- Web/Desktop clients
→ Impossible to cover in 45 minutes
 
**After Scoping Questions:**
Candidate: "What are the core features we should focus on?"
Interviewer: "Let's focus on 1:1 and group messaging, with message 
             delivery guarantees."
 
**After Scoping (Focused):**
- 1:1 text messaging
- Group text messaging (up to 256 members)
- Message delivery states (sent, delivered, read)
- Offline message handling
→ Deep, achievable design in 45 minutes

Don't Assume—Ask

Even when a feature seems "obvious," confirm it. Designing a Twitter clone? Ask if we need retweets. Building a URL shortener? Ask if we need analytics. Assumptions feel natural because they're often correct—but wrong assumptions are silent killers.

Category 2: Scale & Traffic Questions

Scale questions establish the magnitude of the system's demands. The architecture for 1,000 users differs fundamentally from the architecture for 100 million users. Without understanding scale, you might under-engineer (building something that collapses at moderate load) or over-engineer (designing for billions when thousands suffice).

Scale Isn't Just "Big Numbers":

Scale manifests in multiple dimensions:

User scale: How many total users? How many daily active users (DAU)?
Request scale: How many operations per second? Read:write ratio?
Data scale: How much data stored? How fast does it grow?
Concurrent scale: How many users simultaneously active?
Geographic scale: Single region? Multi-region? Global?

Essential Scale & Traffic Questions

•"How many users should we design for?" — Get both current and projected numbers. Ask about MAU (monthly active users) and DAU (daily active users).
•"What's the expected requests per second (RPS)?" — Essential for capacity planning. Often derived from DAU × actions per user ÷ seconds in a day.
•"What's the read-to-write ratio?" — Drastically affects architecture. A 100:1 read-heavy system differs from a 1:1 balanced system.
•"How much data are we storing, and how fast does it grow?" — Determines storage strategy, sharding needs, and retention policies.
•"Are there traffic spikes? What patterns?" — Burst traffic (sports events, marketing campaigns) requires different planning than steady traffic.
•"What's the geographic distribution of users?" — Single-region simplifies design; global distribution introduces latency and replication challenges.

Scale Ranges and Their Architectural Implications
Metric	Small Scale	Medium Scale	Large Scale	Massive Scale
Users (DAU)	<10K	10K - 1M	1M - 100M	100M+
Typical Infra	Single server	Load balancer + app tier	Distributed, sharded	Global, multi-region
Database	Single instance	Replicas, read scaling	Sharded, partitioned	Globally distributed
Cache	Optional	Recommended	Essential	Multi-tier, distributed
QPS (Query/Sec)	<100	100 - 10K	10K - 1M	1M+

The 80/20 Rule for Scale

In interviews, scale numbers often define the "80%" case. Design for that case first, then discuss how the architecture would scale further. Don't over-engineer for scale you weren't asked to handle—it wastes time and signals poor prioritization.

Deriving Scale When Interviewers Are Vague:

Sometimes interviewers respond with "You tell me." This is a test: can you make reasonable estimates?

Example for a social media feed:

Assume 100M DAU (reasonable for a major platform)
Each user views feed 10 times/day: 1B feed requests/day
Spread over 24 hours (but concentrated in 8 peak hours): ~35K feed requests/second
Each feed load fetches 50 posts: 1.75M post reads/second

This derivation shows analytical thinking—exactly what interviewers want to see.

Category 3: Latency & Performance Questions

Latency questions establish the speed expectations for the system. A 200ms response might be acceptable for batch processing but unacceptable for real-time gaming. Without understanding latency constraints, you might design systems that are functionally correct but practically unusable.

The Perception of Speed:

Human perception of latency follows well-known thresholds:

< 100ms: Feels instantaneous
100-300ms: Noticeable but acceptable
300ms-1s: Feels sluggish, user patience tested
> 1s: Users assume something is wrong

These thresholds should inform your clarifying questions.

Essential Latency & Performance Questions

•"What's the acceptable latency for the main operations?" — Get specific numbers. p50, p95, p99 if relevant.
•"Are there any real-time requirements?" — Real-time (chat, gaming) versus near-real-time (notifications) versus batch (analytics) have very different architectures.
•"Is this read-heavy or write-heavy?" — Read-heavy systems optimize differently (caching, read replicas) than write-heavy systems (write-behind, eventual consistency).
•"What's the acceptable lag for data propagation?" — How quickly must a posted tweet appear in followers' feeds? Seconds? Minutes?
•"Are there any offline or degraded-mode scenarios?" — Mobile apps often need offline support; the caching strategy differs.
•"What's more important: throughput or latency?" — Sometimes you can batch requests for throughput at the cost of individual request latency.

Low-Latency Requirements

•< 100ms response time expected
•Edge caching critical
•Aggressive read replicas
•Pre-computation & materialized views
•Connection pooling essential
•Minimize network hops

Throughput-Focused Requirements

•Bulk operations preferred
•Batching acceptable
•Async processing welcomed
•Write-behind caching viable
•Queue-based architectures
•Trade latency for scale

Latency Budgets

If the interviewer says 'p99 latency must be under 200ms,' you now have a budget. Work backwards: network round trip (50ms) + database lookup (50ms) + business logic (30ms) + serialization (20ms) = 150ms, leaving 50ms buffer. This is how senior engineers think.

Category 4: Availability & Reliability Questions

Availability questions establish the uptime requirements and failure tolerance. A personal blog can tolerate occasional downtime; a payment system cannot. Without understanding availability needs, you might design inadequate redundancy or waste resources on unnecessary fault tolerance.

Understanding the Nines:

Availability is typically expressed in "nines":

99% (two nines): 3.65 days downtime/year — Acceptable for internal tools
99.9% (three nines): 8.76 hours downtime/year — Standard for most web apps
99.99% (four nines): 52.56 minutes downtime/year — E-commerce, SaaS
99.999% (five nines): 5.26 minutes downtime/year — Financial systems, critical infrastructure

Each additional nine requires exponentially more engineering effort and cost.

Essential Availability & Reliability Questions

•"What's the expected uptime? Any SLA requirements?" — The answer determines redundancy investment. Clarify if this is aspirational or contractual.
•"What's the impact of downtime?" — Revenue loss per hour? User trust damage? Regulatory penalties? This context drives design decisions.
•"Should the system be highly available across regions?" — Single-region HA differs dramatically from multi-region HA in complexity and cost.
•"What failure scenarios must we handle?" — Single server failure? Datacenter failure? Region failure? Define the blast radius tolerance.
•"What's the recovery time objective (RTO)?" — How quickly must the system recover after failure? Minutes? Hours?
•"What's the recovery point objective (RPO)?" — How much data loss is acceptable? Zero data loss requires synchronous replication.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
**Scenario 1: Social Media App**
Availability: 99.9% (8.76 hours/year downtime acceptable)
- Users annoyed but rarely leave permanently
- Revenue impact: Moderate (ads, engagement loss)
- Solution: Multi-AZ, auto-scaling, graceful degradation
 
**Scenario 2: Stock Trading Platform**
Availability: 99.999% (5.26 minutes/year downtime)
- Users lose real money during downtime
- Regulatory implications, lawsuits possible
- Solution: Multi-region active-active, synchronous replication, 
  instant failover, chaos engineering
 
**Scenario 3: Internal Analytics Dashboard**  
Availability: 99% (3.65 days/year downtime acceptable)
- Only internal users affected
- Business can wait for reports
- Solution: Single-region, basic redundancy, cost-optimized

Don't Over-Engineer

Designing for five nines when three nines suffice wastes interview time and signals poor prioritization. Ask the availability question early, get a clear answer, and design accordingly. Not every system is mission-critical.

Category 5: Consistency & Correctness Questions

Consistency questions establish the data accuracy guarantees. Can users temporarily see stale data? Can the system ever lose data? These questions invoke the CAP theorem and determine fundamental architecture choices like replication strategies and database selection.

The Spectrum of Consistency:

Consistency isn't binary—it's a spectrum:

Strong Consistency: All readers see the same data immediately after a write
Eventual Consistency: Readers may see stale data temporarily, but will converge
Causal Consistency: Related operations appear in order; unrelated may vary
Read-Your-Writes: A user always sees their own updates immediately

Different features within the same system might require different consistency levels.

Essential Consistency & Correctness Questions

•"Is eventual consistency acceptable, or do we need strong consistency?" — The single most important consistency question. Determines if you can use async replication.
•"Can we ever lose data? What data is critical?" — Some data (financial transactions) is sacred; other data (view counts) is expendable.
•"What happens if users see slightly stale data?" — A follower count being 1 second behind is fine; an account balance being stale is not.
•"Are there any ordering guarantees required?" — Chat messages must appear in order; likes don't need to.
•"Do we need transactional support across operations?" — "Transfer money from A to B" must either fully succeed or fully fail.
•"What happens during a network partition?" — CAP theorem forces choices: favor availability (keep serving, possibly stale) or consistency (reject requests).

Consistency Requirements by Feature Type
Feature Type	Consistency Needs	Rationale
Financial transactions	Strong consistency	Money cannot be stale or lost
User authentication	Strong consistency	Security-critical, no stale sessions
Social media feed	Eventual consistency	Users tolerate slight delay
Like/view counts	Eventual consistency	Approximations acceptable
Chat messages	Causal + Read-your-writes	Own messages appear instantly; order preserved
Search index	Eventual consistency	Small lag between post and searchability is acceptable

The Hybrid Approach

Real systems often mix consistency levels. A social network might use strong consistency for direct messages (order matters) but eventual consistency for like counts (approximations are fine). Ask about each major feature separately.

Category 6: Constraint & Context Questions

Constraint questions establish the boundaries and limitations within which the system must operate. Budget limits, existing infrastructure, regulatory requirements, and timeline pressures all constrain design choices in ways that pure technical considerations do not.

Reality Anchors Design:

In production, no system is designed in a vacuum:

Budget limits dictate whether you can afford managed services
Existing infrastructure determines integration requirements
Team expertise influences technology choices
Regulations mandate specific security and auditing measures
Timeline pressure affects build-vs-buy decisions

These questions reveal engineering maturity—you understand that technical purity often yields to practical constraints.

Essential Constraint & Context Questions

•"Are there budget constraints we should consider?" — Rarely asked in interviews, but shows real-world awareness. Interviewers appreciate it.
•"Is there existing infrastructure we should integrate with?" — Greenfield vs. brownfield designs differ substantially.
•"Any regulatory or compliance requirements?" — Healthcare (HIPAA), finance (SOX/PCI), Europe (GDPR) all impose constraints.
•"What's the timeline? MVP vs. full solution?" — Perfect is the enemy of good. Phase designs appropriately.
•"Are there technology preferences or restrictions?" — Some organizations are all-in on AWS; others mandate on-premise.
•"What about internationalization and localization?" — Multi-language support affects data models and storage.

Interview Context

In interviews, constraint questions are often answered with 'Assume we have no constraints' or 'Use whatever technology you prefer.' That's fine—by asking, you've demonstrated that you consider these dimensions. If they do impose constraints, you've avoided designing something unusable.

Security Questions (Often Overlooked):

Security is a constraint that many candidates forget to ask about:

"What's the authentication and authorization model?" — OAuth, SSO, custom tokens?
"Is there sensitive data requiring encryption?" — At rest? In transit? Client-side?
"Who can access what?" — Role-based access control? Multi-tenancy?
"Any audit logging requirements?" — Financial systems often require immutable audit trails.

Asking these questions signals that you think about systems holistically, not just happy paths.

The Question Flow: Orchestrating Your Inquiry

Knowing what to ask is necessary but insufficient. How you ask—the sequence, pacing, and framing—matters equally. A scattershot interrogation feels unfocused; a structured exploration demonstrates mastery.

The Recommended Flow:

Time Investment

The entire requirements clarification should take 5-10 minutes in a 45-60 minute interview. This isn't wasted time—it's ensuring the remaining 35-50 minutes are spent solving the right problem. Nothing wastes more time than designing for the wrong constraints.

Framing Techniques: How to Ask Effectively

The framing of your questions affects how interviewers perceive you. Good framing sounds collaborative, thoughtful, and efficient. Poor framing sounds like interrogation or stalling.

Best Practices for Framing:

Effective Framing

•Propose, then ask: 'I'm thinking we should focus on X and Y—does that align with what you had in mind?'
•Show reasoning: 'At this scale, we'll need sharding. Is there a preference between hash-based and range-based?'
•Bundle related questions: 'For availability—what SLA, and should we plan for multi-region?'
•Defer non-essential: 'I'll assume standard auth for now; happy to dive in later if needed.'
•Seek confirmation: 'So if I understand correctly, eventual consistency is acceptable for the feed?'

Ineffective Framing

•Rapid-fire interrogation: 'How many users? What's the latency? What consistency? Multi-region?'
•Asking before thinking: 'Wait, what features do you want?' without considering yourself first.
•Asking obvious questions: 'Should the system be fast?' (Of course—ask how fast.)
•Unstructured wandering: Jumping between scope, latency, budget, back to scope.
•Not listening to answers: Asking questions already answered in the problem statement.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
**Prompt:** "Design a notification system"
 
**Strong Opening:**
"Before we dive in, I'd like to clarify some requirements. This sounds like 
it could involve push notifications, email, SMS, and in-app—should we focus 
on any specific channels, or design for all of them?"
 
[Interviewer: "Let's focus on push and in-app for now."]
 
"Great. For scale—are we targeting a consumer app with millions of users, 
or something smaller? And should I assume notifications need to be delivered 
in near-real-time, or is some delay acceptable?"
 
[Interviewer: "Consumer scale, 50M users. Real-time for in-app, best-effort for push."]
 
"Understood. One more thing—do we need to handle user preferences, 
like quiet hours or topic subscriptions, or should we assume all 
notifications are delivered?"
 
[Interviewer: "Basic preferences for now."]
 
"Perfect. Let me summarize: we're designing an in-app and push 
notification system for 50M users, near-real-time delivery for in-app, 
basic preference support. Does that capture it?"

Dialogue, Not Monologue

Requirements clarification should feel like a collaborative conversation, not an interview or a checklist. Engage with the interviewer's answers. Ask follow-ups when something is unclear. This back-and-forth is itself a signal of how you work with stakeholders.

Common Anti-Patterns to Avoid

Even candidates who know that they should ask questions often falter in how they do it. Here are the most common anti-patterns and how to avoid them:

Anti-Patterns in Requirements Clarification

•Asking Too Few Questions — Proceeding after 1-2 questions leaves critical assumptions unexplored. Even if the problem seems clear, probe deeper.
•Asking Too Many Questions — Spending 15+ minutes questioning every detail signals analysis paralysis. Use judgment about what matters.
•Asking Questions Already Answered — The problem statement often contains constraints. 'Design a real-time chat' already implies low latency.
•Not Writing Down Answers — Interviewers notice when you forget what they said. Document key constraints visibly.
•Treating Questions as Stalling — If you're asking questions to delay designing, interviewers will notice. Questions should accelerate clarity, not delay action.
•Skipping the Summary — Without a confirmation summary, you and the interviewer might have different understandings. Always verify.
•Ignoring Interviewer Signals — If they're eager to move on, adapt. If they're giving rich detail, dig deeper. Read the room.

The Balance

The goal is productive ambiguity resolution, not exhaustive interrogation. Ask enough to design confidently. You can always clarify more as you design—and should. Requirements clarification isn't a single phase; it's a mindset throughout the interview.

Summary: Essential Questions to Ask

We've covered the comprehensive taxonomy of questions that distinguish exceptional system design candidates. Let's consolidate the key insights:

Key Takeaways

•Questions reveal engineering maturity — How you explore ambiguity signals how you work in production environments.
•Use the six-category taxonomy — Functional scope, scale, latency, availability, consistency, and constraints. Cover each.
•Functional scope comes first — You can't estimate scale or define latency for features you haven't scoped.
•Quantify wherever possible — 'High traffic' is vague; '10K requests per second' is actionable.
•Frame questions collaboratively — Propose assumptions, seek confirmation, summarize understanding.
•Time-box appropriately — 5-10 minutes for clarification in a 45-60 minute interview is ideal.
•Document as you go — Write down constraints where the interviewer can see them. Reference them as you design.

What's Next:

Now that you know which questions to ask, the next page explores how to scope the problem—using your clarified requirements to define a focused, achievable design boundary. Scoping is where you transform broad requirements into specific design targets.

Page Complete

You now possess a comprehensive framework for requirements clarification in system design interviews. The questions you ask in the first minutes set the trajectory for everything that follows. Next, we'll learn to scope problems strategically—turning those clarified requirements into actionable design boundaries.