Common Mistakes - Learning Module

Loading content...

0/273

Ignoring Non-Functional Requirements

The Hidden Dimensions of System Design

A candidate designs a beautiful messaging system. The architecture flows logically—users send messages through an API, messages are stored in a database, recipients retrieve them. The interviewer nods along, then asks: "What happens when you have a million concurrent users? How do you guarantee message delivery? What's your latency target for message receipt?"

The candidate pauses. They hadn't thought about any of that.

This is the second most damaging mistake in system design interviews: treating non-functional requirements (NFRs) as optional. The candidate built something that would work on a laptop. The interviewer wanted something that would work in production at scale.

The distinction between a system that works and a system that works well is entirely defined by non-functional requirements. Ignoring them doesn't just miss interview points—it reveals a fundamental gap in engineering maturity.

The Prototype Trap

Many engineers spend most of their time building features—functional requirements. NFRs are often handled by infrastructure teams, senior architects, or inherited from existing systems. This creates a dangerous blind spot: engineers who can build anything but can't reason about whether their build will survive real-world conditions.

Understanding Non-Functional Requirements

Non-functional requirements describe how a system should behave, as opposed to what it should do. They're also called quality attributes, operational requirements, or system properties. While functional requirements describe features, NFRs describe constraints and qualities.

The critical NFRs for system design:

The Essential Non-Functional Requirements
NFR	Definition	Example Specification	Why It Matters
Scalability	Ability to handle growth in load, data, or users	Support 10x traffic increase with linear cost scaling	Systems that can't scale become bottlenecks or require rewrites
Availability	Percentage of time the system is operational	99.99% uptime (52 minutes downtime/year)	Unavailability directly translates to lost revenue and user trust
Latency	Time to complete an operation	p99 response time < 100ms for API calls	User experience and system usability depend on responsiveness
Consistency	Data correctness guarantees across operations	Strong consistency for account balances	Incorrect data can cause business-critical failures
Durability	Data survival across failures	Zero data loss for confirmed writes	Lost data destroys user trust and may have legal implications
Security	Protection against unauthorized access	Encryption in transit and at rest; audit logging	Breaches cause legal, financial, and reputational damage
Maintainability	Ease of operating and evolving the system	New features deployable in < 1 week	Systems that can't evolve become legacy anchors

Why NFRs are easy to miss:

NFRs are often implicit. When someone says "build a URL shortener," they don't explicitly say "and it should handle 10,000 requests per second, never lose data, and be available 99.9% of the time." But these requirements exist, and ignoring them produces toy solutions.

In an interview, the interviewer expects you to surface these requirements through questions. Failing to ask about NFRs signals that you would ship production systems without considering operational characteristics—a concerning liability for any engineering team.

The Iceberg Analogy

Functional requirements are the visible tip of the iceberg—what users see and interact with. Non-functional requirements are the 90% below the waterline—invisible to users but essential for the system to actually work. Interviews probe both, but NFRs often carry more weight for senior roles.

How This Mistake Manifests in Interviews

Ignoring NFRs isn't a single behavior—it manifests in several recognizable patterns. Understanding these patterns helps you catch yourself before they damage your interview performance.

Manifestation Patterns

•The feature-only design — The candidate describes what the system does ("users can upload photos, add filters, share with friends") without addressing how it handles millions of uploads, what happens during outages, or how quickly operations complete.
•The single-server fantasy — All components in the design exist on one conceptual machine. No discussion of horizontal scaling, load balancing, or geographic distribution. This works for prototypes, not production.
•The happy path obsession — The design covers normal operation beautifully but has no answer for failures: "What if the database goes down?" "What if write volume spikes 10x?" "What if a data center catches fire?"
•The missing numbers — No quantitative specifications appear in the design. No mention of request rates, latency targets, storage volumes, or availability requirements. The design exists in an abstract, number-free zone.
•The security afterthought — Security, authentication, and authorization are mentioned only if the interviewer explicitly asks—or not at all. This suggests the candidate treats security as someone else's problem.

Real Interview Example: The Photo Storage System

The prompt: "Design a system for storing and serving user photos, like Instagram's core storage."

The NFR-ignoring response: "We'll have an API that receives photo uploads, stores them in a database, and serves them when requested. I'll use a CDN for faster access. For the database, we can use PostgreSQL with the photo as a blob column..."

What's missing?

Scale: How many photos per day? How much total storage? (Instagram stores over 2 billion photos)
Latency: What's acceptable for photo load time? (Users abandon slow-loading images)
Availability: What uptime is required? (For a social app, outages mean lost engagement)
Durability: What happens if photos are lost? (Losing user photos is existential for a photo app)
Consistency: Can users see partial uploads? Can deleted photos be cached?

Without addressing these, the candidate has designed a college project, not Instagram.

The Interviewer's Interpretation

When a candidate ignores NFRs, interviewers conclude: "This person has never operated a production system. They don't understand what makes systems succeed or fail in the real world. They would require constant senior oversight." For senior roles, this is disqualifying.

The NFR-First Approach: A Structured Framework

Top candidates don't just remember to ask about NFRs—they follow a systematic approach that ensures comprehensive coverage. This framework should become second nature.

Step 1: The Scale Question

Always start here. Scale determines almost everything else.

Essential scale questions:

"How many users do we need to support? Daily active users?"
"What's the read/write ratio? How many reads per second? Writes?"
"How much data are we storing? What's the growth rate?"
"What geographic distribution do we need?"

Why this matters: A system for 1,000 users has fundamentally different architecture than one for 100 million. Caching strategies, database choices, replication needs—everything changes with scale. You cannot make intelligent design decisions without understanding scale.

Step 2: The Availability Question

How much downtime is acceptable?

Essential availability questions:

"What uptime target are we aiming for? 99.9%? 99.99%?"
"Are there different availability tiers for different features?"
"What happens during maintenance windows?"
"Is the system customer-facing or internal?"

Understanding the numbers:

Availability	Downtime/Year	Downtime/Day	Typical Use Case
99%	3.65 days	~15 minutes	Internal tools
99.9%	8.76 hours	~86 seconds	Business apps
99.99%	52.6 minutes	~8.6 seconds	E-commerce, payments
99.999%	5.26 minutes	~0.86 seconds	Critical infrastructure

Each 9 you add requires exponentially more architectural investment. Designing for 99.999% without knowing it's needed wastes engineering effort.

Step 3: The Latency Question

How fast must operations complete?

Essential latency questions:

"What latency is acceptable for the primary user-facing operations?"
"Are we measuring p50, p95, or p99 latency?"
"Are there operations that can be asynchronous vs requiring immediate response?"
"What are users' expectations based on comparable products?"

Why percentiles matter: Average latency hides problems. A system with 50ms average latency but 5-second p99 latency means 1% of users wait 100x longer. That 1% often includes your most engaged users (who make more requests), causing outsized negative impact.

Step 4: The Consistency Question

What data correctness guarantees are required?

Essential consistency questions:

"Is strong consistency required, or is eventual consistency acceptable?"
"Which operations require transactional guarantees?"
"What's the acceptable lag for data propagation across regions?"
"Are there operations where stale data would cause business harm?"

The consistency-availability trade-off: The CAP theorem tells us we can't have perfect consistency and perfect availability during network partitions. Strong consistency typically costs availability or latency. Understanding requirements helps navigate this trade-off.

NFR Questionnaire Checklist

•Scale: Users, requests/second, data volume, growth rate
•Availability: Uptime target, maintenance windows, failure tolerance
•Latency: Response time targets, percentile focus, sync vs async
•Consistency: Strong vs eventual, transactional requirements, staleness tolerance
•Durability: Data loss tolerance, backup requirements, recovery time
•Security: Authentication needs, data sensitivity, compliance requirements
•Geography: Regional requirements, latency constraints, data residency laws

Connecting NFRs to Design Decisions

Eliciting NFRs is only half the job. You must demonstrate how requirements drive design choices. This is where senior candidates distinguish themselves—they trace every architectural decision back to a requirement.

The decision traceability pattern:

"Because we need 99.99% availability [requirement], we'll deploy across multiple availability zones with automatic failover [design decision]. This trades some cost [trade-off] for resilience against datacenter failures [justification]."

This pattern—requirement → decision → trade-off → justification—demonstrates sophisticated engineering thinking.

NFRs Driving Design Choices
NFR	Requirement Example	Design Decision	Trade-off
Scalability	10x traffic growth in 2 years	Stateless services + horizontal scaling	Increased operational complexity
Availability	99.99% uptime	Multi-AZ deployment with failover	2-3x infrastructure cost
Latency	< 100ms p99 response	Regional edge caching + CDN	Cache invalidation complexity
Consistency	Strong consistency for payments	Synchronous replication with consensus	Higher latency, lower availability
Durability	Zero data loss for user content	Multi-region replication + WAL	Increased write latency
Security	PCI compliance for payments	Encryption at rest + tokenization	Performance overhead, complexity

Case Study: Designing with NFRs

Problem: Design a real-time bidding system for online advertising.

NFR Discovery:

Scale: 1 million bid requests per second
Latency: 100ms total budget (including network), so internal processing < 20ms
Availability: 99.9% (ads are revenue; downtime = lost money)
Consistency: Eventual is fine (ads don't require strong consistency)
Geography: Global reach, latency-sensitive

Design decisions driven by NFRs:

Edge processing (driven by latency requirement): Bid servers at edge locations, not centralized, because network latency would consume the entire budget
In-memory data stores (driven by latency requirement): No time for disk I/O. All campaign data cached in memory with background sync
Eventual consistency (driven by consistency requirement + latency): Campaign updates propagate asynchronously; a few stale bids are acceptable
Multi-region active-active (driven by availability + geography): No single point of failure; each region operates independently
Aggressive timeouts (driven by latency): If a component doesn't respond in 10ms, skip it rather than delay the response

Every decision traces directly to an NFR. This is the thought process interviews evaluate.

Verbalize the Connection

Don't assume interviewers will follow your reasoning. Explicitly state the connection: "Given the latency requirement of 100ms, I'm choosing in-memory caching because database round-trips would take 50-100ms alone." This demonstrates intentional design rather than pattern-matching.

Common NFR Blindspots by System Type

Different system types have different critical NFRs. Knowing which NFRs matter most for each system type prevents you from asking irrelevant questions while missing essential ones.

Critical NFRs by System Type
System Type	Most Critical NFRs	Commonly Overlooked NFR	Why It's Critical
Messaging (WhatsApp)	Latency, Consistency (ordering)	Delivery guarantees	Users notice missed messages immediately
Social Feed (Twitter)	Latency, Scalability	Consistency (timeline)	Stale feeds frustrate users; duplicate posts confuse
Payments (Stripe)	Consistency, Durability	Idempotency	Duplicate charges destroy trust
Video Streaming (Netflix)	Latency, Availability	Adaptive quality	Buffering causes abandonment
Search (Google)	Latency, Relevance	Freshness	Stale results reduce utility
Gaming (Fortnite)	Latency, Consistency	Fairness across latencies	High-ping players have bad experience
E-commerce (Amazon)	Availability, Consistency	Inventory accuracy	Overselling creates fulfillment nightmares

Deep Dive: The Payment System Blindspot

Payment systems illustrate how missing a single NFR can invalidate an otherwise good design.

The overlooked requirement: Idempotency

What happens if a payment request is sent twice? Network issues, client retries, and load balancer quirks can all cause duplicate requests. Without idempotency guarantees, a user might be charged twice for the same purchase.

The inexperienced design: "The API receives a payment request, validates the card, and charges the user."

The problem: If the response is lost and the client retries, the user is charged twice.

The mature design: "Each payment request includes an idempotency key. The system checks if this key has been seen before. If yes, return the previous result. If no, process the payment and store the key-result mapping. This makes retries safe."

The lesson: Domain expertise includes knowing which NFRs are critical for that domain. For payments, it's idempotency. For messaging, it's ordering. For gaming, it's latency fairness.

Research the Domain

Before interviews, research what NFRs matter for common system types. Reading engineering blogs from companies like Uber, Netflix, and Stripe reveals which requirements they prioritize. This preparation lets you ask informed questions and demonstrate domain awareness.

Balancing NFR Trade-offs

Here's the advanced insight: NFRs often conflict with each other. You can't maximize everything. Recognizing and articulating these trade-offs separates senior candidates from the rest.

The fundamental tensions:

Common NFR Trade-offs

•Consistency vs. Latency — Strong consistency requires coordination (locks, consensus protocols) that adds latency. Relaxing consistency enables faster responses.
•Consistency vs. Availability — The CAP theorem: during network partitions, you must choose between consistency and availability. You can't have both.
•Latency vs. Cost — Lower latency requires more resources (caching, edge servers, faster hardware). There's a direct cost relationship.
•Availability vs. Consistency — More replicas increase availability but make consistency harder to maintain.
•Security vs. Usability — Stronger security often means more friction (MFA, session timeouts, complex passwords).
•Scalability vs. Consistency — Horizontal scaling makes maintaining strong consistency across nodes difficult.

Articulating Trade-offs in Interviews

When you identify a tension, don't just acknowledge it—explain your reasoning:

Weak articulation: "We have a trade-off between consistency and latency."

Strong articulation: "We have a trade-off between consistency and latency. For this use case—a social media feed—users expect sub-100ms response times. Strong consistency would require synchronizing across our database replicas, adding 50-100ms to every read. Users are more tolerant of occasionally seeing a slightly stale feed than waiting an extra 100ms. Therefore, I'd choose eventual consistency here, accepting that a user might not see a brand-new post for up to 30 seconds."

This demonstrates:

Understanding of the trade-off mechanism
Domain awareness (what users tolerate)
Quantitative reasoning (latency numbers)
Decision-making with justification

The trade-off matrix interview technique:

For complex systems, consider verbalizing a trade-off matrix: "Looking at our requirements, we have tension between latency and consistency. For the timeline read path, I'd prioritize latency. For the write path where we're storing user posts, I'd prioritize consistency to avoid data loss. By splitting the paths, we can optimize each for its priority."

There's No Perfect Answer

Interviewers don't expect you to choose the 'right' trade-off—there often isn't one. They want to see that you recognize trade-offs exist, can articulate them clearly, and can make a reasoned decision. The quality of your reasoning matters more than the specific choice.

Quantifying NFRs: Numbers That Matter

Abstract discussions of NFRs are less impressive than quantitative ones. Having reference numbers in your head allows you to make your points concrete and demonstrates practical experience.

Reference numbers every system designer should know:

Essential Reference Numbers
Metric	Typical Value	Context
L1 cache reference	~1 ns	The fastest memory access
RAM read	~100 ns	In-memory data structures
SSD read	~100 μs	Fast persistent storage
Network round-trip (same datacenter)	~500 μs	Service-to-service calls
HDD read	~10 ms	Spinning disk access
Cross-continent network round-trip	~100-200 ms	Geo-distributed systems
Human perception threshold	~100 ms	Delays above this feel 'slow'
Acceptable web page load	< 3 seconds	Beyond this, users abandon

Using Numbers Effectively

Before (vague): "We'll cache the data because database lookups are slow."

After (quantified): "A database read takes about 5-10ms. With caching, we can get that down to under 1ms from in-memory access. Given our 100ms latency budget and the need for multiple lookups per request, caching is essential to hit our target."

The quantified version demonstrates:

Knowledge of real system latencies
Understanding of how latencies compound
Ability to do back-of-envelope math
Connection to requirements

Scaling numbers:

Entity	Order of Magnitude	Reference
Tweets per day	~500 million	Twitter's actual scale
Messages per day	~100 billion	WhatsApp's actual scale
Searches per day	~8 billion	Google's approximate scale
Videos uploaded per minute	~500 hours	YouTube's actual scale
Transactions per second	~65,000	Visa's peak capacity

Knowing these numbers lets you contextualize interview problems: "We're designing at Twitter scale, so roughly 500 million items per day, or about 6,000 per second average."

Approximation is Fine

You don't need exact numbers. The difference between 5ms and 7ms is rarely significant. What matters is order of magnitude: is it microseconds, milliseconds, or seconds? Is it thousands, millions, or billions? Getting these right demonstrates practical understanding.

Summary: Making NFRs Second Nature

Ignoring non-functional requirements is a career-limiting mistake in system design interviews. It signals inexperience with production systems and inability to think beyond features to quality attributes.

Key Takeaways

•NFRs distinguish production from prototypes. Features describe what; NFRs describe how well. Both are essential.
•Follow the NFR questionnaire systematically. Scale, availability, latency, consistency, durability, security, geography. Cover them all.
•Connect NFRs to design decisions explicitly. Don't assume interviewers follow your reasoning. State the connection: requirement → decision → trade-off.
•Know domain-critical NFRs. Payments need idempotency. Messaging needs ordering. Learn what matters for each system type.
•Articulate trade-offs with reasoning. NFRs conflict. Show you understand the tensions and can make reasoned decisions.
•Use quantitative reasoning. Numbers make abstract concepts concrete. Have reference latencies and scales memorized.
•Treat NFRs as first-class requirements. They're not optional additions—they're central to design. Ask about them before you design.

Practice exercises:

Take any system design problem and list all possible NFRs before designing. Force yourself to specify numbers for each.
For a system you use daily (Netflix, Uber, Amazon), try to infer its NFRs from user experience. What availability do you observe? What latency?
Practice the trade-off articulation pattern. Pick two conflicting NFRs and explain which you'd prioritize for a specific system and why.
Create flashcards with reference numbers (latencies, scales, availability percentages). Being able to recall these quickly helps in interviews.
Read engineering blogs from major tech companies. Note which NFRs they emphasize for different products.

Page Complete

You now understand why ignoring non-functional requirements derails system design interviews and have frameworks for ensuring comprehensive NFR coverage. In the next page, we'll tackle the third critical mistake: not asking questions—the silence that speaks volumes about engineering maturity.