Loading learning content...
Consider two messaging applications. Both let you send text messages. Both support image sharing. Both offer group chats. Functionally, they're identical. Yet one has 500 million users while the other struggles with a few thousand. The difference? Non-functional requirements.
One delivers messages in 100 milliseconds; the other takes 3 seconds. One stays online during traffic spikes; the other crashes whenever a celebrity tweets. One protects your data with end-to-end encryption; the other has suffered three data breaches. One works seamlessly on a 2G connection in rural India; the other demands fiber-optic internet.
Non-functional requirements (NFRs) define the quality attributes that determine whether a system is merely functional or truly excellent. They answer not 'what does the system do?' but 'how well does it do it?' — and this 'how well' often decides whether a product succeeds or fails in the market.
By the end of this page, you will understand the entire landscape of non-functional requirements, how to identify and specify them, their critical importance in system design, quantification techniques, trade-offs between competing NFRs, and how to leverage them in system design interviews.
Non-functional requirements (NFRs), also called quality attributes, system qualities, or -ilities (because many end in 'ility' — scalability, reliability, usability), describe the properties and constraints that govern HOW the system performs its functions.
While functional requirements define the features and behaviors (the verbs of your system), non-functional requirements define the adjectives: how fast, how secure, how available, how scalable, how usable, how maintainable.
Formal Definition:
A non-functional requirement is a specification of a quality criterion that the system must satisfy, typically expressed as a measurable attribute with an acceptable range or threshold. NFRs constrain the design space by eliminating solutions that would otherwise satisfy functional requirements but fail to meet quality standards.
Notice the critical difference: functional requirements state what happens; non-functional requirements quantify how well it happens. This quantification is essential—vague NFRs like 'the system should be fast' are nearly useless. 'Response time < 200ms at p99 under 10,000 concurrent users' is actionable.
The Iceberg Metaphor:
Functional requirements are the visible tip of the system iceberg—the features users consciously interact with. Non-functional requirements are the massive underwater portion—invisible when working correctly, catastrophically apparent when they fail. Users don't praise latency of 50ms; they just expect it. But a latency spike to 5 seconds triggers complaints, churn, and viral Twitter threads.
NFRs are often overlooked because they're invisible in requirements discussions that focus on features. Product managers ask 'What should users be able to do?' but rarely 'How fast should each action be?' or 'What happens when a datacenter fails?' This invisibility makes NFRs the most common source of post-launch crises.
NFRs span a wide landscape of quality attributes. Understanding this taxonomy ensures you don't overlook critical requirements. Let's explore each category in depth:
Performance measures how quickly and efficiently the system responds to user actions and processes workloads.
Key Metrics:
Examples of Well-Specified Performance Requirements:
Why It Matters:
Additional NFR Categories:
| Category | Description | Example Requirement |
|---|---|---|
| Usability | Ease of use, learnability, accessibility | New users complete core task within 5 minutes without training |
| Portability | Ability to run on different platforms/environments | Application runs on Chrome, Firefox, Safari, Edge without modification |
| Interoperability | Ability to exchange data with other systems | REST APIs follow OpenAPI 3.0 specification |
| Compliance | Adherence to laws, regulations, standards | System compliant with HIPAA, GDPR, SOC 2 Type II |
| Observability | Ability to understand system state from outputs | All services emit structured logs, metrics, and distributed traces |
| Cost Efficiency | Resource utilization relative to cost | Infrastructure cost < $0.01 per 1000 transactions |
| Extensibility | Ability to add new features with minimal changes | Plugin architecture for custom integrations |
| Recoverability | Ability to restore after failure | Full system restoration from backup within 4 hours |
The cardinal sin of NFR specification is vagueness. 'The system should be fast' is not a requirement—it's a wish. Quantification transforms wishes into verifiable specifications.
The SMART Framework for NFRs:
Apply SMART criteria to every non-functional requirement:
Percentiles vs Averages:
A critical distinction in performance requirements is the difference between averages and percentiles:
Why p99 matters:
Power users often generate disproportionate traffic. If your p99 is bad, your most valuable users suffer most. Additionally, in microservices architectures, p99 latencies compound. If a request touches 10 services each with 100ms p99, the overall p99 can exceed 1 second.
SLI (Service Level Indicator): A metric that measures some aspect of service (e.g., latency, error rate). SLO (Service Level Objective): A target value for an SLI (e.g., latency < 200ms for 99% of requests). SLA (Service Level Agreement): A contract with consequences if SLOs are violated (e.g., refunds, credits). NFRs typically become SLOs, which may be formalized into SLAs.
12345678910111213141516171819202122232425262728
# Service Level Objectives: Payment Service ## SLO 1: Availability- SLI: Percentage of successful HTTP responses (non-5xx)- Target: 99.95% over 30-day rolling window- Measurement: Synthetic monitoring every 60 seconds from 5 global locations ## SLO 2: Latency- SLI: Time from request received to response sent- Targets: - p50 < 100ms - p95 < 250ms - p99 < 500ms- Measurement: Application metrics, aggregated per minute ## SLO 3: Error Rate- SLI: Percentage of requests resulting in user-facing errors- Target: < 0.1% over 24-hour rolling window- Measurement: Application error logs, excluding client errors (4xx) ## SLO 4: Throughput- SLI: Successful transactions per second- Target: Sustain 10,000 TPS during peak hours (9 AM - 6 PM local time)- Measurement: Transaction counter with 1-minute resolution ## Error Budget Policy- When error budget < 25% remaining, initiate reliability sprint- When error budget exhausted, freeze feature deployments until recoveredHere's the uncomfortable truth: you cannot optimize for all NFRs simultaneously. Improving one quality attribute often degrades another. System design is fundamentally about navigating these trade-offs intelligently.
Common NFR Trade-off Pairs:
| Optimizing For | Often Conflicts With | Example |
|---|---|---|
| Performance | Cost Efficiency | Faster hardware and more replicas cost more |
| Consistency | Availability | CAP theorem — partitions force a choice |
| Security | Usability | Multi-factor auth, frequent re-authentication increases friction |
| Scalability | Consistency | Distributed writes are harder to keep consistent |
| Maintainability | Performance | Clean abstractions may add overhead |
| Availability | Cost | Redundancy for high availability is expensive |
| Latency | Throughput | Batching improves throughput but increases latency |
| Flexibility | Simplicity | Highly configurable systems are harder to understand |
The CAP Theorem — The Canonical NFR Trade-off:
In distributed systems, you must choose between:
Since network partitions are inevitable in distributed systems, you effectively choose between CP (consistent but may be unavailable during partition) or AP (available but may serve stale data during partition).
Example Trade-off Decisions:
Banking System — Chooses CP. Account balances must be consistent; it's better to reject a transaction than allow an overdraft due to stale data.
Social Media Feed — Chooses AP. Showing a slightly stale feed is acceptable; being unavailable is not.
E-commerce Inventory — Often a hybrid. Shopping cart can be AP (eventual consistency acceptable), but final checkout must be CP (can't sell item twice).
PACELC extends CAP: if Partition, choose Availability or Consistency; Else (normal operation), choose Latency or Consistency. This acknowledges that even without partitions, there's a latency-consistency trade-off in distributed systems.
Making Trade-off Decisions:
When NFRs conflict, use this framework:
The worst outcome is an implicit, undocumented trade-off that the team doesn't realize they've made until production incidents reveal it.
NFRs don't emerge from thin air—they derive from the business context, user expectations, regulatory requirements, and operational constraints. Let's explore how to systematically derive them:
Traffic and Scale Estimation:
Many NFRs require estimating traffic and data volumes. Use back-of-envelope calculations:
Example: Estimating for a Social Media Platform
From this:
Data Storage Estimation:
From this:
Design for 10x your current scale, plan for 100x. Your architecture shouldn't require fundamental changes to handle 10x traffic. You should have a documented path (even if it requires significant work) to handle 100x.
In system design interviews, NFRs are where candidates differentiate themselves. Asking about and reasoning through NFRs signals senior-level thinking.
The NFR Interview Framework:
After understanding functional requirements, systematically probe NFRs:
1234567891011121314151617181920212223242526272829303132
# NFR Clarification Questions for System Design Interviews ## Scale- "How many users do we expect? DAU/MAU?"- "What's the read vs write ratio?"- "Are there predictable traffic patterns (time zones, events)?"- "What's the expected growth rate?" ## Performance- "What latency is acceptable for critical paths?"- "Are there batch processing requirements with specific SLAs?"- "Any real-time requirements (chat, gaming, financial)?" ## Availability- "What's the availability target? (99.9%? 99.99%?)"- "Is downtime during maintenance acceptable?"- "Are certain features more critical than others?" ## Consistency- "Does the system need strong consistency or is eventual okay?"- "For which operations is consistency critical?"- "What's an acceptable staleness window?" ## Security- "What data is sensitive? PII? Financial? Health?"- "Any compliance requirements? (GDPR, HIPAA, PCI)"- "Multi-tenancy isolation requirements?" ## Geography- "Is this single-region or multi-region?"- "Where are users located?"- "Are there data residency requirements?"Demonstrating NFR Mastery in Interviews:
When you proactively discuss NFR trade-offs without being prompted—explaining why you're choosing eventual consistency for the feed but strong consistency for the purchase flow—you demonstrate principal engineer-level thinking.
NFR documentation requires particular care because these requirements drive critical architectural decisions and serve as acceptance criteria for system validation.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
# Non-Functional Requirements: [System Name] ## 1. Performance Requirements ### PRF-001: API Response Time- **Requirement**: Core API endpoints respond within latency thresholds- **Thresholds**: - p50 (median): < 50ms - p95: < 200ms - p99: < 500ms- **Measurement**: Application Performance Monitoring (APM) tool, sampled at 100%- **Scope**: All public API endpoints except bulk operations- **Load Condition**: Under normal load (0-80% of capacity)- **Priority**: Must Have ### PRF-002: Throughput- **Requirement**: System sustains target transaction rate- **Threshold**: 10,000 requests per second- **Measurement**: Load testing with production-representative traffic mix- **Scope**: Aggregate across all API endpoints- **Priority**: Must Have ## 2. Availability Requirements ### AVL-001: Service Availability- **Requirement**: Core services maintain availability SLO- **Threshold**: 99.95% measured monthly- **Measurement**: Synthetic monitoring from 5 geographic locations, 60-second intervals- **Exclusions**: Scheduled maintenance (max 4 hours/month, announced 72 hours ahead)- **Priority**: Must Have ### AVL-002: Disaster Recovery- **Requirement**: Full recovery from regional outage- **RTO (Recovery Time Objective)**: 4 hours- **RPO (Recovery Point Objective)**: 15 minutes (max data loss)- **Measurement**: Annual DR drill- **Priority**: Must Have ## 3. Scalability Requirements ### SCL-001: Horizontal Scaling- **Requirement**: System scales horizontally to meet demand- **Threshold**: Scale from baseline to 10x capacity within 30 minutes- **Measurement**: Load testing with automated scaling triggers- **Priority**: Should Have ### SCL-002: Data Growth- **Requirement**: System handles projected data growth- **Threshold**: 5 years of data (~10PB) without architecture change- **Measurement**: Capacity planning reviews quarterly- **Priority**: Must Have ## 4. Security Requirements ### SEC-001: Encryption at Rest- **Requirement**: All sensitive data encrypted at rest- **Standard**: AES-256- **Scope**: All databases, file storage, backups- **Audit**: Annual third-party security audit- **Priority**: Must Have ### SEC-002: Encryption in Transit- **Requirement**: All network traffic encrypted- **Standard**: TLS 1.3 (minimum TLS 1.2)- **Scope**: All external traffic, internal service-to-service traffic- **Priority**: Must Have ## Appendix: NFR Trade-off Decisions | Decision | Option A | Option B | Chosen | Rationale ||----------|----------|----------|--------|-----------|| Feed Consistency | Strong (CP) | Eventual (AP) | Eventual | Prioritize availability; stale feeds acceptable for seconds || Payment Consistency | Strong (CP) | Eventual (AP) | Strong | Financial correctness mandatory |NFR documentation must evolve with the system. As you learn more about actual system behavior, refine thresholds. As business needs change, update requirements. Outdated NFRs are worse than none—they create false confidence.
We've explored the critical world of non-functional requirements—the quality attributes that determine whether your system is merely functional or truly excellent. Let's consolidate:
What's Next:
We now understand what the system does (functional requirements) and how well it does it (non-functional requirements). But how do we extract these requirements effectively? The next page explores the art of asking the right questions—the systematic inquiry process that separates thorough requirements gathering from superficial fact-finding.
You now possess a comprehensive understanding of non-functional requirements. You can identify them, quantify them, navigate their trade-offs, and communicate them effectively. This knowledge is essential for every architectural decision you'll make.