System Design (HLD)Non-Functional Requirements

Non-Functional Requirements

LevelAdvanced

Duration90 mins

TopicNon-Functional Requirements

1 / 5

Scalability Requirements

The Silent Differentiator

Every ambitious system eventually confronts the same question: What happens when success arrives? A startup's prototype handling 100 users per day suddenly faces 100,000. A regional service expanding nationally sees traffic multiply by 50x overnight. An API serving a handful of partners becomes the backbone for an entire ecosystem.

Scalability requirements define the answer to this question before it becomes a crisis. They are not merely technical specifications—they are strategic decisions that determine whether your system will gracefully accommodate growth or collapse under its own success.

In system design interviews and real-world architecture, the ability to precisely articulate scalability requirements separates engineers who build systems that survive from those who build systems that thrive.

What You Will Learn

By the end of this page, you will master the complete framework for defining scalability requirements: understanding the dimensions of scale, quantifying growth trajectories, specifying scaling thresholds, and translating business needs into precise technical specifications that drive architectural decisions.

Understanding Scalability: Beyond 'Handling More Users'

Scalability is one of the most frequently invoked yet poorly understood concepts in system design. Many engineers equate scalability with "handling more users," but this oversimplification masks the nuanced reality of what scalability actually means.

The Formal Definition:

Scalability is a system's ability to maintain or improve its performance characteristics (throughput, latency, reliability) as the demand placed upon it increases across one or more dimensions, typically by adding resources proportionally.

This definition reveals three critical aspects often overlooked:

Performance characteristics must be maintained — A system that handles 10x users but with 100x latency hasn't scaled; it has degraded.
Multiple dimensions exist — Users, data volume, geographic spread, and feature complexity all constitute different scaling axes.
Resource proportionality matters — Scaling that requires exponentially more resources than workload growth is economically unscalable.

Dimensions of Scalability
Dimension	What Scales	Typical Challenges	Example
User Scale	Concurrent users, requests/second	Session management, authentication overhead	1K → 1M daily active users
Data Scale	Storage volume, data growth rate	Query performance, backup windows, cost	1TB → 1PB stored data
Geographic Scale	Regions, distance from users	Latency, data residency, consistency	Single region → global deployment
Feature Scale	System complexity, integrations	Testing surface, dependency management	MVP → full-featured platform
Organizational Scale	Teams, developers, deployments	Coordination, ownership boundaries	1 team → 100 teams contributing

The Principal Engineer's Perspective

When defining scalability requirements, always specify WHICH dimension you're scaling. A system can be highly scalable along one dimension (user count) while being severely constrained along another (data volume). The most dangerous oversights occur when teams assume scalability is a single boolean property rather than a multi-dimensional characteristic.

Vertical vs. Horizontal Scaling: Architectural Implications

Every scalability decision fundamentally reduces to choosing between two strategies—or more precisely, determining the right blend of both. Understanding the deep implications of vertical and horizontal scaling is essential for defining appropriate scalability requirements.

Vertical Scaling (Scaling Up):

Vertical scaling increases system capacity by adding more resources to existing nodes—more CPU cores, more RAM, faster storage, enhanced network bandwidth. This approach is conceptually simple: when capacity is exhausted, upgrade the hardware.

Horizontal Scaling (Scaling Out):

Horizontal scaling increases capacity by adding more nodes to the system. Instead of making one server more powerful, you add more servers and distribute the workload across them.

Vertical Scaling Characteristics

•Simplicity — No architectural changes required; upgrade hardware transparently
•Strong consistency — Single node eliminates distributed system complexity
•Hard ceiling — Physical limits: largest available instance size
•Single point of failure — All eggs in one basket without redundancy
•Cost curve — Non-linear: 2x resources often costs 3-4x price
•Downtime risk — Upgrades typically require restarts or migrations

Horizontal Scaling Characteristics

•Theoretically unlimited — Add nodes indefinitely (with proper architecture)
•Fault tolerance — Node failures don't bring down the system
•Architectural complexity — Requires distributed system patterns
•Data distribution — Sharding, partitioning, replication strategies needed
•Cost efficiency — Linear cost scaling, commodity hardware
•Elasticity — Scale down during low demand to reduce costs

The Strategic Decision Framework:

When specifying scalability requirements, you must indicate which scaling strategy your design assumes or mandates:

Requirement Framing	Implication
"Support 10K concurrent users on a single deployment"	Vertical scaling path; simpler ops, limited ceiling
"Scale linearly to 100K+ concurrent users"	Horizontal scaling required; distributed architecture
"Maintain sub-100ms latency at any scale"	May require vertical scaling to minimize network hops
"Handle 10x traffic spikes within 5 minutes"	Horizontal scaling with auto-scaling infrastructure

The Hybrid Reality:

In practice, most scalable systems employ both strategies. Databases often scale vertically first (larger instances) before sharding horizontally. Compute tiers typically scale horizontally (stateless web servers) while specialized components scale vertically (in-memory caches).

Your scalability requirements should acknowledge this complexity: "The system shall scale horizontally for stateless compute workloads while supporting vertical scaling for database read replicas up to [instance size], with horizontal sharding available when vertical limits are reached."

The Vertical Scaling Trap

Many systems are architected for vertical scaling because it's simpler initially. But when you hit the ceiling of the largest available instance, your only path forward is a complete re-architecture for horizontal scaling—often under production pressure with aggressive timelines. Define your scalability requirements to anticipate this ceiling BEFORE you hit it.

Quantifying Scale: Metrics That Matter

Vague scalability requirements like "the system should be scalable" or "handle high traffic" are meaningless for architectural decisions. Precise quantification transforms aspirations into actionable specifications.

The Three-Horizon Framework:

Scalability requirements should be defined across three time horizons:

Current State (Day 1) — What the system must handle at launch
Near-Term Growth (Year 1) — Expected load after predictable growth
Long-Term Target (Year 3-5) — Aspirational scale informing architectural choices

This framework prevents both over-engineering (building for 100M users when you have 100) and under-engineering (re-architecting every 6 months).

Scalability Metrics Specification Template
Metric	Day 1	Year 1	Year 3-5	Notes
Daily Active Users (DAU)	10K	500K	10M	Drives session infrastructure
Peak Concurrent Users	1K	50K	1M	Determines connection pool sizing
Requests per Second (RPS)	100	5K	100K	API gateway and compute capacity
Data Ingestion Rate	1 GB/day	100 GB/day	10 TB/day	Write path architecture
Total Data Volume	100 GB	10 TB	1 PB	Storage tier selection
Write:Read Ratio	1:10	1:50	1:100	Read replica strategy
Events per Second	50	10K	500K	Event streaming infrastructure
Payload Size (avg)	5 KB	5 KB	10 KB	Bandwidth and serialization
Geographic Regions	1	3	7	Multi-region data strategy

Deriving Metrics from Business Requirements:

Scalability metrics don't appear from thin air—they flow from business context. Here's a structured approach to deriving them:

Step 1: Identify User Archetypes

How many users will the system serve?
What's the growth trajectory? (Linear, exponential, step-function at launches)
What's the geographic distribution?

Step 2: Map User Behavior to System Load

How many actions per active user per session?
What's the session duration and frequency?
What percentage of actions are reads vs. writes?

Step 3: Apply Traffic Pattern Analysis

What's the peak-to-average ratio? (Typically 3-10x for consumer apps)
Are there predictable spikes? (Black Friday, campaign launches)
What's the growth rate month-over-month?

Step 4: Calculate Derived Metrics

Peak RPS = (DAU × Actions per User × Peak Multiplier) ÷ Active Hours in Seconds
Storage Growth = New Users × Data per User + Existing Users × Data Accumulation Rate
Connection Pool = Peak Concurrent Users × Connections per User Session

scalability-estimation-example.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Example: Social Feed Application Scalability Estimation
 
## Business Inputs
- Target: 1 million DAU by end of Year 1
- Average session: 20 minutes, 3 sessions/day
- Actions per session: 50 (views) + 5 (interactions) + 2 (posts)
- Geographic: US-first, expanding to EU in Year 2
- Strategic growth: 5M DAU by Year 3
 
## Derived Metrics (Year 1)
 
### Request Rate Calculation
- Active hours per day: 16 hours (6 AM - 10 PM local time)
- Peak multiplier: 4x average (lunch and evening peaks)
- Daily actions: 1M users × 3 sessions × 57 actions = 171M actions/day
- Average RPS: 171M ÷ (16 × 3600) = 2,968 RPS
- Peak RPS: 2,968 × 4 = ~12,000 RPS
 
### Storage Calculation
- User profile: 50 KB (including preferences, settings)
- Posts per user per day: 2, average 10 KB each
- Daily new storage: 1M × 20 KB = 20 TB/day for posts
- Cumulative Year 1: ~7 PB (excluding archives and deletions)
 
### Connection Requirements
- Peak concurrent users: DAU × 0.05 = 50,000 (5% concurrent assumption)
- Connections per user: 3 (app, notification websocket, background sync)
- Peak concurrent connections: 150,000
 
## Scalability Requirements Summary
1. API tier: Scale horizontally to 15K RPS with 100ms P99 latency
2. Database: Support 7 PB storage with 50K concurrent connections
3. CDN: Serve 5M unique cached assets per day
4. Event bus: Process 12K events/second for feeds and notifications

The 10x Buffer Principle

When specifying scalability requirements for architectural decisions, apply a 10x buffer to your Year 1 estimates. If your calculations suggest you need to handle 10K RPS, specify 100K as your architectural limit. This ensures your design choices won't require replacement during normal growth and provides headroom for unexpected success.

Scaling Thresholds and Triggers

Scalability requirements must go beyond stating target capacities—they should define when and how scaling actions occur. This transforms scalability from a static property into a dynamic capability.

Elastic Scaling Specifications:

Modern systems don't have fixed capacity; they adjust dynamically based on demand. Your requirements should specify the elasticity characteristics:

Scale-Up Trigger — What conditions initiate capacity increase?
Scale-Up Behavior — How much capacity is added and how quickly?
Scale-Down Trigger — When can capacity be safely reduced?
Scale-Down Behavior — How is capacity reduced without disruption?
Minimum Capacity — What baseline must always be maintained?
Maximum Capacity — What ceiling protects against runaway costs or cascading failures?

Elastic Scaling Specification Example
Parameter	Specification	Rationale
Scale-Up Trigger	CPU > 70% for 2 consecutive minutes OR RPS > 5K per instance	Leading indicator before latency degradation
Scale-Up Increment	Add 50% more instances (minimum 2)	Aggressive response to demand surge
Scale-Up Latency	New capacity available within 3 minutes	Faster than typical traffic ramps
Scale-Down Trigger	CPU < 30% for 15 minutes AND RPS < 2K per instance	Conservative to prevent flapping
Scale-Down Decrement	Remove 25% of instances (maximum)	Gradual reduction protects stability
Scale-Down Cooldown	No scale-down for 30 minutes after scale-up	Prevents oscillation
Minimum Instances	6 (across 3 AZs, 2 per AZ)	Fault tolerance baseline
Maximum Instances	200 (soft limit with alerting at 150)	Cost protection and capacity planning

Metric Selection for Scaling Triggers:

The metrics you choose for scaling triggers dramatically affect system behavior:

Metric	Pros	Cons	Best For
CPU Utilization	Universal, simple	Lagging indicator	Compute-bound workloads
Request Queue Depth	Leading indicator	Requires queue monitoring	Async processing
Request Latency (P99)	User-experience aligned	Can spike from outliers	Latency-sensitive APIs
Requests per Second	Predictable, easy to project	Doesn't reflect request cost	Homogeneous workloads
Memory Utilization	Essential for memory-bound work	Often stable until sudden exhaustion	Caching, in-memory databases
Custom Business Metric	Maps to business outcomes	Requires instrumentation	Specialized workloads

Composite Triggers for Reliability

Production systems often use composite triggers: 'Scale up if (CPU > 70% AND queue depth > 100) OR (P99 latency > 500ms for 1 minute).' This prevents false positives while ensuring multiple failure modes are covered. Your scalability requirements should specify whether triggers are conjunctive (all conditions) or disjunctive (any condition).

Pre-Emptive vs. Reactive Scaling:

Advanced scalability requirements distinguish between reactive scaling (responding to current load) and pre-emptive scaling (anticipating future load):

Reactive Scaling Specification: "The system shall automatically scale compute capacity in response to observed load, maintaining target utilization between 40-70% with actions triggered within 60 seconds of threshold breach."

Pre-Emptive Scaling Specification: "The system shall pre-scale capacity 30 minutes before scheduled events (marketing campaigns, product launches per schedule feed), achieving target capacity before the first anticipated traffic increase."

Predictive Scaling Specification: "The system shall utilize historical traffic patterns to predictively scale capacity, targeting 20% headroom above the 7-day moving average for the corresponding time window."

Scalability Constraints and Bottleneck Identification

No system scales infinitely. Understanding and documenting scalability constraints is as important as specifying targets. These constraints drive architectural trade-offs and inform capacity planning.

Universal System Laws:

Two fundamental laws govern system scalability:

Amdahl's Law:

Speedup from parallelization is limited by the sequential portion of the workload. If 20% of your processing must happen sequentially (e.g., database locks, ordered operations), maximum speedup is 5x regardless of how many processors you add.

Implication: Identify sequential bottlenecks in your requirements: "System scalability is constrained by sequential order processing, which limits throughput to 10K orders/second regardless of compute scale."

amdahls-law.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Amdahl's Law Calculation
 
Speedup = 1 / (S + P/N)
 
Where:
- S = Sequential portion (fraction)
- P = Parallel portion (fraction, where S + P = 1)
- N = Number of processors/instances
 
## Example: Order Processing System
- Sequential operations: 20% (database transaction, order validation)
- Parallel operations: 80% (inventory check, payment processing, notifications)
 
With 1 instance:   Speedup = 1.0x (baseline)
With 4 instances:  Speedup = 1 / (0.2 + 0.8/4) = 2.5x
With 16 instances: Speedup = 1 / (0.2 + 0.8/16) = 4.0x
With 100 instances: Speedup = 1 / (0.2 + 0.8/100) = 4.8x
With ∞ instances:  Speedup = 1 / 0.2 = 5.0x (maximum possible)
 
## Scalability Requirement Implication
"Order processing throughput can scale up to 5x baseline (50K orders/second)
regardless of compute resources due to inherent sequential constraints.
Beyond this, architectural changes are required."

Universal Scalability Law (USL):

USL extends Amdahl's Law to account for contention (coherency delays) in distributed systems. As you add nodes, communication overhead can actually cause negative scaling—more instances means lower throughput.

Implication: "Cross-instance coordination for distributed transactions introduces non-linear overhead. Beyond 50 instances, throughput per instance degrades by approximately 5% per additional instance."

Common Scalability Bottlenecks to Document

•Database Connections — Connection pools have finite size; each instance consumes connections. Document: 'Maximum concurrent connections: 10,000. Each application instance requires 50 connections, limiting cluster to 200 instances.'
•Locking Contention — Hot keys, row-level locks, distributed locks. Document: 'User profile updates are serialized per-user, limiting per-user throughput to 10 updates/second regardless of scale.'
•External Dependencies — Third-party APIs with rate limits. Document: 'Payment gateway limit: 1,000 transactions/second. Scaling beyond requires multi-gateway integration.'
•Network Bandwidth — Cross-region replication, data egress. Document: 'Cross-region sync limited to 10 Gbps, constraining real-time replication to 1.25 GB/second.'
•Single Writer Constraints — Leader election, sequence generation. Document: 'Sequence generation occurs on single leader, limiting unique ID generation to 100K/second.'
•State Synchronization — Cache invalidation, session consistency. Document: 'Cache invalidation propagation takes 50-200ms, limiting consistency guarantees during scale events.'

Document Your Bottlenecks

Every system has a bottleneck—the component that limits overall capacity. If you don't know your bottleneck, you don't understand your scalability. Scalability requirements should explicitly identify the expected bottleneck and the capacity at which it manifests: 'Database write throughput is the primary scalability constraint, expected to limit system capacity at approximately 50K writes/second.'

Scalability Testing Requirements

Scalability requirements are meaningless without corresponding testing requirements. How will you verify your system actually meets its scalability specifications?

Categories of Scalability Testing:

Scalability Testing Types
Test Type	Purpose	Methodology	When to Run
Load Testing	Verify capacity at expected load	Simulate realistic traffic at target levels	Pre-launch, major releases
Stress Testing	Find breaking point	Increase load until failure occurs	Quarterly, architecture changes
Spike Testing	Verify elastic scaling	Apply sudden traffic bursts	After scaling config changes
Soak Testing	Detect resource leaks	Sustained load over extended period	Weekly in staging
Breakpoint Testing	Identify performance cliffs	Gradually increase load, measure degradation	Architecture validation
Scalability Testing	Verify horizontal scaling	Measure throughput vs. instance count	Infrastructure changes

Scalability Testing Specification Template:

Your scalability requirements should include testing specifications:

Scalability Testing Requirements:

1. Load Test Baseline
   - Frequency: Weekly automated, daily in pre-prod
   - Target: 100% of Day 1 specification (5K RPS)
   - Duration: 30 minutes sustained
   - Pass criteria: P99 latency < 200ms, error rate < 0.1%

2. Stress Test Maximum Capacity
   - Frequency: Monthly
   - Target: Increase until 1% error rate sustained
   - Documentation: Record breaking point capacity
   - Minimum acceptable: 150% of Year 1 target (15K RPS)

3. Spike Test Elasticity
   - Frequency: After any scaling configuration change
   - Pattern: 0% → 100% → 300% → 100% → 0% over 30 minutes
   - Pass criteria: P99 recovers within 5 minutes of scale event
   - Auto-scaling: Verify instances added within SLA

4. Soak Test Stability
   - Frequency: Weekly, 24-hour duration
   - Target: 70% of maximum capacity
   - Pass criteria: No memory growth > 10%, no latency degradation > 20%

Test in Production-Like Environments

Scalability characteristics often differ dramatically between environments. A test environment with 1/10th the database size may not exhibit the same query patterns. Specify: 'Scalability testing shall be performed with production-representative data volumes (minimum 70% of production data size) and realistic traffic patterns derived from production logs.'

Writing Effective Scalability Requirements

With all the concepts covered, let's synthesize them into a framework for writing complete, actionable scalability requirements.

The SMART-S Framework for Scalability:

Specific — Precise metrics, not vague aspirations
Measurable — Clear quantitative targets with units
Achievable — Realistic given technology and timeline
Relevant — Tied to business needs and user experience
Time-bound — Specified for defined time horizons
Scalable — Explicitly addresses growth trajectory

scalability-requirements-template.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Scalability Requirements Document Template
 
## 1. Executive Summary
Brief overview of scalability requirements and their business justification.
 
## 2. Capacity Targets
 
### 2.1 User Scale
| Metric | Day 1 | Year 1 | Year 3 |
|--------|-------|--------|--------|
| Daily Active Users | [value] | [value] | [value] |
| Peak Concurrent Users | [value] | [value] | [value] |
| Geographic Distribution | [regions] | [regions] | [regions] |
 
### 2.2 Throughput Scale
| Metric | Day 1 | Year 1 | Year 3 |
|--------|-------|--------|--------|
| Read Requests/Second | [value] | [value] | [value] |
| Write Requests/Second | [value] | [value] | [value] |
| Peak:Average Ratio | [value] | [value] | [value] |
 
### 2.3 Data Scale
| Metric | Day 1 | Year 1 | Year 3 |
|--------|-------|--------|--------|
| Total Storage Volume | [value] | [value] | [value] |
| Daily Data Ingestion | [value] | [value] | [value] |
| Data Retention Period | [value] | [value] | [value] |
 
## 3. Scaling Strategy
 
### 3.1 Scaling Approach
- Compute Tier: [Horizontal/Vertical/Hybrid]
- Database Tier: [Approach with details]
- Cache Tier: [Approach]
 
### 3.2 Elastic Scaling Parameters
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| Scale-up trigger | [condition] | [why] |
| Scale-up increment | [amount] | [why] |
| Scale-up latency | [time] | [why] |
| Scale-down trigger | [condition] | [why] |
| Min instances | [count] | [why] |
| Max instances | [count] | [why] |
 
## 4. Scalability Constraints
 
### 4.1 Known Bottlenecks
- Primary: [component] at [capacity]
- Secondary: [component] at [capacity]
 
### 4.2 Architectural Limitations
- [Limitation with quantified impact]
 
## 5. Testing Requirements
 
### 5.1 Load Testing
- Frequency: [schedule]
- Targets: [specifications]
- Pass criteria: [metrics]
 
### 5.2 Stress Testing
- Frequency: [schedule]
- Methodology: [approach]
- Documentation: [requirements]
 
## 6. Monitoring and Alerting
- Key metrics: [list]
- Alert thresholds: [specifications]
- Dashboard requirements: [details]

Scalability Requirement Anti-Patterns to Avoid

•❌ 'The system should be highly scalable' → ✅ 'The system shall scale to 100K RPS with linear cost increase'
•❌ 'Handle millions of users' → ✅ 'Support 5M DAU with peak concurrency of 500K across 3 regions'
•❌ 'Scale as needed' → ✅ 'Auto-scale between 10 and 200 instances based on CPU > 70%'
•❌ 'Fast response times at scale' → ✅ 'Maintain P99 < 200ms at 50K RPS'
•❌ 'Future-proof architecture' → ✅ 'Architecture supports 10x current scale without re-design'

Summary: Scalability Requirements Mastery

We have covered the complete landscape of scalability requirements. Let's consolidate the essential takeaways:

Key Takeaways

•Scalability is multi-dimensional — User scale, data scale, geographic scale, and feature scale each require separate consideration.
•Vertical vs. horizontal scaling — Understand the trade-offs and specify which strategy your architecture supports.
•Quantify with precision — Use the three-horizon framework (Day 1, Year 1, Year 3) with specific metrics.
•Define elastic scaling parameters — Triggers, increments, cooldowns, and limits transform capacity from static to dynamic.
•Document constraints explicitly — Every system has bottlenecks; identify and quantify them.
•Specify testing requirements — Load, stress, spike, and soak tests verify your scalability claims.
•Use the SMART-S framework — Specific, Measurable, Achievable, Relevant, Time-bound, and Scalable requirements.

What's Next:

With scalability requirements mastered, we turn to the next critical non-functional requirement: Availability. While scalability determines whether your system can handle growth, availability determines whether your system is accessible when users need it. These two requirements often create tension—optimizing for one can compromise the other—making it essential to understand both deeply.

Page Complete

You now have a comprehensive framework for defining scalability requirements. These specifications will drive every major architectural decision in your system design—from database selection to compute infrastructure to deployment topology. In the next page, we'll explore availability requirements with equal rigor.

1 / 5

Loading learning content...

System Design (HLD)Non-Functional Requirements

Non-Functional Requirements

LevelAdvanced

Duration90 mins

TopicNon-Functional Requirements

1 / 5

Scalability Requirements

The Silent Differentiator

What You Will Learn

Understanding Scalability: Beyond 'Handling More Users'

The Formal Definition:

This definition reveals three critical aspects often overlooked:

Performance characteristics must be maintained — A system that handles 10x users but with 100x latency hasn't scaled; it has degraded.
Multiple dimensions exist — Users, data volume, geographic spread, and feature complexity all constitute different scaling axes.
Resource proportionality matters — Scaling that requires exponentially more resources than workload growth is economically unscalable.

Dimensions of Scalability
Dimension	What Scales	Typical Challenges	Example
User Scale	Concurrent users, requests/second	Session management, authentication overhead	1K → 1M daily active users
Data Scale	Storage volume, data growth rate	Query performance, backup windows, cost	1TB → 1PB stored data
Geographic Scale	Regions, distance from users	Latency, data residency, consistency	Single region → global deployment
Feature Scale	System complexity, integrations	Testing surface, dependency management	MVP → full-featured platform
Organizational Scale	Teams, developers, deployments	Coordination, ownership boundaries	1 team → 100 teams contributing

The Principal Engineer's Perspective

Vertical vs. Horizontal Scaling: Architectural Implications

Vertical Scaling (Scaling Up):

Horizontal Scaling (Scaling Out):

Horizontal scaling increases capacity by adding more nodes to the system. Instead of making one server more powerful, you add more servers and distribute the workload across them.

Vertical Scaling Characteristics

•Simplicity — No architectural changes required; upgrade hardware transparently
•Strong consistency — Single node eliminates distributed system complexity
•Hard ceiling — Physical limits: largest available instance size
•Single point of failure — All eggs in one basket without redundancy
•Cost curve — Non-linear: 2x resources often costs 3-4x price
•Downtime risk — Upgrades typically require restarts or migrations

Horizontal Scaling Characteristics

•Theoretically unlimited — Add nodes indefinitely (with proper architecture)
•Fault tolerance — Node failures don't bring down the system
•Architectural complexity — Requires distributed system patterns
•Data distribution — Sharding, partitioning, replication strategies needed
•Cost efficiency — Linear cost scaling, commodity hardware
•Elasticity — Scale down during low demand to reduce costs

The Strategic Decision Framework:

When specifying scalability requirements, you must indicate which scaling strategy your design assumes or mandates:

Requirement Framing	Implication
"Support 10K concurrent users on a single deployment"	Vertical scaling path; simpler ops, limited ceiling
"Scale linearly to 100K+ concurrent users"	Horizontal scaling required; distributed architecture
"Maintain sub-100ms latency at any scale"	May require vertical scaling to minimize network hops
"Handle 10x traffic spikes within 5 minutes"	Horizontal scaling with auto-scaling infrastructure

The Hybrid Reality:

The Vertical Scaling Trap

Quantifying Scale: Metrics That Matter

The Three-Horizon Framework:

Scalability requirements should be defined across three time horizons:

Current State (Day 1) — What the system must handle at launch
Near-Term Growth (Year 1) — Expected load after predictable growth
Long-Term Target (Year 3-5) — Aspirational scale informing architectural choices

This framework prevents both over-engineering (building for 100M users when you have 100) and under-engineering (re-architecting every 6 months).

Scalability Metrics Specification Template
Metric	Day 1	Year 1	Year 3-5	Notes
Daily Active Users (DAU)	10K	500K	10M	Drives session infrastructure
Peak Concurrent Users	1K	50K	1M	Determines connection pool sizing
Requests per Second (RPS)	100	5K	100K	API gateway and compute capacity
Data Ingestion Rate	1 GB/day	100 GB/day	10 TB/day	Write path architecture
Total Data Volume	100 GB	10 TB	1 PB	Storage tier selection
Write:Read Ratio	1:10	1:50	1:100	Read replica strategy
Events per Second	50	10K	500K	Event streaming infrastructure
Payload Size (avg)	5 KB	5 KB	10 KB	Bandwidth and serialization
Geographic Regions	1	3	7	Multi-region data strategy

Deriving Metrics from Business Requirements:

Scalability metrics don't appear from thin air—they flow from business context. Here's a structured approach to deriving them:

Step 1: Identify User Archetypes

How many users will the system serve?
What's the growth trajectory? (Linear, exponential, step-function at launches)
What's the geographic distribution?

Step 2: Map User Behavior to System Load

How many actions per active user per session?
What's the session duration and frequency?
What percentage of actions are reads vs. writes?

Step 3: Apply Traffic Pattern Analysis

What's the peak-to-average ratio? (Typically 3-10x for consumer apps)
Are there predictable spikes? (Black Friday, campaign launches)
What's the growth rate month-over-month?

Step 4: Calculate Derived Metrics

Peak RPS = (DAU × Actions per User × Peak Multiplier) ÷ Active Hours in Seconds
Storage Growth = New Users × Data per User + Existing Users × Data Accumulation Rate
Connection Pool = Peak Concurrent Users × Connections per User Session

scalability-estimation-example.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Example: Social Feed Application Scalability Estimation
 
## Business Inputs
- Target: 1 million DAU by end of Year 1
- Average session: 20 minutes, 3 sessions/day
- Actions per session: 50 (views) + 5 (interactions) + 2 (posts)
- Geographic: US-first, expanding to EU in Year 2
- Strategic growth: 5M DAU by Year 3
 
## Derived Metrics (Year 1)
 
### Request Rate Calculation
- Active hours per day: 16 hours (6 AM - 10 PM local time)
- Peak multiplier: 4x average (lunch and evening peaks)
- Daily actions: 1M users × 3 sessions × 57 actions = 171M actions/day
- Average RPS: 171M ÷ (16 × 3600) = 2,968 RPS
- Peak RPS: 2,968 × 4 = ~12,000 RPS
 
### Storage Calculation
- User profile: 50 KB (including preferences, settings)
- Posts per user per day: 2, average 10 KB each
- Daily new storage: 1M × 20 KB = 20 TB/day for posts
- Cumulative Year 1: ~7 PB (excluding archives and deletions)
 
### Connection Requirements
- Peak concurrent users: DAU × 0.05 = 50,000 (5% concurrent assumption)
- Connections per user: 3 (app, notification websocket, background sync)
- Peak concurrent connections: 150,000
 
## Scalability Requirements Summary
1. API tier: Scale horizontally to 15K RPS with 100ms P99 latency
2. Database: Support 7 PB storage with 50K concurrent connections
3. CDN: Serve 5M unique cached assets per day
4. Event bus: Process 12K events/second for feeds and notifications

The 10x Buffer Principle

Scaling Thresholds and Triggers

Elastic Scaling Specifications:

Modern systems don't have fixed capacity; they adjust dynamically based on demand. Your requirements should specify the elasticity characteristics:

Scale-Up Trigger — What conditions initiate capacity increase?
Scale-Up Behavior — How much capacity is added and how quickly?
Scale-Down Trigger — When can capacity be safely reduced?
Scale-Down Behavior — How is capacity reduced without disruption?
Minimum Capacity — What baseline must always be maintained?
Maximum Capacity — What ceiling protects against runaway costs or cascading failures?

Elastic Scaling Specification Example
Parameter	Specification	Rationale
Scale-Up Trigger	CPU > 70% for 2 consecutive minutes OR RPS > 5K per instance	Leading indicator before latency degradation
Scale-Up Increment	Add 50% more instances (minimum 2)	Aggressive response to demand surge
Scale-Up Latency	New capacity available within 3 minutes	Faster than typical traffic ramps
Scale-Down Trigger	CPU < 30% for 15 minutes AND RPS < 2K per instance	Conservative to prevent flapping
Scale-Down Decrement	Remove 25% of instances (maximum)	Gradual reduction protects stability
Scale-Down Cooldown	No scale-down for 30 minutes after scale-up	Prevents oscillation
Minimum Instances	6 (across 3 AZs, 2 per AZ)	Fault tolerance baseline
Maximum Instances	200 (soft limit with alerting at 150)	Cost protection and capacity planning

Metric Selection for Scaling Triggers:

The metrics you choose for scaling triggers dramatically affect system behavior:

Metric	Pros	Cons	Best For
CPU Utilization	Universal, simple	Lagging indicator	Compute-bound workloads
Request Queue Depth	Leading indicator	Requires queue monitoring	Async processing
Request Latency (P99)	User-experience aligned	Can spike from outliers	Latency-sensitive APIs
Requests per Second	Predictable, easy to project	Doesn't reflect request cost	Homogeneous workloads
Memory Utilization	Essential for memory-bound work	Often stable until sudden exhaustion	Caching, in-memory databases
Custom Business Metric	Maps to business outcomes	Requires instrumentation	Specialized workloads

Composite Triggers for Reliability

Pre-Emptive vs. Reactive Scaling:

Advanced scalability requirements distinguish between reactive scaling (responding to current load) and pre-emptive scaling (anticipating future load):

Scalability Constraints and Bottleneck Identification

Universal System Laws:

Two fundamental laws govern system scalability:

Amdahl's Law:

amdahls-law.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Amdahl's Law Calculation
 
Speedup = 1 / (S + P/N)
 
Where:
- S = Sequential portion (fraction)
- P = Parallel portion (fraction, where S + P = 1)
- N = Number of processors/instances
 
## Example: Order Processing System
- Sequential operations: 20% (database transaction, order validation)
- Parallel operations: 80% (inventory check, payment processing, notifications)
 
With 1 instance:   Speedup = 1.0x (baseline)
With 4 instances:  Speedup = 1 / (0.2 + 0.8/4) = 2.5x
With 16 instances: Speedup = 1 / (0.2 + 0.8/16) = 4.0x
With 100 instances: Speedup = 1 / (0.2 + 0.8/100) = 4.8x
With ∞ instances:  Speedup = 1 / 0.2 = 5.0x (maximum possible)
 
## Scalability Requirement Implication
"Order processing throughput can scale up to 5x baseline (50K orders/second)
regardless of compute resources due to inherent sequential constraints.
Beyond this, architectural changes are required."

Universal Scalability Law (USL):

Common Scalability Bottlenecks to Document

•Database Connections — Connection pools have finite size; each instance consumes connections. Document: 'Maximum concurrent connections: 10,000. Each application instance requires 50 connections, limiting cluster to 200 instances.'
•Locking Contention — Hot keys, row-level locks, distributed locks. Document: 'User profile updates are serialized per-user, limiting per-user throughput to 10 updates/second regardless of scale.'
•External Dependencies — Third-party APIs with rate limits. Document: 'Payment gateway limit: 1,000 transactions/second. Scaling beyond requires multi-gateway integration.'
•Network Bandwidth — Cross-region replication, data egress. Document: 'Cross-region sync limited to 10 Gbps, constraining real-time replication to 1.25 GB/second.'
•Single Writer Constraints — Leader election, sequence generation. Document: 'Sequence generation occurs on single leader, limiting unique ID generation to 100K/second.'
•State Synchronization — Cache invalidation, session consistency. Document: 'Cache invalidation propagation takes 50-200ms, limiting consistency guarantees during scale events.'

Document Your Bottlenecks

Scalability Testing Requirements

Scalability requirements are meaningless without corresponding testing requirements. How will you verify your system actually meets its scalability specifications?

Categories of Scalability Testing:

Scalability Testing Types
Test Type	Purpose	Methodology	When to Run
Load Testing	Verify capacity at expected load	Simulate realistic traffic at target levels	Pre-launch, major releases
Stress Testing	Find breaking point	Increase load until failure occurs	Quarterly, architecture changes
Spike Testing	Verify elastic scaling	Apply sudden traffic bursts	After scaling config changes
Soak Testing	Detect resource leaks	Sustained load over extended period	Weekly in staging
Breakpoint Testing	Identify performance cliffs	Gradually increase load, measure degradation	Architecture validation
Scalability Testing	Verify horizontal scaling	Measure throughput vs. instance count	Infrastructure changes

Scalability Testing Specification Template:

Your scalability requirements should include testing specifications:

Scalability Testing Requirements:

1. Load Test Baseline
   - Frequency: Weekly automated, daily in pre-prod
   - Target: 100% of Day 1 specification (5K RPS)
   - Duration: 30 minutes sustained
   - Pass criteria: P99 latency < 200ms, error rate < 0.1%

2. Stress Test Maximum Capacity
   - Frequency: Monthly
   - Target: Increase until 1% error rate sustained
   - Documentation: Record breaking point capacity
   - Minimum acceptable: 150% of Year 1 target (15K RPS)

3. Spike Test Elasticity
   - Frequency: After any scaling configuration change
   - Pattern: 0% → 100% → 300% → 100% → 0% over 30 minutes
   - Pass criteria: P99 recovers within 5 minutes of scale event
   - Auto-scaling: Verify instances added within SLA

4. Soak Test Stability
   - Frequency: Weekly, 24-hour duration
   - Target: 70% of maximum capacity
   - Pass criteria: No memory growth > 10%, no latency degradation > 20%

Test in Production-Like Environments

Writing Effective Scalability Requirements

With all the concepts covered, let's synthesize them into a framework for writing complete, actionable scalability requirements.

The SMART-S Framework for Scalability:

Specific — Precise metrics, not vague aspirations
Measurable — Clear quantitative targets with units
Achievable — Realistic given technology and timeline
Relevant — Tied to business needs and user experience
Time-bound — Specified for defined time horizons
Scalable — Explicitly addresses growth trajectory

scalability-requirements-template.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Scalability Requirements Document Template
 
## 1. Executive Summary
Brief overview of scalability requirements and their business justification.
 
## 2. Capacity Targets
 
### 2.1 User Scale
| Metric | Day 1 | Year 1 | Year 3 |
|--------|-------|--------|--------|
| Daily Active Users | [value] | [value] | [value] |
| Peak Concurrent Users | [value] | [value] | [value] |
| Geographic Distribution | [regions] | [regions] | [regions] |
 
### 2.2 Throughput Scale
| Metric | Day 1 | Year 1 | Year 3 |
|--------|-------|--------|--------|
| Read Requests/Second | [value] | [value] | [value] |
| Write Requests/Second | [value] | [value] | [value] |
| Peak:Average Ratio | [value] | [value] | [value] |
 
### 2.3 Data Scale
| Metric | Day 1 | Year 1 | Year 3 |
|--------|-------|--------|--------|
| Total Storage Volume | [value] | [value] | [value] |
| Daily Data Ingestion | [value] | [value] | [value] |
| Data Retention Period | [value] | [value] | [value] |
 
## 3. Scaling Strategy
 
### 3.1 Scaling Approach
- Compute Tier: [Horizontal/Vertical/Hybrid]
- Database Tier: [Approach with details]
- Cache Tier: [Approach]
 
### 3.2 Elastic Scaling Parameters
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| Scale-up trigger | [condition] | [why] |
| Scale-up increment | [amount] | [why] |
| Scale-up latency | [time] | [why] |
| Scale-down trigger | [condition] | [why] |
| Min instances | [count] | [why] |
| Max instances | [count] | [why] |
 
## 4. Scalability Constraints
 
### 4.1 Known Bottlenecks
- Primary: [component] at [capacity]
- Secondary: [component] at [capacity]
 
### 4.2 Architectural Limitations
- [Limitation with quantified impact]
 
## 5. Testing Requirements
 
### 5.1 Load Testing
- Frequency: [schedule]
- Targets: [specifications]
- Pass criteria: [metrics]
 
### 5.2 Stress Testing
- Frequency: [schedule]
- Methodology: [approach]
- Documentation: [requirements]
 
## 6. Monitoring and Alerting
- Key metrics: [list]
- Alert thresholds: [specifications]
- Dashboard requirements: [details]

Scalability Requirement Anti-Patterns to Avoid

•❌ 'The system should be highly scalable' → ✅ 'The system shall scale to 100K RPS with linear cost increase'
•❌ 'Handle millions of users' → ✅ 'Support 5M DAU with peak concurrency of 500K across 3 regions'
•❌ 'Scale as needed' → ✅ 'Auto-scale between 10 and 200 instances based on CPU > 70%'
•❌ 'Fast response times at scale' → ✅ 'Maintain P99 < 200ms at 50K RPS'
•❌ 'Future-proof architecture' → ✅ 'Architecture supports 10x current scale without re-design'

Summary: Scalability Requirements Mastery

We have covered the complete landscape of scalability requirements. Let's consolidate the essential takeaways:

Key Takeaways

•Scalability is multi-dimensional — User scale, data scale, geographic scale, and feature scale each require separate consideration.
•Vertical vs. horizontal scaling — Understand the trade-offs and specify which strategy your architecture supports.
•Quantify with precision — Use the three-horizon framework (Day 1, Year 1, Year 3) with specific metrics.
•Define elastic scaling parameters — Triggers, increments, cooldowns, and limits transform capacity from static to dynamic.
•Document constraints explicitly — Every system has bottlenecks; identify and quantify them.
•Specify testing requirements — Load, stress, spike, and soak tests verify your scalability claims.
•Use the SMART-S framework — Specific, Measurable, Achievable, Relevant, Time-bound, and Scalable requirements.

What's Next:

Page Complete

1 / 5