Loading learning content...
We've explored vertical and horizontal scaling in depth and analyzed their trade-offs across multiple dimensions. Now comes the practical question: given a specific system, workload, and organizational context, which approach should you choose?
This page provides concrete decision frameworks. Not abstract principles, but actionable guidance: "If X, then Y." The goal is to transform your understanding of trade-offs into confident decision-making.
Of course, real systems resist simple rules. But having default positions—informed starting points that you can deviate from with evidence—dramatically improves decision quality. Let's build those defaults.
By the end of this page, you will have clear decision criteria for scaling approach selection, understand which workload characteristics strongly favor each approach, know how to map organizational context to architectural decisions, and be equipped with a practical decision tree that applies to most common scenarios.
Different workloads have different characteristics that favor different scaling approaches. Let's examine the key workload dimensions.
Request Volume and Patterns:
Low to moderate request volume (< 10,000 RPS)
Most web applications fall here. A single well-configured server can handle 5,000-20,000 RPS for typical web traffic. At this scale, horizontal scaling is rarely required for capacity—though it may be desired for availability.
Decision: Prefer vertical scaling unless availability requirements demand redundancy.
High request volume (> 10,000 RPS)
Above 10K RPS, vertical scaling becomes constrained:
Decision: Horizontal scaling of stateless tiers. Keep stateful tiers (database) vertically scaled as long as possible.
Extreme request volume (> 100,000 RPS)
At this scale, you're operating internet infrastructure. YouTube, Twitter, Netflix territory.
Decision: Full horizontal scaling across all tiers. Advanced patterns (geographic distribution, tiered caching, traffic shaping) required.
| Request Volume | Typical Systems | Recommended Approach |
|---|---|---|
| < 1,000 RPS | Most SaaS apps, internal tools, small-to-medium consumer apps | Vertical scaling (single server + database) |
| 1,000 - 10,000 RPS | Popular consumer apps, high-traffic SaaS, mid-size e-commerce | Vertical database + horizontally scaled API tier |
| 10,000 - 100,000 RPS | Large-scale consumer apps, major e-commerce, popular games | Horizontal API tier + sharded/replicated database |
100,000 RPS | Internet giants, CDN, real-time platforms | Full horizontal across all layers, geo-distribution |
Data Volume:
Small data sets (< 100GB)
Fits comfortably in RAM on a single server. All queries can be memory-resident for maximum performance. Sharding adds complexity without benefit.
Decision: Vertical scaling. Use a single database with plenty of RAM.
Medium data sets (100GB - 1TB)
Still fits on a single server with sufficient RAM. Modern databases with good indexing can handle this efficiently. Sharding may be relevant for write scaling but not capacity.
Decision: Vertical scaling remains optimal. Reserve horizontal for specific performance needs.
Large data sets (1TB - 100TB)
Pushing the limits of single-server capacity. Sharding or distributed databases become necessary. But not every query needs the full data set—consider hybrid approaches.
Decision: Evaluate sharding for high-volume tables. Keep low-volume tables on a single node.
Very large data sets (> 100TB)
Clearly beyond single-server capacity. Distributed storage is required. This is data warehouse, analytics, or large-scale user-generated content territory.
Decision: Distributed databases or data lakes (BigQuery, Snowflake, Spark on HDFS) are necessary.
Access Pattern Considerations:
Read-heavy workloads (> 90% reads)
Replication is highly effective. Add read replicas to scale reads while keeping a single write primary. This is the "sweet spot" for easy scaling.
Decision: Vertical primary + horizontal read replicas. This combination is simple and effective.
Write-heavy workloads (> 30% writes)
Replication helps less; all writes still go to the primary. Sharding or accepting eventual consistency becomes necessary.
Decision: If writes must be consistent, shard by partition key. If eventual consistency is acceptable, consider leaderless replication (Cassandra, DynamoDB).
90% of systems never need true horizontal scaling of their database. They need: better indexing, query optimization, caching, or read replicas. Before sharding, exhaust these simpler options. Sharding is a one-way door that adds permanent complexity.
Availability requirements are often the primary driver for horizontal scaling—not capacity. Different availability targets require different approaches.
Understanding the nines:
Availability is typically expressed as "nines"—99.9% is "three nines," 99.99% is "four nines." Each additional nine requires roughly 10× the effort and cost.
| Availability | Annual Downtime | Typical For | Minimum Architecture |
|---|---|---|---|
| 99% (two nines) | 3.65 days | Internal tools, batch systems | Single server, automated restart |
| 99.9% (three nines) | 8.76 hours | Most SaaS, B2B apps | Single server + automated failover |
| 99.95% | 4.38 hours | E-commerce, consumer apps | Active-passive with warm standby |
| 99.99% (four nines) | 52.6 minutes | Payment systems, healthcare | Multi-node active-active, multi-AZ |
| 99.999% (five nines) | 5.26 minutes | Telecom, core banking, emergency services | Multi-region active-active, extensive redundancy |
Mapping availability to scaling:
99% - 99.9% availability (most applications)
Achievable with vertical scaling and basic redundancy:
Decision: Vertical scaling with active-passive redundancy. Horizontal scaling not required.
99.95% - 99.99% availability (high-reliability applications)
Requires eliminating single points of failure:
Decision: Horizontal scaling at load balancer and application tier. Evaluate database: replicated primary-standby often sufficient.
99.999%+ availability (critical infrastructure)
Requires eliminating correlated failures:
Decision: Full horizontal scaling and geographic distribution. This is a significant investment.
The cost step function:
Moving from 99% to 99.9% might cost 2× more. Moving from 99.9% to 99.99% might cost 5-10× more. Moving from 99.99% to 99.999% might cost 10-20× more. These costs include infrastructure, engineering, and operational complexity.
Most applications should target 99.9% and be honest about whether higher targets are genuinely required by the business.
Many organizations claim they need 99.99% availability but have never calculated the business impact of downtime. An hour of downtime costing $10,000 doesn't justify spending $500,000/year on infrastructure to prevent it. Calculate your real cost of downtime before setting availability targets.
Latency requirements create unique constraints because horizontal scaling can actually increase latency due to network overhead. Understanding this is crucial for latency-sensitive applications.
Latency categories:
Interactive latency (< 100ms p99)
Users perceive latency above 100ms as "slow." Applications needing snappy feel must achieve this target.
At this latency target, every network hop hurts. A cross-datacenter database call (5-10ms) consumes 5-10% of your budget. A complex service mesh with 5 hops adds 5-15ms overhead.
Decision: Minimize distribution. Vertical scaling where possible. If horizontal is required, co-locate services to minimize network hops. Consider data locality optimizations.
Near-real-time latency (< 50ms p99)
Gaming, live collaboration, trading applications. Users notice delays and the experience degrades.
Decision: Vertical scaling strongly preferred. Any horizontal scaling must be within the same datacenter, ideally same rack. Use kernel-bypass networking and optimized data structures for the hottest paths.
Real-time latency (< 10ms p99)
High-frequency trading, voice/video processing, industrial control systems.
Decision: Vertical scaling is almost mandatory. These systems often use specialized hardware, in-memory processing, and FPGA acceleration. Distribution introduces unacceptable latency.
| Category | Typical Latency Addition | Impact at 100ms Budget | Impact at 10ms Budget |
|---|---|---|---|
| Same-machine (local socket) | < 0.1ms | < 0.1% | < 1% |
| Same-rack (top-of-rack switch) | ~0.1ms | ~0.1% | ~1% |
| Same-datacenter (cross-rack) | ~0.5ms | ~0.5% | ~5% |
| Same-region (cross-AZ) | 1-3ms | 1-3% | 10-30% |
| Cross-region (US-East to US-West) | 50-70ms | 50-70% | IMPOSSIBLE |
| Cross-continent (US to EU) | 70-120ms | 70-120% (over budget) | IMPOSSIBLE |
| Global (US to Asia) | 150-250ms | 150-250% (far over budget) | IMPOSSIBLE |
Geographic distribution paradox:
For global applications with tight latency requirements, horizontal scaling becomes required even though it adds latency:
Decision: Global latency requirements mandate geographic distribution, but each region can be (and often should be) vertically scaled. Run independent stacks in each region rather than distributing a single system globally.
The hybrid pattern for global low-latency:
┌─────────────────────────────────────────────────────────────────┐
│ GLOBAL ROUTING │
│ (GeoDNS, Anycast, Edge LB) │
└───────────────────────────┬─────────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ US-EAST │ │ EU-WEST │ │ AP-NORTHEAST │
│ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Vertically │ │ │ │ Vertically │ │ │ │ Vertically │ │
│ │ Scaled │ │ │ │ Scaled │ │ │ │ Scaled │ │
│ │ Stack │ │ │ │ Stack │ │ │ │ Stack │ │
│ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Each region runs an independent, vertically-scaled stack. No cross-region calls in the request path. Data replication between regions happens asynchronously.
Many applications assume they need low latency without measuring their current latency or understanding user impact. Before optimizing for latency, measure your current p50, p90, p99. Run user tests at different latency levels. Often, going from 200ms to 100ms has less user impact than going from 100ms to 50ms—and costs far less.
Technical requirements are only half the picture. Organizational context strongly influences which scaling approach is practical and sustainable.
Team size and structure:
| Team Size | Architecture Recommendation | Rationale |
|---|---|---|
| 1-5 engineers | Single vertically-scaled system | No capacity for distributed systems complexity; speed of development is critical |
| 5-15 engineers | Modular monolith with selected scaling | Can handle some complexity; modules may enable future service extraction |
| 15-50 engineers | Selective microservices (3-10 services) | Team size supports several independent teams; Conway's Law enables some service ownership |
| 50-200 engineers | Microservices architecture (10-50 services) | Many teams require independence; deployment coordination becomes a bottleneck otherwise |
200 engineers | Full microservices + platform teams | At this scale, shared services and platform teams enable efficiency |
Expertise and experience:
Team with limited distributed systems experience
Distributed systems have failure modes that even experienced engineers find surprising. A team learning distributed systems will make mistakes that cause production incidents.
Decision: Start with vertical scaling. Introduce distribution gradually as the team builds expertise. Have senior engineers or consultants review distributed designs.
Team with distributed systems expertise
Experienced teams can navigate distributed complexity efficiently. They've seen the failure modes and built the mental models.
Decision: Choose based purely on technical requirements. The team can handle either approach.
Hiring constraints:
If finding distributed systems engineers is hard in your market, building a distributed system creates a bottleneck:
Decision: If you can't hire for it, don't build it. A simpler architecture that generalist engineers can maintain is more sustainable.
Operational maturity:
Horizontal scaling requires operational capabilities that vertical scaling doesn't:
| Capability | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| Monitoring | Basic server monitoring | Distributed tracing, fleet aggregation |
| Deployment | Standard CI/CD | Rolling deploys, canaries, feature flags |
| Incident response | SSH, check logs | Runbooks, automated remediation |
| On-call | One system to understand | Many services to understand |
| Debugging | Standard tools | Cross-service correlation |
Decision: If your operational maturity is low, horizontal scaling will be painful. Build operational capability before adding distributed complexity—or accept that incidents will be more frequent and longer.
Ask yourself: Can my current team debug a distributed transaction that failed across three services with a network partition in the middle? If not, and you build that system, you'll have production incidents you can't resolve quickly. Match architecture to team capability, not aspiration.
Technical architecture exists to serve business goals. Business context shapes which scaling trade-offs are acceptable.
Stage of company:
Early stage / Pre-product-market-fit
The product will change dramatically. Features will be added, removed, and radically redesigned. Speed of iteration is everything.
Decision: Maximize simplicity. Vertical scaling. Single database. Optimize for developer velocity above all else.
Growth stage / Scaling what works
Product-market fit is established. User base is growing. The product is relatively stable but scale is increasing.
Decision: Begin selective horizontal scaling where bottlenecks appear. Prioritize stateless tier scaling and database optimization.
Mature stage / Optimizing efficiency
Growth is slower but base is large. Efficiency and cost optimization matter. Reliability expectations are high.
Decision: Right-size architecture. This might mean more horizontal scaling for cost efficiency at scale, or conversely, consolidation to reduce operational overhead.
Business model considerations:
| Business Model | Key Technical Priority | Scaling Implication |
|---|---|---|
| B2B SaaS (< 1000 customers) | Feature velocity, stability | Vertical scaling usually sufficient; availability via HA pairs |
| B2B SaaS (1000+ customers) | Multi-tenant isolation, SLAs | Horizontal for tenant isolation; vertical per tenant where practical |
| Consumer app (free, ad-supported) | Scale, cost efficiency | Horizontal scaling to handle scale at low cost-per-user |
| Consumer app (subscription) | Reliability, feature velocity | Balance based on user expectations and competitive pressure |
| E-commerce | Availability during peaks, speed | Horizontal for burst capacity; optimize checkout path |
| FinTech | Reliability, correctness, compliance | Vertical where possible for simplicity; horizontal for availability requirements |
| Gaming | Low latency, scale for events | Vertical for game servers; horizontal for matchmaking and events |
Cost of downtime:
Different businesses have different costs of downtime:
Low cost of downtime: Internal tools, back-office applications, low-traffic B2B apps. Users can wait or retry. Downtime is annoying but not damaging.
Decision: Optimize for simplicity and velocity. Some downtime is acceptable.
Medium cost of downtime: Standard SaaS, consumer apps, e-commerce during normal periods. Users may go to competitors; reputation may suffer.
Decision: Standard availability (99.9%). Active-passive redundancy. Rapid recovery procedures.
High cost of downtime: Payment processing, healthcare, e-commerce during peak periods (Black Friday), real-time services. Downtime has direct financial or safety impact.
Decision: High availability (99.99%+). Active-active redundancy. Multi-zone/multi-region deployment. This justifies horizontal scaling investment.
Regulatory and compliance:
Some industries have regulatory requirements that affect architecture:
Decision: Compliance requirements can mandate horizontal scaling (for isolation) or favor vertical scaling (for simplified audit).
Architecture is an investment. Vertical scaling costs less upfront and has lower ongoing maintenance. Horizontal scaling costs more upfront but can reduce per-user costs at scale and enable capabilities (availability, geographic reach) that may have direct revenue impact. Frame scaling decisions as business investments with quantified costs and benefits.
Here's a practical decision tree that synthesizes our criteria. Use this as a starting point; real situations may require deviation based on specific context.
START HERE: What is your dominant constraint?
┌─────────────────────────────────────┐
│ What is your dominant constraint? │
└───────────────────┬─────────────────┘
│
┌─────────────┬─────────┴─────────┬─────────────┐
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│Capacity │ │Availa- │ │Latency │ │Develop- │
│(Volume) │ │bility │ │ │ │ment │
│ │ │(Uptime) │ │ │ │Velocity │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
▼ ▼ ▼ ▼
See A See B See C See D
[A] Capacity is the constraint:
┌──────────────────────────────┐
│ Peak load > 10,000 RPS? │
└──────────────┬───────────────┘
│
┌──────────┴──────────┐
No Yes
│ │
▼ ▼
┌────────────┐ ┌────────────────┐
│ VERTICAL │ │ Data volume │
│ Single │ │ > 1TB? │
│ powerful │ └───────┬────────┘
│ server │ │
└────────────┘ ┌───────┴───────┐
No Yes
│ │
▼ ▼
┌────────────┐ ┌────────────┐
│ HORIZONTAL │ │ HORIZONTAL │
│ Stateless │ │ + Sharded │
│ tier only │ │ Database │
└────────────┘ └────────────┘
[B] Availability is the constraint:
┌──────────────────────────────┐
│ Required availability? │
└──────────────┬───────────────┘
│
┌──────────┼──────────┬─────────────┐
99.9% 99.99% 99.999%
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────────┐
│VERTICAL │ │HORIZONTAL│ │GEOGRAPHIC │
│+ Active │ │Multi-AZ │ │Multi-Region │
│Passive │ │Redundancy│ │Active-Active│
└─────────┘ └─────────┘ └─────────────┘
[C] Latency is the constraint:
┌──────────────────────────────┐
│ Target latency (p99)? │
└──────────────┬───────────────┘
│
┌──────────┼───────────────┬─────────────┐
>100ms 50-100ms <50ms
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────────┐
│Either │ │Minimize │ │VERTICAL │
│approach │ │network │ │Co-located │
│works │ │hops │ │Specialized │
└─────────┘ └─────────┘ └─────────────┘
NOTE: For global low-latency, geographic distribution
becomes necessary (horizontal), but each region should
be vertically optimized.
[D] Development velocity is the constraint:
┌──────────────────────────────┐
│ Team size? │
└──────────────┬───────────────┘
│
┌──────────┼──────────┬─────────────┐
<15 15-50 >50 engineers
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────────┐
│VERTICAL │ │Modular │ │Services may │
│Monolith │ │Monolith │ │improve │
│for │ │with │ │velocity via │
│simplicity│SELECT │ │ │independence │
└─────────┘ │services │ └─────────────┘
└─────────┘
Most real systems should use a hybrid: vertically scaled database (as long as possible) with horizontally scaled stateless application tier (for availability and deployment flexibility). This combination captures most of horizontal scaling's benefits while avoiding its hardest problems (distributed data). Deviate from this default only with clear justification.
Let's apply the framework to common real-world scenarios:
Scenario 1: Early-stage startup, new product
Context: 5 engineers, finding product-market fit, 1,000 DAU, uncertain growth
Decision: Vertical scaling
Why: Development velocity is everything. You'll rebuild this system three times before you need to scale it.
Scenario 2: B2B SaaS with steady growth
Context: 20 engineers, 500 enterprise customers, 50,000 DAU, 99.9% SLA commitments
Decision: Hybrid—vertical database, horizontal API tier
Why: Enterprise customers expect reliability. Horizontal API tier provides availability without database complexity. This architecture handles 10× growth.
Scenario 3: Consumer mobile app with high engagement
Context: 50 engineers, 10M MAU, global users, real-time features, high traffic variability
Decision: Full horizontal with geographic distribution
Why: Scale and global latency requirements mandate distribution. Engineering team is large enough to handle complexity.
| Scenario | Primary Approach | Key Considerations |
|---|---|---|
| Internal tool / Back office | Vertical | Low traffic, low availability needs |
| MVP / New product | Vertical | Speed of iteration is paramount |
| Small-medium B2B SaaS | Hybrid (vertical DB, horizontal API) | Availability for SLA; DB complexity not needed |
| High-traffic B2C app (single region) | Horizontal | Capacity required; latency not global |
| Global consumer app | Horizontal + Geographic | Latency requirements mandate distribution |
| Real-time gaming / Trading | Vertical with HA | Latency critical; minimize network hops |
| Data-intensive analytics | Horizontal (specialized) | Data volume exceeds single node |
| Multi-tenant enterprise SaaS | Hybrid (tenant isolation varies) | Large tenants may need dedicated resources |
These recommendations assume typical characteristics. Your specific situation may differ. A B2B SaaS serving 3 massive enterprises with 100,000 users each has different needs than one serving 500 small businesses. Always validate recommendations against your actual constraints.
We've built a comprehensive framework for scaling decisions. The key is having informed defaults that you can override when evidence warrants.
What's next:
With "when to use which" addressed, we'll examine the practical limits of both approaches. The final page explores the real-world ceilings: what happens when you push vertical scaling to its maximum, and what happens when horizontal scaling's complexity becomes its own bottleneck.
You now have a practical decision framework for scaling approach selection. This framework—grounded in workload characteristics, availability requirements, latency constraints, and organizational context—enables confident, defensible architectural decisions for any system you encounter.