System Design (HLD)Horizontal vs Vertical Scaling

Horizontal vs Vertical Scaling

LevelIntermediate

Duration75 mins

TopicHorizontal vs Vertical Scaling

4 / 5

When to Use Which

From Theory to Decision

We've explored vertical and horizontal scaling in depth and analyzed their trade-offs across multiple dimensions. Now comes the practical question: given a specific system, workload, and organizational context, which approach should you choose?

This page provides concrete decision frameworks. Not abstract principles, but actionable guidance: "If X, then Y." The goal is to transform your understanding of trade-offs into confident decision-making.

Of course, real systems resist simple rules. But having default positions—informed starting points that you can deviate from with evidence—dramatically improves decision quality. Let's build those defaults.

What You Will Master

By the end of this page, you will have clear decision criteria for scaling approach selection, understand which workload characteristics strongly favor each approach, know how to map organizational context to architectural decisions, and be equipped with a practical decision tree that applies to most common scenarios.

Workload-Based Decision Criteria

Different workloads have different characteristics that favor different scaling approaches. Let's examine the key workload dimensions.

Request Volume and Patterns:

Low to moderate request volume (< 10,000 RPS)

Most web applications fall here. A single well-configured server can handle 5,000-20,000 RPS for typical web traffic. At this scale, horizontal scaling is rarely required for capacity—though it may be desired for availability.

Decision: Prefer vertical scaling unless availability requirements demand redundancy.

High request volume (> 10,000 RPS)

Above 10K RPS, vertical scaling becomes constrained:

Thread scheduling overhead on very high core counts
Network stack becomes a bottleneck
Single points of failure are high-impact

Decision: Horizontal scaling of stateless tiers. Keep stateful tiers (database) vertically scaled as long as possible.

Extreme request volume (> 100,000 RPS)

At this scale, you're operating internet infrastructure. YouTube, Twitter, Netflix territory.

Decision: Full horizontal scaling across all tiers. Advanced patterns (geographic distribution, tiered caching, traffic shaping) required.

Scaling Approach by Request Volume
Request Volume	Typical Systems	Recommended Approach
< 1,000 RPS	Most SaaS apps, internal tools, small-to-medium consumer apps	Vertical scaling (single server + database)
1,000 - 10,000 RPS	Popular consumer apps, high-traffic SaaS, mid-size e-commerce	Vertical database + horizontally scaled API tier
10,000 - 100,000 RPS	Large-scale consumer apps, major e-commerce, popular games	Horizontal API tier + sharded/replicated database
100,000 RPS	Internet giants, CDN, real-time platforms	Full horizontal across all layers, geo-distribution

Data Volume:

Small data sets (< 100GB)

Fits comfortably in RAM on a single server. All queries can be memory-resident for maximum performance. Sharding adds complexity without benefit.

Decision: Vertical scaling. Use a single database with plenty of RAM.

Medium data sets (100GB - 1TB)

Still fits on a single server with sufficient RAM. Modern databases with good indexing can handle this efficiently. Sharding may be relevant for write scaling but not capacity.

Decision: Vertical scaling remains optimal. Reserve horizontal for specific performance needs.

Large data sets (1TB - 100TB)

Pushing the limits of single-server capacity. Sharding or distributed databases become necessary. But not every query needs the full data set—consider hybrid approaches.

Decision: Evaluate sharding for high-volume tables. Keep low-volume tables on a single node.

Very large data sets (> 100TB)

Clearly beyond single-server capacity. Distributed storage is required. This is data warehouse, analytics, or large-scale user-generated content territory.

Decision: Distributed databases or data lakes (BigQuery, Snowflake, Spark on HDFS) are necessary.

Access Pattern Considerations:

Read-heavy workloads (> 90% reads)

Replication is highly effective. Add read replicas to scale reads while keeping a single write primary. This is the "sweet spot" for easy scaling.

Decision: Vertical primary + horizontal read replicas. This combination is simple and effective.

Write-heavy workloads (> 30% writes)

Replication helps less; all writes still go to the primary. Sharding or accepting eventual consistency becomes necessary.

Decision: If writes must be consistent, shard by partition key. If eventual consistency is acceptable, consider leaderless replication (Cassandra, DynamoDB).

The 90% Rule

90% of systems never need true horizontal scaling of their database. They need: better indexing, query optimization, caching, or read replicas. Before sharding, exhaust these simpler options. Sharding is a one-way door that adds permanent complexity.

Availability-Based Decision Criteria

Availability requirements are often the primary driver for horizontal scaling—not capacity. Different availability targets require different approaches.

Understanding the nines:

Availability is typically expressed as "nines"—99.9% is "three nines," 99.99% is "four nines." Each additional nine requires roughly 10× the effort and cost.

Availability Targets and Architectural Implications
Availability	Annual Downtime	Typical For	Minimum Architecture
99% (two nines)	3.65 days	Internal tools, batch systems	Single server, automated restart
99.9% (three nines)	8.76 hours	Most SaaS, B2B apps	Single server + automated failover
99.95%	4.38 hours	E-commerce, consumer apps	Active-passive with warm standby
99.99% (four nines)	52.6 minutes	Payment systems, healthcare	Multi-node active-active, multi-AZ
99.999% (five nines)	5.26 minutes	Telecom, core banking, emergency services	Multi-region active-active, extensive redundancy

Mapping availability to scaling:

99% - 99.9% availability (most applications)

Achievable with vertical scaling and basic redundancy:

Single primary server handling traffic
Warm standby for failover (or database replica that can be promoted)
Automated health checking and failover
5-15 minute recovery time acceptable

Decision: Vertical scaling with active-passive redundancy. Horizontal scaling not required.

99.95% - 99.99% availability (high-reliability applications)

Requires eliminating single points of failure:

Multiple active nodes sharing traffic
Database with synchronous replication
Multi-availability-zone deployment
Sub-minute failover

Decision: Horizontal scaling at load balancer and application tier. Evaluate database: replicated primary-standby often sufficient.

99.999%+ availability (critical infrastructure)

Requires eliminating correlated failures:

Multi-region deployment with active traffic in each
Independent deployment pipelines per region
Chaos engineering to validate resilience
Zero-downtime deployment practices

Decision: Full horizontal scaling and geographic distribution. This is a significant investment.

The cost step function:

Moving from 99% to 99.9% might cost 2× more. Moving from 99.9% to 99.99% might cost 5-10× more. Moving from 99.99% to 99.999% might cost 10-20× more. These costs include infrastructure, engineering, and operational complexity.

Most applications should target 99.9% and be honest about whether higher targets are genuinely required by the business.

Availability Theater

Many organizations claim they need 99.99% availability but have never calculated the business impact of downtime. An hour of downtime costing $10,000 doesn't justify spending $500,000/year on infrastructure to prevent it. Calculate your real cost of downtime before setting availability targets.

Latency-Based Decision Criteria

Latency requirements create unique constraints because horizontal scaling can actually increase latency due to network overhead. Understanding this is crucial for latency-sensitive applications.

Latency categories:

Interactive latency (< 100ms p99)

Users perceive latency above 100ms as "slow." Applications needing snappy feel must achieve this target.

At this latency target, every network hop hurts. A cross-datacenter database call (5-10ms) consumes 5-10% of your budget. A complex service mesh with 5 hops adds 5-15ms overhead.

Decision: Minimize distribution. Vertical scaling where possible. If horizontal is required, co-locate services to minimize network hops. Consider data locality optimizations.

Near-real-time latency (< 50ms p99)

Gaming, live collaboration, trading applications. Users notice delays and the experience degrades.

Decision: Vertical scaling strongly preferred. Any horizontal scaling must be within the same datacenter, ideally same rack. Use kernel-bypass networking and optimized data structures for the hottest paths.

Real-time latency (< 10ms p99)

High-frequency trading, voice/video processing, industrial control systems.

Decision: Vertical scaling is almost mandatory. These systems often use specialized hardware, in-memory processing, and FPGA acceleration. Distribution introduces unacceptable latency.

Network Latency Budget Impact
Category	Typical Latency Addition	Impact at 100ms Budget	Impact at 10ms Budget
Same-machine (local socket)	< 0.1ms	< 0.1%	< 1%
Same-rack (top-of-rack switch)	~0.1ms	~0.1%	~1%
Same-datacenter (cross-rack)	~0.5ms	~0.5%	~5%
Same-region (cross-AZ)	1-3ms	1-3%	10-30%
Cross-region (US-East to US-West)	50-70ms	50-70%	IMPOSSIBLE
Cross-continent (US to EU)	70-120ms	70-120% (over budget)	IMPOSSIBLE
Global (US to Asia)	150-250ms	150-250% (far over budget)	IMPOSSIBLE

Geographic distribution paradox:

For global applications with tight latency requirements, horizontal scaling becomes required even though it adds latency:

A user in Tokyo accessing a San Francisco server has 120-150ms round-trip baseline
No amount of optimization on the server can overcome physics (speed of light)
To serve Tokyo users with <100ms latency, you need infrastructure in Asia

Decision: Global latency requirements mandate geographic distribution, but each region can be (and often should be) vertically scaled. Run independent stacks in each region rather than distributing a single system globally.

The hybrid pattern for global low-latency:

┌─────────────────────────────────────────────────────────────────┐
│                       GLOBAL ROUTING                            │
│                  (GeoDNS, Anycast, Edge LB)                     │
└───────────────────────────┬─────────────────────────────────────┘
                            │
         ┌──────────────────┼──────────────────┐
         ▼                  ▼                  ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│   US-EAST       │ │   EU-WEST       │ │   AP-NORTHEAST  │
│ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Vertically  │ │ │ │ Vertically  │ │ │ │ Vertically  │ │
│ │ Scaled      │ │ │ │ Scaled      │ │ │ │ Scaled      │ │
│ │ Stack       │ │ │ │ Stack       │ │ │ │ Stack       │ │
│ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘ └─────────────────┘

Each region runs an independent, vertically-scaled stack. No cross-region calls in the request path. Data replication between regions happens asynchronously.

Measure Before Assuming

Many applications assume they need low latency without measuring their current latency or understanding user impact. Before optimizing for latency, measure your current p50, p90, p99. Run user tests at different latency levels. Often, going from 200ms to 100ms has less user impact than going from 100ms to 50ms—and costs far less.

Organizational Decision Criteria

Technical requirements are only half the picture. Organizational context strongly influences which scaling approach is practical and sustainable.

Team size and structure:

Team Size Recommendations
Team Size	Architecture Recommendation	Rationale
1-5 engineers	Single vertically-scaled system	No capacity for distributed systems complexity; speed of development is critical
5-15 engineers	Modular monolith with selected scaling	Can handle some complexity; modules may enable future service extraction
15-50 engineers	Selective microservices (3-10 services)	Team size supports several independent teams; Conway's Law enables some service ownership
50-200 engineers	Microservices architecture (10-50 services)	Many teams require independence; deployment coordination becomes a bottleneck otherwise
200 engineers	Full microservices + platform teams	At this scale, shared services and platform teams enable efficiency

Expertise and experience:

Team with limited distributed systems experience

Distributed systems have failure modes that even experienced engineers find surprising. A team learning distributed systems will make mistakes that cause production incidents.

Decision: Start with vertical scaling. Introduce distribution gradually as the team builds expertise. Have senior engineers or consultants review distributed designs.

Team with distributed systems expertise

Experienced teams can navigate distributed complexity efficiently. They've seen the failure modes and built the mental models.

Decision: Choose based purely on technical requirements. The team can handle either approach.

Hiring constraints:

If finding distributed systems engineers is hard in your market, building a distributed system creates a bottleneck:

Existing engineers are overloaded maintaining the system
New hires can't contribute effectively
On-call burden falls on few people

Decision: If you can't hire for it, don't build it. A simpler architecture that generalist engineers can maintain is more sustainable.

Operational maturity:

Horizontal scaling requires operational capabilities that vertical scaling doesn't:

Capability	Vertical Scaling	Horizontal Scaling
Monitoring	Basic server monitoring	Distributed tracing, fleet aggregation
Deployment	Standard CI/CD	Rolling deploys, canaries, feature flags
Incident response	SSH, check logs	Runbooks, automated remediation
On-call	One system to understand	Many services to understand
Debugging	Standard tools	Cross-service correlation

Decision: If your operational maturity is low, horizontal scaling will be painful. Build operational capability before adding distributed complexity—or accept that incidents will be more frequent and longer.

The Honest Assessment

Ask yourself: Can my current team debug a distributed transaction that failed across three services with a network partition in the middle? If not, and you build that system, you'll have production incidents you can't resolve quickly. Match architecture to team capability, not aspiration.

Business-Based Decision Criteria

Technical architecture exists to serve business goals. Business context shapes which scaling trade-offs are acceptable.

Stage of company:

Early stage / Pre-product-market-fit

The product will change dramatically. Features will be added, removed, and radically redesigned. Speed of iteration is everything.

Decision: Maximize simplicity. Vertical scaling. Single database. Optimize for developer velocity above all else.

Growth stage / Scaling what works

Product-market fit is established. User base is growing. The product is relatively stable but scale is increasing.

Decision: Begin selective horizontal scaling where bottlenecks appear. Prioritize stateless tier scaling and database optimization.

Mature stage / Optimizing efficiency

Growth is slower but base is large. Efficiency and cost optimization matter. Reliability expectations are high.

Decision: Right-size architecture. This might mean more horizontal scaling for cost efficiency at scale, or conversely, consolidation to reduce operational overhead.

Business model considerations:

Business Model and Scaling Alignment
Business Model	Key Technical Priority	Scaling Implication
B2B SaaS (< 1000 customers)	Feature velocity, stability	Vertical scaling usually sufficient; availability via HA pairs
B2B SaaS (1000+ customers)	Multi-tenant isolation, SLAs	Horizontal for tenant isolation; vertical per tenant where practical
Consumer app (free, ad-supported)	Scale, cost efficiency	Horizontal scaling to handle scale at low cost-per-user
Consumer app (subscription)	Reliability, feature velocity	Balance based on user expectations and competitive pressure
E-commerce	Availability during peaks, speed	Horizontal for burst capacity; optimize checkout path
FinTech	Reliability, correctness, compliance	Vertical where possible for simplicity; horizontal for availability requirements
Gaming	Low latency, scale for events	Vertical for game servers; horizontal for matchmaking and events

Cost of downtime:

Different businesses have different costs of downtime:

Low cost of downtime: Internal tools, back-office applications, low-traffic B2B apps. Users can wait or retry. Downtime is annoying but not damaging.

Decision: Optimize for simplicity and velocity. Some downtime is acceptable.

Medium cost of downtime: Standard SaaS, consumer apps, e-commerce during normal periods. Users may go to competitors; reputation may suffer.

Decision: Standard availability (99.9%). Active-passive redundancy. Rapid recovery procedures.

High cost of downtime: Payment processing, healthcare, e-commerce during peak periods (Black Friday), real-time services. Downtime has direct financial or safety impact.

Decision: High availability (99.99%+). Active-active redundancy. Multi-zone/multi-region deployment. This justifies horizontal scaling investment.

Regulatory and compliance:

Some industries have regulatory requirements that affect architecture:

Data residency: Data must stay in certain jurisdictions → may require regional isolation
Audit requirements: All changes must be traceable → affects deployment and data patterns
Isolation requirements: Data from different customers mustn't mix → may require sharding or multi-tenancy patterns

Decision: Compliance requirements can mandate horizontal scaling (for isolation) or favor vertical scaling (for simplified audit).

Business ROI of Architecture

Architecture is an investment. Vertical scaling costs less upfront and has lower ongoing maintenance. Horizontal scaling costs more upfront but can reduce per-user costs at scale and enable capabilities (availability, geographic reach) that may have direct revenue impact. Frame scaling decisions as business investments with quantified costs and benefits.

The Scaling Decision Tree

Here's a practical decision tree that synthesizes our criteria. Use this as a starting point; real situations may require deviation based on specific context.

START HERE: What is your dominant constraint?

         ┌─────────────────────────────────────┐
         │   What is your dominant constraint? │
         └───────────────────┬─────────────────┘
                             │
     ┌─────────────┬─────────┴─────────┬─────────────┐
     ▼             ▼                   ▼             ▼
┌─────────┐  ┌─────────┐         ┌─────────┐  ┌─────────┐
│Capacity │  │Availa-  │         │Latency  │  │Develop- │
│(Volume) │  │bility   │         │         │  │ment     │
│         │  │(Uptime) │         │         │  │Velocity │
└────┬────┘  └────┬────┘         └────┬────┘  └────┬────┘
     │            │                   │            │
     ▼            ▼                   ▼            ▼
  See A        See B               See C        See D

[A] Capacity is the constraint:

┌──────────────────────────────┐
│  Peak load > 10,000 RPS?     │
└──────────────┬───────────────┘
               │
    ┌──────────┴──────────┐
    No                    Yes
    │                     │
    ▼                     ▼
┌────────────┐     ┌────────────────┐
│ VERTICAL   │     │ Data volume    │
│ Single     │     │ > 1TB?         │
│ powerful   │     └───────┬────────┘
│ server     │             │
└────────────┘     ┌───────┴───────┐
                   No              Yes
                   │               │
                   ▼               ▼
            ┌────────────┐  ┌────────────┐
            │ HORIZONTAL │  │ HORIZONTAL │
            │ Stateless  │  │ + Sharded  │
            │ tier only  │  │ Database   │
            └────────────┘  └────────────┘

[B] Availability is the constraint:

┌──────────────────────────────┐
│  Required availability?       │
└──────────────┬───────────────┘
               │
    ┌──────────┼──────────┬─────────────┐
   99.9%     99.99%      99.999%
    │          │            │
    ▼          ▼            ▼
┌─────────┐ ┌─────────┐  ┌─────────────┐
│VERTICAL │ │HORIZONTAL│  │GEOGRAPHIC   │
│+ Active │ │Multi-AZ  │  │Multi-Region │
│Passive  │ │Redundancy│  │Active-Active│
└─────────┘ └─────────┘  └─────────────┘

[C] Latency is the constraint:

┌──────────────────────────────┐
│  Target latency (p99)?       │
└──────────────┬───────────────┘
               │
    ┌──────────┼───────────────┬─────────────┐
  >100ms    50-100ms         <50ms
    │          │               │
    ▼          ▼               ▼
┌─────────┐ ┌─────────┐     ┌─────────────┐
│Either   │ │Minimize │     │VERTICAL     │
│approach │ │network  │     │Co-located   │
│works    │ │hops     │     │Specialized  │
└─────────┘ └─────────┘     └─────────────┘

NOTE: For global low-latency, geographic distribution
becomes necessary (horizontal), but each region should
be vertically optimized.

[D] Development velocity is the constraint:

┌──────────────────────────────┐
│  Team size?                  │
└──────────────┬───────────────┘
               │
    ┌──────────┼──────────┬─────────────┐
   <15       15-50       >50 engineers
    │          │            │
    ▼          ▼            ▼
┌─────────┐ ┌─────────┐  ┌─────────────┐
│VERTICAL │ │Modular  │  │Services may │
│Monolith │ │Monolith │  │improve      │
│for      │ │with     │  │velocity via │
│simplicity│SELECT │  │  │independence │
└─────────┘ │services │  └─────────────┘
            └─────────┘

The Hybrid Default

Most real systems should use a hybrid: vertically scaled database (as long as possible) with horizontally scaled stateless application tier (for availability and deployment flexibility). This combination captures most of horizontal scaling's benefits while avoiding its hardest problems (distributed data). Deviate from this default only with clear justification.

Common Scenario Recommendations

Let's apply the framework to common real-world scenarios:

Scenario 1: Early-stage startup, new product

Context: 5 engineers, finding product-market fit, 1,000 DAU, uncertain growth

Decision: Vertical scaling

Single server, single database
Focus all engineering on product development
Revisit when you hit 10,000 DAU or clear growth trajectory

Why: Development velocity is everything. You'll rebuild this system three times before you need to scale it.

Scenario 2: B2B SaaS with steady growth

Context: 20 engineers, 500 enterprise customers, 50,000 DAU, 99.9% SLA commitments

Decision: Hybrid—vertical database, horizontal API tier

Stateless API tier behind load balancer (3+ nodes per AZ)
Single primary PostgreSQL/MySQL with read replica
Redis for sessions and caching

Why: Enterprise customers expect reliability. Horizontal API tier provides availability without database complexity. This architecture handles 10× growth.

Scenario 3: Consumer mobile app with high engagement

Context: 50 engineers, 10M MAU, global users, real-time features, high traffic variability

Decision: Full horizontal with geographic distribution

Stateless API tier with auto-scaling per region
Sharded database or managed distributed database (CockroachDB, Spanner, DynamoDB)
Edge caching via CDN
Message queues for async processing

Why: Scale and global latency requirements mandate distribution. Engineering team is large enough to handle complexity.

Quick Reference: Scenario Recommendations
Scenario	Primary Approach	Key Considerations
Internal tool / Back office	Vertical	Low traffic, low availability needs
MVP / New product	Vertical	Speed of iteration is paramount
Small-medium B2B SaaS	Hybrid (vertical DB, horizontal API)	Availability for SLA; DB complexity not needed
High-traffic B2C app (single region)	Horizontal	Capacity required; latency not global
Global consumer app	Horizontal + Geographic	Latency requirements mandate distribution
Real-time gaming / Trading	Vertical with HA	Latency critical; minimize network hops
Data-intensive analytics	Horizontal (specialized)	Data volume exceeds single node
Multi-tenant enterprise SaaS	Hybrid (tenant isolation varies)	Large tenants may need dedicated resources

Scenarios Are Starting Points

These recommendations assume typical characteristics. Your specific situation may differ. A B2B SaaS serving 3 massive enterprises with 100,000 users each has different needs than one serving 500 small businesses. Always validate recommendations against your actual constraints.

Summary: Make It Your Default, Then Question It

We've built a comprehensive framework for scaling decisions. The key is having informed defaults that you can override when evidence warrants.

Key Takeaways

•Start with vertical scaling as the default — Deviation requires evidence of specific requirements that vertical can't meet.
•Identify your dominant constraint — Capacity, availability, latency, or velocity. Design for that constraint first.
•Match architecture to team — Distributed systems capability isn't free; don't build what you can't operate.
•Consider business context — Stage, model, and cost of failure influence acceptable trade-offs.
•Use the hybrid approach — Vertical database + horizontal stateless tier captures most benefits at manageable complexity.
•Revisit decisions as context changes — The right answer today may not be right in 18 months.

What's next:

With "when to use which" addressed, we'll examine the practical limits of both approaches. The final page explores the real-world ceilings: what happens when you push vertical scaling to its maximum, and what happens when horizontal scaling's complexity becomes its own bottleneck.

Page Complete

You now have a practical decision framework for scaling approach selection. This framework—grounded in workload characteristics, availability requirements, latency constraints, and organizational context—enables confident, defensible architectural decisions for any system you encounter.

4 / 5

Loading learning content...

System Design (HLD)Horizontal vs Vertical Scaling

Horizontal vs Vertical Scaling

LevelIntermediate

Duration75 mins

TopicHorizontal vs Vertical Scaling

4 / 5

When to Use Which

From Theory to Decision

What You Will Master

Workload-Based Decision Criteria

Different workloads have different characteristics that favor different scaling approaches. Let's examine the key workload dimensions.

Request Volume and Patterns:

Low to moderate request volume (< 10,000 RPS)

Decision: Prefer vertical scaling unless availability requirements demand redundancy.

High request volume (> 10,000 RPS)

Above 10K RPS, vertical scaling becomes constrained:

Thread scheduling overhead on very high core counts
Network stack becomes a bottleneck
Single points of failure are high-impact

Decision: Horizontal scaling of stateless tiers. Keep stateful tiers (database) vertically scaled as long as possible.

Extreme request volume (> 100,000 RPS)

At this scale, you're operating internet infrastructure. YouTube, Twitter, Netflix territory.

Decision: Full horizontal scaling across all tiers. Advanced patterns (geographic distribution, tiered caching, traffic shaping) required.

Scaling Approach by Request Volume
Request Volume	Typical Systems	Recommended Approach
< 1,000 RPS	Most SaaS apps, internal tools, small-to-medium consumer apps	Vertical scaling (single server + database)
1,000 - 10,000 RPS	Popular consumer apps, high-traffic SaaS, mid-size e-commerce	Vertical database + horizontally scaled API tier
10,000 - 100,000 RPS	Large-scale consumer apps, major e-commerce, popular games	Horizontal API tier + sharded/replicated database
100,000 RPS	Internet giants, CDN, real-time platforms	Full horizontal across all layers, geo-distribution

Data Volume:

Small data sets (< 100GB)

Fits comfortably in RAM on a single server. All queries can be memory-resident for maximum performance. Sharding adds complexity without benefit.

Decision: Vertical scaling. Use a single database with plenty of RAM.

Medium data sets (100GB - 1TB)

Still fits on a single server with sufficient RAM. Modern databases with good indexing can handle this efficiently. Sharding may be relevant for write scaling but not capacity.

Decision: Vertical scaling remains optimal. Reserve horizontal for specific performance needs.

Large data sets (1TB - 100TB)

Pushing the limits of single-server capacity. Sharding or distributed databases become necessary. But not every query needs the full data set—consider hybrid approaches.

Decision: Evaluate sharding for high-volume tables. Keep low-volume tables on a single node.

Very large data sets (> 100TB)

Clearly beyond single-server capacity. Distributed storage is required. This is data warehouse, analytics, or large-scale user-generated content territory.

Decision: Distributed databases or data lakes (BigQuery, Snowflake, Spark on HDFS) are necessary.

Access Pattern Considerations:

Read-heavy workloads (> 90% reads)

Replication is highly effective. Add read replicas to scale reads while keeping a single write primary. This is the "sweet spot" for easy scaling.

Decision: Vertical primary + horizontal read replicas. This combination is simple and effective.

Write-heavy workloads (> 30% writes)

Replication helps less; all writes still go to the primary. Sharding or accepting eventual consistency becomes necessary.

Decision: If writes must be consistent, shard by partition key. If eventual consistency is acceptable, consider leaderless replication (Cassandra, DynamoDB).

The 90% Rule

Availability-Based Decision Criteria

Availability requirements are often the primary driver for horizontal scaling—not capacity. Different availability targets require different approaches.

Understanding the nines:

Availability is typically expressed as "nines"—99.9% is "three nines," 99.99% is "four nines." Each additional nine requires roughly 10× the effort and cost.

Availability Targets and Architectural Implications
Availability	Annual Downtime	Typical For	Minimum Architecture
99% (two nines)	3.65 days	Internal tools, batch systems	Single server, automated restart
99.9% (three nines)	8.76 hours	Most SaaS, B2B apps	Single server + automated failover
99.95%	4.38 hours	E-commerce, consumer apps	Active-passive with warm standby
99.99% (four nines)	52.6 minutes	Payment systems, healthcare	Multi-node active-active, multi-AZ
99.999% (five nines)	5.26 minutes	Telecom, core banking, emergency services	Multi-region active-active, extensive redundancy

Mapping availability to scaling:

99% - 99.9% availability (most applications)

Achievable with vertical scaling and basic redundancy:

Single primary server handling traffic
Warm standby for failover (or database replica that can be promoted)
Automated health checking and failover
5-15 minute recovery time acceptable

Decision: Vertical scaling with active-passive redundancy. Horizontal scaling not required.

99.95% - 99.99% availability (high-reliability applications)

Requires eliminating single points of failure:

Multiple active nodes sharing traffic
Database with synchronous replication
Multi-availability-zone deployment
Sub-minute failover

Decision: Horizontal scaling at load balancer and application tier. Evaluate database: replicated primary-standby often sufficient.

99.999%+ availability (critical infrastructure)

Requires eliminating correlated failures:

Multi-region deployment with active traffic in each
Independent deployment pipelines per region
Chaos engineering to validate resilience
Zero-downtime deployment practices

Decision: Full horizontal scaling and geographic distribution. This is a significant investment.

The cost step function:

Most applications should target 99.9% and be honest about whether higher targets are genuinely required by the business.

Availability Theater

Latency-Based Decision Criteria

Latency requirements create unique constraints because horizontal scaling can actually increase latency due to network overhead. Understanding this is crucial for latency-sensitive applications.

Latency categories:

Interactive latency (< 100ms p99)

Users perceive latency above 100ms as "slow." Applications needing snappy feel must achieve this target.

At this latency target, every network hop hurts. A cross-datacenter database call (5-10ms) consumes 5-10% of your budget. A complex service mesh with 5 hops adds 5-15ms overhead.

Decision: Minimize distribution. Vertical scaling where possible. If horizontal is required, co-locate services to minimize network hops. Consider data locality optimizations.

Near-real-time latency (< 50ms p99)

Gaming, live collaboration, trading applications. Users notice delays and the experience degrades.

Real-time latency (< 10ms p99)

High-frequency trading, voice/video processing, industrial control systems.

Decision: Vertical scaling is almost mandatory. These systems often use specialized hardware, in-memory processing, and FPGA acceleration. Distribution introduces unacceptable latency.

Network Latency Budget Impact
Category	Typical Latency Addition	Impact at 100ms Budget	Impact at 10ms Budget
Same-machine (local socket)	< 0.1ms	< 0.1%	< 1%
Same-rack (top-of-rack switch)	~0.1ms	~0.1%	~1%
Same-datacenter (cross-rack)	~0.5ms	~0.5%	~5%
Same-region (cross-AZ)	1-3ms	1-3%	10-30%
Cross-region (US-East to US-West)	50-70ms	50-70%	IMPOSSIBLE
Cross-continent (US to EU)	70-120ms	70-120% (over budget)	IMPOSSIBLE
Global (US to Asia)	150-250ms	150-250% (far over budget)	IMPOSSIBLE

Geographic distribution paradox:

For global applications with tight latency requirements, horizontal scaling becomes required even though it adds latency:

A user in Tokyo accessing a San Francisco server has 120-150ms round-trip baseline
No amount of optimization on the server can overcome physics (speed of light)
To serve Tokyo users with <100ms latency, you need infrastructure in Asia

The hybrid pattern for global low-latency:

┌─────────────────────────────────────────────────────────────────┐
│                       GLOBAL ROUTING                            │
│                  (GeoDNS, Anycast, Edge LB)                     │
└───────────────────────────┬─────────────────────────────────────┘
                            │
         ┌──────────────────┼──────────────────┐
         ▼                  ▼                  ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│   US-EAST       │ │   EU-WEST       │ │   AP-NORTHEAST  │
│ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Vertically  │ │ │ │ Vertically  │ │ │ │ Vertically  │ │
│ │ Scaled      │ │ │ │ Scaled      │ │ │ │ Scaled      │ │
│ │ Stack       │ │ │ │ Stack       │ │ │ │ Stack       │ │
│ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘ └─────────────────┘

Each region runs an independent, vertically-scaled stack. No cross-region calls in the request path. Data replication between regions happens asynchronously.

Measure Before Assuming

Organizational Decision Criteria

Technical requirements are only half the picture. Organizational context strongly influences which scaling approach is practical and sustainable.

Team size and structure:

Team Size Recommendations
Team Size	Architecture Recommendation	Rationale
1-5 engineers	Single vertically-scaled system	No capacity for distributed systems complexity; speed of development is critical
5-15 engineers	Modular monolith with selected scaling	Can handle some complexity; modules may enable future service extraction
15-50 engineers	Selective microservices (3-10 services)	Team size supports several independent teams; Conway's Law enables some service ownership
50-200 engineers	Microservices architecture (10-50 services)	Many teams require independence; deployment coordination becomes a bottleneck otherwise
200 engineers	Full microservices + platform teams	At this scale, shared services and platform teams enable efficiency

Expertise and experience:

Team with limited distributed systems experience

Distributed systems have failure modes that even experienced engineers find surprising. A team learning distributed systems will make mistakes that cause production incidents.

Decision: Start with vertical scaling. Introduce distribution gradually as the team builds expertise. Have senior engineers or consultants review distributed designs.

Team with distributed systems expertise

Experienced teams can navigate distributed complexity efficiently. They've seen the failure modes and built the mental models.

Decision: Choose based purely on technical requirements. The team can handle either approach.

Hiring constraints:

If finding distributed systems engineers is hard in your market, building a distributed system creates a bottleneck:

Existing engineers are overloaded maintaining the system
New hires can't contribute effectively
On-call burden falls on few people

Decision: If you can't hire for it, don't build it. A simpler architecture that generalist engineers can maintain is more sustainable.

Operational maturity:

Horizontal scaling requires operational capabilities that vertical scaling doesn't:

Capability	Vertical Scaling	Horizontal Scaling
Monitoring	Basic server monitoring	Distributed tracing, fleet aggregation
Deployment	Standard CI/CD	Rolling deploys, canaries, feature flags
Incident response	SSH, check logs	Runbooks, automated remediation
On-call	One system to understand	Many services to understand
Debugging	Standard tools	Cross-service correlation

The Honest Assessment

Business-Based Decision Criteria

Technical architecture exists to serve business goals. Business context shapes which scaling trade-offs are acceptable.

Stage of company:

Early stage / Pre-product-market-fit

The product will change dramatically. Features will be added, removed, and radically redesigned. Speed of iteration is everything.

Decision: Maximize simplicity. Vertical scaling. Single database. Optimize for developer velocity above all else.

Growth stage / Scaling what works

Product-market fit is established. User base is growing. The product is relatively stable but scale is increasing.

Decision: Begin selective horizontal scaling where bottlenecks appear. Prioritize stateless tier scaling and database optimization.

Mature stage / Optimizing efficiency

Growth is slower but base is large. Efficiency and cost optimization matter. Reliability expectations are high.

Decision: Right-size architecture. This might mean more horizontal scaling for cost efficiency at scale, or conversely, consolidation to reduce operational overhead.

Business model considerations:

Business Model and Scaling Alignment
Business Model	Key Technical Priority	Scaling Implication
B2B SaaS (< 1000 customers)	Feature velocity, stability	Vertical scaling usually sufficient; availability via HA pairs
B2B SaaS (1000+ customers)	Multi-tenant isolation, SLAs	Horizontal for tenant isolation; vertical per tenant where practical
Consumer app (free, ad-supported)	Scale, cost efficiency	Horizontal scaling to handle scale at low cost-per-user
Consumer app (subscription)	Reliability, feature velocity	Balance based on user expectations and competitive pressure
E-commerce	Availability during peaks, speed	Horizontal for burst capacity; optimize checkout path
FinTech	Reliability, correctness, compliance	Vertical where possible for simplicity; horizontal for availability requirements
Gaming	Low latency, scale for events	Vertical for game servers; horizontal for matchmaking and events

Cost of downtime:

Different businesses have different costs of downtime:

Low cost of downtime: Internal tools, back-office applications, low-traffic B2B apps. Users can wait or retry. Downtime is annoying but not damaging.

Decision: Optimize for simplicity and velocity. Some downtime is acceptable.

Medium cost of downtime: Standard SaaS, consumer apps, e-commerce during normal periods. Users may go to competitors; reputation may suffer.

Decision: Standard availability (99.9%). Active-passive redundancy. Rapid recovery procedures.

High cost of downtime: Payment processing, healthcare, e-commerce during peak periods (Black Friday), real-time services. Downtime has direct financial or safety impact.

Decision: High availability (99.99%+). Active-active redundancy. Multi-zone/multi-region deployment. This justifies horizontal scaling investment.

Regulatory and compliance:

Some industries have regulatory requirements that affect architecture:

Data residency: Data must stay in certain jurisdictions → may require regional isolation
Audit requirements: All changes must be traceable → affects deployment and data patterns
Isolation requirements: Data from different customers mustn't mix → may require sharding or multi-tenancy patterns

Decision: Compliance requirements can mandate horizontal scaling (for isolation) or favor vertical scaling (for simplified audit).

Business ROI of Architecture

The Scaling Decision Tree

Here's a practical decision tree that synthesizes our criteria. Use this as a starting point; real situations may require deviation based on specific context.

START HERE: What is your dominant constraint?

         ┌─────────────────────────────────────┐
         │   What is your dominant constraint? │
         └───────────────────┬─────────────────┘
                             │
     ┌─────────────┬─────────┴─────────┬─────────────┐
     ▼             ▼                   ▼             ▼
┌─────────┐  ┌─────────┐         ┌─────────┐  ┌─────────┐
│Capacity │  │Availa-  │         │Latency  │  │Develop- │
│(Volume) │  │bility   │         │         │  │ment     │
│         │  │(Uptime) │         │         │  │Velocity │
└────┬────┘  └────┬────┘         └────┬────┘  └────┬────┘
     │            │                   │            │
     ▼            ▼                   ▼            ▼
  See A        See B               See C        See D

[A] Capacity is the constraint:

┌──────────────────────────────┐
│  Peak load > 10,000 RPS?     │
└──────────────┬───────────────┘
               │
    ┌──────────┴──────────┐
    No                    Yes
    │                     │
    ▼                     ▼
┌────────────┐     ┌────────────────┐
│ VERTICAL   │     │ Data volume    │
│ Single     │     │ > 1TB?         │
│ powerful   │     └───────┬────────┘
│ server     │             │
└────────────┘     ┌───────┴───────┐
                   No              Yes
                   │               │
                   ▼               ▼
            ┌────────────┐  ┌────────────┐
            │ HORIZONTAL │  │ HORIZONTAL │
            │ Stateless  │  │ + Sharded  │
            │ tier only  │  │ Database   │
            └────────────┘  └────────────┘

[B] Availability is the constraint:

┌──────────────────────────────┐
│  Required availability?       │
└──────────────┬───────────────┘
               │
    ┌──────────┼──────────┬─────────────┐
   99.9%     99.99%      99.999%
    │          │            │
    ▼          ▼            ▼
┌─────────┐ ┌─────────┐  ┌─────────────┐
│VERTICAL │ │HORIZONTAL│  │GEOGRAPHIC   │
│+ Active │ │Multi-AZ  │  │Multi-Region │
│Passive  │ │Redundancy│  │Active-Active│
└─────────┘ └─────────┘  └─────────────┘

[C] Latency is the constraint:

┌──────────────────────────────┐
│  Target latency (p99)?       │
└──────────────┬───────────────┘
               │
    ┌──────────┼───────────────┬─────────────┐
  >100ms    50-100ms         <50ms
    │          │               │
    ▼          ▼               ▼
┌─────────┐ ┌─────────┐     ┌─────────────┐
│Either   │ │Minimize │     │VERTICAL     │
│approach │ │network  │     │Co-located   │
│works    │ │hops     │     │Specialized  │
└─────────┘ └─────────┘     └─────────────┘

NOTE: For global low-latency, geographic distribution
becomes necessary (horizontal), but each region should
be vertically optimized.

[D] Development velocity is the constraint:

┌──────────────────────────────┐
│  Team size?                  │
└──────────────┬───────────────┘
               │
    ┌──────────┼──────────┬─────────────┐
   <15       15-50       >50 engineers
    │          │            │
    ▼          ▼            ▼
┌─────────┐ ┌─────────┐  ┌─────────────┐
│VERTICAL │ │Modular  │  │Services may │
│Monolith │ │Monolith │  │improve      │
│for      │ │with     │  │velocity via │
│simplicity│SELECT │  │  │independence │
└─────────┘ │services │  └─────────────┘
            └─────────┘

The Hybrid Default

Common Scenario Recommendations

Let's apply the framework to common real-world scenarios:

Scenario 1: Early-stage startup, new product

Context: 5 engineers, finding product-market fit, 1,000 DAU, uncertain growth

Decision: Vertical scaling

Single server, single database
Focus all engineering on product development
Revisit when you hit 10,000 DAU or clear growth trajectory

Why: Development velocity is everything. You'll rebuild this system three times before you need to scale it.

Scenario 2: B2B SaaS with steady growth

Context: 20 engineers, 500 enterprise customers, 50,000 DAU, 99.9% SLA commitments

Decision: Hybrid—vertical database, horizontal API tier

Stateless API tier behind load balancer (3+ nodes per AZ)
Single primary PostgreSQL/MySQL with read replica
Redis for sessions and caching

Why: Enterprise customers expect reliability. Horizontal API tier provides availability without database complexity. This architecture handles 10× growth.

Scenario 3: Consumer mobile app with high engagement

Context: 50 engineers, 10M MAU, global users, real-time features, high traffic variability

Decision: Full horizontal with geographic distribution

Stateless API tier with auto-scaling per region
Sharded database or managed distributed database (CockroachDB, Spanner, DynamoDB)
Edge caching via CDN
Message queues for async processing

Why: Scale and global latency requirements mandate distribution. Engineering team is large enough to handle complexity.

Quick Reference: Scenario Recommendations
Scenario	Primary Approach	Key Considerations
Internal tool / Back office	Vertical	Low traffic, low availability needs
MVP / New product	Vertical	Speed of iteration is paramount
Small-medium B2B SaaS	Hybrid (vertical DB, horizontal API)	Availability for SLA; DB complexity not needed
High-traffic B2C app (single region)	Horizontal	Capacity required; latency not global
Global consumer app	Horizontal + Geographic	Latency requirements mandate distribution
Real-time gaming / Trading	Vertical with HA	Latency critical; minimize network hops
Data-intensive analytics	Horizontal (specialized)	Data volume exceeds single node
Multi-tenant enterprise SaaS	Hybrid (tenant isolation varies)	Large tenants may need dedicated resources

Scenarios Are Starting Points

Summary: Make It Your Default, Then Question It

We've built a comprehensive framework for scaling decisions. The key is having informed defaults that you can override when evidence warrants.

Key Takeaways

•Start with vertical scaling as the default — Deviation requires evidence of specific requirements that vertical can't meet.
•Identify your dominant constraint — Capacity, availability, latency, or velocity. Design for that constraint first.
•Match architecture to team — Distributed systems capability isn't free; don't build what you can't operate.
•Consider business context — Stage, model, and cost of failure influence acceptable trade-offs.
•Use the hybrid approach — Vertical database + horizontal stateless tier captures most benefits at manageable complexity.
•Revisit decisions as context changes — The right answer today may not be right in 18 months.

What's next:

Page Complete

4 / 5