System Design (HLD)What Is a Distributed System?

What Is a Distributed System?

LevelIntermediate

Duration55 mins

TopicWhat Is a Distributed System?

2 / 4

Why We Need Distributed Systems

The Necessity of Distribution

If distributed systems are so complex—so prone to subtle failures, so difficult to debug, so expensive to operate—why do we build them? The answer is simple: we have no choice.

The demands of modern software exceed what any single machine can provide. A smartphone in your pocket has more computing power than the machines that sent humans to the moon, yet that power is insufficient for the services you use daily. When you search Google, you're not querying one computer—you're querying thousands, coordinated to return results in under 200 milliseconds. When Netflix serves a movie, it's not streaming from one server—it's orchestrating content delivery from edge nodes distributed across the globe.

This page examines the fundamental forces that make distribution inevitable and why understanding these forces is essential for sound engineering decisions.

What You Will Master

By the end of this page, you will understand the fundamental limitations of single-machine computing, the forces that drive distribution (scale, reliability, geography, cost), and how to reason about when distribution becomes necessary versus premature optimization.

The Limits of Vertical Scaling

For decades, the solution to performance problems was simple: buy a bigger machine. This approach—vertical scaling or "scaling up"—worked remarkably well. Moore's Law delivered exponential growth in processing power every 18-24 months. But this era has ended.

The Physical Limits of Single Machines:

CPU Limits:

Clock speeds plateaued around 2005 (~5 GHz practical limit due to heat)
More cores don't help single-threaded workloads
The fastest single-core performance improves ~5-10% per year now
Amdahl's Law limits parallelization gains: a 5% sequential portion caps speedup at 20x

Memory Limits:

Largest commodity servers: ~12 TB RAM (as of 2024)
Specialized machines (SGI UV): ~64 TB RAM, but cost millions
Many workloads exceed even these limits (e.g., search indices, ML training datasets)
Memory access is the new bottleneck (100x slower than CPU cache)

Storage Limits:

Single-machine storage: ~1 PB practical (dense NVMe configurations)
Storage I/O bandwidth: ~100 GB/s for enterprise SSDs
Large datasets (10+ PB) cannot physically fit on one machine

Network Limits:

Single NIC: 100-400 Gbps (current state-of-the-art)
Serving millions of users simultaneously requires more bandwidth
Connection count limits: ~1 million concurrent connections per machine with optimization

Modern Single-Machine Limits (2024 State of the Art)
Resource	Commodity Server	Extreme High-End	Distribution Threshold
CPU Cores	64-128 cores	448 cores (AMD EPYC 9004)	Many parallel workloads, or single-thread bottleneck
RAM	1-2 TB	12 TB	10 TB working set
Storage	100-200 TB (NVMe)	~1 PB	1 PB, or bandwidth requirements
Network	25-100 Gbps	400 Gbps	400 Gbps aggregate throughput
Requests/sec	50K-100K	~500K (optimized)	Millions of RPS

The Cost Curve Reality:

Vertical scaling follows a non-linear cost curve. Doubling capacity often costs more than double:

A 64-core server might cost $20,000
A 128-core server might cost $60,000 (3x for 2x capacity)
A 256-core server might cost $200,000 (10x for 4x capacity)

Beyond certain thresholds, horizontal scaling (adding more commodity machines) becomes economically rational even before hitting physical limits. This is why hyperscalers run millions of modest machines rather than thousands of extreme ones.

The End of Free Lunch

Herb Sutter's famous 2005 article 'The Free Lunch Is Over' predicted this: automatic performance gains from CPU improvements would end, and software would need to scale horizontally. Nearly two decades later, this prediction has proven accurate. Distribution is not a choice—it's an architectural necessity for scale.

The Scale Imperative

Modern applications serve global audiences at scales that were unimaginable a decade ago. These scales fundamentally require distributed architectures.

The Numbers That Drive Distribution:

Scale of Major Internet Services (Approximate 2024 Data)
Service	Users	Requests/Day	Data Volume
Google Search	8.5 billion searches/day	99,000 queries/sec	~100 PB index
YouTube	2 billion users	1 billion hours watched/day	800+ million videos
Facebook	3 billion MAU	4 million API calls/sec	2.5 billion new content items/day
WhatsApp	2 billion users	100 billion messages/day	3 billion minutes of calls/day
Netflix	260+ million subscribers	450 million hours streamed/day	17,000+ titles, 30k encoding jobs/day
Amazon	310+ million customers	40% of US e-commerce	100,000+ orders during peak/minute

Scaling Dimensions:

Scale manifests across multiple dimensions, each requiring different distribution strategies:

1. User Scale (Concurrent Sessions)

Problem: Each active user requires state, connections, and resources
Solution: Distribute across multiple application servers, share-nothing architecture
Example: WhatsApp serves 2 billion users with ~3 million Erlang connections per server

2. Data Scale (Storage Volume)

Problem: Datasets exceed single-machine storage capacity
Solution: Partition (shard) data across nodes, replicate for durability
Example: CERN's LHC generates ~1 PB/second during experiments, stored across distributed file systems

3. Transaction Scale (Operations/Second)

Problem: More operations than one CPU can process
Solution: Partition workload, parallelize processing
Example: Visa processes 65,000+ transaction messages per second during peaks

4. Compute Scale (Processing Requirements)

Problem: Computation too large for single machine (ML training, simulations)
Solution: Distribute computation, aggregate results
Example: GPT-4 training required thousands of GPUs across multiple clusters

Scale Is Multiplicative

These scaling dimensions multiply. A billion users, each generating multiple requests, each requiring data lookups, each triggering background computations, creates astronomical demands. You're not scaling one thing—you're scaling an interconnected system where bottlenecks cascade.

The Reliability Mandate

Single points of failure are unacceptable for modern services. Distribution provides redundancy—if one component fails, others continue operating. This is not optional for systems that society depends on.

Understanding Availability Math:

Availability Levels and Their Meaning
Availability	Downtime/Year	Downtime/Month	Downtime/Day	Typical Use Case
99% (two nines)	3.65 days	7.3 hours	14.4 minutes	Personal projects, internal tools
99.9% (three nines)	8.76 hours	43.8 minutes	1.4 minutes	Standard business applications
99.95%	4.38 hours	21.9 minutes	43 seconds	E-commerce, SaaS platforms
99.99% (four nines)	52.6 minutes	4.4 minutes	8.6 seconds	Financial services, healthcare
99.999% (five nines)	5.26 minutes	26 seconds	0.86 seconds	Telecom, critical infrastructure

Why Single Machines Cannot Achieve High Availability:

Hardware fails at predictable rates:

Hard Drives: 2-4% annual failure rate
Memory (DIMM): ~1% annual failure rate per DIMM
Power Supplies: ~3% annual failure rate
Entire Server: ~5-10% annual failure rate (any component)

With a 5% annual server failure rate, a single server achieves ~99.5% availability at best—and that's only hardware. Add software bugs, deployments, security patches, and operating system updates, and realistic single-machine availability is 99-99.5%.

The Path to Higher Availability:

Redundancy at Every Level:

Multiple servers: If one fails, others serve traffic
Multiple data centers: If one loses power, others continue
Multiple regions: If one has network issues, others are isolated

No Single Points of Failure (SPOF):

Load balancers in pairs (active-passive or active-active)
Databases with replicas (master-slave or multi-master)
Network paths with multiple routes

Failure Domains:

Components should fail independently
Cascading failures must be prevented
Blast radius of any failure should be limited

Single-Machine Failure Modes

•Hardware failure takes everything down
•Patches require downtime
•A single bug crashes all users
•Capacity exhaustion stops service
•No graceful degradation possible
•Recovery time = repair + restart time

Distributed Failure Handling

•Node failure affects only its traffic
•Rolling updates eliminate downtime
•Bugs isolated to affected instances
•Horizontal scaling handles spikes
•Partial functionality during issues
•Recovery = automatic failover (seconds)

The Cost of Downtime

For major platforms, downtime costs millions per minute. Amazon's 2018 Prime Day outage reportedly cost ~$100 million in 2 hours. Meta's 2021 outage cost ~$65 million for 6 hours down. Distribution isn't just a technical preference—it's financial protection.

The Geography Factor

The speed of light is the universe's ultimate rate limiter—and it makes geographic distribution essential for low-latency global services.

The Physics of Latency:

Light travels at ~299,792 km/s in a vacuum, but in fiber optic cables, it's slower (~200,000 km/s due to refraction). Even at these speeds, distance creates unavoidable latency:

New York to San Francisco: ~4,000 km → ~20ms one-way → ~40ms round-trip (RTT)
New York to London: ~5,500 km → ~28ms one-way → ~55ms RTT
New York to Tokyo: ~10,800 km → ~54ms one-way → ~108ms RTT
New York to Sydney: ~16,000 km → ~80ms one-way → ~160ms RTT

These are theoretical minimums. Real-world latencies are 1.5-2x higher due to:

Non-direct routing (cables follow geography, not straight lines)
Network hops (routers add 0.5-5ms each)
Congestion (queueing delays)
Protocol overhead (TCP handshakes, TLS negotiation)

Real-World Global Latencies (Typical RTT)
From	To	Typical RTT	Impact on UX
US East	US West	60-80ms	Noticeable in real-time apps
US East	Europe	75-100ms	Affects interactive experiences
US East	Asia	150-250ms	Significantly impacts UX
US East	Australia	200-300ms	Unusable for real-time gaming
Europe	Asia	250-350ms	Multiple RTTs = seconds of delay

Why Latency Matters:

User Experience:

Google research: 200ms delay → 0.22% drop in searches
Amazon research: 100ms delay → 1% drop in sales
Akamai research: 100ms delay → 7% drop in conversion

Application Constraints:

Video conferencing: <150ms for natural conversation
Online gaming: <50ms for competitive play, <100ms for casual
Financial trading: <1ms for high-frequency (co-location essential)
Voice calls: <150ms for acceptable quality

The Solution: Bring Computation Closer to Users

Content Delivery Networks (CDNs):

Cache static content at edge locations
Reduce RTT for assets from ~200ms to ~20ms
Major CDNs (Cloudflare, Akamai) have 200+ edge locations globally

Regional Deployments:

Deploy application instances in multiple regions
Route users to nearest region
Trade consistency complexity for latency

Edge Computing:

Run application logic at edge locations
Process data before sending to origin
Examples: Cloudflare Workers, AWS Lambda@Edge

You Cannot Cheat Physics

No optimization can make New York-to-Tokyo faster than light allows. You can reduce overhead, but you cannot eliminate propagation delay. The only solution for low global latency is distribution—placing data and computation geographically close to users.

Economic Efficiency

Beyond technical necessity, distributed systems often make economic sense—enabling scale, efficiency, and flexibility that monolithic approaches cannot match.

The Economics of Commodity Hardware:

Distributed systems pioneered by Google demonstrated that many cheap machines often outperform few expensive ones:

The Google Philosophy (circa 2003):

Commodity hardware fails frequently, but it's cheap
Design software to expect and handle failures
Aggregate capacity of 1000 cheap machines > capacity of 10 expensive ones
Total cost of 1000 cheap machines < total cost of 10 expensive ones

This philosophy—software solving hardware reliability—revolutionized infrastructure economics.

Cost Comparison: Vertical vs Horizontal
Approach	Hardware Cost	Operational Cost	Capability	Risk
1 Large Server ($200K)	$200,000	Medium (specialized)	Fixed capacity, single point	Total loss on failure
10 Medium Servers ($20K each)	$200,000	Higher (complexity)	Distributed capacity	Partial loss on failure
50 Small Servers ($4K each)	$200,000	Highest initially, but automatable	Highly distributed, elastic	Minimal loss per failure

The Elasticity Advantage:

Distributed architectures enable elastic scaling—adjusting capacity to match demand:

Fixed Resources (Monolithic):

Provision for peak load
Pay for capacity you're not using 90% of the time
Over-provision to avoid failures
Typical utilization: 10-20% of peak capacity

Elastic Resources (Distributed):

Scale up during high demand
Scale down during low demand
Pay for what you use
Typical utilization: 50-70% of provisioned capacity

Example Cost Impact:

Consider a service with 10x traffic variation (normal: 10K RPS, peak: 100K RPS):

Fixed provisioning: Buy capacity for 100K RPS = $100,000/month
Elastic provisioning: Average 20K RPS = $20,000/month
Savings: 80% cost reduction

Cloud platforms (AWS, GCP, Azure) are built on this premise. The entire cloud computing model—pay-per-use, infinite scalability—is enabled by distributed architectures.

Organizational Economics:

Beyond infrastructure costs, distributed systems affect organizational efficiency:

Development Velocity:

Independent services can be deployed independently
Teams don't block each other
Faster iteration cycles

Talent Utilization:

Teams can specialize in specific services
Smaller cognitive load per developer
Easier onboarding (understand one service, not entire monolith)

Risk Distribution:

Bugs in one service don't crash everything
Deployments affect limited blast radius
Experiments can be isolated to specific user segments

Total Cost of Ownership

Economic analysis must include total cost: hardware, operations, development time, incident response, and opportunity cost. Distributed systems have higher operational complexity but often lower TCO at scale due to elasticity, resilience, and organizational efficiency.

Isolation and Security

Distribution provides security and isolation benefits that are difficult to achieve with monolithic systems.

Defense in Depth:

Network Segmentation:

Sensitive systems can be isolated in private networks
Only designated services can access databases
API gateways provide a single, hardened entry point
Internal services communicate over encrypted channels

Blast Radius Containment:

A compromised service cannot directly access all data
Each service can have minimal permissions (least privilege)
Security breaches can be contained to affected components
Forensic investigation is easier with isolated services

Multi-tenancy Isolation:

Different customers can be served by different instances
Data can be physically separated by tenant
Noisy neighbor problems can be mitigated
Compliance requirements can be met per-region

The Shared-Nothing Architecture:

Distributed systems often embrace "shared-nothing" architecture:

Each node has its own CPU, memory, and storage
No shared resources to become bottlenecks
No shared state to become security vulnerabilities
Failure of one node doesn't affect others

This isolation model, while complex to coordinate, provides inherent security boundaries that monolithic systems must artificially construct.

Security Benefits of Distribution

•Attack Surface Reduction — Each service exposes only its specific API, not entire application internals
•Credential Scoping — Services authenticate with minimal permissions; a payment service doesn't need to access user profiles
•Audit Boundaries — Service-to-service calls can be logged and monitored at network level
•Compliance Isolation — PCI-compliant payment processing can be isolated from non-compliant services
•Fault Isolation — DDoS attacks on one service don't bring down others (with proper architecture)
•Secret Management — Each service can have its own secrets; compromise of one doesn't expose all

Distributed Security Challenges

Distribution isn't automatically more secure. A larger attack surface (more network endpoints), complex authentication (service-to-service), and coordination vulnerabilities (race conditions across services) introduce new security concerns. Distribution provides tools for security but requires careful implementation.

When NOT to Distribute

Understanding why we need distributed systems also requires understanding when we don't need them. Premature distribution is a common and costly mistake.

Signs You Might Be Distributing Too Early:

1. You Haven't Hit Single-Machine Limits

A PostgreSQL database handles 10,000+ TPS on commodity hardware
A single application server can serve 10,000+ concurrent users
If you're not at these limits, distribution adds complexity without benefit

2. You're Optimizing for Imaginary Scale

"We need to be ready for millions of users"
But you have 100 users and no indication of growth
Build for 10x current scale, not 1000x imaginary scale

3. Your Team Is Small

Distributed systems require distributed expertise
Debugging distributed issues requires specific skills
A 3-person team shouldn't manage 30 microservices

4. Your Problem Is Actually Algorithmic

Sometimes "slow" is an O(n²) algorithm, not a scale problem
Fixing the algorithm is cheaper than distributing computation
Profile before distributing; you might be solving the wrong problem

Distribute When...

•Single-machine limits are measurably hit
•High availability is a contractual requirement
•Geographic latency is impacting users
•Regulatory requirements mandate data locality
•Team size justifies independent deployments
•Traffic patterns benefit from elasticity

Avoid Distribution When...

•A single server comfortably handles load
•99.9% availability is acceptable
•Users are geographically concentrated
•Strong consistency is essential and simple
•Team is small (< 10 engineers)
•Load is stable and predictable

The Cost of Premature Distribution:

Operational Overhead: Monitoring, alerting, log aggregation across services
Debugging Complexity: Distributed traces vs simple stack traces
Latency Addition: Network calls between services add milliseconds
Consistency Challenges: Transactions across services require sagas/2PC
Development Friction: Local environment requires running multiple services
Testing Complexity: Integration tests must coordinate multiple services

The Monolith-First Approach:

Many successful companies started with monoliths and distributed later:

Shopify: Monolith serving billions in GMV; selectively extracting services
Amazon: Started as a monolith; decomposed over years as needed
Etsy: Monolith supporting millions of sellers
Stack Overflow: Serves 1 billion monthly visitors with a handful of servers

The path: Monolith → Identify bottlenecks → Extract specific services → Repeat as needed.

The Right Question

Don't ask 'Should we be distributed?' Ask 'What specific problem will distribution solve that we cannot solve by optimizing our current system?' If you can't articulate a clear answer with measurements, you probably don't need to distribute yet.

Summary: Why Distribution Becomes Inevitable

We've examined the forces that drive distributed systems adoption. Let's consolidate the key insights:

Key Takeaways

•Vertical scaling has hard limits — CPU clock speeds, memory capacity, and storage I/O impose physical ceilings. Beyond these, horizontal scaling is the only option.
•Modern scale exceeds single-machine capacity — Billions of users, petabytes of data, and millions of requests per second require distributed architectures. This isn't preference; it's physics.
•High availability requires redundancy — Single points of failure cannot achieve 99.99%+ availability. Distribution provides the redundancy necessary for reliable services.
•Geography imposes latency floors — The speed of light makes distant servers slow. Low global latency requires geographically distributed deployments.
•Economics favor distribution at scale — Commodity hardware, elastic scaling, and organizational efficiency make distributed systems cost-effective despite operational complexity.
•Isolation provides security benefits — Shared-nothing architectures enable defense in depth, least privilege, and blast radius containment.
•Distribution should be driven by necessity — Premature distribution adds complexity without benefit. Start simple, measure bottlenecks, and distribute incrementally.

What's Next:

We've established what distributed systems are and why we need them. The next page dives deeper into the specific benefits—scalability and fault tolerance—that make distributed systems compelling despite their complexity. We'll explore how these benefits manifest in practice and the architectural patterns that enable them.

Understanding Motivation

You now understand the fundamental forces driving distributed systems adoption: physical limits, scale demands, reliability requirements, geographic constraints, economic efficiency, and security isolation. This understanding helps you evaluate when distribution is truly necessary versus when a simpler architecture suffices.

2 / 4

Loading learning content...

System Design (HLD)What Is a Distributed System?

What Is a Distributed System?

LevelIntermediate

Duration55 mins

TopicWhat Is a Distributed System?

2 / 4

Why We Need Distributed Systems

The Necessity of Distribution

If distributed systems are so complex—so prone to subtle failures, so difficult to debug, so expensive to operate—why do we build them? The answer is simple: we have no choice.

This page examines the fundamental forces that make distribution inevitable and why understanding these forces is essential for sound engineering decisions.

What You Will Master

The Limits of Vertical Scaling

The Physical Limits of Single Machines:

CPU Limits:

Clock speeds plateaued around 2005 (~5 GHz practical limit due to heat)
More cores don't help single-threaded workloads
The fastest single-core performance improves ~5-10% per year now
Amdahl's Law limits parallelization gains: a 5% sequential portion caps speedup at 20x

Memory Limits:

Largest commodity servers: ~12 TB RAM (as of 2024)
Specialized machines (SGI UV): ~64 TB RAM, but cost millions
Many workloads exceed even these limits (e.g., search indices, ML training datasets)
Memory access is the new bottleneck (100x slower than CPU cache)

Storage Limits:

Single-machine storage: ~1 PB practical (dense NVMe configurations)
Storage I/O bandwidth: ~100 GB/s for enterprise SSDs
Large datasets (10+ PB) cannot physically fit on one machine

Network Limits:

Single NIC: 100-400 Gbps (current state-of-the-art)
Serving millions of users simultaneously requires more bandwidth
Connection count limits: ~1 million concurrent connections per machine with optimization

Modern Single-Machine Limits (2024 State of the Art)
Resource	Commodity Server	Extreme High-End	Distribution Threshold
CPU Cores	64-128 cores	448 cores (AMD EPYC 9004)	Many parallel workloads, or single-thread bottleneck
RAM	1-2 TB	12 TB	10 TB working set
Storage	100-200 TB (NVMe)	~1 PB	1 PB, or bandwidth requirements
Network	25-100 Gbps	400 Gbps	400 Gbps aggregate throughput
Requests/sec	50K-100K	~500K (optimized)	Millions of RPS

The Cost Curve Reality:

Vertical scaling follows a non-linear cost curve. Doubling capacity often costs more than double:

A 64-core server might cost $20,000
A 128-core server might cost $60,000 (3x for 2x capacity)
A 256-core server might cost $200,000 (10x for 4x capacity)

The End of Free Lunch

The Scale Imperative

Modern applications serve global audiences at scales that were unimaginable a decade ago. These scales fundamentally require distributed architectures.

The Numbers That Drive Distribution:

Scale of Major Internet Services (Approximate 2024 Data)
Service	Users	Requests/Day	Data Volume
Google Search	8.5 billion searches/day	99,000 queries/sec	~100 PB index
YouTube	2 billion users	1 billion hours watched/day	800+ million videos
Facebook	3 billion MAU	4 million API calls/sec	2.5 billion new content items/day
WhatsApp	2 billion users	100 billion messages/day	3 billion minutes of calls/day
Netflix	260+ million subscribers	450 million hours streamed/day	17,000+ titles, 30k encoding jobs/day
Amazon	310+ million customers	40% of US e-commerce	100,000+ orders during peak/minute

Scaling Dimensions:

Scale manifests across multiple dimensions, each requiring different distribution strategies:

1. User Scale (Concurrent Sessions)

Problem: Each active user requires state, connections, and resources
Solution: Distribute across multiple application servers, share-nothing architecture
Example: WhatsApp serves 2 billion users with ~3 million Erlang connections per server

2. Data Scale (Storage Volume)

Problem: Datasets exceed single-machine storage capacity
Solution: Partition (shard) data across nodes, replicate for durability
Example: CERN's LHC generates ~1 PB/second during experiments, stored across distributed file systems

3. Transaction Scale (Operations/Second)

Problem: More operations than one CPU can process
Solution: Partition workload, parallelize processing
Example: Visa processes 65,000+ transaction messages per second during peaks

4. Compute Scale (Processing Requirements)

Problem: Computation too large for single machine (ML training, simulations)
Solution: Distribute computation, aggregate results
Example: GPT-4 training required thousands of GPUs across multiple clusters

Scale Is Multiplicative

The Reliability Mandate

Understanding Availability Math:

Availability Levels and Their Meaning
Availability	Downtime/Year	Downtime/Month	Downtime/Day	Typical Use Case
99% (two nines)	3.65 days	7.3 hours	14.4 minutes	Personal projects, internal tools
99.9% (three nines)	8.76 hours	43.8 minutes	1.4 minutes	Standard business applications
99.95%	4.38 hours	21.9 minutes	43 seconds	E-commerce, SaaS platforms
99.99% (four nines)	52.6 minutes	4.4 minutes	8.6 seconds	Financial services, healthcare
99.999% (five nines)	5.26 minutes	26 seconds	0.86 seconds	Telecom, critical infrastructure

Why Single Machines Cannot Achieve High Availability:

Hardware fails at predictable rates:

Hard Drives: 2-4% annual failure rate
Memory (DIMM): ~1% annual failure rate per DIMM
Power Supplies: ~3% annual failure rate
Entire Server: ~5-10% annual failure rate (any component)

The Path to Higher Availability:

Redundancy at Every Level:

Multiple servers: If one fails, others serve traffic
Multiple data centers: If one loses power, others continue
Multiple regions: If one has network issues, others are isolated

No Single Points of Failure (SPOF):

Load balancers in pairs (active-passive or active-active)
Databases with replicas (master-slave or multi-master)
Network paths with multiple routes

Failure Domains:

Components should fail independently
Cascading failures must be prevented
Blast radius of any failure should be limited

Single-Machine Failure Modes

•Hardware failure takes everything down
•Patches require downtime
•A single bug crashes all users
•Capacity exhaustion stops service
•No graceful degradation possible
•Recovery time = repair + restart time

Distributed Failure Handling

•Node failure affects only its traffic
•Rolling updates eliminate downtime
•Bugs isolated to affected instances
•Horizontal scaling handles spikes
•Partial functionality during issues
•Recovery = automatic failover (seconds)

The Cost of Downtime

The Geography Factor

The speed of light is the universe's ultimate rate limiter—and it makes geographic distribution essential for low-latency global services.

The Physics of Latency:

Light travels at ~299,792 km/s in a vacuum, but in fiber optic cables, it's slower (~200,000 km/s due to refraction). Even at these speeds, distance creates unavoidable latency:

New York to San Francisco: ~4,000 km → ~20ms one-way → ~40ms round-trip (RTT)
New York to London: ~5,500 km → ~28ms one-way → ~55ms RTT
New York to Tokyo: ~10,800 km → ~54ms one-way → ~108ms RTT
New York to Sydney: ~16,000 km → ~80ms one-way → ~160ms RTT

These are theoretical minimums. Real-world latencies are 1.5-2x higher due to:

Non-direct routing (cables follow geography, not straight lines)
Network hops (routers add 0.5-5ms each)
Congestion (queueing delays)
Protocol overhead (TCP handshakes, TLS negotiation)

Real-World Global Latencies (Typical RTT)
From	To	Typical RTT	Impact on UX
US East	US West	60-80ms	Noticeable in real-time apps
US East	Europe	75-100ms	Affects interactive experiences
US East	Asia	150-250ms	Significantly impacts UX
US East	Australia	200-300ms	Unusable for real-time gaming
Europe	Asia	250-350ms	Multiple RTTs = seconds of delay

Why Latency Matters:

User Experience:

Google research: 200ms delay → 0.22% drop in searches
Amazon research: 100ms delay → 1% drop in sales
Akamai research: 100ms delay → 7% drop in conversion

Application Constraints:

Video conferencing: <150ms for natural conversation
Online gaming: <50ms for competitive play, <100ms for casual
Financial trading: <1ms for high-frequency (co-location essential)
Voice calls: <150ms for acceptable quality

The Solution: Bring Computation Closer to Users

Content Delivery Networks (CDNs):

Cache static content at edge locations
Reduce RTT for assets from ~200ms to ~20ms
Major CDNs (Cloudflare, Akamai) have 200+ edge locations globally

Regional Deployments:

Deploy application instances in multiple regions
Route users to nearest region
Trade consistency complexity for latency

Edge Computing:

Run application logic at edge locations
Process data before sending to origin
Examples: Cloudflare Workers, AWS Lambda@Edge

You Cannot Cheat Physics

Economic Efficiency

Beyond technical necessity, distributed systems often make economic sense—enabling scale, efficiency, and flexibility that monolithic approaches cannot match.

The Economics of Commodity Hardware:

Distributed systems pioneered by Google demonstrated that many cheap machines often outperform few expensive ones:

The Google Philosophy (circa 2003):

Commodity hardware fails frequently, but it's cheap
Design software to expect and handle failures
Aggregate capacity of 1000 cheap machines > capacity of 10 expensive ones
Total cost of 1000 cheap machines < total cost of 10 expensive ones

This philosophy—software solving hardware reliability—revolutionized infrastructure economics.

Cost Comparison: Vertical vs Horizontal
Approach	Hardware Cost	Operational Cost	Capability	Risk
1 Large Server ($200K)	$200,000	Medium (specialized)	Fixed capacity, single point	Total loss on failure
10 Medium Servers ($20K each)	$200,000	Higher (complexity)	Distributed capacity	Partial loss on failure
50 Small Servers ($4K each)	$200,000	Highest initially, but automatable	Highly distributed, elastic	Minimal loss per failure

The Elasticity Advantage:

Distributed architectures enable elastic scaling—adjusting capacity to match demand:

Fixed Resources (Monolithic):

Provision for peak load
Pay for capacity you're not using 90% of the time
Over-provision to avoid failures
Typical utilization: 10-20% of peak capacity

Elastic Resources (Distributed):

Scale up during high demand
Scale down during low demand
Pay for what you use
Typical utilization: 50-70% of provisioned capacity

Example Cost Impact:

Consider a service with 10x traffic variation (normal: 10K RPS, peak: 100K RPS):

Fixed provisioning: Buy capacity for 100K RPS = $100,000/month
Elastic provisioning: Average 20K RPS = $20,000/month
Savings: 80% cost reduction

Cloud platforms (AWS, GCP, Azure) are built on this premise. The entire cloud computing model—pay-per-use, infinite scalability—is enabled by distributed architectures.

Organizational Economics:

Beyond infrastructure costs, distributed systems affect organizational efficiency:

Development Velocity:

Independent services can be deployed independently
Teams don't block each other
Faster iteration cycles

Talent Utilization:

Teams can specialize in specific services
Smaller cognitive load per developer
Easier onboarding (understand one service, not entire monolith)

Risk Distribution:

Bugs in one service don't crash everything
Deployments affect limited blast radius
Experiments can be isolated to specific user segments

Total Cost of Ownership

Isolation and Security

Distribution provides security and isolation benefits that are difficult to achieve with monolithic systems.

Defense in Depth:

Network Segmentation:

Sensitive systems can be isolated in private networks
Only designated services can access databases
API gateways provide a single, hardened entry point
Internal services communicate over encrypted channels

Blast Radius Containment:

A compromised service cannot directly access all data
Each service can have minimal permissions (least privilege)
Security breaches can be contained to affected components
Forensic investigation is easier with isolated services

Multi-tenancy Isolation:

Different customers can be served by different instances
Data can be physically separated by tenant
Noisy neighbor problems can be mitigated
Compliance requirements can be met per-region

The Shared-Nothing Architecture:

Distributed systems often embrace "shared-nothing" architecture:

Each node has its own CPU, memory, and storage
No shared resources to become bottlenecks
No shared state to become security vulnerabilities
Failure of one node doesn't affect others

This isolation model, while complex to coordinate, provides inherent security boundaries that monolithic systems must artificially construct.

Security Benefits of Distribution

•Attack Surface Reduction — Each service exposes only its specific API, not entire application internals
•Credential Scoping — Services authenticate with minimal permissions; a payment service doesn't need to access user profiles
•Audit Boundaries — Service-to-service calls can be logged and monitored at network level
•Compliance Isolation — PCI-compliant payment processing can be isolated from non-compliant services
•Fault Isolation — DDoS attacks on one service don't bring down others (with proper architecture)
•Secret Management — Each service can have its own secrets; compromise of one doesn't expose all

Distributed Security Challenges

When NOT to Distribute

Understanding why we need distributed systems also requires understanding when we don't need them. Premature distribution is a common and costly mistake.

Signs You Might Be Distributing Too Early:

1. You Haven't Hit Single-Machine Limits

A PostgreSQL database handles 10,000+ TPS on commodity hardware
A single application server can serve 10,000+ concurrent users
If you're not at these limits, distribution adds complexity without benefit

2. You're Optimizing for Imaginary Scale

"We need to be ready for millions of users"
But you have 100 users and no indication of growth
Build for 10x current scale, not 1000x imaginary scale

3. Your Team Is Small

Distributed systems require distributed expertise
Debugging distributed issues requires specific skills
A 3-person team shouldn't manage 30 microservices

4. Your Problem Is Actually Algorithmic

Sometimes "slow" is an O(n²) algorithm, not a scale problem
Fixing the algorithm is cheaper than distributing computation
Profile before distributing; you might be solving the wrong problem

Distribute When...

•Single-machine limits are measurably hit
•High availability is a contractual requirement
•Geographic latency is impacting users
•Regulatory requirements mandate data locality
•Team size justifies independent deployments
•Traffic patterns benefit from elasticity

Avoid Distribution When...

•A single server comfortably handles load
•99.9% availability is acceptable
•Users are geographically concentrated
•Strong consistency is essential and simple
•Team is small (< 10 engineers)
•Load is stable and predictable

The Cost of Premature Distribution:

Operational Overhead: Monitoring, alerting, log aggregation across services
Debugging Complexity: Distributed traces vs simple stack traces
Latency Addition: Network calls between services add milliseconds
Consistency Challenges: Transactions across services require sagas/2PC
Development Friction: Local environment requires running multiple services
Testing Complexity: Integration tests must coordinate multiple services

The Monolith-First Approach:

Many successful companies started with monoliths and distributed later:

Shopify: Monolith serving billions in GMV; selectively extracting services
Amazon: Started as a monolith; decomposed over years as needed
Etsy: Monolith supporting millions of sellers
Stack Overflow: Serves 1 billion monthly visitors with a handful of servers

The path: Monolith → Identify bottlenecks → Extract specific services → Repeat as needed.

The Right Question

Summary: Why Distribution Becomes Inevitable

We've examined the forces that drive distributed systems adoption. Let's consolidate the key insights:

Key Takeaways

•Vertical scaling has hard limits — CPU clock speeds, memory capacity, and storage I/O impose physical ceilings. Beyond these, horizontal scaling is the only option.
•Modern scale exceeds single-machine capacity — Billions of users, petabytes of data, and millions of requests per second require distributed architectures. This isn't preference; it's physics.
•High availability requires redundancy — Single points of failure cannot achieve 99.99%+ availability. Distribution provides the redundancy necessary for reliable services.
•Geography imposes latency floors — The speed of light makes distant servers slow. Low global latency requires geographically distributed deployments.
•Economics favor distribution at scale — Commodity hardware, elastic scaling, and organizational efficiency make distributed systems cost-effective despite operational complexity.
•Isolation provides security benefits — Shared-nothing architectures enable defense in depth, least privilege, and blast radius containment.
•Distribution should be driven by necessity — Premature distribution adds complexity without benefit. Start simple, measure bottlenecks, and distribute incrementally.

What's Next:

Understanding Motivation

2 / 4