Constraints And Bottlenecks - Learning Module

Loading content...

0/273

Network as Bottleneck

The Invisible Constraint: Network in Distributed Systems

In a world of distributed systems—microservices architectures, cloud deployments, globally replicated databases—the network has become both the enabling technology and the most fundamental constraint. Every service call, every database query, every message passing between components travels over the network. And unlike CPU or memory, which can be upgraded by buying bigger machines, network constraints are often governed by the laws of physics.

The speed of light is approximately 300,000 kilometers per second. In fiber optic cable, signals travel at about two-thirds that speed—roughly 200,000 km/s or 200 km per millisecond. This means a round trip from New York to London (5,585 km each way) takes a minimum of ~56 milliseconds, just from propagation delay. No amount of money, engineering, or optimization can make light travel faster.

Network bottlenecks are particularly challenging because they're often invisible. When a service is slow, engineers instinctively check CPU, memory, and disk. The network—especially internal network—is frequently overlooked until it becomes a severe problem. Yet in microservices architectures with dozens of inter-service calls per request, network latency and reliability dominate end-user experience.

What You Will Learn

By the end of this page, you will understand the fundamental components of network latency, how to identify network bottlenecks in distributed systems, the difference between bandwidth and latency constraints, and architectural patterns for minimizing network impact on system performance. This knowledge is essential for designing systems that perform well at scale across distributed components.

Understanding Network Latency

Network latency is not a single value—it's composed of multiple components, each with different characteristics and mitigation strategies. Understanding this breakdown is essential for effective optimization.

Components of Network Latency:

1. Propagation Delay (Physics)

The time for a signal to travel from source to destination, limited by the speed of light in the medium:

Fiber optic: ~5 μs per kilometer
Cross-continent (US coast to coast, ~4,000 km): ~20 ms one-way
Trans-Atlantic (NYC to London, ~5,600 km): ~28 ms one-way
Round trip doubles these values

This cannot be reduced—only avoided by placing components closer together.

2. Transmission Delay (Bandwidth)

The time to put all bits of a message onto the wire:

Transmission delay = Message size / Bandwidth
1 MB on a 10 Mbps link: 800 ms
1 MB on a 10 Gbps link: 0.8 ms

This can be reduced by increasing bandwidth or reducing message size.

3. Processing Delay (Routers/Switches)

Time spent in networking equipment examining packets, making routing decisions, and forwarding:

Typically microseconds per hop in modern equipment
Can add up significantly with many hops
Software-defined networking adds processing overhead

4. Queuing Delay (Congestion)

Time spent waiting in buffers when network devices are congested:

Highly variable—depends on current load
Can be the largest component under congestion
Causes latency spikes and jitter (variance in latency)

Network Latency Components
Component	Cause	Typical Range	How to Reduce
Propagation	Distance (speed of light)	~5μs per km	Move components closer together
Transmission	Message size vs. bandwidth	Variable	Reduce message size, increase bandwidth
Processing	Router/switch operations	1-100 μs per hop	Fewer hops, faster equipment
Queuing	Network congestion	0 to seconds	Reduce congestion, QoS, traffic shaping

Latency Percentiles Matter:

When measuring network latency, averages are misleading. What matters are percentiles:

p50 (median): What most requests experience
p95: What 1 in 20 requests experience
p99: What 1 in 100 requests experience (often 2-5x p50)
p99.9: What 1 in 1000 requests experience (can be 10x+ p50)

High percentile latencies (tail latencies) often indicate queuing or congestion. In systems making many network calls per request, tail latencies compound: if you make 10 calls, each with 1% chance of high latency, you have ~10% chance of at least one slow call.

The Tail Latency Amplification Problem:

For a request that requires 100 parallel backend calls:

If each call has p99 latency of 100ms, and p50 of 5ms
The request p99 isn't 100ms—it's much worse
Probability that at least one call hits p99: 1 - (0.99)^100 = 63%
The majority of requests will include at least one slow call

This is why microservices architectures with many inter-service calls are particularly sensitive to network tail latency.

The Latency Tax of Microservices

Every service-to-service call adds network latency. A modest 2ms per call, across 20 calls in a request path, adds 40ms of pure network tax. This is before any actual processing. As you decompose into more services, this tax compounds. It's a key reason why organizations often consolidate microservices or use service mesh optimization techniques.

Bandwidth vs. Latency: Different Problems, Different Solutions

Network bottlenecks come in two fundamentally different flavors: bandwidth constraints (not enough capacity) and latency constraints (too slow regardless of capacity). The solutions differ dramatically.

Bandwidth Bottlenecks:

You're bandwidth-constrained when the total data transfer exceeds your network capacity:

Saturated network links (approaching line rate)
Large file transfers blocking interactive traffic
Backup/replication operations consuming available bandwidth
Video streaming or media delivery at scale

Symptoms of Bandwidth Constraints:

Throughput plateaus despite adding more processing capacity
Transfer times increase linearly with data size
All traffic slows proportionally during peak periods
Network utilization metrics show sustained high percentage

Latency Bottlenecks:

You're latency-constrained when round-trip time—not capacity—limits performance:

Interactive user requests across geographic distances
Chat and real-time collaboration features
API calls between services in different regions
Database queries to distant data sources

Symptoms of Latency Constraints:

Response times dominated by network round trips, not processing
Adding bandwidth doesn't improve response time
Geographic user distribution strongly affects experience
Small requests take disproportionately long (latency dominates transmission time)

Solving Bandwidth Problems

•Upgrade links — More bandwidth (10G → 100G)
•Compress data — Reduce bytes transferred (gzip, zstd, Brotli)
•Efficient encoding — Protocol Buffers, MessagePack vs. JSON
•CDN for static content — Offload to edge
•Traffic shaping — Prioritize interactive traffic over bulk
•Delta sync — Send only changes, not full objects

Solving Latency Problems

•Reduce distance — Edge deployments, regional replicas
•Reduce round trips — Request batching, multiplexing
•Parallel requests — Fetch independent data concurrently
•Caching — Avoid network call entirely
•Connection reuse — Avoid handshake overhead (keep-alive)
•Precomputation — Push computation to edges

The Bandwidth-Delay Product:

A critical concept linking bandwidth and latency is the bandwidth-delay product (BDP):

BDP = Bandwidth × Round-Trip Time

This represents the amount of data 'in flight' on a network path at any moment. For optimal throughput:

TCP window sizes must be at least as large as the BDP
Undersized windows waste bandwidth (waiting for acknowledgments)

Example:

1 Gbps link with 100ms RTT
BDP = 1,000,000,000 bits/s × 0.1 s = 100,000,000 bits = 12.5 MB
TCP receive window must be at least 12.5 MB to fully utilize the link

On high-bandwidth, high-latency links (e.g., trans-oceanic connections), BDP tuning is essential. Default TCP window sizes (often 64 KB or less) are woefully inadequate for such paths.

Diagnose Before Optimizing

Before optimizing, determine which problem you have. A bandwidth problem won't be solved by reducing round trips; a latency problem won't be solved by compression. Use traceroute for latency analysis and bandwidth tests (iperf3) for capacity assessment. Then choose the right optimization approach.

Network Bottlenecks in Microservices Architectures

Microservices architectures are particularly susceptible to network bottlenecks because they transform what were local function calls into network calls. A monolith might make 1-2 database calls per request; a microservices system might make 20+ inter-service calls.

The Call Chain Problem:

Consider a simplified request flow:

User → API Gateway (1 call)
API Gateway → Auth Service (1 call)
API Gateway → Product Service (1 call)
Product Service → Inventory Service (1 call)
Product Service → Pricing Service (1 call)
Pricing Service → Promotions Service (1 call)
API Gateway → User Service (1 call)

That's 7 network calls for one user request. If each call adds 5ms of network latency:

Total latency contribution: 35ms minimum
Before any actual business logic executes

And this is a simple example. Real-world flows can have 30-50+ service calls, with complex fan-out patterns where one service calls multiple downstream services.

Patterns That Amplify Network Pain:

1. Chatty APIs: APIs that require many small calls instead of fewer comprehensive calls:

Getting user profile: 1 call
Getting user preferences: 1 call
Getting user permissions: 1 call
Getting user subscription: 1 call

Better: A single call that returns all needed data.

2. Deep Call Chains: Service A calls B, which calls C, which calls D. Each hop adds latency:

A → B → C → D: 4 hops
Latency: sum of all 4 network round trips
Consider restructuring to reduce chain depth

3. Synchronous Orchestration: A central orchestrator making sequential calls:

Orchestrator calls Service A, waits for response
Orchestrator calls Service B, waits for response
Orchestrator calls Service C, waits for response
Total latency: A + B + C

If calls are independent, parallelize them: max(A, B, C)

Microservices Network Anti-Patterns
Anti-Pattern	Description	Impact	Remediation
Chatty APIs	Many small calls for one operation	Latency multiplied by call count	Aggregate endpoints, BFF pattern
Deep chains	A→B→C→D→E call sequences	Latency is sum of all hops	Flatten hierarchy, async events
Synchronous fan-out	Sequential calls to independent services	Latency is sum when it could be max	Parallelize independent calls
N+1 queries	Fetching list, then calling per item	O(n) network calls	Batch endpoints, GraphQL
No timeout/retry	Failed calls block indefinitely	Cascade failures	Timeouts, circuit breakers, retries

Mitigation Strategies for Microservices:

Backend for Frontend (BFF): Create aggregation layers that combine multiple backend calls into single responses for specific clients. Mobile BFF, Web BFF, etc.

GraphQL: Allow clients to specify exactly what data they need in one request, with the GraphQL layer handling data aggregation from multiple sources.

gRPC and Protocol Buffers: Replace REST/JSON with gRPC for inter-service communication:

Binary protocol: smaller payloads, faster serialization
HTTP/2: multiplexing, header compression, persistent connections
Streaming: avoid request-response overhead for continuous data

Service Mesh (Istio, Linkerd): Infrastructure layer that optimizes service-to-service communication:

Connection pooling managed by sidecar proxies
Automatic retries and timeouts
Traffic management and load balancing
mTLS without application changes

The Service Mesh Trade-off

Service meshes add their own overhead—sidecar proxies add latency (typically 1-5ms per hop). The benefits (reliability, observability, security) often outweigh this cost, but in latency-critical paths, direct connections may be necessary. Measure before and after deploying a service mesh.

Connection Management and Overhead

Establishing network connections is expensive. Understanding and minimizing connection overhead is essential for low-latency systems.

TCP Connection Establishment (3-Way Handshake):

Client → Server: SYN
Server → Client: SYN-ACK
Client → Server: ACK

This requires 1.5 round trips before data can be sent. At 50ms RTT, that's 75ms just to establish a connection.

TLS Handshake (on top of TCP):

For HTTPS connections (TLS 1.2):

ClientHello
ServerHello, Certificate, KeyExchange, ServerHelloDone
ClientKeyExchange, ChangeCipherSpec, Finished
ChangeCipherSpec, Finished

This adds 2 more round trips. TLS 1.3 improves this to 1 round trip, and 0-RTT for resumed sessions.

Total New Connection Overhead:

TCP only: 1.5 RTT
TCP + TLS 1.2: 3.5 RTT
TCP + TLS 1.3: 2.5 RTT (1 RTT with 0-RTT resumption)

At 100ms RTT, a new TLS 1.2 connection takes 350ms before any application data. This is why connection reuse is critical.

Connection Pooling:

Maintain pools of established connections:

Connections are reused across requests
Handshake cost is amortized over many requests
Pool sizes must balance memory usage vs. connection availability

HTTP/2 Multiplexing:

HTTP/1.1 allows only one request at a time per connection (with limited pipelining). HTTP/2 changes this:

Multiple concurrent requests on a single connection
No head-of-line blocking at HTTP level
Reduced connection count required
Header compression (HPACK) reduces overhead

A single HTTP/2 connection can serve 100+ concurrent requests, versus 100 HTTP/1.1 connections for the same throughput.

Protocol Connection Overhead Comparison
Protocol	Handshake RTTs	Multiplexing	Header Overhead	Best For
HTTP/1.1 + TLS 1.2	3.5	No	High (text)	Browser compatibility
HTTP/2 + TLS 1.2	3.5	Yes	Low (HPACK)	Most scenarios
HTTP/2 + TLS 1.3	2.5	Yes	Low	Modern deployments
gRPC + HTTP/2	2.5	Yes + streaming	Very low (binary)	Internal services
WebSocket + TLS	3.5 (initial)	Yes (after upgrade)	Minimal (binary frames)	Real-time bidirectional

DNS Resolution Overhead:

Often overlooked, DNS resolution adds latency:

Typical DNS lookup: 10-100ms (depending on cache state)
Can be several round trips for uncached queries
Mitigations: local DNS caching, reduced TTLs, DNS prefetching

Keep-Alive Best Practices:

Enable HTTP keep-alive: Most modern servers/clients do this by default
Tune connection timeouts: Balance between keeping idle connections and resource consumption
Monitor connection churn: High new connection rates indicate potential for pooling improvement
Use connection-aware load balancing: Don't break persistent connections with aggressive rebalancing

The First Request Is the Slowest

Cold start latency—the first request to a new connection or endpoint—is always the worst. It includes DNS resolution, TCP handshake, TLS handshake, and often application-level warm-up. Pre-warming connections during deployment and maintaining connection pools are key to avoiding cold start penalties for users.

Geographic Distribution: Fighting the Speed of Light

For global applications, geography is the ultimate network constraint. The only way to beat the speed of light is to reduce the distance data must travel.

The Geography-Latency Relationship:

Approximate round-trip times (RTT) between major regions:

Route	Distance	Min RTT (physics)	Typical RTT
Same datacenter	< 1 km	< 0.01 ms	0.1-0.5 ms
Same region (e.g., US-East zones)	~100 km	~1 ms	1-5 ms
Cross-continent (US East to West)	~4,000 km	~40 ms	50-80 ms
Trans-Atlantic (US East to Europe)	~6,000 km	~60 ms	70-100 ms
Trans-Pacific (US West to Asia)	~10,000 km	~100 ms	100-180 ms
Global round trip	~40,000 km	~400 ms	N/A

Strategies for Geographic Distribution:

1. CDN (Content Delivery Network): Distribute static content to edge locations worldwide:

HTML, CSS, JavaScript files
Images, videos, and media
API caching for read-heavy endpoints

Users fetch content from nearby edge servers instead of origin.

2. Multi-Region Deployment: Run application servers in multiple geographic regions:

Route users to nearest region (GeoDNS, Anycast)
Reduces latency for compute as well as static content
Requires data replication strategy (eventual consistency vs. strong)

3. Edge Computing: Execute logic at edge locations, close to users:

Cloudflare Workers, AWS Lambda@Edge, Fastly Compute@Edge
Personalization, authentication, A/B testing at edge
Reduced round trips to origin for simple operations

Geographic Distribution Strategies
Strategy	What It Distributes	Latency Reduction	Complexity
CDN	Static content, cached API responses	High for static content	Low
Multi-region compute	Application servers	High for all requests	Medium-High
Edge computing	Light logic (auth, personalization)	High for supported use cases	Medium
Geo-replicated database	Data	High for reads; complex for writes	Very High
Anycast networking	Network routing	Automatic nearest selection	Medium

Data Replication Challenges:

The hardest part of geographic distribution is data:

Read replicas per region: Users read from local replica. Straightforward.
Writes across regions: Where is the source of truth? Options:
- Single write region: All writes go to one region. Simple but adds latency for distant users.
- Conflict resolution: Allow writes anywhere, resolve conflicts later (CRDTs, last-write-wins).
- Geo-partitioning: Each user's data lives in their region. Cross-region data access is rare.

Anycast Routing:

Anycast allows multiple servers to share the same IP address. Network routing automatically directs users to the nearest server:

DNS returns same IP globally
BGP routing selects closest path
No DNS TTL issues; works at network layer
Used by CDNs, DNS providers, and DDoS protection

Trade-offs of Distribution:

Consistency vs. latency: Strongly consistent reads require round trips to authoritative region.
Operational complexity: More regions = more infrastructure to manage, monitor, deploy.
Cost: Running in multiple regions multiplies infrastructure costs.
Data sovereignty: Some data must stay in specific geographic regions (GDPR, data residency laws).

Complexity Scales with Regions

Each additional region adds operational complexity: deployments, monitoring, failover testing, data replication. Start with regions that serve the majority of your users. A US-only deployment might serve 80% of users well; adding EU reduces latency for 15% more. Adding Asia-Pacific for the remaining 5% may not be worth the complexity for an early-stage product.

Network Reliability: Dealing with Failures

Networks fail. Packets get lost, connections get reset, entire links go down. Designing for network unreliability is a core skill for distributed systems engineers.

The Eight Fallacies of Distributed Computing:

First articulated in 1994, these remain true today:

The network is reliable. (It's not)
Latency is zero. (It's not)
Bandwidth is infinite. (It's not)
The network is secure. (It's not)
Topology doesn't change. (It does)
There is one administrator. (There isn't)
Transport cost is zero. (It isn't)
The network is homogeneous. (It isn't)

Assumptions that any of these are true lead to fragile systems.

Designing for Network Failure:

Timeouts: Every network call must have a timeout:

Without timeout: call can hang indefinitely, blocking resources
Too short: legitimate slow responses treated as failures
Too long: slow failure detection, resources blocked

Guideline: Timeout = (expected p99 latency) × 2-3x + buffer for retries

Retries: Failed requests should be retried (for idempotent operations):

Simple retry: immediate retry once or twice
Exponential backoff: increasing delay between retries (1s, 2s, 4s, 8s...)
Jitter: randomize backoff to avoid thundering herd after outage recovery

Circuit Breakers: When a service is failing, stop sending traffic:

Track failure rate over a window
If failures exceed threshold, 'open' the circuit (fail immediately)
Periodically test if service recovered ('half-open' state)
Prevents cascade failures and gives failing service time to recover

Idempotency: Design operations to be safely retriable:

Reading data: naturally idempotent
Creating with client-generated ID: idempotent (duplicate creates same record)
Incrementing counter: NOT idempotent (must use idempotency keys)

Network Resilience Patterns
Pattern	Purpose	Implementation	Considerations
Timeouts	Prevent indefinite waiting	Configure per-call; adjust based on SLA	Balance between fail-fast and false positives
Retries	Handle transient failures	Exponential backoff + jitter	Only for idempotent operations
Circuit Breaker	Prevent cascade failures	Track failures; trip when threshold exceeded	Need monitoring and alerting
Bulkhead	Isolate failures	Separate thread pools/connections per dependency	Resource overhead
Fallback	Graceful degradation	Return cached/default data on failure	Define appropriate fallback behavior

Graceful Degradation:

When network calls fail, the system should degrade gracefully rather than fail completely:

Return cached data: Serve stale data rather than error
Default values: Use sensible defaults when personalization service is down
Reduced functionality: Disable non-critical features; keep core functionality working
Queue for later: Store requests to process when service recovers

Example degradation hierarchy:

Try to get fresh data from service
On failure, retry with exponential backoff
On persistent failure, trip circuit breaker
Return cached data if available
Return default/static response if no cache
Only fail to user if absolutely no alternative

Test Your Failure Handling

Failure handling code that's never tested is likely broken. Use chaos engineering—deliberately inject network failures (latency, packet loss, connection resets) in testing and staging environments. Tools like Chaos Monkey, Toxiproxy, and Gremlin help simulate network failures. The goal: verify your system degrades gracefully, not catastrophically.

Measuring and Monitoring Network Performance

You can't optimize what you don't measure. Comprehensive network monitoring is essential for identifying bottlenecks and validating optimizations.

Key Metrics to Monitor:

Latency Metrics:

Round-trip time (RTT): Time for a packet to travel to destination and back
Application-level latency: Time from request sent to response received (includes processing)
Latency percentiles: p50, p95, p99, p99.9 for meaningful analysis
Latency by path: Per-service, per-endpoint, per-dependency latency breakdown

Throughput Metrics:

Bytes/second: Total data transfer rate
Requests/second: Request throughput
Bandwidth utilization: Percentage of link capacity used

Error Metrics:

Connection errors: Failed connection attempts
TCP retransmits: Packets that had to be resent (indicates loss/congestion)
Timeout rate: Requests that timed out waiting for response
Error rate by type: 4xx, 5xx, connection refused, etc.

Connection Metrics:

Active connections: Currently open connections
Connection rate: New connections per second
Connection pool utilization: Pool usage percentage
Time in connection state: TIME_WAIT accumulation, etc.

Tools for Network Monitoring:

Infrastructure Level:

netstat / ss: Socket statistics
iftop / nethogs: Bandwidth per process/connection
tcpdump / Wireshark: Packet capture and analysis
iperf3: Bandwidth testing
mtr: Combined traceroute + ping for path analysis

Application Level:

Distributed tracing (Jaeger, Zipkin): End-to-end request flow with timing
APM tools (Datadog, NewRelic): Application performance with network insight
Service mesh observability (Istio, Linkerd): Per-service network metrics

Synthetic Monitoring:

External probes from multiple regions
Real user measurement (RUM) for client-perceived latency
Scheduled health checks across service boundaries

Network Monitoring Best Practices

•Measure latency at every service boundary — Each internal call should be instrumented
•Track percentiles, not just averages — Averages hide tail latency problems
•Correlate network and application metrics — High retransmits often explain high latency
•Alert on anomalies, not just thresholds — Sudden changes matter even if below absolute threshold
•Measure from multiple vantage points — Internal and external perspective both needed
•Include network in distributed traces — Trace spans should show network vs. processing time
•Baseline during normal operation — Know your normal to detect abnormal

Traces Tell the Story

Distributed tracing is the single most valuable tool for understanding network impact in microservices. A trace shows exactly where time is spent: 10ms in Service A, 50ms waiting for Service B, 5ms network transfer. Without tracing, you're guessing at where the bottleneck is. With tracing, you know.

Summary: Mastering Network Bottlenecks

Network bottlenecks are often the hidden constraint in distributed systems. Unlike CPU or memory, which can be upgraded, network latency is fundamentally bounded by the speed of light. Let's consolidate the key insights from this page:

Key Takeaways

•Latency has multiple components — Propagation (physics), transmission (bandwidth), processing (equipment), and queuing (congestion) all contribute
•Bandwidth vs. latency are different problems — Compression helps bandwidth; reducing round trips helps latency
•Microservices amplify network sensitivity — Every service call adds latency; optimize call patterns aggressively
•Connection overhead is substantial — TCP + TLS handshakes add multiple RTTs; connection reuse is essential
•Geography is the ultimate constraint — The only way to beat the speed of light is to reduce distance
•Networks fail — Design for failure with timeouts, retries, circuit breakers, and graceful degradation
•Tail latencies matter — With many network calls, P99 of individual calls becomes P50 of aggregate request
•Measure comprehensively — Distributed tracing is essential for understanding network impact

Optimization Priority Framework:

Eliminate calls — Caching, precomputation, batching
Parallelize calls — Concurrent fetches for independent data
Reduce call size — Compression, efficient serialization, pagination
Reuse connections — Connection pooling, HTTP/2, gRPC
Reduce distance — Edge computing, multi-region deployment, CDN
Increase capacity — Bandwidth upgrades (only if bandwidth-constrained)

Module Complete:

With this page, you've completed Module 4: Constraints and Bottlenecks. You now understand how to identify constraints early, recognize the four fundamental resource bottlenecks (CPU, memory, network, disk), and dive deep into database and network bottlenecks—the two most common constraints in distributed systems.

This knowledge forms the foundation for all system design decisions: understanding what constrains your system allows you to make informed trade-offs and design architectures that work within their real-world boundaries.

Module Complete

Congratulations! You've completed Module 4: Constraints and Bottlenecks. You now have the mental models to identify system constraints, recognize the binding bottleneck, and apply mitigation strategies for CPU, memory, network, and disk limitations. These skills are fundamental to effective system design and will serve you in every architectural decision.