System Design (HLD)Identifying Bottlenecks

Identifying Performance Bottlenecks

LevelAdvanced

Duration90 mins

TopicIdentifying Bottlenecks

3 / 5

Network Bottlenecks: The Invisible Constraint in Distributed Systems

The Network: Where Latency Hides

Network bottlenecks are insidious. Unlike CPU saturation (visible in metrics) or database slowness (visible in query logs), network issues often masquerade as other problems. A service appears slow, but CPU is idle and queries are fast—the time is being consumed by the network, often invisible to standard monitoring.

In distributed systems, every component communicates over the network. Microservices call each other. Applications talk to databases. Caches sit between layers. CDNs serve content from edges. The network is omnipresent, and its constraints shape system performance fundamentally.

Understanding network bottlenecks requires thinking about latency, bandwidth, connection overhead, and the amplification effects that turn small network costs into massive performance problems at scale.

What You Will Learn

By the end of this page, you will understand the three dimensions of network bottlenecks (bandwidth, latency, connections), how network costs amplify in distributed systems, techniques for diagnosing network issues, and strategies for minimizing network overhead in your architecture.

The Three Dimensions of Network Bottlenecks

Network performance bottlenecks manifest in three distinct dimensions, each with different symptoms and solutions:

The Three Dimensions of Network Bottlenecks
Dimension	Definition	Key Metric	Typical Symptom
Bandwidth	Maximum data transfer rate	Megabits/Gigabits per second	Large transfers are slow; throughput plateaus
Latency	Time for a packet to travel from source to destination	Milliseconds (round-trip time)	Every request has a fixed time floor; small requests feel slow
Connection Overhead	Cost of establishing and maintaining connections	Time-to-first-byte, connection count	Many small requests are slow; connection limits hit

The Water Pipe Analogy:

Imagine network connections as water pipes:

Bandwidth is the diameter of the pipe—how much water (data) can flow per second
Latency is the length of the pipe—how long water takes to travel from end to end
Connection overhead is opening and closing the valve—there's a cost each time you start flowing

High bandwidth with high latency means you can transfer lots of data, but there's a delay before anything arrives (satellite internet). Low latency with low bandwidth means data arrives quickly, but you can't send much at once (dial-up). Each dimension requires different optimization strategies.

Bandwidth vs Latency: A Common Confusion

Many engineers confuse bandwidth and latency. A 10 Gbps connection (high bandwidth) can still have 100ms latency. Bandwidth is about throughput capacity; latency is about delay. A single small HTTP request on that 10 Gbps link still waits 100ms for the response, regardless of bandwidth.

Bandwidth Bottlenecks — When the Pipe Is Too Narrow

Bandwidth bottlenecks occur when the data transfer requirements exceed the network's capacity. They're most visible when transferring large amounts of data:

Large file uploads/downloads
Database bulk exports
Backup and replication streams
High-volume log shipping
Video streaming

Understanding Bandwidth Units:

Bandwidth Reference Table
Connection Type	Typical Bandwidth	Transfer 1 GB	Transfer 1 TB
Home broadband	100 Mbps	~80 seconds	~22 hours
Cloud VM (basic)	1 Gbps	~8 seconds	~2.2 hours
Cloud VM (high-perf)	10-25 Gbps	< 1 second	~15 minutes
Data center interconnect	100 Gbps	~0.08 seconds	~1.5 minutes
Cross-region link	Varies, often constrained	Higher latency adds delay	May have transfer caps

Diagnosing Bandwidth Saturation:

diagnosing_bandwidth.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#!/bin/bash
# =====================================================
# DIAGNOSING BANDWIDTH SATURATION
# =====================================================
 
# Real-time network interface throughput
iftop                          # Interactive bandwidth by connection
nethogs                        # Bandwidth per process
nload                          # Graph of network utilization
 
# Interface statistics
ip -s link show eth0           # Packets/bytes transmitted/received
cat /proc/net/dev              # Similar, raw numbers
 
# Sustained bandwidth test (between two hosts)
iperf3 -s                      # On server: start iperf server
iperf3 -c <server_ip>          # On client: test bandwidth
 
# Example output:
# [ ID] Interval           Transfer     Bitrate
# [  5]   0.00-10.00  sec  1.09 GBytes   937 Mbits/sec
# 
# Interpretation:
# - Close to theoretical max = saturated
# - Well below = no bandwidth issue, look elsewhere
 
 
# =====================================================
# IDENTIFYING BANDWIDTH-CONSUMING PROCESSES
# =====================================================
 
# Which process is using bandwidth?
nethogs eth0
# Shows PID, user, program, and KB/s sent/received
 
# For container environments
cat /sys/fs/cgroup/memory/docker/<container_id>/memory.usage_in_bytes
# Container network stats in /proc/<pid>/net/dev
 
 
# =====================================================
# CLOUD-SPECIFIC BANDWIDTH LIMITS
# =====================================================
 
# AWS: EBS bandwidth is often the hidden limit
# EC2 instances have EBS bandwidth caps (e.g., 4,750 Mbps for m5.xlarge)
# Symptoms: Disk I/O appears slow, but it's network to EBS
 
# Check EBS bandwidth consumption in CloudWatch:
# - VolumeReadBytes, VolumeWriteBytes
# - If near instance EBS bandwidth cap, you're throttled
 
# GCP: Network egress has per-VM limits
# Check instance type documentation for limits
 
# Azure: Similar per-VM bandwidth throttling
# Accelerated networking can help

Strategies for Bandwidth Bottlenecks:

Bandwidth Optimization Strategies

•Compression: Compress data before transmission. gzip reduces JSON by 70-90%. Protobuf is 3-10x smaller than JSON. WebP is 25-34% smaller than PNG.
•Minimize payloads: Send only needed data. Avoid SELECT *. Use GraphQL or sparse fieldsets. Paginate large responses.
•CDN for static content: Serve static assets from edge locations, reducing origin bandwidth consumption.
•Increase link capacity: Upgrade to larger instance types with more network bandwidth. Use enhanced networking (SR-IOV).
•Distribute load: If one server's bandwidth is saturated, distribute across multiple servers.
•Use efficient protocols: HTTP/2 multiplexing, WebSockets for persistent connections, gRPC for binary efficiency.
•Background/off-peak transfers: Move bulk data transfers to off-peak hours when bandwidth is available.

Latency Bottlenecks — The Speed of Light Constraint

Network latency is the time for a packet to travel from source to destination. Unlike bandwidth (which can be increased with better hardware), latency is constrained by physics—data cannot travel faster than the speed of light.

Latency Components:

Components of Network Latency
Component	Description	Typical Range	Mitigation
Propagation delay	Physical travel time (speed of light in fiber)	~5μs per km	Move closer (co-location, edge computing)
Transmission delay	Time to push bits onto the wire	Bandwidth-dependent	Smaller payloads, higher bandwidth
Processing delay	Time for routers/switches to process	~1-10 μs per hop	Fewer network hops, better hardware
Queuing delay	Time waiting in router queues	Varies wildly	Avoid congested paths, QoS
Application delay	Time for endpoint to process and respond	Application-dependent	Faster application code

Geographic Latency Reference:

The speed of light in fiber optic cable is approximately 200,000 km/s. A round-trip from New York to London (~5,500 km) has a minimum theoretical latency of ~55ms. Real-world latencies are typically 1.5-2x the theoretical minimum due to non-direct routes and processing delays.

Real-World Geographic Latency
Route	Distance	Theoretical Min	Typical Observed
Same datacenter	< 1 km	< 0.1 ms	0.1-0.5 ms
Same region (availability zone)	~50 km	~0.5 ms	1-2 ms
Cross-region (US-East to US-West)	~4,000 km	~40 ms	60-80 ms
Transatlantic (US-East to Europe)	~5,500 km	~55 ms	70-100 ms
Transpacific (US-West to Tokyo)	~8,500 km	~85 ms	100-150 ms
Global (Sydney to London)	~17,000 km	~170 ms	250-300 ms

Latency Is Fundamental

You cannot reduce propagation delay below the speed of light. If your users are 10,000 km from your servers, every single request incurs at least 100ms of latency. The only solutions are edge computing (move computation closer) or reducing the number of round-trips.

The Latency Amplification Problem:

In distributed systems, latency compounds. A request that makes 10 sequential service calls, each with 50ms latency, incurs 500ms of network latency alone—before any computation. This is latency amplification.

latency_amplification.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# =====================================================
# LATENCY AMPLIFICATION IN MICROSERVICES
# =====================================================
 
# A typical e-commerce checkout request:
#
# Client → API Gateway → Order Service → [
#     Inventory Service,
#     Payment Service → External Payment Gateway,
#     Shipping Service,
#     Notification Service
# ]
#
# If each hop is 5ms within datacenter:
# Sequential: 5 + 5 + 5 + 5 + 5 + 5 = 30ms network latency
# Plus 100ms to external payment gateway
# Total: 130ms+ just in network latency
 
import asyncio
import time
 
# =====================================================
# ANTI-PATTERN: SEQUENTIAL CALLS
# =====================================================
 
async def checkout_sequential(order):
    """Each call waits for the previous - latency accumulates."""
    
    start = time.time()
    
    # Sequential calls - each waits for previous
    inventory = await check_inventory(order.items)     # 5ms
    payment = await process_payment(order.payment)     # 100ms (external)
    shipping = await calculate_shipping(order.address) # 5ms
    await reserve_inventory(order.items)               # 5ms
    await send_confirmation(order.user)                # 5ms
    
    total_latency = time.time() - start
    # Result: ~120ms of sequential network latency
    return total_latency
 
 
# =====================================================
# PATTERN: PARALLEL CALLS WHERE POSSIBLE
# =====================================================
 
async def checkout_parallel(order):
    """
    Identify independent operations and parallelize.
    Still respect dependencies.
    """
    
    start = time.time()
    
    # Phase 1: Independent operations in parallel
    inventory_check, shipping_calc = await asyncio.gather(
        check_inventory(order.items),        # 5ms
        calculate_shipping(order.address),   # 5ms
    )  # Total: 5ms (parallel)
    
    # Phase 2: Dependent on inventory check
    if not inventory_check.available:
        raise OutOfStockError()
    
    payment = await process_payment(order.payment)  # 100ms (external, must be sequential)
    
    # Phase 3: After payment, can parallelize again
    await asyncio.gather(
        reserve_inventory(order.items),      # 5ms
        send_confirmation(order.user),       # 5ms
    )  # Total: 5ms (parallel)
    
    total_latency = time.time() - start
    # Result: ~110ms (5 + 100 + 5 = 110ms)
    # Saved 10ms compared to sequential
    return total_latency
 
 
# =====================================================
# PATTERN: REDUCING ROUND-TRIPS
# =====================================================
 
# ANTI-PATTERN: Multiple calls to same service
async def get_product_details_bad(product_ids):
    results = []
    for pid in product_ids:
        # N round-trips for N products!
        product = await product_service.get(pid)
        results.append(product)
    return results
 
# PATTERN: Batch API
async def get_product_details_good(product_ids):
    # Single round-trip for all products
    return await product_service.get_batch(product_ids)
 
 
# =====================================================
# MEASURING CALL CHAIN LATENCY
# =====================================================
 
# Distributed tracing (Jaeger, Zipkin, Datadog APM) shows:
# 1. Total request latency
# 2. Per-service latency breakdown
# 3. Sequential vs parallel call patterns
# 4. Where latency is introduced
#
# Critical for identifying which calls to optimize or parallelize

Strategies for Latency Bottlenecks:

Latency Reduction Strategies

•Geographic distribution: Deploy servers in regions near users. Use CDNs for static content. Regional database replicas for reads.
•Reduce round-trips: Batch API calls. Use GraphQL to fetch everything in one request. Prefetch data likely to be needed.
•Parallelize calls: Identify independent dependencies and call them concurrently.
•Cache aggressively: The fastest network call is the one you don't make. Cache at every layer.
•Connection reuse: Keep TCP connections open (HTTP keep-alive, connection pools). Avoid per-request connection establishment.
•Async processing: For non-critical operations (logging, analytics, notifications), use message queues instead of synchronous calls.
•Edge computing: Move computation to CDN edges or user devices. Process data where it's generated.

Connection Overhead Bottlenecks — The Cost of Establishment

Every network connection has setup and teardown costs. For short-lived connections or high request rates, this overhead can dominate actual data transfer time.

TCP Connection Establishment:

connection_overhead.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
TCP Three-Way Handshake:
 
Client                                Server
  |                                     |
  |-------- SYN (seq=x) --------------->|  RTT #1 start
  |                                     |
  |<------ SYN-ACK (seq=y, ack=x+1) ----|  RTT #1 end / RTT #2 start
  |                                     |
  |-------- ACK (ack=y+1) ------------->|  RTT #2 end
  |                                     |
  |  Connection established, can send   |
  |                                     |
 
Time cost: 1.5 RTT (round-trip times)
For 50ms RTT: 75ms just to open connection before any data
 
 
TLS Handshake (on top of TCP):
 
Client                                Server
  |                                     |
  |-------- TCP SYN ------------------->|
  |<------ TCP SYN-ACK -----------------|
  |-------- TCP ACK ------------------->|  (1.5 RTT for TCP)
  |                                     |
  |-------- ClientHello --------------->|
  |<------ ServerHello, Certificate ----|
  |-------- KeyExchange, Finished ----->|
  |<------ Finished --------------------|  (2 RTT for TLS 1.2)
  |                                     |
  |  Secure connection established      |
 
Time cost: 3.5 RTT total (TCP + TLS 1.2)
TLS 1.3: 2.5 RTT (1 fewer round-trip)
0-RTT resumption: 1.5 RTT for subsequent connections
 
For 50ms RTT:
- TCP only: 75ms
- TLS 1.2: 175ms
- TLS 1.3: 125ms
- TLS 1.3 0-RTT: 75ms (resumed)

The Impact of Connection Overhead:

Consider a web page that loads 50 resources. With HTTP/1.1 and no keep-alive:

50 resources × 175ms TLS handshake = 8.75 seconds of connection overhead alone

This is why connection reuse, HTTP/2, and connection pooling are critical.

Connection Optimization Techniques
Technique	How It Helps	Typical Improvement	Considerations
HTTP Keep-Alive	Reuses TCP connections across requests	Eliminates 1.5+ RTT per request	Enabled by default in HTTP/1.1+
HTTP/2 Multiplexing	Multiple requests over single connection	One connection serves all requests	Requires TLS; no head-of-line blocking fix
HTTP/3 (QUIC)	UDP-based, 0-RTT connection resumption	~1 RTT for resumed connections	New protocol; growing support
Connection Pooling	Pre-established DB/service connections	Eliminates per-request connect cost	Pool sizing is critical
TLS Session Tickets	Resume TLS sessions without full handshake	Save 1+ RTT on repeated connections	Requires server support
Persistent Connections	Long-lived connections (WebSocket, gRPC)	One handshake, unlimited requests	Connection management complexity

connection_pooling_best_practices.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# =====================================================
# HTTP CONNECTION POOLING (CLIENT SIDE)
# =====================================================
 
import aiohttp
import httpx
 
# ANTI-PATTERN: New connection per request
async def fetch_bad(url):
    async with aiohttp.ClientSession() as session:
        # Creates new session (connection pool) per call!
        async with session.get(url) as response:
            return await response.text()
 
# Each call: TCP + TLS handshake = 150ms+ overhead
 
 
# PATTERN: Shared connection pool
class HttpClient:
    """Singleton pattern for shared connection pool."""
    
    _session = None
    
    @classmethod
    async def get_session(cls):
        if cls._session is None or cls._session.closed:
            # Pool configuration
            connector = aiohttp.TCPConnector(
                limit=100,           # Max total connections
                limit_per_host=10,   # Max per destination host
                keepalive_timeout=30, # Keep idle connections for 30s
                enable_cleanup_closed=True,
            )
            cls._session = aiohttp.ClientSession(connector=connector)
        return cls._session
    
    @classmethod
    async def fetch(cls, url):
        session = await cls.get_session()
        async with session.get(url) as response:
            return await response.text()
 
# First call: 150ms (new connection)
# Subsequent calls: 5ms (reused connection from pool)
 
 
# =====================================================
# GRPC CONNECTION MANAGEMENT
# =====================================================
 
import grpc
 
# ANTI-PATTERN: New channel per call
def call_service_bad(request):
    channel = grpc.insecure_channel('service:50051')
    stub = MyServiceStub(channel)
    return stub.Method(request)
    # Channel closed after call - connection wasted
 
# PATTERN: Shared channel with health checking
class GrpcClient:
    def __init__(self, target):
        self.channel = grpc.insecure_channel(
            target,
            options=[
                ('grpc.keepalive_time_ms', 30000),
                ('grpc.keepalive_timeout_ms', 10000),
                ('grpc.keepalive_permit_without_calls', True),
                ('grpc.http2.max_pings_without_data', 0),
            ]
        )
        self.stub = MyServiceStub(self.channel)
    
    def call(self, request):
        return self.stub.Method(request)
 
# Connection established once, reused indefinitely
 
 
# =====================================================
# MONITORING CONNECTION HEALTH
# =====================================================
 
# Key metrics to track:
# 1. Connection pool utilization: active / max connections
# 2. Wait time for connection from pool
# 3. Connection error rate (failed to establish)
# 4. New connections per second (should be low with good pooling)
# 5. Connection lifetime (long = healthy reuse)
 
# Detect connection leaks:
# - Pool exhausted but few active connections
# - Connections not returned to pool after use
# - Always use context managers / try-finally for cleanup

HTTP/2 Changes the Game

HTTP/2 multiplexes all requests to a host over a single TCP connection. Instead of 6 parallel connections (HTTP/1.1 browser limit), you have one connection handling hundreds of concurrent requests. This dramatically reduces connection overhead and makes connection reuse automatic.

Diagnosing Network Bottlenecks — Finding the Invisible Problem

Network problems are often misdiagnosed because they manifest as application slowness. A systematic approach is required:

Step 1: Confirm Network Is the Bottleneck

network_diagnostics.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#!/bin/bash
# =====================================================
# STEP 1: BASIC CONNECTIVITY AND LATENCY
# =====================================================
 
# Ping: basic latency measurement
ping -c 10 target-host
# Check: min/avg/max/stddev
# High stddev = jitter = unstable connection
 
# MTR: traceroute with continuous stats
mtr target-host
# Shows per-hop latency and packet loss
# Identifies which network segment is slow
 
# TCP-specific latency (more realistic than ICMP)
hping3 -S -p 443 target-host
# Or use tcping:
tcping target-host 443
 
 
# =====================================================
# STEP 2: BANDWIDTH TESTING
# =====================================================
 
# iperf3: point-to-point bandwidth test
# On server:
iperf3 -s
 
# On client:
iperf3 -c server-ip -t 30 -P 4
# -t 30: run for 30 seconds
# -P 4: use 4 parallel streams
 
# Interpretation:
# Measured vs expected = saturation indicator
 
 
# =====================================================
# STEP 3: DNS RESOLUTION
# =====================================================
 
# DNS lookup time (often overlooked latency source)
dig target-host | grep "Query time"
# Or:
time nslookup target-host
 
# If high: 
# - Check DNS server proximity
# - Implement DNS caching (dnsmasq, resolved)
# - Consider DNS prefetching in applications
 
 
# =====================================================
# STEP 4: TLS HANDSHAKE ANALYSIS
# =====================================================
 
# Measure TLS handshake time
curl -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nFirst byte: %{time_starttransfer}s\nTotal: %{time_total}s\n" -o /dev/null -s https://target-host
 
# Example output:
# DNS: 0.024s
# Connect: 0.055s      (TCP handshake done)
# TLS: 0.142s          (TLS handshake done)
# First byte: 0.198s   (server processing + first response)
# Total: 0.234s
 
# Breakdown:
# DNS time: 24ms
# TCP handshake: 55-24 = 31ms
# TLS handshake: 142-55 = 87ms
# Server processing: 198-142 = 56ms
# Data transfer: 234-198 = 36ms
 
 
# =====================================================
# STEP 5: PACKET CAPTURE ANALYSIS
# =====================================================
 
# Capture traffic for detailed analysis
tcpdump -i eth0 -w capture.pcap host target-host
 
# Analyze in Wireshark:
# - Look for retransmissions (packet loss)
# - Check TCP window size (could be limiting throughput)
# - Identify slow segments in the connection
 
# Quick stats from tcpdump:
tcpdump -r capture.pcap -q | wc -l  # packet count
tcpdump -r capture.pcap 'tcp[tcpflags] & tcp-syn != 0' | wc -l  # connections
 
 
# =====================================================
# STEP 6: APPLICATION-LEVEL TRACING
# =====================================================
 
# Distributed tracing shows where time is spent
# Tools: Jaeger, Zipkin, Datadog APM, Honeycomb
 
# Look for:
# - Time between spans (network latency)
# - External service call duration
# - Database query time vs connection acquisition time

Common Network Bottleneck Patterns:

Network Bottleneck Diagnostic Patterns
Observation	Likely Cause	Investigation	Solution
Slow first request, fast subsequent	Connection establishment overhead	Check time_connect vs time_starttransfer	Connection pooling, HTTP/2, keep-alive
All requests slow by ~Xms	Network latency (distance)	Check ping/mtr latency to dependencies	Move services closer, add caching layer
Throughput plateaus under load	Bandwidth saturation	Monitor interface throughput with iftop	Compress payloads, more bandwidth, distribute load
Intermittent slow requests	Packet loss/retransmission	Check for retransmissions in tcpdump	Investigate network path for congestion
Slow DNS in curl timing	DNS resolution latency	Check DNS server, resolution time	Local DNS caching, reduce DNS lookups
High TLS time in curl timing	TLS handshake overhead	Check TLS version, certificate chain	TLS 1.3, session resumption, smaller cert chains

Distributed Tracing Is Essential

In microservices architectures, distributed tracing (Jaeger, Zipkin, etc.) is invaluable for identifying network bottlenecks. It shows exactly where time is spent across service boundaries, revealing whether delays are in computation or network transit.

Architectural Patterns to Minimize Network Overhead

Good system architecture anticipates network constraints and designs to minimize their impact:

Network-Aware Architectural Patterns

•Backend-for-Frontend (BFF): Single API endpoint aggregates calls to multiple services. One round-trip from client instead of many. BFF handles fan-out to services within the datacenter (low latency).
•Edge Computing / CDN: Push computation and caching to network edges, closer to users. Cloudflare Workers, AWS Lambda@Edge. Reduce geographic latency for common operations.
•Data Locality: Co-locate services that communicate frequently. If Service A calls Service B 1000x/second, they should be in the same availability zone or region.
•Asynchronous Communication: Use message queues (Kafka, RabbitMQ, SQS) for non-latency-sensitive operations. Decouple sender and receiver; tolerate network delays.
•Read Replicas / Cache Layers: Place data closer to consumers. Read replicas in each region. Cache frequently-accessed data to avoid network round-trips.
•Protocol Efficiency: Use binary protocols (gRPC, Protocol Buffers) instead of text (JSON/REST) for high-volume internal communication. Smaller payloads, faster parsing.
•Request Collapsing / Batching: Combine multiple small requests into one larger request. Reduces connection count and round-trip overhead.

bff_pattern_example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# =====================================================
# BACKEND-FOR-FRONTEND (BFF) PATTERN
# =====================================================
# 
# Problem: Mobile app needs data from 5 services
# Without BFF: 5 round-trips to remote services (500ms+ latency)
# With BFF: 1 round-trip to BFF, BFF fans out internally
 
# =====================================================
# CLIENT PERSPECTIVE
# =====================================================
 
# WITHOUT BFF (from mobile app)
async def load_product_page_old(product_id):
    # Each call crosses internet to backend
    product = await api.get(f"/products/{product_id}")    # 100ms
    reviews = await api.get(f"/reviews/{product_id}")     # 100ms  
    inventory = await api.get(f"/inventory/{product_id}") # 100ms
    pricing = await api.get(f"/pricing/{product_id}")     # 100ms
    related = await api.get(f"/related/{product_id}")     # 100ms
    # Total: ~500ms of network latency from mobile
 
# WITH BFF (from mobile app)
async def load_product_page_new(product_id):
    # One call to BFF, which aggregates internally
    response = await api.get(f"/bff/product-page/{product_id}")
    # Total: ~100ms from mobile (one round-trip)
    return response  # Contains all data pre-assembled
 
 
# =====================================================
# BFF SERVER IMPLEMENTATION
# =====================================================
 
from fastapi import FastAPI
import asyncio
 
app = FastAPI()
 
@app.get("/bff/product-page/{product_id}")
async def get_product_page(product_id: str):
    """
    Aggregate data from multiple services for product page.
    Services are in same datacenter - ~2ms latency each.
    """
    
    # Parallel calls within datacenter
    product, reviews, inventory, pricing, related = await asyncio.gather(
        product_service.get(product_id),    # 2ms
        review_service.get(product_id),     # 2ms
        inventory_service.get(product_id),  # 2ms
        pricing_service.get(product_id),    # 2ms
        related_service.get(product_id),    # 2ms
    )
    # Total internal latency: ~2ms (parallel)
    
    # Return pre-assembled response
    return {
        "product": product,
        "reviews": reviews,
        "inventory": inventory,
        "pricing": pricing,
        "related_products": related,
    }
 
# Client latency breakdown:
# - Mobile -> BFF: 100ms
# - BFF -> Services (parallel): 2ms
# - Total: ~102ms vs ~500ms = 5x improvement

The Golden Rule of Distributed Systems

Every network round-trip you eliminate is guaranteed latency reduction. The fastest RPC is the one you don't make. Always ask: Can we batch these calls? Cache the result? Move the computation closer?

Summary: Mastering Network Bottleneck Diagnosis

Network bottlenecks are often invisible but always impactful. In distributed systems, the network underlies every operation, and its constraints shape what's possible.

Key Takeaways

•Three dimensions: Bandwidth (throughput capacity), latency (delay per round-trip), and connection overhead (setup/teardown cost).
•Latency is physics: You cannot transmit data faster than light. Geographic distance imposes hard latency floors. The only solutions are moving computation closer or reducing round-trips.
•Latency amplifies: Each service call in a chain adds latency. Sequential calls compound; parallelize where possible.
•Connection overhead matters: TCP + TLS handshakes cost 2-4 RTTs. Connection pooling, HTTP/2, and keep-alive eliminate this overhead.
•Diagnosis requires tools: curl timing breakdown, mtr for path analysis, tcpdump for packet-level issues, distributed tracing for application-level view.
•Architectural patterns help: BFF for aggregation, edge computing for latency reduction, async for tolerance, batching for efficiency.
•The fastest RPC is no RPC: Every network call you eliminate is guaranteed latency reduction. Cache aggressively, batch requests, and co-locate communicating services.

What's Next:

With network bottlenecks covered, we'll examine memory constraints in the next page. You'll learn about heap management, garbage collection pauses, memory leaks, cache sizing, and the subtle ways memory pressure manifests as performance degradation.

Page Complete

You now understand the three dimensions of network bottlenecks, can diagnose network issues using systematic tools, and know architectural patterns to minimize network overhead. The network is no longer invisible—you can measure it, understand it, and optimize around it.

3 / 5

Loading learning content...

System Design (HLD)Identifying Bottlenecks

Identifying Performance Bottlenecks

LevelAdvanced

Duration90 mins

TopicIdentifying Bottlenecks

3 / 5

Network Bottlenecks: The Invisible Constraint in Distributed Systems

The Network: Where Latency Hides

What You Will Learn

The Three Dimensions of Network Bottlenecks

Network performance bottlenecks manifest in three distinct dimensions, each with different symptoms and solutions:

The Three Dimensions of Network Bottlenecks
Dimension	Definition	Key Metric	Typical Symptom
Bandwidth	Maximum data transfer rate	Megabits/Gigabits per second	Large transfers are slow; throughput plateaus
Latency	Time for a packet to travel from source to destination	Milliseconds (round-trip time)	Every request has a fixed time floor; small requests feel slow
Connection Overhead	Cost of establishing and maintaining connections	Time-to-first-byte, connection count	Many small requests are slow; connection limits hit

The Water Pipe Analogy:

Imagine network connections as water pipes:

Bandwidth is the diameter of the pipe—how much water (data) can flow per second
Latency is the length of the pipe—how long water takes to travel from end to end
Connection overhead is opening and closing the valve—there's a cost each time you start flowing

Bandwidth vs Latency: A Common Confusion

Bandwidth Bottlenecks — When the Pipe Is Too Narrow

Bandwidth bottlenecks occur when the data transfer requirements exceed the network's capacity. They're most visible when transferring large amounts of data:

Large file uploads/downloads
Database bulk exports
Backup and replication streams
High-volume log shipping
Video streaming

Understanding Bandwidth Units:

Bandwidth Reference Table
Connection Type	Typical Bandwidth	Transfer 1 GB	Transfer 1 TB
Home broadband	100 Mbps	~80 seconds	~22 hours
Cloud VM (basic)	1 Gbps	~8 seconds	~2.2 hours
Cloud VM (high-perf)	10-25 Gbps	< 1 second	~15 minutes
Data center interconnect	100 Gbps	~0.08 seconds	~1.5 minutes
Cross-region link	Varies, often constrained	Higher latency adds delay	May have transfer caps

Diagnosing Bandwidth Saturation:

diagnosing_bandwidth.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#!/bin/bash
# =====================================================
# DIAGNOSING BANDWIDTH SATURATION
# =====================================================
 
# Real-time network interface throughput
iftop                          # Interactive bandwidth by connection
nethogs                        # Bandwidth per process
nload                          # Graph of network utilization
 
# Interface statistics
ip -s link show eth0           # Packets/bytes transmitted/received
cat /proc/net/dev              # Similar, raw numbers
 
# Sustained bandwidth test (between two hosts)
iperf3 -s                      # On server: start iperf server
iperf3 -c <server_ip>          # On client: test bandwidth
 
# Example output:
# [ ID] Interval           Transfer     Bitrate
# [  5]   0.00-10.00  sec  1.09 GBytes   937 Mbits/sec
# 
# Interpretation:
# - Close to theoretical max = saturated
# - Well below = no bandwidth issue, look elsewhere
 
 
# =====================================================
# IDENTIFYING BANDWIDTH-CONSUMING PROCESSES
# =====================================================
 
# Which process is using bandwidth?
nethogs eth0
# Shows PID, user, program, and KB/s sent/received
 
# For container environments
cat /sys/fs/cgroup/memory/docker/<container_id>/memory.usage_in_bytes
# Container network stats in /proc/<pid>/net/dev
 
 
# =====================================================
# CLOUD-SPECIFIC BANDWIDTH LIMITS
# =====================================================
 
# AWS: EBS bandwidth is often the hidden limit
# EC2 instances have EBS bandwidth caps (e.g., 4,750 Mbps for m5.xlarge)
# Symptoms: Disk I/O appears slow, but it's network to EBS
 
# Check EBS bandwidth consumption in CloudWatch:
# - VolumeReadBytes, VolumeWriteBytes
# - If near instance EBS bandwidth cap, you're throttled
 
# GCP: Network egress has per-VM limits
# Check instance type documentation for limits
 
# Azure: Similar per-VM bandwidth throttling
# Accelerated networking can help

Strategies for Bandwidth Bottlenecks:

Bandwidth Optimization Strategies

•Compression: Compress data before transmission. gzip reduces JSON by 70-90%. Protobuf is 3-10x smaller than JSON. WebP is 25-34% smaller than PNG.
•Minimize payloads: Send only needed data. Avoid SELECT *. Use GraphQL or sparse fieldsets. Paginate large responses.
•CDN for static content: Serve static assets from edge locations, reducing origin bandwidth consumption.
•Increase link capacity: Upgrade to larger instance types with more network bandwidth. Use enhanced networking (SR-IOV).
•Distribute load: If one server's bandwidth is saturated, distribute across multiple servers.
•Use efficient protocols: HTTP/2 multiplexing, WebSockets for persistent connections, gRPC for binary efficiency.
•Background/off-peak transfers: Move bulk data transfers to off-peak hours when bandwidth is available.

Latency Bottlenecks — The Speed of Light Constraint

Latency Components:

Components of Network Latency
Component	Description	Typical Range	Mitigation
Propagation delay	Physical travel time (speed of light in fiber)	~5μs per km	Move closer (co-location, edge computing)
Transmission delay	Time to push bits onto the wire	Bandwidth-dependent	Smaller payloads, higher bandwidth
Processing delay	Time for routers/switches to process	~1-10 μs per hop	Fewer network hops, better hardware
Queuing delay	Time waiting in router queues	Varies wildly	Avoid congested paths, QoS
Application delay	Time for endpoint to process and respond	Application-dependent	Faster application code

Geographic Latency Reference:

Real-World Geographic Latency
Route	Distance	Theoretical Min	Typical Observed
Same datacenter	< 1 km	< 0.1 ms	0.1-0.5 ms
Same region (availability zone)	~50 km	~0.5 ms	1-2 ms
Cross-region (US-East to US-West)	~4,000 km	~40 ms	60-80 ms
Transatlantic (US-East to Europe)	~5,500 km	~55 ms	70-100 ms
Transpacific (US-West to Tokyo)	~8,500 km	~85 ms	100-150 ms
Global (Sydney to London)	~17,000 km	~170 ms	250-300 ms

Latency Is Fundamental

The Latency Amplification Problem:

latency_amplification.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# =====================================================
# LATENCY AMPLIFICATION IN MICROSERVICES
# =====================================================
 
# A typical e-commerce checkout request:
#
# Client → API Gateway → Order Service → [
#     Inventory Service,
#     Payment Service → External Payment Gateway,
#     Shipping Service,
#     Notification Service
# ]
#
# If each hop is 5ms within datacenter:
# Sequential: 5 + 5 + 5 + 5 + 5 + 5 = 30ms network latency
# Plus 100ms to external payment gateway
# Total: 130ms+ just in network latency
 
import asyncio
import time
 
# =====================================================
# ANTI-PATTERN: SEQUENTIAL CALLS
# =====================================================
 
async def checkout_sequential(order):
    """Each call waits for the previous - latency accumulates."""
    
    start = time.time()
    
    # Sequential calls - each waits for previous
    inventory = await check_inventory(order.items)     # 5ms
    payment = await process_payment(order.payment)     # 100ms (external)
    shipping = await calculate_shipping(order.address) # 5ms
    await reserve_inventory(order.items)               # 5ms
    await send_confirmation(order.user)                # 5ms
    
    total_latency = time.time() - start
    # Result: ~120ms of sequential network latency
    return total_latency
 
 
# =====================================================
# PATTERN: PARALLEL CALLS WHERE POSSIBLE
# =====================================================
 
async def checkout_parallel(order):
    """
    Identify independent operations and parallelize.
    Still respect dependencies.
    """
    
    start = time.time()
    
    # Phase 1: Independent operations in parallel
    inventory_check, shipping_calc = await asyncio.gather(
        check_inventory(order.items),        # 5ms
        calculate_shipping(order.address),   # 5ms
    )  # Total: 5ms (parallel)
    
    # Phase 2: Dependent on inventory check
    if not inventory_check.available:
        raise OutOfStockError()
    
    payment = await process_payment(order.payment)  # 100ms (external, must be sequential)
    
    # Phase 3: After payment, can parallelize again
    await asyncio.gather(
        reserve_inventory(order.items),      # 5ms
        send_confirmation(order.user),       # 5ms
    )  # Total: 5ms (parallel)
    
    total_latency = time.time() - start
    # Result: ~110ms (5 + 100 + 5 = 110ms)
    # Saved 10ms compared to sequential
    return total_latency
 
 
# =====================================================
# PATTERN: REDUCING ROUND-TRIPS
# =====================================================
 
# ANTI-PATTERN: Multiple calls to same service
async def get_product_details_bad(product_ids):
    results = []
    for pid in product_ids:
        # N round-trips for N products!
        product = await product_service.get(pid)
        results.append(product)
    return results
 
# PATTERN: Batch API
async def get_product_details_good(product_ids):
    # Single round-trip for all products
    return await product_service.get_batch(product_ids)
 
 
# =====================================================
# MEASURING CALL CHAIN LATENCY
# =====================================================
 
# Distributed tracing (Jaeger, Zipkin, Datadog APM) shows:
# 1. Total request latency
# 2. Per-service latency breakdown
# 3. Sequential vs parallel call patterns
# 4. Where latency is introduced
#
# Critical for identifying which calls to optimize or parallelize

Strategies for Latency Bottlenecks:

Latency Reduction Strategies

•Geographic distribution: Deploy servers in regions near users. Use CDNs for static content. Regional database replicas for reads.
•Reduce round-trips: Batch API calls. Use GraphQL to fetch everything in one request. Prefetch data likely to be needed.
•Parallelize calls: Identify independent dependencies and call them concurrently.
•Cache aggressively: The fastest network call is the one you don't make. Cache at every layer.
•Connection reuse: Keep TCP connections open (HTTP keep-alive, connection pools). Avoid per-request connection establishment.
•Async processing: For non-critical operations (logging, analytics, notifications), use message queues instead of synchronous calls.
•Edge computing: Move computation to CDN edges or user devices. Process data where it's generated.

Connection Overhead Bottlenecks — The Cost of Establishment

Every network connection has setup and teardown costs. For short-lived connections or high request rates, this overhead can dominate actual data transfer time.

TCP Connection Establishment:

connection_overhead.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
TCP Three-Way Handshake:
 
Client                                Server
  |                                     |
  |-------- SYN (seq=x) --------------->|  RTT #1 start
  |                                     |
  |<------ SYN-ACK (seq=y, ack=x+1) ----|  RTT #1 end / RTT #2 start
  |                                     |
  |-------- ACK (ack=y+1) ------------->|  RTT #2 end
  |                                     |
  |  Connection established, can send   |
  |                                     |
 
Time cost: 1.5 RTT (round-trip times)
For 50ms RTT: 75ms just to open connection before any data
 
 
TLS Handshake (on top of TCP):
 
Client                                Server
  |                                     |
  |-------- TCP SYN ------------------->|
  |<------ TCP SYN-ACK -----------------|
  |-------- TCP ACK ------------------->|  (1.5 RTT for TCP)
  |                                     |
  |-------- ClientHello --------------->|
  |<------ ServerHello, Certificate ----|
  |-------- KeyExchange, Finished ----->|
  |<------ Finished --------------------|  (2 RTT for TLS 1.2)
  |                                     |
  |  Secure connection established      |
 
Time cost: 3.5 RTT total (TCP + TLS 1.2)
TLS 1.3: 2.5 RTT (1 fewer round-trip)
0-RTT resumption: 1.5 RTT for subsequent connections
 
For 50ms RTT:
- TCP only: 75ms
- TLS 1.2: 175ms
- TLS 1.3: 125ms
- TLS 1.3 0-RTT: 75ms (resumed)

The Impact of Connection Overhead:

Consider a web page that loads 50 resources. With HTTP/1.1 and no keep-alive:

50 resources × 175ms TLS handshake = 8.75 seconds of connection overhead alone

This is why connection reuse, HTTP/2, and connection pooling are critical.

Connection Optimization Techniques
Technique	How It Helps	Typical Improvement	Considerations
HTTP Keep-Alive	Reuses TCP connections across requests	Eliminates 1.5+ RTT per request	Enabled by default in HTTP/1.1+
HTTP/2 Multiplexing	Multiple requests over single connection	One connection serves all requests	Requires TLS; no head-of-line blocking fix
HTTP/3 (QUIC)	UDP-based, 0-RTT connection resumption	~1 RTT for resumed connections	New protocol; growing support
Connection Pooling	Pre-established DB/service connections	Eliminates per-request connect cost	Pool sizing is critical
TLS Session Tickets	Resume TLS sessions without full handshake	Save 1+ RTT on repeated connections	Requires server support
Persistent Connections	Long-lived connections (WebSocket, gRPC)	One handshake, unlimited requests	Connection management complexity

connection_pooling_best_practices.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# =====================================================
# HTTP CONNECTION POOLING (CLIENT SIDE)
# =====================================================
 
import aiohttp
import httpx
 
# ANTI-PATTERN: New connection per request
async def fetch_bad(url):
    async with aiohttp.ClientSession() as session:
        # Creates new session (connection pool) per call!
        async with session.get(url) as response:
            return await response.text()
 
# Each call: TCP + TLS handshake = 150ms+ overhead
 
 
# PATTERN: Shared connection pool
class HttpClient:
    """Singleton pattern for shared connection pool."""
    
    _session = None
    
    @classmethod
    async def get_session(cls):
        if cls._session is None or cls._session.closed:
            # Pool configuration
            connector = aiohttp.TCPConnector(
                limit=100,           # Max total connections
                limit_per_host=10,   # Max per destination host
                keepalive_timeout=30, # Keep idle connections for 30s
                enable_cleanup_closed=True,
            )
            cls._session = aiohttp.ClientSession(connector=connector)
        return cls._session
    
    @classmethod
    async def fetch(cls, url):
        session = await cls.get_session()
        async with session.get(url) as response:
            return await response.text()
 
# First call: 150ms (new connection)
# Subsequent calls: 5ms (reused connection from pool)
 
 
# =====================================================
# GRPC CONNECTION MANAGEMENT
# =====================================================
 
import grpc
 
# ANTI-PATTERN: New channel per call
def call_service_bad(request):
    channel = grpc.insecure_channel('service:50051')
    stub = MyServiceStub(channel)
    return stub.Method(request)
    # Channel closed after call - connection wasted
 
# PATTERN: Shared channel with health checking
class GrpcClient:
    def __init__(self, target):
        self.channel = grpc.insecure_channel(
            target,
            options=[
                ('grpc.keepalive_time_ms', 30000),
                ('grpc.keepalive_timeout_ms', 10000),
                ('grpc.keepalive_permit_without_calls', True),
                ('grpc.http2.max_pings_without_data', 0),
            ]
        )
        self.stub = MyServiceStub(self.channel)
    
    def call(self, request):
        return self.stub.Method(request)
 
# Connection established once, reused indefinitely
 
 
# =====================================================
# MONITORING CONNECTION HEALTH
# =====================================================
 
# Key metrics to track:
# 1. Connection pool utilization: active / max connections
# 2. Wait time for connection from pool
# 3. Connection error rate (failed to establish)
# 4. New connections per second (should be low with good pooling)
# 5. Connection lifetime (long = healthy reuse)
 
# Detect connection leaks:
# - Pool exhausted but few active connections
# - Connections not returned to pool after use
# - Always use context managers / try-finally for cleanup

HTTP/2 Changes the Game

Diagnosing Network Bottlenecks — Finding the Invisible Problem

Network problems are often misdiagnosed because they manifest as application slowness. A systematic approach is required:

Step 1: Confirm Network Is the Bottleneck

network_diagnostics.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#!/bin/bash
# =====================================================
# STEP 1: BASIC CONNECTIVITY AND LATENCY
# =====================================================
 
# Ping: basic latency measurement
ping -c 10 target-host
# Check: min/avg/max/stddev
# High stddev = jitter = unstable connection
 
# MTR: traceroute with continuous stats
mtr target-host
# Shows per-hop latency and packet loss
# Identifies which network segment is slow
 
# TCP-specific latency (more realistic than ICMP)
hping3 -S -p 443 target-host
# Or use tcping:
tcping target-host 443
 
 
# =====================================================
# STEP 2: BANDWIDTH TESTING
# =====================================================
 
# iperf3: point-to-point bandwidth test
# On server:
iperf3 -s
 
# On client:
iperf3 -c server-ip -t 30 -P 4
# -t 30: run for 30 seconds
# -P 4: use 4 parallel streams
 
# Interpretation:
# Measured vs expected = saturation indicator
 
 
# =====================================================
# STEP 3: DNS RESOLUTION
# =====================================================
 
# DNS lookup time (often overlooked latency source)
dig target-host | grep "Query time"
# Or:
time nslookup target-host
 
# If high: 
# - Check DNS server proximity
# - Implement DNS caching (dnsmasq, resolved)
# - Consider DNS prefetching in applications
 
 
# =====================================================
# STEP 4: TLS HANDSHAKE ANALYSIS
# =====================================================
 
# Measure TLS handshake time
curl -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nFirst byte: %{time_starttransfer}s\nTotal: %{time_total}s\n" -o /dev/null -s https://target-host
 
# Example output:
# DNS: 0.024s
# Connect: 0.055s      (TCP handshake done)
# TLS: 0.142s          (TLS handshake done)
# First byte: 0.198s   (server processing + first response)
# Total: 0.234s
 
# Breakdown:
# DNS time: 24ms
# TCP handshake: 55-24 = 31ms
# TLS handshake: 142-55 = 87ms
# Server processing: 198-142 = 56ms
# Data transfer: 234-198 = 36ms
 
 
# =====================================================
# STEP 5: PACKET CAPTURE ANALYSIS
# =====================================================
 
# Capture traffic for detailed analysis
tcpdump -i eth0 -w capture.pcap host target-host
 
# Analyze in Wireshark:
# - Look for retransmissions (packet loss)
# - Check TCP window size (could be limiting throughput)
# - Identify slow segments in the connection
 
# Quick stats from tcpdump:
tcpdump -r capture.pcap -q | wc -l  # packet count
tcpdump -r capture.pcap 'tcp[tcpflags] & tcp-syn != 0' | wc -l  # connections
 
 
# =====================================================
# STEP 6: APPLICATION-LEVEL TRACING
# =====================================================
 
# Distributed tracing shows where time is spent
# Tools: Jaeger, Zipkin, Datadog APM, Honeycomb
 
# Look for:
# - Time between spans (network latency)
# - External service call duration
# - Database query time vs connection acquisition time

Common Network Bottleneck Patterns:

Network Bottleneck Diagnostic Patterns
Observation	Likely Cause	Investigation	Solution
Slow first request, fast subsequent	Connection establishment overhead	Check time_connect vs time_starttransfer	Connection pooling, HTTP/2, keep-alive
All requests slow by ~Xms	Network latency (distance)	Check ping/mtr latency to dependencies	Move services closer, add caching layer
Throughput plateaus under load	Bandwidth saturation	Monitor interface throughput with iftop	Compress payloads, more bandwidth, distribute load
Intermittent slow requests	Packet loss/retransmission	Check for retransmissions in tcpdump	Investigate network path for congestion
Slow DNS in curl timing	DNS resolution latency	Check DNS server, resolution time	Local DNS caching, reduce DNS lookups
High TLS time in curl timing	TLS handshake overhead	Check TLS version, certificate chain	TLS 1.3, session resumption, smaller cert chains

Distributed Tracing Is Essential

Architectural Patterns to Minimize Network Overhead

Good system architecture anticipates network constraints and designs to minimize their impact:

Network-Aware Architectural Patterns

•Backend-for-Frontend (BFF): Single API endpoint aggregates calls to multiple services. One round-trip from client instead of many. BFF handles fan-out to services within the datacenter (low latency).
•Edge Computing / CDN: Push computation and caching to network edges, closer to users. Cloudflare Workers, AWS Lambda@Edge. Reduce geographic latency for common operations.
•Data Locality: Co-locate services that communicate frequently. If Service A calls Service B 1000x/second, they should be in the same availability zone or region.
•Asynchronous Communication: Use message queues (Kafka, RabbitMQ, SQS) for non-latency-sensitive operations. Decouple sender and receiver; tolerate network delays.
•Read Replicas / Cache Layers: Place data closer to consumers. Read replicas in each region. Cache frequently-accessed data to avoid network round-trips.
•Protocol Efficiency: Use binary protocols (gRPC, Protocol Buffers) instead of text (JSON/REST) for high-volume internal communication. Smaller payloads, faster parsing.
•Request Collapsing / Batching: Combine multiple small requests into one larger request. Reduces connection count and round-trip overhead.

bff_pattern_example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# =====================================================
# BACKEND-FOR-FRONTEND (BFF) PATTERN
# =====================================================
# 
# Problem: Mobile app needs data from 5 services
# Without BFF: 5 round-trips to remote services (500ms+ latency)
# With BFF: 1 round-trip to BFF, BFF fans out internally
 
# =====================================================
# CLIENT PERSPECTIVE
# =====================================================
 
# WITHOUT BFF (from mobile app)
async def load_product_page_old(product_id):
    # Each call crosses internet to backend
    product = await api.get(f"/products/{product_id}")    # 100ms
    reviews = await api.get(f"/reviews/{product_id}")     # 100ms  
    inventory = await api.get(f"/inventory/{product_id}") # 100ms
    pricing = await api.get(f"/pricing/{product_id}")     # 100ms
    related = await api.get(f"/related/{product_id}")     # 100ms
    # Total: ~500ms of network latency from mobile
 
# WITH BFF (from mobile app)
async def load_product_page_new(product_id):
    # One call to BFF, which aggregates internally
    response = await api.get(f"/bff/product-page/{product_id}")
    # Total: ~100ms from mobile (one round-trip)
    return response  # Contains all data pre-assembled
 
 
# =====================================================
# BFF SERVER IMPLEMENTATION
# =====================================================
 
from fastapi import FastAPI
import asyncio
 
app = FastAPI()
 
@app.get("/bff/product-page/{product_id}")
async def get_product_page(product_id: str):
    """
    Aggregate data from multiple services for product page.
    Services are in same datacenter - ~2ms latency each.
    """
    
    # Parallel calls within datacenter
    product, reviews, inventory, pricing, related = await asyncio.gather(
        product_service.get(product_id),    # 2ms
        review_service.get(product_id),     # 2ms
        inventory_service.get(product_id),  # 2ms
        pricing_service.get(product_id),    # 2ms
        related_service.get(product_id),    # 2ms
    )
    # Total internal latency: ~2ms (parallel)
    
    # Return pre-assembled response
    return {
        "product": product,
        "reviews": reviews,
        "inventory": inventory,
        "pricing": pricing,
        "related_products": related,
    }
 
# Client latency breakdown:
# - Mobile -> BFF: 100ms
# - BFF -> Services (parallel): 2ms
# - Total: ~102ms vs ~500ms = 5x improvement

The Golden Rule of Distributed Systems

Every network round-trip you eliminate is guaranteed latency reduction. The fastest RPC is the one you don't make. Always ask: Can we batch these calls? Cache the result? Move the computation closer?

Summary: Mastering Network Bottleneck Diagnosis

Network bottlenecks are often invisible but always impactful. In distributed systems, the network underlies every operation, and its constraints shape what's possible.

Key Takeaways

•Three dimensions: Bandwidth (throughput capacity), latency (delay per round-trip), and connection overhead (setup/teardown cost).
•Latency is physics: You cannot transmit data faster than light. Geographic distance imposes hard latency floors. The only solutions are moving computation closer or reducing round-trips.
•Latency amplifies: Each service call in a chain adds latency. Sequential calls compound; parallelize where possible.
•Connection overhead matters: TCP + TLS handshakes cost 2-4 RTTs. Connection pooling, HTTP/2, and keep-alive eliminate this overhead.
•Diagnosis requires tools: curl timing breakdown, mtr for path analysis, tcpdump for packet-level issues, distributed tracing for application-level view.
•Architectural patterns help: BFF for aggregation, edge computing for latency reduction, async for tolerance, batching for efficiency.
•The fastest RPC is no RPC: Every network call you eliminate is guaranteed latency reduction. Cache aggressively, batch requests, and co-locate communicating services.

What's Next:

Page Complete

3 / 5