Network Performance - Learning Module

Loading content...

0/228

Latency

The Time It Takes: Understanding Network Delay

When you click a link and wait for a page to load, you're experiencing latency—the delay between initiating an action and seeing the result. While bandwidth determines how much data can flow, latency determines how long each piece takes to arrive.

Latency is the time delay between when data is sent and when it's received. It's measured in milliseconds (ms) for network operations and microseconds (μs) for low-latency systems. Unlike bandwidth, which can be increased by upgrading hardware, latency has fundamental physical limits—the speed of light sets an absolute floor.

What You Will Learn

By the end of this page, you will understand latency from physical fundamentals through practical measurement. You'll learn to decompose latency into its components, identify which are fixable, and apply strategies that reduce user-perceived delay in real systems.

Defining Latency: Precision Matters

Network latency encompasses several related but distinct concepts. Precision in terminology prevents miscommunication:

One-Way Latency (OWL): The time for a packet to travel from source to destination. Difficult to measure accurately because it requires synchronized clocks at both endpoints.

Round-Trip Time (RTT): The time for a packet to travel from source to destination AND for a response to return. Most commonly measured and reported; what 'ping' shows.

Request-Response Latency: The time from sending a request to receiving a complete response. Includes server processing time. What users experience.

For most practical purposes: Latency ≈ RTT / 2 (assuming symmetric paths, which isn't always true)

Latency Terminology
Term	Definition	Measurement	Common Values
One-Way Latency	Source to destination time	NTP-synced hosts	Same LAN: <0.5ms
Round-Trip Time	Out and back	ping, TCP handshake	Internet: 20-200ms
Request-Response	Full transaction time	Application logs	Web: 100-500ms
First Byte (TTFB)	Request to first response byte	Browser tools	Web: 50-200ms
Last Byte	Request to complete response	Load testing tools	Varies widely

RTT vs. Latency

Be precise when discussing latency. 'Ping is 50ms' means RTT is 50ms, so one-way latency is ~25ms. Saying 'latency is 50ms' is ambiguous. In TCP discussions, RTT is the proper term because the protocol operates on round-trips for ACKs.

Why Latency Matters:

User Experience — Humans perceive delays >100ms; >200ms feels sluggish; >1s breaks flow of thought
TCP Throughput — Throughput ≈ Window / RTT; higher latency = lower throughput for same window
Application Performance — Many applications require multiple round-trips; latency multiplies
Real-Time Systems — Gaming, video conferencing, trading require low latency for usability
Geographic Reach — Latency determines practical distance for interactive services

Deconstructing Latency: The Four Components

Total network latency is the sum of four distinct delays. Understanding each component reveals optimization opportunities:

1. Propagation Delay (dp): Time for a signal to travel through the medium. Determined by distance and propagation speed.

dp = Distance / Propagation Speed

Light in vacuum: ~300,000 km/s (3 × 10⁸ m/s)
Light in fiber: ~200,000 km/s (~0.67c due to refractive index)
Electrical signal in copper: ~200,000 km/s (~0.67c)

This is the physics-limited component. New York to London (5,500 km) has a minimum propagation delay of ~28ms (one-way) through fiber. No technology can reduce this—only moving endpoints closer helps.

Propagation Delay Reality Check
Route	Distance	Min. Propagation (Fiber)	Typical RTT	Overhead Factor
Same building	500m	~3 μs	<1ms	~300x (switching)
Same city	50 km	~0.25ms	5-10ms	~30x
Cross-country (US)	4,000 km	~20ms	50-70ms	~2.5x
Transatlantic	5,500 km	~28ms	70-90ms	~2.5x
Transpacific	10,000 km	~50ms	120-180ms	~2.5x
GEO Satellite	72,000 km†	~240ms	500-700ms	~2.5x

†GEO satellites are 36,000 km up; signal travels up, to satellite, to ground station, and back = 4 × 36,000 km × 0.5 for one-way minimum.

2. Transmission Delay (dt): Time to push all bits of a packet onto the link. Depends on packet size and link bandwidth.

dt = Packet Size / Link Bandwidth

1500 bytes on 10 Mbps: 1.2 ms
1500 bytes on 1 Gbps: 12 μs
1500 bytes on 100 Gbps: 0.12 μs

Transmission delay is negligible at high speeds but becomes significant on slow links (DSL, cellular, satellite) or for very large packets.

3. Processing Delay (dproc): Time for network devices to process a packet—examining headers, making routing decisions, checking errors.

Modern routers: 1-10 μs per hop
Firewalls with deep inspection: 10-100 μs
Load balancers: 10-50 μs
Software-based forwarding: 100-1000 μs

Processing delay adds up over many hops but is usually small compared to propagation on wide-area paths. However, complex middleboxes can add significant delay.

4. Queuing Delay (dq): Time a packet waits in buffers before being transmitted. This is the variable, unpredictable component.

Empty queue: ~0
Light load: microseconds
Heavy load: milliseconds
Near-capacity: can be hundreds of milliseconds
Overload: packet loss (infinite delay)

Queuing delay is where latency problems usually hide. It varies with traffic load and can spike dramatically during congestion.

The Bufferbloat Problem

Large buffers in network devices (designed to prevent packet loss) can cause enormous queuing delays. A 1MB buffer on a 10 Mbps link can hold 800ms of traffic! This 'bufferbloat' destroys latency while masking congestion signals. Modern solutions include AQM (CoDel, PIE, fq_codel).

latency_components.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
"""
Network latency component calculator.
Demonstrates how each delay component contributes to total latency.
"""
 
def calculate_latency(
    distance_km: float,
    packet_size_bytes: int,
    link_bandwidth_mbps: float,
    hops: int = 5,
    processing_per_hop_us: float = 10,
    queue_utilization: float = 0.3  # Fraction of link capacity in use
) -> dict:
    """
    Calculate network latency components.
    
    Args:
        distance_km: Physical distance in kilometers
        packet_size_bytes: Packet size in bytes
        link_bandwidth_mbps: Link speed in Mbps
        hops: Number of router hops
        processing_per_hop_us: Processing delay per hop in microseconds
        queue_utilization: Current link utilization (0-1)
    
    Returns:
        Dictionary with delay components
    """
    # Constants
    SPEED_IN_FIBER = 200_000  # km/s (0.67 * c)
    
    # 1. Propagation delay (ms)
    propagation_ms = (distance_km / SPEED_IN_FIBER) * 1000
    
    # 2. Transmission delay (ms)  
    packet_bits = packet_size_bytes * 8
    link_bps = link_bandwidth_mbps * 1_000_000
    transmission_ms = (packet_bits / link_bps) * 1000
    
    # 3. Processing delay (ms)
    processing_ms = (hops * processing_per_hop_us) / 1000
    
    # 4. Queuing delay - simplified M/M/1 model (ms)
    # Average queue delay = transmission_time / (1 - utilization)
    if queue_utilization >= 0.99:
        queue_utilization = 0.99  # Cap to avoid infinity
    queue_ms = (transmission_ms / (1 - queue_utilization)) - transmission_ms
    queue_ms *= hops  # Queuing at each hop
    
    # Total one-way delay
    total_ms = propagation_ms + transmission_ms + processing_ms + queue_ms
    
    # RTT (assuming symmetric path)
    rtt_ms = total_ms * 2
    
    return {
        "propagation_ms": round(propagation_ms, 3),
        "transmission_ms": round(transmission_ms, 3),
        "processing_ms": round(processing_ms, 3),
        "queuing_ms": round(queue_ms, 3),
        "total_one_way_ms": round(total_ms, 3),
        "rtt_ms": round(rtt_ms, 3),
        "dominant_component": max(
            [("propagation", propagation_ms),
             ("transmission", transmission_ms),
             ("processing", processing_ms),
             ("queuing", queue_ms)],
            key=lambda x: x[1]
        )[0]
    }
 
# Example scenarios
scenarios = [
    ("LAN - Same Building", 0.1, 1500, 1000, 2, 0.1),
    ("LAN - Cross Campus", 2, 1500, 1000, 5, 0.3),
    ("Metro - Same City", 50, 1500, 10000, 8, 0.4),
    ("WAN - Cross Country", 4000, 1500, 1000, 15, 0.5),
    ("WAN - Transatlantic", 5500, 1500, 10000, 20, 0.3),
    ("DSL - Last Mile", 5, 1500, 10, 3, 0.6),  # Slow link
    ("Congested Link", 100, 1500, 100, 5, 0.9),  # Heavy load
]
 
print("=== Network Latency Component Analysis ===\n")
print(f"{'Scenario':<25} {'Prop':>8} {'Trans':>8} {'Proc':>8} {'Queue':>8} {'RTT':>10} {'Dominant':>12}")
print("-" * 90)
 
for name, dist, pkt, bw, hops, util in scenarios:
    result = calculate_latency(dist, pkt, bw, hops, 10, util)
    print(f"{name:<25} {result['propagation_ms']:>7.3f}ms {result['transmission_ms']:>7.3f}ms "
          f"{result['processing_ms']:>7.3f}ms {result['queuing_ms']:>7.3f}ms "
          f"{result['rtt_ms']:>9.2f}ms {result['dominant_component']:>12}")

The Speed of Light Barrier

Unlike bandwidth, which can be increased with technology, propagation latency faces a fundamental physical limit: nothing travels faster than light. This has profound implications for global-scale systems.

The Physics:

Speed of light in vacuum: c ≈ 299,792 km/s
Speed in optical fiber: ~198,000 km/s (~0.66c) due to glass refractive index
Speed in copper: ~200,000 km/s (~0.67c)

The Geometry: Real network paths are longer than straight-line distance:

Cables follow geography (coastlines, mountains, rights-of-way)
Undersea cables avoid shipping lanes and seismic zones
Fiber paths may be 1.5-3x great-circle distance

Example: New York to London

Great-circle distance: ~5,500 km
Actual submarine cable path: ~6,500 km
Theoretical minimum (straight line through Earth): ~3,800 km (impossible!)

Speed of Light Latency Minimums (One-Way)
Route	Straight Line	Fiber Path	Typical Actual	Potential Savings
NYC ↔ Chicago	~3.9ms	~4.5ms	~7ms	~36%
NYC ↔ London	~18ms	~22ms	~35ms	~37%
NYC ↔ Tokyo	~36ms	~45ms	~85ms	~47%
London ↔ Singapore	~35ms	~50ms	~90ms	~44%
SF ↔ Sydney	~40ms	~55ms	~75ms	~27%

The Race for Microseconds

High-frequency trading firms spend billions building straighter fiber routes and microwave/laser networks that approach theoretical minimums. The Spread Networks Chicago-NYC fiber route (2010) shaved 3ms off the standard path. Later microwave links cut it further. These microseconds are worth millions in arbitrage opportunities.

Implications for System Design:

Edge Deployment is Essential
- Users 10,000 km away face 100+ ms minimum RTT
- CDNs and edge computing bring content/processing closer
- No software optimization overcomes physics
Geographic Architecture Matters
- Regional deployments for latency-sensitive services
- Multi-region active-active for global users
- Cloud region selection is a latency decision
Some Things Are Impossible
- Real-time multiplayer gaming across oceans (150ms+ RTT)
- Interactive video with distant participants (noticeable delay)
- Distributed consensus at global scale (CAP theorem implications)

Measuring Latency: Tools and Techniques

Accurate latency measurement requires appropriate tools and methodology. Different tools measure different things:

ICMP-Based (Ping):

Measures RTT using ICMP Echo Request/Reply
Simple, universally available
May be deprioritized by routers (not representative)
ICMP may be blocked entirely

TCP-Based:

Uses TCP handshake or application data
Treated same as regular traffic
Includes TCP stack processing time
More representative of application experience

Application-Level:

Measures actual transaction time
Includes server processing
What users actually experience
Most relevant for SLAs

latency_measurement.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#!/bin/bash
# Comprehensive latency measurement toolkit
 
TARGET="example.com"
COUNT=100
 
echo "=== Latency Measurement Suite ==="
echo "Target: $TARGET"
echo ""
 
# === 1. Basic ICMP Ping ===
echo "1. ICMP Ping (basic RTT):"
ping -c $COUNT $TARGET | tail -3
echo ""
 
# === 2. TCP-based ping (hping3) ===
echo "2. TCP Ping (port 443, more realistic):"
sudo hping3 -S -p 443 -c $COUNT $TARGET 2>&1 | grep "rtt"
echo ""
 
# === 3. Path analysis (traceroute) ===
echo "3. Traceroute (per-hop latency):"
traceroute -n $TARGET
echo ""
 
# === 4. MTR (continuous traceroute) ===
echo "4. MTR (10 rounds, shows loss and jitter per hop):"
mtr -r -c 10 -n $TARGET
echo ""
 
# === 5. TCP Connection Time ===
echo "5. TCP Connection Time (curl):"
for i in {1..5}; do
    curl -o /dev/null -s -w "Connect: %{time_connect}s, TTFB: %{time_starttransfer}s, Total: %{time_total}s\n" https://$TARGET
done
echo ""
 
# === 6. DNS Latency ===
echo "6. DNS Resolution Time:"
for i in {1..5}; do
    { time dig +short $TARGET; } 2>&1 | grep real | awk '{print $2}'
done
echo ""
 
# === 7. Latency Distribution (ping with timestamps) ===
echo "7. Generating latency distribution..."
ping -c $COUNT $TARGET | grep "time=" | awk -F'time=' '{print $2}' | awk '{print $1}' > /tmp/latency.txt
echo "Stats (ms):"
awk '{sum+=$1; if($1<min||NR==1)min=$1; if($1>max)max=$1} 
     END{print "  Min:", min, "  Avg:", sum/NR, "  Max:", max}' /tmp/latency.txt
     
# Calculate percentiles
sort -n /tmp/latency.txt > /tmp/latency_sorted.txt
LINES=$(wc -l < /tmp/latency_sorted.txt)
P50=$(sed -n "$((LINES/2))p" /tmp/latency_sorted.txt)
P95=$(sed -n "$((LINES*95/100))p" /tmp/latency_sorted.txt)
P99=$(sed -n "$((LINES*99/100))p" /tmp/latency_sorted.txt)
echo "  P50: $P50 ms  P95: $P95 ms  P99: $P99 ms"

Understanding Measurement Output:

Interpreting Latency Metrics
Metric	Meaning	When to Use	Watch For
Min RTT	Best-case latency	Baseline/theoretical	Unrealistically low (cached/broken)
Avg RTT	Mean latency	General comparison	Skewed by outliers
P50 (Median)	Typical experience	User experience	High if avg is low (outliers)
P95/P99	Tail latency	SLA, performance issues	Order of magnitude vs avg
Max RTT	Worst-case	Troubleshooting	Transient spikes
StdDev/Jitter	Variability	Streaming, VoIP	High variance = problems

Tail Latency Matters

Average latency hides problems. If P50 is 10ms but P99 is 500ms, 1% of requests are 50x slower. For a system handling 1000 requests/second, that's 10 users per second experiencing terrible performance. Always measure and monitor percentiles, especially P95 and P99.

Latency Impact on Applications

Different applications have vastly different latency sensitivities. Understanding these requirements guides architecture decisions:

Human Perception Thresholds:

0-100ms: Feels instant, imperceptible delay
100-300ms: Noticeable but acceptable delay
300-1000ms: Clearly sluggish, user frustration begins
>1000ms: Application feels broken, users abandon tasks

For reference, human reaction time averages ~200-250ms, so delays under 100ms are generally imperceptible.

Application Latency Requirements
Application Type	Target Latency	Maximum Tolerable	Why
Voice call (VoIP)	<150ms one-way	300ms	Conversational flow breaks
Video conferencing	<150ms RTT	400ms	Lip sync, interaction
Online gaming (FPS)	<30ms RTT	75ms	Aim, reactions
Online gaming (RTS)	<100ms RTT	200ms	Unit control
Financial trading	<1ms	10ms	Arbitrage windows
Web browsing	<100ms TTFB	500ms	User experience dwell
Video streaming	<5s start	30s	User abandonment
Bulk transfer	N/A	N/A	Throughput matters, not latency

The Latency Multiplication Problem:

Many operations require multiple network round-trips, multiplying latency impact:

DNS resolution: 1-2 RTTs before connection starts
TCP handshake: 1 RTT (SYN → SYN-ACK → ACK)
TLS handshake: 1-2 additional RTTs (TLS 1.2 = 2, TLS 1.3 = 1)
HTTP request: At least 1 RTT
API calls: Each external call adds RTTs

Example: Loading a simple HTTPS page (100ms RTT)

DNS lookup: 100ms
TCP handshake: 100ms
TLS 1.2 handshake: 200ms
HTTP request/response: 100ms

Total: 500ms before first byte (for a cold connection)

This is why connection reuse, HTTP/2, and TLS session resumption matter enormously.

Reducing Round Trips

Every protocol optimization that reduces RTTs provides latency benefit. TCP Fast Open eliminates 1 RTT. TLS 1.3 reduces handshake by 1 RTT. HTTP/2 multiplexing avoids new connection setup. 0-RTT resumption in TLS 1.3 and QUIC eliminates handshake latency entirely for repeat connections.

Latency Optimization Techniques

•Connection pooling — Reuse TCP connections instead of creating new ones per request
•Keep-alive — Maintain connections for subsequent requests
•HTTP/2 and HTTP/3 — Multiplex requests over fewer connections
•TLS session resumption — Cache TLS parameters to skip full handshake
•Pre-connection — Establish connections before user clicks (prefetch, preconnect)
•Edge caching — Serve content from geographically close servers
•Async operations — Don't block user experience on slow operations
•Request batching — Combine multiple small requests into one

Latency vs. Bandwidth: Different Problems

A common misconception is that faster internet (more bandwidth) means faster loading. While bandwidth helps for large transfers, latency dominates for interactive use and small requests.

The Fundamental Relationship:

For a data transfer: Transfer Time = Latency + (Size / Bandwidth)

Small objects: Latency dominates. A 10KB file on 10Mbps takes 8ms to transmit, but with 100ms RTT, takes 108ms+ (latency is 93% of time)
Large objects: Bandwidth dominates. A 1GB file on 100Mbps takes 80s; latency is negligible percentage

The Crossover Point: The size above which bandwidth matters more than latency: Critical Size = Bandwidth × RTT = BDP

For a 100ms RTT, 100 Mbps connection: 100ms × 100 Mbps = 1.25 MB

Objects < 1.25 MB: latency-sensitive
Objects > 1.25 MB: bandwidth-sensitive

When Latency Matters Most

•Web browsing (many small files)
•API calls
•Database queries
•Interactive applications
•Real-time communication
•Gaming
•IoT sensor data

When Bandwidth Matters Most

•Video streaming
•Large file downloads
•Backup and sync
•Scientific data transfer
•Media production
•Software distribution
•Database replication

Web Performance Reality

Studies show that above ~5-10 Mbps, increasing bandwidth has diminishing returns for web page load time. Most web pages are limited by the number of round-trips required to fetch resources, not by transfer speed. This is why HTTP/2 (fewer connections, multiplexing) often helps more than raw bandwidth upgrades.

latency_vs_bandwidth.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
"""
Demonstrate the relative impact of latency vs bandwidth.
Shows when each factor dominates transfer time.
"""
 
def transfer_time(size_kb: float, bandwidth_mbps: float, 
                  rtt_ms: float, round_trips: int = 1) -> dict:
    """
    Calculate transfer time and breakdown.
    
    Args:
        size_kb: Object size in kilobytes
        bandwidth_mbps: Link bandwidth in Mbps
        rtt_ms: Round-trip time in milliseconds
        round_trips: Number of RTTs required (setup, etc.)
    
    Returns:
        Timing breakdown
    """
    # Calculate components
    size_bits = size_kb * 8 * 1024
    bandwidth_bps = bandwidth_mbps * 1_000_000
    
    transmission_ms = (size_bits / bandwidth_bps) * 1000
    latency_cost_ms = rtt_ms * round_trips
    total_ms = transmission_ms + latency_cost_ms
    
    latency_fraction = latency_cost_ms / total_ms * 100
    
    return {
        "total_ms": round(total_ms, 2),
        "transmission_ms": round(transmission_ms, 2),
        "latency_ms": round(latency_cost_ms, 2),
        "latency_fraction": round(latency_fraction, 1),
        "bandwidth_limited": latency_fraction < 50,
    }
 
# Compare typical web objects
objects = [
    ("HTML page", 20, 2),        # 20KB, 2 RTTs (DNS+connect+request)
    ("CSS file", 50, 1),         # 50KB, 1 RTT
    ("JavaScript", 200, 1),      # 200KB, 1 RTT
    ("Image", 500, 1),           # 500KB, 1 RTT
    ("Video chunk", 2000, 1),    # 2MB, 1 RTT
    ("Full video", 500000, 1),   # 500MB, 1 RTT
]
 
print("=== Latency vs Bandwidth Impact ===\n")
 
for bandwidth in [10, 100, 1000]:  # 10 Mbps, 100 Mbps, 1 Gbps
    print(f"\n--- Bandwidth: {bandwidth} Mbps, RTT: 50ms ---")
    print(f"{'Object':<15} {'Size':>8} {'Total':>8} {'BW%':>6} {'Lat%':>6} {'Limited By':<12}")
    
    for name, size_kb, rtts in objects:
        result = transfer_time(size_kb, bandwidth, 50, rtts)
        bw_pct = 100 - result['latency_fraction']
        limited = "Bandwidth" if result['bandwidth_limited'] else "Latency"
        print(f"{name:<15} {size_kb:>7}KB {result['total_ms']:>7}ms {bw_pct:>5.1f}% "
              f"{result['latency_fraction']:>5.1f}% {limited:<12}")
 
# Critical size calculation (BDP)
print("\n=== Critical Object Size (BDP) ===")
print("Objects smaller than this are latency-dominated")
for bandwidth, rtt in [(10, 50), (100, 50), (1000, 50), (100, 10), (100, 200)]:
    bdp_kb = (bandwidth * 1000000 * rtt / 1000) / 8 / 1024
    print(f"  {bandwidth:4d} Mbps, {rtt:3d}ms RTT: {bdp_kb:,.0f} KB critical size")

Strategies for Reducing Latency

Different latency components require different reduction strategies. Here's a comprehensive approach:

Reducing Propagation Delay (Physics-Limited):

•Move closer to users — Deploy CDN edge nodes, regional deployments. Can't break physics, but can shorten distance.
•Straighter network paths — Premium transit, direct peering, private network paths.
•Low-latency media — Hollow-core fiber (experimental, speed-of-light in vacuum), microwave/laser links (shorter than fiber paths).

Reducing Queuing Delay:

•QoS prioritization — Prioritize latency-sensitive traffic (voice, gaming) over bulk transfers.
•Active Queue Management — CoDel, PIE, fq_codel prevent bufferbloat by intelligent dropping.
•Over-provisioning — Links under 50% utilization have minimal queuing delay.
•Traffic engineering — Route around congested paths, load balance.
•Smaller buffers — Counter-intuitively, smaller buffers can reduce latency at the cost of some throughput.

Protocol-Level Optimizations:

Protocol Optimizations for Latency
Optimization	RTT Savings	Where	Trade-offs
TCP Fast Open	1 RTT	Connection setup	Security considerations, limited support
TLS 1.3	1 RTT	TLS handshake	Requires modern clients/servers
0-RTT Resumption	1-2 RTT	Repeat connections	Replay attack risk
HTTP/2 Push	1 RTT	Resource loading	Hard to predict what to push
QUIC	1-2 RTT	Connection + TLS	UDP blocked by some networks
Preconnect/Prefetch	1-3 RTT	Predictive	Wasted connections if not used

End System Optimizations:

•Kernel bypass — DPDK, kernel bypass networking for ultra-low-latency. Reduces processing by skipping kernel stack.
•Interrupt coalescing tuning — Balance between CPU efficiency and latency. Lower coalescing = lower latency, more CPU.
•CPU pinning — Dedicate cores to network processing, avoid context switches.
•NUMA awareness — Process packets on same NUMA node as NIC for cache efficiency.
•Busy polling — Reduce interrupt latency by polling for packets. Higher CPU, lower latency.

The Priority Order

Optimize in impact order: 1) Reduce round-trips (protocol fixes), 2) Deploy closer to users (CDN), 3) Fix queuing delays (bufferbloat, congestion), 4) Low-level tuning (kernel, NIC). Most latency wins come from steps 1-2; step 4 matters only for extreme requirements.

Monitoring Latency in Production

Production latency monitoring requires continuous measurement, proper statistics, and actionable alerting.

What to Monitor:

Key Latency Metrics

•P50 (Median) — Typical user experience. If P50 degrades, everyone notices.
•P95 — Slower than 95% of requests. Affects significant minority of users.
•P99 — Tail latency. Critical for high-request volumes; 1% of 1M requests is 10K slow requests.
•P99.9 — Extreme tail. Matters for very high-scale systems.
•Max — Worst case. Useful for debugging, but often transient.
•Standard Deviation — Latency variability. High variance = unpredictable experience.

Monitoring Approaches:

Synthetic Monitoring:
- Scheduled probes from multiple locations
- Measures network/infrastructure latency
- Detects problems before users do
- Limited by probe frequency and locations
Real User Monitoring (RUM):
- JavaScript in browser reports actual timings
- Captures real user experience
- Includes device/network variability
- Sample bias (fast users don't wait)
Application Performance Monitoring (APM):
- Instrument application code
- Traces request through all components
- Identifies internal latency sources
- Adds overhead (typically 1-5%)

Latency Alerting Strategy
Metric	Warning	Critical	Action
P50	1.5× baseline	2× baseline	Investigate, may be traffic spike
P95	2× baseline	3× baseline	Likely real problem
P99	2× baseline	5× baseline	Tail latency issue
Absolute P50	100ms	200ms	User experience degradation
Absolute P99	500ms	1s	Significant user impact

Averaging is Lying

Never alert on average latency alone. An average of 50ms could mean consistent 50ms (good) or 10ms with occasional 500ms spikes (bad). The average hides the outliers that ruin user experience. Always use percentiles for latency monitoring.

Summary: Mastering Latency

Latency is the time dimension of network performance—often more critical than bandwidth for user experience. Understanding its components, limitations, and optimization strategies enables you to build responsive systems.

Key Takeaways

•Latency = Propagation + Transmission + Processing + Queuing — Each component requires different optimization strategies. Propagation is physics-limited; queuing is often the culprit for variable latency.
•The speed of light sets a floor — No technology overcomes physics. NYC-London minimum RTT is ~60ms. Global systems must architect around this reality.
•Measure correctly — Use RTT and percentiles (P50, P95, P99), not averages. Tail latency hides in averages but ruins user experience.
•Round-trips multiply latency — Protocol overhead (DNS, TCP, TLS, HTTP) adds multiple RTTs. Connection reuse and modern protocols (HTTP/2, QUIC) reduce this.
•Latency dominates small transfers — For typical web objects (< 10KB-1MB), latency matters more than bandwidth. Reducing RTT helps more than increasing speed.
•Bufferbloat creates hidden latency — Large buffers prevent loss but cause huge queuing delays. AQM (CoDel, fq_codel) is the solution.
•Monitor percentiles continuously — P99 reveals what averages hide. Establish baselines, alert on deviations, investigate tail latency.

What's Next:

We've covered bandwidth (capacity), throughput (reality), and latency (delay). Next, we'll examine jitter—the variation in latency that causes chaos for real-time applications. Consistent latency is often more important than low latency for streaming, voice, and video.

Page Complete

You now understand latency from physics through practical optimization. This knowledge enables you to diagnose delay problems, architect for responsiveness, and set realistic expectations for geographically distributed systems.