Loading learning content...
Here's a surprising fact: the theoretical maximum throughput of a TCP connection isn't determined solely by the bandwidth of its links. A 10 Gbps fiber connection between Tokyo and London cannot achieve 10 Gbps throughput with a single TCP stream—not because of bandwidth limitations, but because of physics.
The speed of light in fiber is approximately 200,000 km/s. Tokyo to London is about 9,500 km via undersea cable. That's a one-way propagation delay of roughly 47 milliseconds, meaning a round-trip time of at least 95 ms. Combined with bandwidth, this creates a fundamental constraint on how much data can be "in flight" at any moment.
Understanding network capacity requires grasping this interplay between bandwidth (how much data per second) and delay (how long data takes to traverse the path). Together, they define the bandwidth-delay product—the central concept in understanding TCP's performance limits.
By the end of this page, you will understand how to calculate and interpret the bandwidth-delay product, appreciate why high-bandwidth or high-delay networks require careful tuning, and develop intuition for what determines the maximum achievable throughput for any network path.
Bandwidth measures the maximum rate at which bits can be transmitted over a communication link, expressed in bits per second (bps). It represents the capacity of the link—how many bits can "fit" onto the wire in each second.
Physical Intuition:
Think of bandwidth as the diameter of a pipe. A wider pipe allows more water to flow through it per unit time. Similarly, higher bandwidth allows more bits to be transmitted per second.
Modern network links span an enormous range:
Each factor-of-10 increase in bandwidth represents a qualitative shift in what's possible.
| Bandwidth | Time to Transfer 1 GB | Example Use Case |
|---|---|---|
| 56 Kbps | ~40 hours | Original dial-up (1990s) |
| 1 Mbps | ~2.2 hours | Basic DSL |
| 100 Mbps | ~80 seconds | Home fiber/cable |
| 1 Gbps | ~8 seconds | Gigabit Ethernet |
| 10 Gbps | ~0.8 seconds | Data center interconnect |
| 100 Gbps | ~80 ms | Backbone links |
Bottleneck Bandwidth:
In an end-to-end path, the bottleneck bandwidth is the bandwidth of the slowest link. This is the maximum rate at which data can flow through the path, regardless of other links' capacities.
Consider a path: Source → 1 Gbps → Router A → 100 Mbps → Router B → 1 Gbps → Destination
The bottleneck is the 100 Mbps link. No matter how fast other links are, sustained throughput cannot exceed 100 Mbps. The faster links provide burst capacity and reduce local queuing, but don't increase end-to-end throughput.
Available Bandwidth vs. Capacity:
The physical capacity of a link differs from its available bandwidth at any moment. If a 100 Mbps link is already carrying 60 Mbps of traffic, only 40 Mbps is available for new flows. Congestion control must adapt to available bandwidth, not nominal capacity.
Every link in the Internet is shared among multiple flows. Your TCP connection doesn't get dedicated bandwidth—it shares capacity with potentially thousands of other flows. This is why congestion control exists: to fairly divide this shared resource without central coordination.
Delay (or latency) measures how long it takes for a bit to travel from source to destination. While bandwidth describes how much, delay describes how long.
Components of End-to-End Delay:
Total delay is the sum of four components:
1. Propagation Delay (t_prop) Time for the signal to physically travel through the medium. Determined by distance and the speed of light in that medium:
t_prop = Distance / Propagation Speed
2. Transmission Delay (t_trans) Time to push all bits of a packet onto the link:
t_trans = Packet Size / Bandwidth
For a 1,500-byte packet on a 1 Gbps link: 1,500 × 8 / 10⁹ = 12 microseconds
3. Queuing Delay (t_queue) Time spent waiting in router buffers. Highly variable—zero when empty, potentially seconds when congested.
4. Processing Delay (t_proc) Time for routers to examine headers and make forwarding decisions. Typically microseconds on modern hardware.
| Component | Cross-Campus (1 km) | Cross-Country (4000 km) | Intercontinental (15000 km) |
|---|---|---|---|
| Propagation | ~5 μs | ~20 ms | ~75 ms |
| Transmission (1500B @ 1Gbps) | ~12 μs | ~12 μs | ~12 μs |
| Queuing (light load) | ~10 μs | ~100 μs | ~500 μs |
| Processing | ~1 μs/hop | ~10 μs (10 hops) | ~20 μs (20 hops) |
| One-Way Total | ~30 μs | ~20 ms | ~76 ms |
| Round-Trip Time | ~60 μs | ~40 ms | ~152 ms |
Round-Trip Time (RTT):
For TCP, the critical delay metric is Round-Trip Time (RTT)—the time for a packet to travel from sender to receiver and for the acknowledgment to return. This is approximately twice the one-way delay, plus any asymmetries in the forward and return paths.
RTT matters because TCP's congestion control and reliability mechanisms are inherently round-trip based:
You can always add more bandwidth (more fibers, higher frequencies, better encoding). But you cannot reduce propagation delay below the speed of light. For applications sensitive to latency—gaming, finance, video conferencing—no amount of bandwidth can compensate for physical distance.
The Bandwidth-Delay Product (BDP) is perhaps the single most important concept in understanding TCP performance. It represents the maximum amount of data that can be "in transit" on a network path at any instant.
The Formula:
BDP = Bandwidth × RTT
Or equivalently:
BDP = Bandwidth × One-Way Delay × 2
Physical Interpretation:
Imagine the network path as a pipe. Bandwidth is the cross-sectional area; delay (specifically, half the RTT) is the length. The BDP is the volume of the pipe—how much data "fits inside" the network between sender and receiver at any moment.
For maximum throughput, the sender must keep this pipe full. Every moment that the pipe has empty space, potential throughput is wasted.
| Network Type | Bandwidth | RTT | BDP | Interpretation |
|---|---|---|---|---|
| Local Ethernet | 1 Gbps | 0.5 ms | 62.5 KB | Modest buffer needed |
| Transcontinental | 1 Gbps | 80 ms | 10 MB | Large window required |
| Satellite | 50 Mbps | 600 ms | 3.75 MB | High delay dominates |
| Data Center | 100 Gbps | 0.2 ms | 2.5 MB | Ultra-high bandwidth |
| Submarine Cable | 10 Gbps | 150 ms | 187.5 MB | Massive pipe to fill |
Why BDP Matters for TCP:
TCP uses a sliding window to control how much data can be in flight (sent but not yet acknowledged). For TCP to achieve full throughput:
TCP Window ≥ BDP
If the window is smaller than the BDP, the sender will run out of data to send before acknowledgments return. This creates "idle time" where bandwidth goes unused.
Example: A 1 Gbps path with 100 ms RTT has a BDP of 12.5 MB. If TCP's window is limited to 64 KB (the original TCP maximum), achievable throughput is:
Throughput = Window / RTT = 64 KB / 100 ms = 5.12 Mbps
That's 0.5% of the available bandwidth! The link can carry 1 Gbps, but TCP self-limits to 5 Mbps because its window is too small.
This is why modern TCP uses window scaling (RFC 1323) to support windows up to 1 GB, and why proper buffer sizing on hosts is critical for high-BDP paths.
Paths with high bandwidth AND high delay are called 'Long Fat Networks' (LFNs or 'elephants'). These pose special challenges: large buffers needed, slow congestion response, RTT unfairness amplified. Satellite links, intercontinental connections, and data center supercomputers all exhibit LFN characteristics.
Pipelining is the key to efficient use of high-BDP networks. Rather than waiting for acknowledgment of each segment before sending the next (stop-and-wait), TCP sends multiple segments continuously, keeping the pipeline filled.
Stop-and-Wait Inefficiency:
Consider sending data without pipelining:
With a 1500-byte segment and 100 ms RTT: Throughput = 1500 bytes / 100 ms = 120 Kbps
On a 1 Gbps link, this achieves 0.012% efficiency!
Pipelined Transmission:
With pipelining, the sender continuously transmits up to its window size without waiting:
Efficiency = WindowSize / BDP (ideally 100% when window = BDP)
The Self-Clocking Phenomenon:
When the pipeline is full, a beautiful property emerges: the rate at which ACKs return exactly matches the rate at which new segments can be sent. Each ACK "releases" one segment's worth of window space, allowing one new segment to be transmitted.
This is called ACK clocking or self-clocking. The connection naturally adapts to the available bandwidth without explicit rate calculation—ACKs return at the bottleneck rate, so new segments can only be sent at the bottleneck rate.
Van Jacobson's fundamental insight: in a stable, fully loaded connection, the number of packets in flight should remain constant. A new packet enters the network only when an old packet (acknowledged) exits. This conservation principle is the foundation of stable pipelining.
Given the concepts we've established, we can now derive formulas for TCP's effective throughput under various conditions.
Ideal Throughput (No Congestion):
When window size is not limited by congestion:
Throughput = min(WindowSize / RTT, Bottleneck Bandwidth)
If WindowSize ≥ BDP, throughput approaches the bottleneck bandwidth.
Throughput with Packet Loss:
In the presence of congestion-induced loss, TCP's throughput becomes loss-sensitive. The classic Mathis formula (1997) provides a first-order approximation:
Throughput ≈ (MSS / RTT) × (1 / √p)
Where:
This formula reveals critical insights:
| Loss Rate (p) | Formula Result | Interpretation |
|---|---|---|
| 0.0001 (0.01%) | ~116 Mbps | Near-ideal performance |
| 0.001 (0.1%) | ~36.8 Mbps | Good performance |
| 0.01 (1%) | ~11.6 Mbps | Noticeable degradation |
| 0.1 (10%) | ~3.7 Mbps | Severe degradation |
Practical Throughput Limits:
In reality, throughput is constrained by multiple factors simultaneously:
Effective Throughput = min(Sender Window, Receiver Window, Congestion Window) / RTT
Bounded above by bottleneck bandwidth.
Theoretical formulas provide bounds and intuition, but real networks exhibit complex behavior. Shared paths, cross-traffic dynamics, queuing variations, and protocol overhead mean that actual throughput often differs from predictions. Measurement tools like iperf3, netperf, and tcptrace help validate models against reality.
The BDP concept directly informs how buffers should be sized—both at network routers and on end hosts.
Router Buffer Sizing:
For a router handling many flows, the traditional rule of thumb (RFC 3439) was:
Buffer Size = BDP = Bandwidth × RTT
This ensures the buffer can absorb a full window's worth of data during congestion response.
However, modern research (Stanford, 2004) showed that with many flows, this can be reduced:
Buffer Size = BDP / √N
Where N is the number of flows. Statistical multiplexing means bursts from different flows are unlikely to coincide perfectly.
The Bufferbloat Controversy:
Excessive router buffers cause bufferbloat (as discussed in Page 1). Modern recommendations favor active queue management (AQM) with modest buffers:
Memory Considerations:
Buffer sizing involves trade-offs:
For servers handling thousands of connections, buffer memory adds up:
Dynamic buffer sizing algorithms help by growing buffers only as needed, but administrators must still configure appropriate maximum limits.
Modern operating systems automatically adjust TCP buffer sizes based on observed network conditions. Linux, Windows, and macOS all implement this. Manual tuning is now mainly needed for high-performance scenarios (10+ Gbps, very high latency) rather than general-purpose computing.
TCP needs to estimate network capacity to operate effectively. Unlike bandwidth, which can sometimes be known a priori, available capacity must be discovered dynamically.
Passive Measurements:
TCP can infer capacity from observation:
RTT Measurement: TCP timestamps (RFC 1323) or simple segment-ACK timing reveal round-trip delay. TCP tracks:
ACK Rate: The rate at which ACKs return indicates available throughput at the bottleneck. If ACKs for 10 segments arrive in 10 ms, the path is delivering roughly MSS × 10 / 10ms worth of data.
Active Probing:
For more precise capacity estimation, dedicated measurement techniques exist:
Packet Pair: Send two back-to-back packets. Measure arrival spacing at receiver. Bottleneck serialization delay = packet size / bottleneck bandwidth.
Pathrate, Pathload: Tools that inject probe traffic and measure response characteristics to estimate both capacity (maximum) and available bandwidth (current).
| Method | Measures | Accuracy | Overhead |
|---|---|---|---|
| ACK timing | Available bandwidth (rough) | Low-Medium | None (passive) |
| Packet pair | Bottleneck capacity | Medium-High | Minimal probes |
| Pathload | Available bandwidth | High | Significant probe traffic |
| iperf3 | Achievable throughput | High (for TCP) | Full saturation needed |
| BBR modeling | Both capacity and RTT | Medium | Ongoing probing |
The BBR Approach:
Google's BBR (Bottleneck Bandwidth and RTT) congestion control takes a model-based approach:
By explicitly modeling capacity rather than just reacting to loss, BBR aims to achieve high throughput with minimal queuing. This represents a philosophical shift from loss-based to capacity-based congestion control.
Network capacity is inherently variable. Cross-traffic changes available bandwidth continuously. Routing changes can alter the path entirely. Wireless links have capacity that varies with signal quality. Any capacity estimate is a snapshot that may become stale immediately. Robust congestion control must handle this uncertainty.
We've developed a comprehensive understanding of network capacity and its implications for TCP performance. Let's consolidate the key insights:
What's Next:
Understanding network capacity tells us what TCP is trying to achieve. The next page explores sender-based control—how TCP adjusts its transmission rate using only information available at the sender, without explicit network feedback.
You now understand the fundamental physics and mathematics of network capacity. This knowledge is essential for understanding why TCP behaves as it does—why window sizes matter, why high-RTT paths are challenging, and why congestion control algorithms must carefully estimate and adapt to changing network conditions.