Computer NetworksCongestion Control Overview

TCP Congestion Control Overview

LevelIntermediate

Duration60 mins

TopicCongestion Control Overview

2 / 5

Network Capacity: Bandwidth, Delay, and the Physics of Data

The Speed of Light and the Speed of Data

Here's a surprising fact: the theoretical maximum throughput of a TCP connection isn't determined solely by the bandwidth of its links. A 10 Gbps fiber connection between Tokyo and London cannot achieve 10 Gbps throughput with a single TCP stream—not because of bandwidth limitations, but because of physics.

The speed of light in fiber is approximately 200,000 km/s. Tokyo to London is about 9,500 km via undersea cable. That's a one-way propagation delay of roughly 47 milliseconds, meaning a round-trip time of at least 95 ms. Combined with bandwidth, this creates a fundamental constraint on how much data can be "in flight" at any moment.

Understanding network capacity requires grasping this interplay between bandwidth (how much data per second) and delay (how long data takes to traverse the path). Together, they define the bandwidth-delay product—the central concept in understanding TCP's performance limits.

What You Will Learn

By the end of this page, you will understand how to calculate and interpret the bandwidth-delay product, appreciate why high-bandwidth or high-delay networks require careful tuning, and develop intuition for what determines the maximum achievable throughput for any network path.

Understanding Bandwidth: The Pipe Width

Bandwidth measures the maximum rate at which bits can be transmitted over a communication link, expressed in bits per second (bps). It represents the capacity of the link—how many bits can "fit" onto the wire in each second.

Physical Intuition:

Think of bandwidth as the diameter of a pipe. A wider pipe allows more water to flow through it per unit time. Similarly, higher bandwidth allows more bits to be transmitted per second.

Modern network links span an enormous range:

Dial-up modems: 56 Kbps
Home broadband: 100 Mbps - 1 Gbps
Data center links: 10 - 400 Gbps
Backbone connections: 100 Gbps - multi-Tbps

Each factor-of-10 increase in bandwidth represents a qualitative shift in what's possible.

Bandwidth in Perspective
Bandwidth	Time to Transfer 1 GB	Example Use Case
56 Kbps	~40 hours	Original dial-up (1990s)
1 Mbps	~2.2 hours	Basic DSL
100 Mbps	~80 seconds	Home fiber/cable
1 Gbps	~8 seconds	Gigabit Ethernet
10 Gbps	~0.8 seconds	Data center interconnect
100 Gbps	~80 ms	Backbone links

Bottleneck Bandwidth:

In an end-to-end path, the bottleneck bandwidth is the bandwidth of the slowest link. This is the maximum rate at which data can flow through the path, regardless of other links' capacities.

Consider a path: Source → 1 Gbps → Router A → 100 Mbps → Router B → 1 Gbps → Destination

The bottleneck is the 100 Mbps link. No matter how fast other links are, sustained throughput cannot exceed 100 Mbps. The faster links provide burst capacity and reduce local queuing, but don't increase end-to-end throughput.

Available Bandwidth vs. Capacity:

The physical capacity of a link differs from its available bandwidth at any moment. If a 100 Mbps link is already carrying 60 Mbps of traffic, only 40 Mbps is available for new flows. Congestion control must adapt to available bandwidth, not nominal capacity.

Bandwidth is Shared

Every link in the Internet is shared among multiple flows. Your TCP connection doesn't get dedicated bandwidth—it shares capacity with potentially thousands of other flows. This is why congestion control exists: to fairly divide this shared resource without central coordination.

Understanding Delay: The Pipe Length

Delay (or latency) measures how long it takes for a bit to travel from source to destination. While bandwidth describes how much, delay describes how long.

Components of End-to-End Delay:

Total delay is the sum of four components:

1. Propagation Delay (t_prop) Time for the signal to physically travel through the medium. Determined by distance and the speed of light in that medium:

Speed of light in vacuum: ~300,000 km/s
Speed in fiber optic: ~200,000 km/s (roughly 2/3 light speed)
Speed in copper: ~200,000 km/s (also ~2/3 light speed)

t_prop = Distance / Propagation Speed

2. Transmission Delay (t_trans) Time to push all bits of a packet onto the link:

t_trans = Packet Size / Bandwidth

For a 1,500-byte packet on a 1 Gbps link: 1,500 × 8 / 10⁹ = 12 microseconds

3. Queuing Delay (t_queue) Time spent waiting in router buffers. Highly variable—zero when empty, potentially seconds when congested.

4. Processing Delay (t_proc) Time for routers to examine headers and make forwarding decisions. Typically microseconds on modern hardware.

Delay Components for Different Scenarios
Component	Cross-Campus (1 km)	Cross-Country (4000 km)	Intercontinental (15000 km)
Propagation	~5 μs	~20 ms	~75 ms
Transmission (1500B @ 1Gbps)	~12 μs	~12 μs	~12 μs
Queuing (light load)	~10 μs	~100 μs	~500 μs
Processing	~1 μs/hop	~10 μs (10 hops)	~20 μs (20 hops)
One-Way Total	~30 μs	~20 ms	~76 ms
Round-Trip Time	~60 μs	~40 ms	~152 ms

Round-Trip Time (RTT):

For TCP, the critical delay metric is Round-Trip Time (RTT)—the time for a packet to travel from sender to receiver and for the acknowledgment to return. This is approximately twice the one-way delay, plus any asymmetries in the forward and return paths.

RTT matters because TCP's congestion control and reliability mechanisms are inherently round-trip based:

Acknowledgments must return before the sender confirms delivery
Congestion window updates happen once per RTT
Timeouts are based on measured RTT
The sender's sending rate is effectively: WindowSize / RTT

Latency is Destiny

You can always add more bandwidth (more fibers, higher frequencies, better encoding). But you cannot reduce propagation delay below the speed of light. For applications sensitive to latency—gaming, finance, video conferencing—no amount of bandwidth can compensate for physical distance.

The Bandwidth-Delay Product: Data in Flight

The Bandwidth-Delay Product (BDP) is perhaps the single most important concept in understanding TCP performance. It represents the maximum amount of data that can be "in transit" on a network path at any instant.

The Formula:

BDP = Bandwidth × RTT

Or equivalently:

BDP = Bandwidth × One-Way Delay × 2

Physical Interpretation:

Imagine the network path as a pipe. Bandwidth is the cross-sectional area; delay (specifically, half the RTT) is the length. The BDP is the volume of the pipe—how much data "fits inside" the network between sender and receiver at any moment.

For maximum throughput, the sender must keep this pipe full. Every moment that the pipe has empty space, potential throughput is wasted.

Bandwidth-Delay Product Examples
Network Type	Bandwidth	RTT	BDP	Interpretation
Local Ethernet	1 Gbps	0.5 ms	62.5 KB	Modest buffer needed
Transcontinental	1 Gbps	80 ms	10 MB	Large window required
Satellite	50 Mbps	600 ms	3.75 MB	High delay dominates
Data Center	100 Gbps	0.2 ms	2.5 MB	Ultra-high bandwidth
Submarine Cable	10 Gbps	150 ms	187.5 MB	Massive pipe to fill

Why BDP Matters for TCP:

TCP uses a sliding window to control how much data can be in flight (sent but not yet acknowledged). For TCP to achieve full throughput:

TCP Window ≥ BDP

If the window is smaller than the BDP, the sender will run out of data to send before acknowledgments return. This creates "idle time" where bandwidth goes unused.

Example: A 1 Gbps path with 100 ms RTT has a BDP of 12.5 MB. If TCP's window is limited to 64 KB (the original TCP maximum), achievable throughput is:

Throughput = Window / RTT = 64 KB / 100 ms = 5.12 Mbps

That's 0.5% of the available bandwidth! The link can carry 1 Gbps, but TCP self-limits to 5 Mbps because its window is too small.

This is why modern TCP uses window scaling (RFC 1323) to support windows up to 1 GB, and why proper buffer sizing on hosts is critical for high-BDP paths.

The Long Fat Network Problem

Paths with high bandwidth AND high delay are called 'Long Fat Networks' (LFNs or 'elephants'). These pose special challenges: large buffers needed, slow congestion response, RTT unfairness amplified. Satellite links, intercontinental connections, and data center supercomputers all exhibit LFN characteristics.

Pipelining and Network Efficiency

Pipelining is the key to efficient use of high-BDP networks. Rather than waiting for acknowledgment of each segment before sending the next (stop-and-wait), TCP sends multiple segments continuously, keeping the pipeline filled.

Stop-and-Wait Inefficiency:

Consider sending data without pipelining:

Send segment 1
Wait for ACK 1 (takes 1 RTT)
Send segment 2
Wait for ACK 2 (takes 1 RTT) ...

With a 1500-byte segment and 100 ms RTT: Throughput = 1500 bytes / 100 ms = 120 Kbps

On a 1 Gbps link, this achieves 0.012% efficiency!

Converting Mermaid diagram...

Pipelined Transmission:

With pipelining, the sender continuously transmits up to its window size without waiting:

Send segments 1, 2, 3, 4, 5... (filling the window)
ACK for segment 1 arrives
Window slides, allowing segment 6 to be sent
Process continues indefinitely

Efficiency = WindowSize / BDP (ideally 100% when window = BDP)

The Self-Clocking Phenomenon:

When the pipeline is full, a beautiful property emerges: the rate at which ACKs return exactly matches the rate at which new segments can be sent. Each ACK "releases" one segment's worth of window space, allowing one new segment to be transmitted.

This is called ACK clocking or self-clocking. The connection naturally adapts to the available bandwidth without explicit rate calculation—ACKs return at the bottleneck rate, so new segments can only be sent at the bottleneck rate.

Conservation of Packets

Van Jacobson's fundamental insight: in a stable, fully loaded connection, the number of packets in flight should remain constant. A new packet enters the network only when an old packet (acknowledged) exits. This conservation principle is the foundation of stable pipelining.

Calculating Effective Throughput

Given the concepts we've established, we can now derive formulas for TCP's effective throughput under various conditions.

Ideal Throughput (No Congestion):

When window size is not limited by congestion:

Throughput = min(WindowSize / RTT, Bottleneck Bandwidth)

If WindowSize ≥ BDP, throughput approaches the bottleneck bandwidth.

Throughput with Packet Loss:

In the presence of congestion-induced loss, TCP's throughput becomes loss-sensitive. The classic Mathis formula (1997) provides a first-order approximation:

Throughput ≈ (MSS / RTT) × (1 / √p)

Where:

MSS = Maximum Segment Size (typically ~1460 bytes)
p = Packet loss probability
RTT = Round-trip time

This formula reveals critical insights:

Throughput scales inversely with RTT (RTT unfairness)
Throughput scales inversely with the square root of loss rate
Doubling the loss rate reduces throughput by ~30%

TCP Throughput vs. Loss Rate (assuming MSS=1460B, RTT=100ms)
Loss Rate (p)	Formula Result	Interpretation
0.0001 (0.01%)	~116 Mbps	Near-ideal performance
0.001 (0.1%)	~36.8 Mbps	Good performance
0.01 (1%)	~11.6 Mbps	Noticeable degradation
0.1 (10%)	~3.7 Mbps	Severe degradation

Practical Throughput Limits:

In reality, throughput is constrained by multiple factors simultaneously:

Sender Window: Limited by buffer allocation and congestion window
Receiver Window: Advertised by receiver based on available buffer
Network Capacity: Bottleneck bandwidth of the path
Congestion Window: Dynamically adjusted based on perceived network state
OS and Hardware Limits: Socket buffers, NIC capabilities, interrupt handling

Effective Throughput = min(Sender Window, Receiver Window, Congestion Window) / RTT

Bounded above by bottleneck bandwidth.

Real-World Measurements

Theoretical formulas provide bounds and intuition, but real networks exhibit complex behavior. Shared paths, cross-traffic dynamics, queuing variations, and protocol overhead mean that actual throughput often differs from predictions. Measurement tools like iperf3, netperf, and tcptrace help validate models against reality.

Buffer Sizing: The Network and Host Perspective

The BDP concept directly informs how buffers should be sized—both at network routers and on end hosts.

Router Buffer Sizing:

For a router handling many flows, the traditional rule of thumb (RFC 3439) was:

Buffer Size = BDP = Bandwidth × RTT

This ensures the buffer can absorb a full window's worth of data during congestion response.

However, modern research (Stanford, 2004) showed that with many flows, this can be reduced:

Buffer Size = BDP / √N

Where N is the number of flows. Statistical multiplexing means bursts from different flows are unlikely to coincide perfectly.

The Bufferbloat Controversy:

Excessive router buffers cause bufferbloat (as discussed in Page 1). Modern recommendations favor active queue management (AQM) with modest buffers:

Use ECN to signal congestion before buffers fill
Deploy CoDel or PIE algorithms to manage queue depth
Keep buffers sized for burst absorption, not sustained overload

Host Buffer Requirements

•Send Buffer: Must hold data until ACKed, >= BDP for full utilization
•Receive Buffer: Must hold out-of-order segments, >= BDP ideal
•Socket Buffer: OS-level, affects maximum window advertisement
•Modern defaults: Often too small for high-BDP paths

Tuning Parameters

•Linux: net.core.rmem_max, net.core.wmem_max
•TCP Autotuning: Modern OSes adjust buffers dynamically
•Window Scaling: RFC 1323, allows windows > 64KB
•Considerations: Memory per connection, fairness to other flows

Memory Considerations:

Buffer sizing involves trade-offs:

Too Small: Limits throughput, underutilizes available bandwidth
Too Large: Wastes memory, contributes to bufferbloat, may delay congestion signals

For servers handling thousands of connections, buffer memory adds up:

10,000 connections × 256 KB per connection = 2.5 GB just for TCP buffers
This affects container sizing, memory allocation, and cost

Dynamic buffer sizing algorithms help by growing buffers only as needed, but administrators must still configure appropriate maximum limits.

The Auto-Tuning Era

Modern operating systems automatically adjust TCP buffer sizes based on observed network conditions. Linux, Windows, and macOS all implement this. Manual tuning is now mainly needed for high-performance scenarios (10+ Gbps, very high latency) rather than general-purpose computing.

Measuring and Estimating Network Capacity

TCP needs to estimate network capacity to operate effectively. Unlike bandwidth, which can sometimes be known a priori, available capacity must be discovered dynamically.

Passive Measurements:

TCP can infer capacity from observation:

RTT Measurement: TCP timestamps (RFC 1323) or simple segment-ACK timing reveal round-trip delay. TCP tracks:

SRTT: Smoothed RTT (exponential moving average)
RTTVAR: RTT variance (for timeout calculation)

ACK Rate: The rate at which ACKs return indicates available throughput at the bottleneck. If ACKs for 10 segments arrive in 10 ms, the path is delivering roughly MSS × 10 / 10ms worth of data.

Active Probing:

For more precise capacity estimation, dedicated measurement techniques exist:

Packet Pair: Send two back-to-back packets. Measure arrival spacing at receiver. Bottleneck serialization delay = packet size / bottleneck bandwidth.

Pathrate, Pathload: Tools that inject probe traffic and measure response characteristics to estimate both capacity (maximum) and available bandwidth (current).

Capacity Estimation Methods
Method	Measures	Accuracy	Overhead
ACK timing	Available bandwidth (rough)	Low-Medium	None (passive)
Packet pair	Bottleneck capacity	Medium-High	Minimal probes
Pathload	Available bandwidth	High	Significant probe traffic
iperf3	Achievable throughput	High (for TCP)	Full saturation needed
BBR modeling	Both capacity and RTT	Medium	Ongoing probing

The BBR Approach:

Google's BBR (Bottleneck Bandwidth and RTT) congestion control takes a model-based approach:

Estimate Bottleneck Bandwidth: The maximum delivery rate observed over a time window
Estimate Minimum RTT: The minimum observed RTT (when queues are empty)
Calculate BDP: RTT_min × BW_max
Target the Sweet Spot: Maintain in-flight data ≈ BDP

By explicitly modeling capacity rather than just reacting to loss, BBR aims to achieve high throughput with minimal queuing. This represents a philosophical shift from loss-based to capacity-based congestion control.

Estimation Challenges

Network capacity is inherently variable. Cross-traffic changes available bandwidth continuously. Routing changes can alter the path entirely. Wireless links have capacity that varies with signal quality. Any capacity estimate is a snapshot that may become stale immediately. Robust congestion control must handle this uncertainty.

Summary: Network Capacity

We've developed a comprehensive understanding of network capacity and its implications for TCP performance. Let's consolidate the key insights:

Key Takeaways

•Bandwidth and delay jointly determine capacity — Neither alone describes what TCP can achieve; the bandwidth-delay product captures their interaction.
•BDP = Bandwidth × RTT — This measures how much data can be 'in flight' and sets the required window size for full utilization.
•Pipelining is essential — Stop-and-wait is catastrophically inefficient; keeping the pipe full requires continuous transmission up to the window limit.
•Self-clocking emerges naturally — When the pipeline is full, ACKs pace transmission at exactly the bottleneck rate.
•Buffer sizing affects both hosts and network — Too small limits throughput; too large causes bufferbloat.
•Capacity must be estimated dynamically — Available bandwidth changes constantly; TCP must adapt through measurement and inference.

What's Next:

Understanding network capacity tells us what TCP is trying to achieve. The next page explores sender-based control—how TCP adjusts its transmission rate using only information available at the sender, without explicit network feedback.

Page Complete

You now understand the fundamental physics and mathematics of network capacity. This knowledge is essential for understanding why TCP behaves as it does—why window sizes matter, why high-RTT paths are challenging, and why congestion control algorithms must carefully estimate and adapt to changing network conditions.

2 / 5

Loading learning content...

Computer NetworksCongestion Control Overview

TCP Congestion Control Overview

LevelIntermediate

Duration60 mins

TopicCongestion Control Overview

2 / 5

Network Capacity: Bandwidth, Delay, and the Physics of Data

The Speed of Light and the Speed of Data

What You Will Learn

Understanding Bandwidth: The Pipe Width

Physical Intuition:

Think of bandwidth as the diameter of a pipe. A wider pipe allows more water to flow through it per unit time. Similarly, higher bandwidth allows more bits to be transmitted per second.

Modern network links span an enormous range:

Dial-up modems: 56 Kbps
Home broadband: 100 Mbps - 1 Gbps
Data center links: 10 - 400 Gbps
Backbone connections: 100 Gbps - multi-Tbps

Each factor-of-10 increase in bandwidth represents a qualitative shift in what's possible.

Bandwidth in Perspective
Bandwidth	Time to Transfer 1 GB	Example Use Case
56 Kbps	~40 hours	Original dial-up (1990s)
1 Mbps	~2.2 hours	Basic DSL
100 Mbps	~80 seconds	Home fiber/cable
1 Gbps	~8 seconds	Gigabit Ethernet
10 Gbps	~0.8 seconds	Data center interconnect
100 Gbps	~80 ms	Backbone links

Bottleneck Bandwidth:

In an end-to-end path, the bottleneck bandwidth is the bandwidth of the slowest link. This is the maximum rate at which data can flow through the path, regardless of other links' capacities.

Consider a path: Source → 1 Gbps → Router A → 100 Mbps → Router B → 1 Gbps → Destination

Available Bandwidth vs. Capacity:

Bandwidth is Shared

Understanding Delay: The Pipe Length

Delay (or latency) measures how long it takes for a bit to travel from source to destination. While bandwidth describes how much, delay describes how long.

Components of End-to-End Delay:

Total delay is the sum of four components:

1. Propagation Delay (t_prop) Time for the signal to physically travel through the medium. Determined by distance and the speed of light in that medium:

Speed of light in vacuum: ~300,000 km/s
Speed in fiber optic: ~200,000 km/s (roughly 2/3 light speed)
Speed in copper: ~200,000 km/s (also ~2/3 light speed)

t_prop = Distance / Propagation Speed

2. Transmission Delay (t_trans) Time to push all bits of a packet onto the link:

t_trans = Packet Size / Bandwidth

For a 1,500-byte packet on a 1 Gbps link: 1,500 × 8 / 10⁹ = 12 microseconds

3. Queuing Delay (t_queue) Time spent waiting in router buffers. Highly variable—zero when empty, potentially seconds when congested.

4. Processing Delay (t_proc) Time for routers to examine headers and make forwarding decisions. Typically microseconds on modern hardware.

Delay Components for Different Scenarios
Component	Cross-Campus (1 km)	Cross-Country (4000 km)	Intercontinental (15000 km)
Propagation	~5 μs	~20 ms	~75 ms
Transmission (1500B @ 1Gbps)	~12 μs	~12 μs	~12 μs
Queuing (light load)	~10 μs	~100 μs	~500 μs
Processing	~1 μs/hop	~10 μs (10 hops)	~20 μs (20 hops)
One-Way Total	~30 μs	~20 ms	~76 ms
Round-Trip Time	~60 μs	~40 ms	~152 ms

Round-Trip Time (RTT):

RTT matters because TCP's congestion control and reliability mechanisms are inherently round-trip based:

Acknowledgments must return before the sender confirms delivery
Congestion window updates happen once per RTT
Timeouts are based on measured RTT
The sender's sending rate is effectively: WindowSize / RTT

Latency is Destiny

The Bandwidth-Delay Product: Data in Flight

The Formula:

BDP = Bandwidth × RTT

Or equivalently:

BDP = Bandwidth × One-Way Delay × 2

Physical Interpretation:

For maximum throughput, the sender must keep this pipe full. Every moment that the pipe has empty space, potential throughput is wasted.

Bandwidth-Delay Product Examples
Network Type	Bandwidth	RTT	BDP	Interpretation
Local Ethernet	1 Gbps	0.5 ms	62.5 KB	Modest buffer needed
Transcontinental	1 Gbps	80 ms	10 MB	Large window required
Satellite	50 Mbps	600 ms	3.75 MB	High delay dominates
Data Center	100 Gbps	0.2 ms	2.5 MB	Ultra-high bandwidth
Submarine Cable	10 Gbps	150 ms	187.5 MB	Massive pipe to fill

Why BDP Matters for TCP:

TCP uses a sliding window to control how much data can be in flight (sent but not yet acknowledged). For TCP to achieve full throughput:

TCP Window ≥ BDP

If the window is smaller than the BDP, the sender will run out of data to send before acknowledgments return. This creates "idle time" where bandwidth goes unused.

Example: A 1 Gbps path with 100 ms RTT has a BDP of 12.5 MB. If TCP's window is limited to 64 KB (the original TCP maximum), achievable throughput is:

Throughput = Window / RTT = 64 KB / 100 ms = 5.12 Mbps

That's 0.5% of the available bandwidth! The link can carry 1 Gbps, but TCP self-limits to 5 Mbps because its window is too small.

This is why modern TCP uses window scaling (RFC 1323) to support windows up to 1 GB, and why proper buffer sizing on hosts is critical for high-BDP paths.

The Long Fat Network Problem

Pipelining and Network Efficiency

Stop-and-Wait Inefficiency:

Consider sending data without pipelining:

Send segment 1
Wait for ACK 1 (takes 1 RTT)
Send segment 2
Wait for ACK 2 (takes 1 RTT) ...

With a 1500-byte segment and 100 ms RTT: Throughput = 1500 bytes / 100 ms = 120 Kbps

On a 1 Gbps link, this achieves 0.012% efficiency!

Converting Mermaid diagram...

Pipelined Transmission:

With pipelining, the sender continuously transmits up to its window size without waiting:

Send segments 1, 2, 3, 4, 5... (filling the window)
ACK for segment 1 arrives
Window slides, allowing segment 6 to be sent
Process continues indefinitely

Efficiency = WindowSize / BDP (ideally 100% when window = BDP)

The Self-Clocking Phenomenon:

Conservation of Packets

Calculating Effective Throughput

Given the concepts we've established, we can now derive formulas for TCP's effective throughput under various conditions.

Ideal Throughput (No Congestion):

When window size is not limited by congestion:

Throughput = min(WindowSize / RTT, Bottleneck Bandwidth)

If WindowSize ≥ BDP, throughput approaches the bottleneck bandwidth.

Throughput with Packet Loss:

In the presence of congestion-induced loss, TCP's throughput becomes loss-sensitive. The classic Mathis formula (1997) provides a first-order approximation:

Throughput ≈ (MSS / RTT) × (1 / √p)

Where:

MSS = Maximum Segment Size (typically ~1460 bytes)
p = Packet loss probability
RTT = Round-trip time

This formula reveals critical insights:

Throughput scales inversely with RTT (RTT unfairness)
Throughput scales inversely with the square root of loss rate
Doubling the loss rate reduces throughput by ~30%

TCP Throughput vs. Loss Rate (assuming MSS=1460B, RTT=100ms)
Loss Rate (p)	Formula Result	Interpretation
0.0001 (0.01%)	~116 Mbps	Near-ideal performance
0.001 (0.1%)	~36.8 Mbps	Good performance
0.01 (1%)	~11.6 Mbps	Noticeable degradation
0.1 (10%)	~3.7 Mbps	Severe degradation

Practical Throughput Limits:

In reality, throughput is constrained by multiple factors simultaneously:

Sender Window: Limited by buffer allocation and congestion window
Receiver Window: Advertised by receiver based on available buffer
Network Capacity: Bottleneck bandwidth of the path
Congestion Window: Dynamically adjusted based on perceived network state
OS and Hardware Limits: Socket buffers, NIC capabilities, interrupt handling

Effective Throughput = min(Sender Window, Receiver Window, Congestion Window) / RTT

Bounded above by bottleneck bandwidth.

Real-World Measurements

Buffer Sizing: The Network and Host Perspective

The BDP concept directly informs how buffers should be sized—both at network routers and on end hosts.

Router Buffer Sizing:

For a router handling many flows, the traditional rule of thumb (RFC 3439) was:

Buffer Size = BDP = Bandwidth × RTT

This ensures the buffer can absorb a full window's worth of data during congestion response.

However, modern research (Stanford, 2004) showed that with many flows, this can be reduced:

Buffer Size = BDP / √N

Where N is the number of flows. Statistical multiplexing means bursts from different flows are unlikely to coincide perfectly.

The Bufferbloat Controversy:

Excessive router buffers cause bufferbloat (as discussed in Page 1). Modern recommendations favor active queue management (AQM) with modest buffers:

Use ECN to signal congestion before buffers fill
Deploy CoDel or PIE algorithms to manage queue depth
Keep buffers sized for burst absorption, not sustained overload

Host Buffer Requirements

•Send Buffer: Must hold data until ACKed, >= BDP for full utilization
•Receive Buffer: Must hold out-of-order segments, >= BDP ideal
•Socket Buffer: OS-level, affects maximum window advertisement
•Modern defaults: Often too small for high-BDP paths

Tuning Parameters

•Linux: net.core.rmem_max, net.core.wmem_max
•TCP Autotuning: Modern OSes adjust buffers dynamically
•Window Scaling: RFC 1323, allows windows > 64KB
•Considerations: Memory per connection, fairness to other flows

Memory Considerations:

Buffer sizing involves trade-offs:

Too Small: Limits throughput, underutilizes available bandwidth
Too Large: Wastes memory, contributes to bufferbloat, may delay congestion signals

For servers handling thousands of connections, buffer memory adds up:

10,000 connections × 256 KB per connection = 2.5 GB just for TCP buffers
This affects container sizing, memory allocation, and cost

Dynamic buffer sizing algorithms help by growing buffers only as needed, but administrators must still configure appropriate maximum limits.

The Auto-Tuning Era

Measuring and Estimating Network Capacity

TCP needs to estimate network capacity to operate effectively. Unlike bandwidth, which can sometimes be known a priori, available capacity must be discovered dynamically.

Passive Measurements:

TCP can infer capacity from observation:

RTT Measurement: TCP timestamps (RFC 1323) or simple segment-ACK timing reveal round-trip delay. TCP tracks:

SRTT: Smoothed RTT (exponential moving average)
RTTVAR: RTT variance (for timeout calculation)

ACK Rate: The rate at which ACKs return indicates available throughput at the bottleneck. If ACKs for 10 segments arrive in 10 ms, the path is delivering roughly MSS × 10 / 10ms worth of data.

Active Probing:

For more precise capacity estimation, dedicated measurement techniques exist:

Packet Pair: Send two back-to-back packets. Measure arrival spacing at receiver. Bottleneck serialization delay = packet size / bottleneck bandwidth.

Pathrate, Pathload: Tools that inject probe traffic and measure response characteristics to estimate both capacity (maximum) and available bandwidth (current).

Capacity Estimation Methods
Method	Measures	Accuracy	Overhead
ACK timing	Available bandwidth (rough)	Low-Medium	None (passive)
Packet pair	Bottleneck capacity	Medium-High	Minimal probes
Pathload	Available bandwidth	High	Significant probe traffic
iperf3	Achievable throughput	High (for TCP)	Full saturation needed
BBR modeling	Both capacity and RTT	Medium	Ongoing probing

The BBR Approach:

Google's BBR (Bottleneck Bandwidth and RTT) congestion control takes a model-based approach:

Estimate Bottleneck Bandwidth: The maximum delivery rate observed over a time window
Estimate Minimum RTT: The minimum observed RTT (when queues are empty)
Calculate BDP: RTT_min × BW_max
Target the Sweet Spot: Maintain in-flight data ≈ BDP

Estimation Challenges

Summary: Network Capacity

We've developed a comprehensive understanding of network capacity and its implications for TCP performance. Let's consolidate the key insights:

Key Takeaways

•Bandwidth and delay jointly determine capacity — Neither alone describes what TCP can achieve; the bandwidth-delay product captures their interaction.
•BDP = Bandwidth × RTT — This measures how much data can be 'in flight' and sets the required window size for full utilization.
•Pipelining is essential — Stop-and-wait is catastrophically inefficient; keeping the pipe full requires continuous transmission up to the window limit.
•Self-clocking emerges naturally — When the pipeline is full, ACKs pace transmission at exactly the bottleneck rate.
•Buffer sizing affects both hosts and network — Too small limits throughput; too large causes bufferbloat.
•Capacity must be estimated dynamically — Available bandwidth changes constantly; TCP must adapt through measurement and inference.

What's Next:

Page Complete

2 / 5