Loading content...
Imagine trying to control traffic flow on a highway network where you can only see the cars at the entrance ramps and exit ramps—never the highway itself. You don't know how congested any segment is. You can't see the accidents or the volume of traffic from other entrances. All you can observe is: how long before cars you sent reach their destination.
This is precisely the challenge TCP faces. A TCP sender sits at the edge of the network with no visibility into routers, no knowledge of other connections, and no authority to request special treatment. Yet it must somehow regulate its transmission rate to be:
This is the essence of sender-based congestion control: making optimal decisions using only end-host observations.
By the end of this page, you will understand why TCP uses sender-based control, what information is available to the sender, how the congestion window mechanism works, and the fundamental trade-offs inherent in controlling rate from the network's edge.
Congestion control could, in principle, be implemented by the network itself. Routers could allocate bandwidth, schedule packets fairly, and reject traffic that exceeds capacity. This network-assisted approach exists in some networks (ATM, MPLS traffic engineering, Software-Defined Networks).
So why does the Internet use sender-based control? The answer lies in the Internet's end-to-end design philosophy.
The End-to-End Argument:
First articulated by Saltzer, Reed, and Clark in 1984, the end-to-end principle holds that functions should be implemented at the endpoints rather than in the network when:
The Trade-off:
Sender-based control pays for these advantages with significant limitations:
Information asymmetry — The sender doesn't know the network state. It must infer congestion from indirect signals (loss, delay) that arrive delayed by one RTT.
Delayed feedback — By the time a congestion signal reaches the sender, the congestion has existed for at least one RTT. Any response takes another RTT to reach the congestion point.
Limited coordination — Each sender operates independently. There's no mechanism to ensure fair allocation except the emergent behavior of compatible algorithms.
Conservative approach required — Unable to detect network state precisely, TCP must err on the side of caution, potentially underutilizing capacity.
Some modern extensions add network feedback: ECN (Explicit Congestion Notification) allows routers to mark packets during congestion, and DCTCP/L4S use this extensively. But these enhance rather than replace sender-based control—the sender still makes all rate decisions.
To implement congestion control, TCP senders must extract meaningful signals from limited observations. Understanding what information is available—and what isn't—clarifies why congestion control algorithms work the way they do.
Direct Observations:
The sender can directly observe:
1. Segment Transmission Times When each segment was sent (local timestamp)
2. ACK Arrival Times When acknowledgments arrive (local timestamp)
3. ACK Contents
4. Timeout Events When expected ACKs don't arrive within the expected time
| Derived Metric | How It's Calculated | What It Indicates |
|---|---|---|
| Round-Trip Time (RTT) | ACK arrival - segment send time | Current path delay (including queuing) |
| RTT variance | Deviation from smooth RTT | Delay stability/predictability |
| Delivery rate | Bytes ACKed / time interval | Current throughput achieved |
| Loss events | Timeout or 3+ duplicate ACKs | Packet drops (likely congestion) |
| Out-of-order delivery | SACK blocks or ACK gaps | Network-induced reordering |
| Receive window changes | Advertised window in ACK | Receiver buffer availability |
What's NOT Available:
Critically, the sender cannot directly observe:
This information gap is the fundamental challenge. TCP must infer network state from noisy, delayed signals that mix multiple causes together.
Consider what RTT tells you. If RTT increases, possible causes include: more queuing at a router, more queuing at the receiver, longer route (path change), slower links (wireless rate adaptation), or more processing delay. Without additional information, you can't determine which. This ambiguity constrains algorithm design.
TCP's primary mechanism for controlling transmission rate is the congestion window (cwnd). This variable, maintained at the sender, limits how much data can be in flight at any time.
The Relationship:
Effective Window = min(cwnd, receiver_window)
The sender can have at most Effective Window bytes unacknowledged at any time.
Sending Rate = Effective Window / RTT
By adjusting cwnd, TCP indirectly controls its sending rate without explicit rate specification.
Why a Window Rather Than a Rate?
Defining rate directly (e.g., "send at 100 Mbps") would require:
The window approach is self-regulating: new packets are sent only when ACKs arrive, naturally adapting to network conditions. If the network is fast, ACKs return quickly, enabling fast transmission. If the network is slow, ACKs are delayed, automatically slowing transmission.
The cwnd Lifecycle:
Units:
cwnd is typically measured in bytes, though algorithms often think in terms of segments (MSS). A cwnd of 64 KB with MSS of 1460 bytes allows ~44 segments in flight.
State Tracking:
Beyond cwnd, TCP tracks:
While cwnd controls the window, the effective rate depends on RTT. A cwnd of 100 KB means 100 Mbps with 8 ms RTT, but only 8 Mbps with 100 ms RTT. High-RTT paths need larger windows to achieve the same rate, which is why window scaling matters for long-distance connections.
One of TCP's most elegant properties is ACK clocking (also called self-clocking). When the network is properly loaded, the rate at which ACKs return naturally matches the rate at which the sender can safely transmit.
How ACK Clocking Works:
Consider a fully loaded connection:
The key insight: The bottleneck link serializes segments at its capacity. The spacing between segment arrivals at the receiver (and thus ACK departures) reflects the bottleneck bandwidth. These inter-ACK spacings are preserved on the return path. When ACKs arrive at the sender, they're naturally spaced at the bottleneck rate.
Consequence: If the sender transmits one new segment for each ACK received, it automatically sends at the bottleneck rate—no explicit rate calculation needed!
Benefits of ACK Clocking:
Breaking ACK Clocking:
Several situations disrupt ACK clocking:
ACK compression: Multiple ACKs arrive together due to queuing on return path Delayed ACKs: Receiver waits before ACKing, bunching acknowledgments Packet loss: Lost packets break the ACK stream, requiring retransmissions Path changes: New routes have different timing characteristics Application stalls: Sender runs out of data, breaking the continuous flow
After disruption, TCP must re-establish proper clocking through slow start or careful probing.
Van Jacobson described this as 'packet conservation': in steady state, a new packet should enter the network only when an old packet leaves (signaled by its ACK). This ensures the network never has more packets than it can handle, maintaining stability.
TCP's traditional window-based control is not the only approach. Modern systems often combine windows with explicit pacing to better regulate transmission patterns.
Window-Only Control (Traditional):
The sender transmits immediately whenever:
Problem: This can create bursts. If 10 ACKs arrive nearly simultaneously (delayed ACKs, ACK compression), the sender may immediately transmit 10 segments back-to-back. These bursts can:
Pacing Implementation:
The basic pacing approach:
Pacing Rate = cwnd / RTT
Packets are spaced at intervals of MSS / Pacing Rate.
Example: cwnd = 1 MB, RTT = 100 ms, MSS = 1500 bytes
A packet is transmitted every 150 μs, smoothing the transmission pattern.
Modern Usage:
Google's BBR uses pacing extensively. QUIC (which rebuilds congestion control from scratch) includes pacing as a core mechanism. Linux TCP also supports pacing through the FQ (Fair Queue) packet scheduler.
Trade-offs:
Pacing helps at high speeds but adds overhead at low speeds (more timer events). It also requires kernel support for fine-grained packet scheduling. Modern systems typically enable pacing above a certain threshold rate.
Some modern NICs support hardware-based packet pacing, offloading the scheduling work from the CPU. This enables accurate microsecond-level pacing at 100+ Gbps speeds where software timing is imprecise. TSN (Time-Sensitive Networking) and DPDK frameworks leverage this capability.
TCP congestion control can be analyzed as a feedback control system. Understanding control loop dynamics explains why TCP behaves as it does and why certain parameter choices are made.
The Control Loop:
┌─────────────────────────────────┐
│ Network │
│ (Unknown, time-varying) │
└─────────────────────────────────┘
▲ │
│ │ Feedback (ACKs,
Packets │ │ losses, delay)
│ ▼
┌─────────────────────────────────┐
│ TCP Sender │
│ │
│ cwnd = f(feedback signals) │
│ rate ≈ cwnd / RTT │
└─────────────────────────────────┘
Key Control Theory Concepts:
The Stability Challenge:
TCP must respond to congestion without overreacting:
Too aggressive increase: Multiple flows simultaneously increase, causing oscillation and repeated congestion events.
Too aggressive decrease: Minor congestion triggers severe rate cuts, leaving capacity unused.
Too slow increase: After congestion, recovery takes too long, wasting capacity.
Too slow decrease: Congestion persists because flows don't reduce rate fast enough.
The AIMD (Additive Increase, Multiplicative Decrease) approach—which we'll explore in detail on Page 5—was specifically designed to provide stability and fairness in this control loop.
RTT Sensitivity:
The control loop's stability depends on RTT:
This creates inherent challenges for flows with very different RTTs sharing a bottleneck.
By the time a loss signal reaches the sender (indicating congestion that happened 1 RTT ago), the sender has already transmitted another RTT's worth of data into a potentially congested network. This 'overshoot' is unavoidable with delayed feedback and is why TCP cuts cwnd sharply on loss—to quickly drain the excess data.
Despite its elegance, sender-based control has inherent limitations that drive ongoing research and algorithm development.
Challenge 1: Limited Visibility
The sender sees only end-to-end behavior. If performance degrades, it cannot determine:
Challenge 2: Delayed Response
All feedback arrives at least 1 RTT after the event occurred. For a flow with 200 ms RTT, congestion that started 200 ms ago is just now being signaled. The response won't affect the congestion point until another 200 ms passes.
Challenge 3: Fairness Depends on Cooperation
Sender-based control relies on all senders implementing compatible algorithms. An aggressive sender ignoring congestion signals gains unfair bandwidth share. This led to the concept of TCP-friendliness: new algorithms should achieve similar throughput to standard TCP under the same conditions.
| Challenge | Impact | Mitigation Approaches |
|---|---|---|
| RTT unfairness | Low-RTT flows get larger share | Delay-based CC, rate-based CC |
| Random loss | Treated as congestion, hurts throughput | ECN, loss differentiation, link-layer retrans |
| Bufferbloat | No loss signal despite congestion | Delay-based CC, AQM at routers |
| Startup delay | Slow start takes multiple RTTs to ramp | IW10, BBR startup, 0-RTT protocols |
| Incast | Synchronized bursts cause collapse | Application-level shaping, pausing |
| Competing algorithms | Mixed CC algorithms interact poorly | Algorithm coexistence testing, ECN L4S |
The Alternatives:
Recognizing these limitations, various approaches augment or partially replace pure sender-based control:
ECN (Explicit Congestion Notification): Routers mark packets (rather than dropping) when queues build. Senders respond before loss occurs. Reduces delay and loss variability.
Active Queue Management (AQM): Routers (RED, CoDel, PIE) actively manage queues, providing earlier and clearer congestion signals to senders.
Network-Assisted CC: In controlled environments (data centers), network information can be exposed to endpoints (DCQCN, HPCC).
Receiver-Side Modifications: Receivers can help by providing more information (accurate timestamps, congestion indicators) or by pacing ACKs.
Despite these enhancements, the sender remains the decision-maker. These approaches provide better information, but the fundamental architecture remains sender-based.
The limitations of sender-based control are also its strength. Because intelligence is at the edges, the Internet scales massively without central coordination. Routers make simple, fast forwarding decisions while endpoints manage the complexity of adaptation. This trade-off has served the Internet remarkably well for decades.
We've established a comprehensive understanding of how TCP controls congestion from the sender. Let's consolidate the key insights:
What's Next:
Sender-based control needs signals to know when congestion is occurring. The next page explores congestion signals—how TCP interprets packet loss, delay changes, and explicit marks to infer network state and react appropriately.
You now understand the architectural choice of sender-based control, what information TCP has access to, and how the congestion window mechanism translates that information into rate control. This foundation prepares you to understand the specific signals TCP uses to detect and respond to congestion.