Slow Start Congestion Avoidance - Learning Module

Loading content...

0/228

Congestion Avoidance

Walking the Tightrope

Consider a tightrope walker who has successfully reached the middle of the rope. The aggressive climbing phase is over—now comes the delicate task of maintaining balance while inching forward. One misstep could mean disaster, yet standing still means never reaching the goal.

This is TCP's situation in congestion avoidance. Having used slow start to rapidly discover network capacity, TCP now faces a more subtle challenge: how to fully utilize available bandwidth while avoiding the congestion that would force a painful restart. The answer is conservative, linear growth—adding just enough capacity to test for headroom without overwhelming the network.

Congestion avoidance represents the steady-state behavior of most long-lived TCP connections. Understanding it is essential for optimizing throughput, diagnosing performance issues, and appreciating how millions of connections cooperatively share the Internet's capacity.

What You Will Learn

This page covers why congestion avoidance uses linear growth instead of exponential, the exact mechanics of the cwnd increase algorithm, how connections probe for additional capacity, the relationship between congestion avoidance and network stability, and the AIMD (Additive Increase, Multiplicative Decrease) principle that ensures fairness.

The Transition from Slow Start

Congestion avoidance begins when slow start ends. Recall the termination conditions for slow start:

cwnd ≥ ssthresh — The sender has reached the remembered capacity estimate
Packet loss — The network has signaled it cannot handle the current rate

In both cases, continuing exponential growth would be dangerous. The network is either near its limit (ssthresh approached) or has already signaled overload (loss). Congestion avoidance responds with a fundamentally different growth strategy.

The philosophical shift:

Slow start asks: "What's the maximum capacity I can discover?"

Congestion avoidance asks: "Is there any additional capacity I can safely claim?"

This shift from exploration to exploitation changes the growth rate from exponential to linear. The sender transitions from doubling the transmission rate each RTT to adding one segment per RTT—roughly a factor of N difference for a window of N segments.

Slow Start vs Congestion Avoidance Comparison
Property	Slow Start	Congestion Avoidance
Goal	Discover capacity rapidly	Maintain near-capacity utilization
Growth type	Exponential (×2 per RTT)	Linear (+1 segment per RTT)
Risk level	High (will overshoot)	Low (gentle probing)
Typical duration	3-10 RTTs	Indefinite (entire connection)
cwnd update rule	cwnd += MSS per ACK	cwnd += MSS²/cwnd per ACK
Triggered by	Connection start, loss events	cwnd ≥ ssthresh
Terminated by	cwnd ≥ ssthresh or loss	Loss only

The seamless transition:

The transition from slow start to congestion avoidance is often invisible to the user. There's no pause or reset—the sender simply changes the formula for cwnd updates. What was "add one MSS per ACK" becomes "add a fraction of MSS per ACK."

Example transition (cwnd = ssthresh = 64 segments):

Last slow start RTT (cwnd = 32→64):
  - 32 segments sent
  - 32 ACKs received
  - cwnd += 32 × MSS = 64 MSS total
  - Growth: 2×

First congestion avoidance RTT (cwnd = 64→65):
  - 64 segments sent
  - 64 ACKs received
  - cwnd += 64 × (MSS/64) = 1 MSS total
  - Growth: 1.015625×

The growth rate drops dramatically—from doubling to a 1.5% increase. This is the intended behavior: slow start found the capacity, now congestion avoidance maintains it.

Not All Connections Reach Congestion Avoidance

Many short connections (HTTP requests, DNS queries) never leave slow start. If the transfer completes before cwnd reaches ssthresh, congestion avoidance is never entered. This is why initial window size (IW=10) and connection reuse matter so much for web performance.

The Congestion Avoidance Algorithm

The congestion avoidance algorithm can be expressed in several equivalent ways. Let's explore each formulation:

Per-ACK formulation (standard):

For each ACK received:
    cwnd = cwnd + (MSS × MSS) / cwnd

Or equivalently:

cwnd = cwnd + MSS / cwnd    (when cwnd is in MSS units)

Why this formula?

Let's analyze what happens over one RTT when cwnd = N segments:

N segments are sent
N ACKs are received
Each ACK adds MSS/N (in MSS units) to cwnd
Total increase: N × (MSS/N) = 1 MSS

The result: cwnd increases by exactly one MSS per RTT, regardless of the current cwnd value. This is linear growth—additive, not multiplicative.

congestion_avoidance.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Congestion Avoidance Implementation
// Called when ACK is received and cwnd >= ssthresh
 
function on_ack_received_congestion_avoidance():
    // Method 1: Fractional MSS per ACK (common)
    cwnd_increment = MSS * MSS / cwnd
    cwnd = cwnd + cwnd_increment
    
    // Method 2: Counter-based (avoids fractions)
    // Increment counter for each ACK
    cwnd_counter = cwnd_counter + 1
    if cwnd_counter >= cwnd / MSS:
        cwnd = cwnd + MSS
        cwnd_counter = 0
    
    // Method 3: Direct formula per RTT
    // (Used in some implementations)
    // After receiving cwnd/MSS ACKs:
    // new_cwnd = old_cwnd + MSS
 
// Parameters
MSS = 1460      // Maximum Segment Size (bytes)
cwnd = 65535    // Example: 45 segments
ssthresh = 60000 // Below cwnd, so in congestion avoidance
 
// Per-ACK calculation
increment = 1460 * 1460 / 65535  // ≈ 32.5 bytes
// After 45 ACKs: 45 * 32.5 ≈ 1462 bytes ≈ 1 MSS

Mathematical analysis:

Let cwnd(t) be the congestion window at time t (measured in RTTs). During congestion avoidance:

cwnd(t+1) = cwnd(t) + 1      (in MSS units)

Solving this linear recurrence:

cwnd(t) = cwnd(0) + t        (in MSS units)

This is pure linear growth—a straight line on a cwnd vs. time graph. Compare to slow start's exponential:

Slow start:           cwnd(t) = cwnd(0) × 2^t
Congestion avoidance: cwnd(t) = cwnd(0) + t

Practical implications:

Congestion Avoidance Characteristics

•Gentle probing — Adding 1 MSS/RTT means a 64-segment window takes 64 RTTs to double, not 1 RTT.
•Stable behavior — Small per-ACK increments prevent sudden capacity grabs.
•Bounded oscillation — When loss occurs, cwnd is halved. Linear growth back to capacity takes O(cwnd) RTTs.
•Fair coexistence — All flows in congestion avoidance add at the same rate, promoting fair sharing.
•Predictable recovery — Time to reach target capacity is linearly proportional to the capacity itself.

Byte Counting in Congestion Avoidance

The formula cwnd += MSS²/cwnd assumes one ACK per segment. With delayed ACKs (one ACK per two segments), uncorrected implementations would grow at half the expected rate. Appropriate Byte Counting (ABC, RFC 3465) addresses this by incrementing based on bytes acknowledged, not ACK count.

The AIMD Principle

TCP's congestion control follows a principle called AIMD: Additive Increase, Multiplicative Decrease. This pattern is fundamental to understanding both individual connection behavior and network-wide fairness.

Additive Increase (AI):

During congestion avoidance, cwnd increases by a constant amount each RTT:

cwnd(t+1) = cwnd(t) + α

where α = 1 MSS in standard TCP. This is the "additive" part.

Multiplicative Decrease (MD):

When loss is detected, cwnd is reduced by a multiplicative factor:

cwnd(new) = cwnd(old) × β

where β = 0.5 in standard TCP (i.e., cwnd is halved). This is the "multiplicative" part.

Why this combination?

AIMD has a remarkable property: it converges to fair bandwidth allocation. Let's see why:

Fairness convergence proof (intuitive):

Imagine two flows sharing a bottleneck. Flow A has cwnd = 100, Flow B has cwnd = 20 (unfair).

Scenario: Bottleneck capacity = 60 segments

Phase 1 - Additive Increase:
  A: 100→101, B: 20→21 (both add 1)
  Combined: 121 + 21 = 122, exceeds capacity, loss occurs

Phase 2 - Multiplicative Decrease (both flows experience loss):
  A: 101→50.5, B: 21→10.5
  
Phase 3 - Additive Increase again:
  A: 50.5→51.5, B: 10.5→11.5
  ... continues ...

After many cycles:
  Both flows converge toward 30 segments each (fair share)

The key insight:

Additive increase adds the same amount to each flow → Absolute gap stays constant
Multiplicative decrease scales proportionally → Reduces the larger flow more

Over many cycles, the ratio of flow rates approaches 1:1.

Converting Mermaid diagram...

AIMD Parameters in TCP Variants
Variant	Additive Increase (α)	Multiplicative Decrease (β)	Fairness Trade-off
TCP Reno	1 MSS/RTT	0.5	Classical fairness
TCP CUBIC	~1 MSS/RTT (adjusted)	0.7	Better high-BDP performance
TCP Vegas	1/baseRTT	If RTT increases	Delay-based, preemptive
Compound TCP	1 MSS/RTT + dwnd	0.5	Hybrid loss/delay
DCTCP	1 MSS/RTT	Variable (ECN-based)	Fine-grained for data centers

AIMD Ensures Stability

AIMD isn't just about fairness—it also ensures network stability. Multiplicative decrease quickly reduces aggregate load when congestion occurs, creating headroom for recovery. Additive increase slowly fills this headroom, preventing synchronized oscillations. This combination makes the Internet stable even with millions of competing flows.

The Sawtooth Pattern

The combination of additive increase and multiplicative decrease creates a characteristic sawtooth pattern in the cwnd over time. This pattern is one of the most recognizable signatures of TCP behavior.

Understanding the sawtooth:

cwnd ▲
     │           ╱│          ╱│          ╱│
     │         ╱  │        ╱  │        ╱  │
     │       ╱    │      ╱    │      ╱    │
     │     ╱      │    ╱      │    ╱      │
     │   ╱        │  ╱        │  ╱        │
     │ ╱          │╱          │╱          │
     └─────────────────────────────────────► time
         Rise    Drop   Rise    Drop
       (linear) (half) (linear) (half)

Phases of the sawtooth:

Rising edge — Congestion avoidance adds 1 MSS per RTT, creating a linear increase
Peak — cwnd reaches a point where the network is at capacity; loss occurs
Falling edge — Multiplicative decrease halves cwnd instantly
Repeat — The cycle continues, oscillating around the network's capacity

Sawtooth characteristics:

Let W be the point where loss occurs (the network capacity in segments). The sawtooth oscillates between W/2 and W:

Minimum cwnd: W/2 (immediately after loss)
Maximum cwnd: W (immediately before loss)
Average cwnd: 3W/4 (due to linear rise)
Cycle duration: W/2 RTTs (to grow from W/2 to W)

Throughput analysis:

The average throughput during the sawtooth is:

Avg throughput = 3W/4 × MSS / RTT

Compared to the ideal throughput (cwnd always at W):

Efficiency = (3W/4) / W = 75%

This 25% loss is the cost of TCP's probing behavior. The connection cannot maintain exactly the right rate—it must probe upward until loss confirms the limit.

Real-world deviations:

Practical sawtooths differ from the ideal:

Loss may occur before full capacity (random losses, cross-traffic)
Recovery may not halve exactly (different TCP variants)
Buffer dynamics create additional oscillations
Multiple losses per cycle create deeper drops

Sawtooth Analysis for Various Network Conditions
Network Capacity	BDP	Sawtooth Period	Avg Throughput	RTT Impact
10 Mbps, 50ms RTT	62 KB	~21 RTTs (1.05s)	~7.5 Mbps	Moderate oscillation
100 Mbps, 50ms RTT	625 KB	~214 RTTs (10.7s)	~75 Mbps	Long smooth cycles
1 Gbps, 5ms RTT	625 KB	~214 RTTs (1.07s)	~750 Mbps	Fast cycles
1 Gbps, 100ms RTT	12.5 MB	~4267 RTTs (7.1min)	~750 Mbps	Very slow recovery
10 Gbps, 1ms RTT	1.25 MB	~427 RTTs (0.43s)	~7.5 Gbps	Rapid oscillation

High-BDP Networks Challenge

On high-bandwidth, high-latency links ("long fat networks" or LFNs), the sawtooth period becomes extremely long. A 10 Gbps transatlantic link (100ms RTT) has a BDP of 125 MB. After a loss halves cwnd, recovering to full capacity takes over 42,000 RTTs—more than an hour! This is why CUBIC and BBR were developed—to accelerate recovery on high-BDP networks.

Probing for Additional Capacity

Even in congestion avoidance, TCP continues to probe for additional capacity. Network conditions change—competing flows end, routing changes, or capacity increases. Congestion avoidance's linear growth is a continuous probe: each RTT, the sender tests whether one more segment can fit.

Why linear probing works:

The question "Is there more capacity?" has an unknown answer that changes over time. TCP's approach:

Assume there might be capacity (increment cwnd by 1 MSS)
If loss occurs → No, reduce cwnd
If no loss → Yes, keep the increment and try again

This probing continues indefinitely. Even a connection that has been running for hours will still add 1 MSS per RTT, constantly testing for available capacity.

Probing efficiency:

Linear probing has a specific cost-benefit ratio:

Linear Probing Trade-offs

•Cost of probing too slowly — If competing flow ends and 50 MSS of capacity opens, linear probing takes 50 RTTs to claim it.
•Cost of probing too aggressively — Adding 50 MSS/RTT would cause losses every RTT as capacity is exceeded.
•Linear balance — Adding 1 MSS/RTT means losing 1 segment when capacity is exceeded—bounded waste.
•Fairness preservation — All flows add at the same rate, maintaining proportional bandwidth sharing.
•Stability guarantee — Small increments prevent wild oscillations in aggregate traffic.

When probing finds new capacity:

If a competing flow ends, releasing 100 MSS of capacity:

RTT 0: Available capacity = 100 MSS, our cwnd = 50 MSS
RTT 1: cwnd = 51, capacity used = 51, no loss
RTT 2: cwnd = 52, capacity used = 52, no loss
...
RTT 50: cwnd = 100, all capacity claimed
RTT 51: cwnd = 101, exceeds capacity, loss, halve cwnd

The probing eventually discovers and claims the new capacity, though it takes O(capacity) RTTs. This is slow compared to slow start but safe.

Faster probing approaches:

Recognizing that linear probing is slow for large capacity changes, some TCP variants use hybrid approaches:

CUBIC: Uses a cubic function that increases slowly near the last-known capacity but faster below it
Compound TCP: Adds a delay-based window component that scales with available capacity
BBR: Periodically probes at 125% of estimated bandwidth to check for capacity increases

Observing Congestion Avoidance Probing

To see congestion avoidance in action, use ss -ti on Linux during a file transfer. The cwnd field should show gradual increases (typically 1-2 segments per snapshot interval). If cwnd is growing rapidly after the initial phase, something unusual may be happening (application-limited, or non-standard congestion control).

Congestion Avoidance and Fairness

One of congestion avoidance's most important properties is its contribution to fair bandwidth allocation among competing flows. Let's analyze how this works:

The fairness definition:

N flows sharing a bottleneck of capacity C are "fair" when each receives C/N throughput. In practice, some variation is acceptable, measured by Jain's Fairness Index:

Fairness Index = (Σxᵢ)² / (N × Σxᵢ²)

where xᵢ is flow i's throughput. Perfect fairness = 1.0.

Why congestion avoidance promotes fairness:

Consider two flows, A and B, sharing a 100 Mbps link:

Case 1: Both start fair (50 Mbps each)
  - Both add 1 MSS/RTT
  - Both reach loss at same time
  - Both halve → still fair

Case 2: A has 80 Mbps, B has 20 Mbps (unfair)
  - Both add 1 MSS/RTT (equal absolute increase)
  - Both lose when combined exceeds 100 Mbps
  - A loses 40 Mbps, B loses 10 Mbps (proportional decrease)
  - A now has 40 Mbps, B has 10 Mbps → ratio improved
  
  After many cycles:
  - Ratio A:B approaches 1:1

AIMD Fairness Convergence Example
Cycle	Flow A (Mbps)	Flow B (Mbps)	Ratio A:B	Combined
0 (initial)	80	20	4:1	100 (loss)
After halve	40	10	4:1	50
After growth	60	30	2:1	90
After growth	70	40	1.75:1	110 (loss)
After halve	35	20	1.75:1	55
...	...	...	...	...
Converged	50	50	1:1	100 (loss)

RTT fairness issue:

There's an important caveat: flows with different RTTs won't converge to equal throughput. A flow with half the RTT:

Completes twice as many RTTs per second
Adds twice as many MSS per second
Eventually claims more than its fair share

This "RTT unfairness" means that connections to nearby servers tend to dominate over connections to distant servers. Some TCP variants (CUBIC, BBR) attempt to reduce RTT bias.

Multiple bottleneck fairness:

The fairness analysis assumes a single bottleneck. In practice, networks have complex topologies. A flow's fair share depends on which specific links it traverses and their individual capacities. TCP's AIMD ensures local fairness at each congested link but doesn't guarantee end-to-end max-min fairness.

Fairness vs Efficiency Trade-off

Perfect fairness (equal throughput for all flows) may not be efficient. A large file transfer might benefit from more bandwidth than a small HTTP request. Some argue for 'proportional fairness' or 'utility-based fairness' instead. However, TCP's simple AIMD provides a reasonable default that doesn't require explicit coordination.

Modern Enhancements to Congestion Avoidance

The original congestion avoidance algorithm (Reno-style linear growth) has limitations on modern high-speed networks. Several enhancements have been developed:

CUBIC Congestion Avoidance (Linux default):

CUBIC replaces linear growth with a cubic function centered on Wmax (the window size before the last loss):

W(t) = C × (t - K)³ + Wmax
where K = ∛(Wmax × β / C)

This creates three regions:

Fast recovery below Wmax: Growth is faster than linear
Plateau near Wmax: Growth slows to probe carefully
Probing above Wmax: Growth resumes to find new capacity

The result: faster recovery from loss with the same stability near capacity.

Pacing:

Modern implementations often pace segment transmissions rather than sending bursts. During congestion avoidance, segments are spread across the RTT:

Classic:  |seg1|seg2|seg3|......wait......|
Paced:    |.seg1.|..seg2.|..seg3.|..wait..|

Pacing reduces burstiness, smooths queue occupancy, and improves coexistence with other traffic.

Congestion Avoidance Enhancements
Enhancement	Mechanism	Benefit	Deployed In
CUBIC	Cubic function instead of linear	Faster recovery on high-BDP	Linux (default)
Compound	Delay + loss windows combined	Better high-BDP performance	Windows Server 2008+
Proportional Rate Reduction	Smoother recovery after loss	Fewer spurious retransmits	Linux 3.2+
Pacing	Spread packets over RTT	Reduced burstiness	Most modern stacks
HyStart++	Improved slow start exit	Fewer exit-induced losses	QUIC, modern TCP
ECN support	Explicit congestion notification	React before loss	Widely supported

ECN-based congestion response:

With Explicit Congestion Notification, routers can signal congestion before dropping packets. When an ECN congestion signal (CE mark) is received:

Classic response (similar to loss):
    ssthresh = cwnd / 2
    cwnd = cwnd / 2
    Continue in congestion avoidance

DCTCP response (data centers):
    Reduce proportionally to fraction of marked packets
    1% marked → reduce by ~1%
    50% marked → reduce by ~50%

ECN allows earlier, more precise congestion signals, enabling smoother congestion avoidance without packet loss.

BBR's different approach:

Google's BBR algorithm doesn't use traditional congestion avoidance at all. Instead of additive increase until loss, BBR:

Continuously estimates bottleneck bandwidth (BtlBw) and minimum RTT (RTprop)
Paces at BtlBw × RTprop (the BDP)
Periodically probes for bandwidth increases
Never intentionally causes loss

This represents a fundamental departure from AIMD while preserving the goal of efficient, fair bandwidth utilization.

Checking Your Congestion Control Algorithm

On Linux, view the current algorithm with sysctl net.ipv4.tcp_congestion_control. List available options with sysctl net.ipv4.tcp_available_congestion_control. Per-socket selection is possible via setsockopt(), allowing applications to choose algorithms suited to their traffic patterns.

Summary: The Steady State of TCP

We've thoroughly explored congestion avoidance—the phase where TCP connections spend most of their time. Let's consolidate the key insights:

Key Takeaways

•Congestion avoidance begins when cwnd ≥ ssthresh — This marks the transition from aggressive probing to careful maintenance.
•Linear growth (+1 MSS/RTT) is the core mechanism — This is dramatically slower than slow start's exponential growth.
•AIMD ensures fairness and stability — Additive increase, multiplicative decrease converges to fair sharing.
•The sawtooth pattern is characteristic — cwnd oscillates between ssthresh and 2×ssthresh, averaging 75% utilization.
•Linear probing is conservative but safe — Discovering new capacity takes O(capacity) RTTs, preventing aggressive grabbing.
•Modern enhancements improve high-BDP performance — CUBIC, pacing, and ECN support accelerate recovery without sacrificing stability.

What's next:

The final topic in this module examines linear growth in greater detail—analyzing its mathematical properties, its impact on long-term throughput, and its interaction with other aspects of TCP behavior.

Page Complete

You now understand TCP congestion avoidance—the steady-state phase where connections carefully balance throughput against congestion risk. The additive increase, multiplicative decrease pattern creates a stable, fair, and efficient system that enables millions of connections to share Internet capacity. Next, we'll examine the linear growth pattern in greater depth.