Slow Start Congestion Avoidance - Learning Module

Loading content...

0/228

Slow Start Threshold (ssthresh)

The Memory of Congestion

Imagine driving on an unfamiliar road at night. You accelerate confidently until your headlights reveal a sharp curve ahead—you brake hard, navigate the curve, and continue. But now you know: there's a dangerous curve somewhere around this speed. You won't accelerate as aggressively the next time.

This is exactly how TCP's slow start threshold (ssthresh) works. It's the protocol's memory of where congestion was previously encountered. Once TCP discovers (through packet loss) that a certain transmission rate overwhelms the network, ssthresh records this experience. Future slow start phases will transition to cautious linear growth before repeating the same mistake.

Without ssthresh, TCP would perpetually oscillate between aggressive exponential growth and catastrophic losses. With ssthresh, TCP learns and adapts, becoming increasingly efficient at utilizing available capacity without repeatedly causing congestion.

What You Will Learn

This page explains what ssthresh represents conceptually, how it's initialized and updated, its role in controlling slow start termination, how it captures network capacity estimates, and why this single variable transforms TCP from memoryless probing to intelligent adaptation.

The Purpose of ssthresh

The slow start threshold (ssthresh) serves a critical function in TCP congestion control: it marks the boundary between aggressive probing (slow start) and conservative probing (congestion avoidance).

Why this boundary matters:

During slow start, cwnd grows exponentially—doubling every RTT. This aggressive growth is appropriate when the network's capacity is unknown and could be very large. But exponential growth has a dangerous property: it will always overshoot the actual capacity, causing packet loss.

The first time slow start overshoots, TCP learns something valuable: the network couldn't handle cwnd at the moment of loss. Rather than forgetting this and repeating the exact same overshoot next time, TCP stores an estimate of the safe limit in ssthresh.

The ssthresh contract:

If cwnd < ssthresh:
    We're in slow start (exponential growth)
    Network capacity is still being discovered
    
If cwnd >= ssthresh:
    We're in congestion avoidance (linear growth)
    We're near the network's estimated limit
    Proceed cautiously to avoid loss

ssthresh Role in TCP State Transitions
Condition	Current Phase	Growth Behavior	Rationale
cwnd < ssthresh	Slow Start	cwnd += MSS per ACK (exponential)	Capacity unknown, probe aggressively
cwnd >= ssthresh	Congestion Avoidance	cwnd += MSS²/cwnd per ACK (linear)	Near capacity, probe conservatively
Packet loss (timeout)	Recovery	ssthresh = cwnd/2, cwnd = 1	Severe congestion, restart cautiously
3 duplicate ACKs	Fast Recovery	ssthresh = cwnd/2, cwnd = ssthresh	Mild congestion, don't reset completely

The learning cycle:

Initial state: ssthresh is set to a very large value (effectively infinite)
First loss: TCP was too aggressive; ssthresh = cwnd/2 captures where trouble started
Second slow start: cwnd grows exponentially only until ssthresh, then switches to linear
Subsequent loss: ssthresh is updated again with fresh information
Steady state: ssthresh oscillates around the actual network capacity

Over time, ssthresh converges to a value that reflects the network's capacity at the bottleneck link. The connection spends most of its time in congestion avoidance, making small adjustments around this learned value.

ssthresh is an Estimate, Not a Guarantee

Network conditions are dynamic—other flows start and stop, routing changes, interference varies. ssthresh captures a snapshot of past capacity, not a prediction of future capacity. This is why TCP continues probing (via congestion avoidance) even after reaching ssthresh, and why ssthresh gets updated on every loss event.

ssthresh Initialization

When a new TCP connection begins, it has no knowledge of the network path's capacity. How should ssthresh be initialized?

The default approach:

Most TCP implementations initialize ssthresh to a very large value:

ssthresh = 65535 bytes (traditional)
ssthresh = INT_MAX (common in modern kernels)
ssthresh = advertised window from SYN (some implementations)

With ssthresh set to a huge value, the initial slow start phase will continue until:

Packet loss occurs (network limit discovered), or
The receiver's window (rwnd) is reached

This is the correct behavior—a new connection should probe aggressively since it knows nothing about path capacity.

Why not start with a conservative ssthresh?

Starting with a low ssthresh (say, 10 segments) would cause the connection to immediately enter congestion avoidance, growing linearly from the start. The result:

Low Initial ssthresh Problems

•Massive underutilization — High-speed paths would take minutes to utilize
•Unfair to new flows — Would be starved by existing flows that learned higher ssthresh
•Penalizes ignorance — The algorithm would be punished for not knowing capacity
•No actual benefit — Still must eventually probe to find real capacity

High Initial ssthresh Benefits

•Rapid capacity discovery — Exponential growth finds capacity in O(log n) RTTs
•Self-correcting — First loss provides accurate ssthresh estimate
•Equal opportunity — New and existing flows have same probing rights
•Bounded damage — At most one exponential overshoot per connection

Platform-specific initialization:

Different operating systems handle initialization slightly differently:

Linux (recent kernels):
    ssthresh = TCP_INFINITE_SSTHRESH (0x7FFFFFFF)
    
FreeBSD:
    ssthresh = min(sb_hiwat, TCP_MAXWIN) // Socket buffer high water mark
    
Windows:
    ssthresh = 65535 (traditional) or larger with window scaling
    
macOS/iOS:
    Similar to FreeBSD, based on socket buffer configuration

Connection reuse and ssthresh preservation:

Some TCP implementations cache ssthresh values from previous connections to the same destination. This allows subsequent connections to benefit from learned capacity without repeating the probing phase. However, this optimization is complex and rarely used due to concerns about stale estimates.

Viewing ssthresh on Linux

You can observe ssthresh values using: ss -i (shows ssthresh in socket info) or ip tcp_metrics show (shows cached values per destination). The latter reveals TCP's memory of past connections, including remembered ssthresh, RTT, and RTO values.

ssthresh Update Rules

The ssthresh variable is updated whenever TCP detects congestion (packet loss). The update rule is designed to capture the last known "safe" transmission rate:

On loss detection:

ssthresh = max(FlightSize/2, 2*MSS)

Where FlightSize is the amount of data in flight at the time of loss detection. In practice, this approximates:

ssthresh = max(cwnd/2, 2*MSS)

Why halve cwnd?

The halving rule is based on a key insight: if cwnd caused congestion, then cwnd was too large. But by how much? TCP's conservative assumption is that capacity is roughly half the current window. This is because:

During slow start, cwnd doubles per RTT. If loss happened at cwnd=X, then cwnd=X/2 worked in the previous RTT.
Halving preserves at least some throughput while creating room for recovery.
If multiple flows share the bottleneck, halving creates fair bandwidth redistribution.

The 2*MSS minimum:

The minimum of 2*MSS ensures that ssthresh never drops so low that the connection effectively stops. Even in severe congestion, two segments can be transmitted, allowing the connection to recover.

ssthresh Update Scenarios
Event	cwnd Before	ssthresh Update	cwnd After	Next Phase
Timeout (RTO expires)	100 MSS	max(50, 2) = 50 MSS	1 MSS	Slow start to 50, then CA
3 duplicate ACKs	100 MSS	max(50, 2) = 50 MSS	50 + 3 MSS	Fast recovery, then CA
Timeout at low cwnd	4 MSS	max(2, 2) = 2 MSS	1 MSS	Slow start to 2, then CA
Multiple losses in RTT	200 MSS	max(100, 2) = 100 MSS	Varies by variant	Depends on TCP variant
Loss after long idle	50 MSS	max(25, 2) = 25 MSS	1 MSS or restart	May restart slow start

ssthresh only decreases on loss:

A critical observation: ssthresh is only updated when loss occurs. It never increases during normal operation. This means:

After initial loss sets ssthresh, it may only decrease further
If network capacity increases (less competing traffic), ssthresh won't capture this
TCP may underestimate available capacity for long-lived connections

Implications for performance:

After a period of congestion passes, ssthresh may be "stuck" at a low value from past losses. The connection continues in congestion avoidance (linear growth) even though slow start (exponential growth) would be appropriate.

Some TCP variants address this with "slow start restart" (RFC 2414) or by allowing ssthresh to increase under certain conditions, but the core behavior remains: ssthresh is a conservative estimate that errs on the side of caution.

The RTO Penalty

When loss is detected via timeout (RTO expiration) rather than fast retransmit (duplicate ACKs), TCP resets cwnd to 1 MSS and enters slow start. This is a severe penalty—throughput may drop by 99% instantly. The rationale is that timeout indicates severe congestion (the network is so overwhelmed that no ACKs are getting through), warranting the most conservative response.

The Slow Start to Congestion Avoidance Transition

The transition from slow start to congestion avoidance is one of TCP's most important state changes. Understanding exactly when and how this transition occurs is crucial for analyzing connection behavior.

The transition condition:

When cwnd >= ssthresh:
    Exit slow start
    Enter congestion avoidance
    Change growth from exponential to linear

This condition is checked after every cwnd update. The moment cwnd equals or exceeds ssthresh, the growth rule changes from additive (per ACK) to truly additive (per RTT).

Before transition (slow start):

On ACK: cwnd = cwnd + MSS
Effect: cwnd doubles per RTT

After transition (congestion avoidance):

On ACK: cwnd = cwnd + MSS * (MSS / cwnd)
Effect: cwnd increases by 1 MSS per RTT

Detailed transition example:

Let's trace a connection with MSS=1460, initial cwnd=10, and ssthresh=64 (set by previous loss):

RTT 0: cwnd=10, slow start, send 10 segments
RTT 1: cwnd=20, slow start, send 20 segments
RTT 2: cwnd=40, slow start, send 40 segments
RTT 3: cwnd=80, exceeds ssthresh=64!
        → But wait: did we cross threshold mid-RTT?

The transition isn't perfectly clean. If cwnd=40 at start of RTT 3, and each of 40 ACKs adds 1 MSS:

After receiving 24 ACKs: cwnd = 40 + 24 = 64 (equals ssthresh)
ACK 25 onwards: congestion avoidance rule applies
Growth slows for remaining 16 ACKs

In practice, implementations handle this by checking the threshold continuously. The result: a smooth transition from exponential to linear growth right at ssthresh.

Converting Mermaid diagram...

HyStart: A Smarter Transition

Modern TCP implementations like CUBIC use HyStart (Hybrid Slow Start) to detect the approaching threshold before loss occurs. By monitoring RTT increases during slow start, HyStart can transition to congestion avoidance early, avoiding the packet loss that would otherwise terminate slow start. This preserves the ssthresh's role while reducing unnecessary losses.

ssthresh as a Capacity Estimator

The ssthresh variable can be viewed as TCP's internal estimate of the network's capacity (or more precisely, its safe operating window). Let's analyze how accurate this estimate is and its implications:

What ssthresh actually represents:

When loss occurs at cwnd = X, ssthresh is set to X/2. This represents:

Half the window size that caused congestion
An estimate of the Bandwidth-Delay Product (BDP) of the path
A "safe" transmission rate that shouldn't cause immediate congestion

Why X/2 and not exactly the capacity?

The network's capacity (BDP) isn't precisely at the loss point. Several factors create uncertainty:

Sources of Estimation Error

•Buffer absorption — Routers have buffers that delay the congestion signal. cwnd at loss may be BDP + buffer size, so X/2 may still exceed BDP.
•Other flows — The capacity available to this flow depends on competing traffic. What worked before may not work now.
•Measurement delay — Loss detection (RTO or dup ACKs) takes time. By the time TCP reacts, conditions may have changed.
•Exponential overshoot — In slow start, cwnd doubled from working to failing. The "true" limit is somewhere between X/2 and X.
•Transient congestion — A brief spike in cross-traffic may cause loss at a cwnd well below sustainable capacity.

ssthresh convergence behavior:

Over time, ssthresh tends to oscillate around the actual fair-share capacity:

If ssthresh is too low, congestion avoidance slowly increases cwnd until loss occurs, then ssthresh updates
If ssthresh is too high, slow start may overshoot again, reducing ssthresh
The oscillation amplitude depends on network dynamics and competing traffic

Steady-state analysis:

In a stable network with N identical TCP flows sharing a bottleneck of capacity C:

Each flow's fair share is C/N
Each flow's ssthresh oscillates around C/N
cwnd follows a sawtooth pattern between ssthresh and 2×ssthresh
Average utilization approaches 100% minus the buffer requirements

The "pipe" analogy:

ssthresh can be thought of as TCP's estimate of the pipe size (BDP). During slow start, TCP rapidly fills the pipe. Once full (loss), ssthresh records the pipe size. Congestion avoidance then maintains the pipe near capacity, probing for changes.

Estimating BDP from ssthresh

In practice, if you can observe a connection's ssthresh after it stabilizes, you can estimate the path's effective BDP. Use: ip tcp_metrics show DEST on Linux to see cached ssthresh values. Multiply by MSS to get bytes. Compare with expected BDP (known_bandwidth × RTT) to assess if the connection is achieving its potential.

ssthresh in Different TCP Variants

Different TCP congestion control algorithms handle ssthresh with slight variations, affecting performance characteristics:

TCP Tahoe (Original, 1988):

On any loss:
    ssthresh = cwnd / 2
    cwnd = 1 MSS
    Enter slow start

Simplest behavior: always reset to slow start. This is very conservative—even a single lost packet causes a complete restart.

TCP Reno (1990):

On timeout:
    ssthresh = cwnd / 2
    cwnd = 1 MSS
    Enter slow start

On 3 duplicate ACKs:
    ssthresh = cwnd / 2
    cwnd = ssthresh + 3 MSS  (for 3 dup ACKs in flight)
    Enter fast recovery (retransmit lost, then CA)

Reno distinguishes between timeout (severe) and duplicate ACKs (mild), avoiding slow start restart when possible.

ssthresh Behavior Across TCP Variants
Variant	ssthresh on Loss	Post-Loss cwnd	Key Innovation
Tahoe	cwnd/2	1 MSS always	Original congestion control
Reno	cwnd/2	ssthresh on dup ACK, 1 on timeout	Fast recovery
NewReno	cwnd/2	ssthresh, tracks partial ACKs	Handles multiple losses
CUBIC	0.7 × cwnd	0.7 × cwnd	Less aggressive reduction
Vegas	cwnd/2	cwnd - (cwnd × diff/baseRTT)	RTT-based, rarely triggers
Compound	cwnd/2	dwnd + cwnd/2	Delay + loss based
BBR	N/A (model-based)	Calculated from model	Ignores loss for ssthresh

CUBIC (Linux default since 2006):

CUBIC uses a modified reduction:

ssthresh = 0.7 × cwnd  (reduce to 70%, not 50%)
cwnd = ssthresh

This less aggressive reduction maintains higher throughput during mild congestion. CUBIC also uses a cubic function (hence the name) instead of linear growth during congestion avoidance, allowing faster capacity recovery.

BBR (Google, 2016):

BBR fundamentally changes the model:

No traditional ssthresh
Capacity estimated from measured RTT and delivery rate
Congestion detected by RTT increase, not loss

BBR doesn't use loss as the primary congestion signal, so the concept of ssthresh (a loss-driven threshold) doesn't directly apply. Instead, BBR maintains explicit models of bottleneck bandwidth and minimum RTT.

Implications for performance:

The choice of ssthresh reduction factor (1/2 for classic, 0.7 for CUBIC) directly affects:

Recovery time after loss
Fairness with competing flows
Amplitude of cwnd oscillations

Why Different Reduction Factors?

The traditional 1/2 factor was chosen assuming every flow should reduce equally to create room for new flows. CUBIC's 0.7 factor reflects that with larger BDPs and more efficient loss recovery, less drastic reduction is needed. BBR's approach recognizes that loss may not indicate congestion at all in some modern networks with stochastic loss (e.g., wireless).

Common ssthresh Problems

While ssthresh is essential for stable congestion control, several problems can arise:

Problem 1: Stale ssthresh estimates

If network conditions change significantly after ssthresh is set, the estimate becomes stale:

Competing flow ends → capacity increases, but ssthresh stays low
Connection goes idle → doesn't probe for new capacity
Route changes → new path has different capacity

Solution approaches:

Some implementations "decay" ssthresh upward over time
Linux's tcp_metrics cache expires entries after a period
Slow start restart after idle resumes probing

Problem 2: ssthresh too low from spurious loss

Non-congestion losses (wireless corruption, packet reordering) trigger ssthresh reduction despite no actual congestion:

scenario:
    Packet reordered (not lost)
    TCP detects as loss (3 dup ACKs)
    ssthresh = cwnd/2 (unnecessary reduction)
    Throughput halved with no congestion

Solution approaches:

DSACK to detect spurious retransmits
F-RTO algorithm for spurious timeout detection
Eifel algorithm for response to spurious timeouts

Diagnosing ssthresh Issues

•Symptom: Consistently low throughput — Check if ssthresh is stuck at a low value from past loss. Compare with expected BDP.
•Symptom: Slow start never exits — ssthresh may be set very high (normal for new connections) but rwnd limits throughput first.
•Symptom: Saw-tooth throughput — Normal behavior oscillating between ssthresh and 2×ssthresh. Reduce losses to smooth.
•Symptom: Performance varies between connections — ssthresh caching may preserve stale estimates. Consider flushing tcp_metrics.
•Symptom: One direction fast, other slow — ssthresh is per-direction. Each direction learns independently.

Problem 3: ssthresh and path MTU discovery interaction

When PMTUD reduces MSS, the segment count for the same byte count increases. If ssthresh is stored in bytes, behavior is consistent. If stored in segments (some implementations), the effective capacity changes:

Before PMTUD: ssthresh = 50 × 1460 = 73,000 bytes
PMTUD reduces MSS to 512
If segments: ssthresh = 50 × 512 = 25,600 bytes (underestimate)
If bytes: ssthresh = 73,000 bytes / 512 = 142 segments (correct)

Most modern implementations store ssthresh in bytes to avoid this issue.

Problem 4: ssthresh and multi-path

When load balancing (ECMP) sends packets across multiple paths with different capacities, loss signals become confusing. ssthresh oscillates as different paths congest at different rates. MPTCP addresses this with per-subflow congestion control.

Debugging Low ssthresh on Linux

If connections to specific hosts perform poorly, check ip tcp_metrics show DEST. A very low ssthresh (e.g., 2-4 MSS) suggests past severe congestion. Clear with ip tcp_metrics delete DEST or globally with ip tcp_metrics flush all. The next connection will relearn ssthresh from scratch.

Summary: The Learning Threshold

We've comprehensively analyzed ssthresh—TCP's memory of network capacity and the gatekeeper between exponential and linear growth. Let's consolidate the essential insights:

Key Takeaways

•ssthresh marks the slow start/congestion avoidance boundary — When cwnd ≥ ssthresh, TCP transitions from exponential to linear growth.
•Initially set high, updated on loss — New connections assume infinite capacity; loss teaches the actual limit.
•Halving on loss estimates safe capacity — ssthresh = cwnd/2 captures where trouble started, providing a conservative restart point.
•ssthresh is TCP's capacity memory — It enables learning from past congestion to avoid repeating identical mistakes.
•Different variants modify the behavior — CUBIC uses 0.7x, BBR ignores loss-based ssthresh entirely.
•Stale estimates can hurt performance — Cached or outdated ssthresh values may limit throughput below actual capacity.

What's next:

With slow start terminated (either by reaching ssthresh or experiencing loss), TCP enters congestion avoidance. The next page explores this linear growth phase—how it cautiously probes for additional capacity while minimizing the risk of causing congestion.

Page Complete

You now understand ssthresh—the variable that transforms TCP from a memoryless protocol into an adaptive one. By recording where congestion occurred and using it to control future growth, ssthresh enables TCP to efficiently utilize networks without repeatedly causing collapse. Next, we'll examine the congestion avoidance phase that ssthresh guards.

Slow Start Threshold (ssthresh)

The Memory of Congestion

What You Will Learn

The Purpose of ssthresh

Why this boundary matters:

The ssthresh contract:

If cwnd < ssthresh:
    We're in slow start (exponential growth)
    Network capacity is still being discovered
    
If cwnd >= ssthresh:
    We're in congestion avoidance (linear growth)
    We're near the network's estimated limit
    Proceed cautiously to avoid loss

ssthresh Role in TCP State Transitions
Condition	Current Phase	Growth Behavior	Rationale
cwnd < ssthresh	Slow Start	cwnd += MSS per ACK (exponential)	Capacity unknown, probe aggressively
cwnd >= ssthresh	Congestion Avoidance	cwnd += MSS²/cwnd per ACK (linear)	Near capacity, probe conservatively
Packet loss (timeout)	Recovery	ssthresh = cwnd/2, cwnd = 1	Severe congestion, restart cautiously
3 duplicate ACKs	Fast Recovery	ssthresh = cwnd/2, cwnd = ssthresh	Mild congestion, don't reset completely

The learning cycle:

Initial state: ssthresh is set to a very large value (effectively infinite)
First loss: TCP was too aggressive; ssthresh = cwnd/2 captures where trouble started
Second slow start: cwnd grows exponentially only until ssthresh, then switches to linear
Subsequent loss: ssthresh is updated again with fresh information
Steady state: ssthresh oscillates around the actual network capacity

ssthresh is an Estimate, Not a Guarantee

ssthresh Initialization

When a new TCP connection begins, it has no knowledge of the network path's capacity. How should ssthresh be initialized?

The default approach:

Most TCP implementations initialize ssthresh to a very large value:

ssthresh = 65535 bytes (traditional)
ssthresh = INT_MAX (common in modern kernels)
ssthresh = advertised window from SYN (some implementations)

With ssthresh set to a huge value, the initial slow start phase will continue until:

Packet loss occurs (network limit discovered), or
The receiver's window (rwnd) is reached

This is the correct behavior—a new connection should probe aggressively since it knows nothing about path capacity.

Why not start with a conservative ssthresh?

Starting with a low ssthresh (say, 10 segments) would cause the connection to immediately enter congestion avoidance, growing linearly from the start. The result:

Low Initial ssthresh Problems

•Massive underutilization — High-speed paths would take minutes to utilize
•Unfair to new flows — Would be starved by existing flows that learned higher ssthresh
•Penalizes ignorance — The algorithm would be punished for not knowing capacity
•No actual benefit — Still must eventually probe to find real capacity

High Initial ssthresh Benefits

•Rapid capacity discovery — Exponential growth finds capacity in O(log n) RTTs
•Self-correcting — First loss provides accurate ssthresh estimate
•Equal opportunity — New and existing flows have same probing rights
•Bounded damage — At most one exponential overshoot per connection

Platform-specific initialization:

Different operating systems handle initialization slightly differently:

Linux (recent kernels):
    ssthresh = TCP_INFINITE_SSTHRESH (0x7FFFFFFF)
    
FreeBSD:
    ssthresh = min(sb_hiwat, TCP_MAXWIN) // Socket buffer high water mark
    
Windows:
    ssthresh = 65535 (traditional) or larger with window scaling
    
macOS/iOS:
    Similar to FreeBSD, based on socket buffer configuration

Connection reuse and ssthresh preservation:

Viewing ssthresh on Linux

ssthresh Update Rules

The ssthresh variable is updated whenever TCP detects congestion (packet loss). The update rule is designed to capture the last known "safe" transmission rate:

On loss detection:

ssthresh = max(FlightSize/2, 2*MSS)

Where FlightSize is the amount of data in flight at the time of loss detection. In practice, this approximates:

ssthresh = max(cwnd/2, 2*MSS)

Why halve cwnd?

During slow start, cwnd doubles per RTT. If loss happened at cwnd=X, then cwnd=X/2 worked in the previous RTT.
Halving preserves at least some throughput while creating room for recovery.
If multiple flows share the bottleneck, halving creates fair bandwidth redistribution.

The 2*MSS minimum:

The minimum of 2*MSS ensures that ssthresh never drops so low that the connection effectively stops. Even in severe congestion, two segments can be transmitted, allowing the connection to recover.

ssthresh Update Scenarios
Event	cwnd Before	ssthresh Update	cwnd After	Next Phase
Timeout (RTO expires)	100 MSS	max(50, 2) = 50 MSS	1 MSS	Slow start to 50, then CA
3 duplicate ACKs	100 MSS	max(50, 2) = 50 MSS	50 + 3 MSS	Fast recovery, then CA
Timeout at low cwnd	4 MSS	max(2, 2) = 2 MSS	1 MSS	Slow start to 2, then CA
Multiple losses in RTT	200 MSS	max(100, 2) = 100 MSS	Varies by variant	Depends on TCP variant
Loss after long idle	50 MSS	max(25, 2) = 25 MSS	1 MSS or restart	May restart slow start

ssthresh only decreases on loss:

A critical observation: ssthresh is only updated when loss occurs. It never increases during normal operation. This means:

After initial loss sets ssthresh, it may only decrease further
If network capacity increases (less competing traffic), ssthresh won't capture this
TCP may underestimate available capacity for long-lived connections

Implications for performance:

The RTO Penalty

The Slow Start to Congestion Avoidance Transition

The transition condition:

When cwnd >= ssthresh:
    Exit slow start
    Enter congestion avoidance
    Change growth from exponential to linear

This condition is checked after every cwnd update. The moment cwnd equals or exceeds ssthresh, the growth rule changes from additive (per ACK) to truly additive (per RTT).

Before transition (slow start):

On ACK: cwnd = cwnd + MSS
Effect: cwnd doubles per RTT

After transition (congestion avoidance):

On ACK: cwnd = cwnd + MSS * (MSS / cwnd)
Effect: cwnd increases by 1 MSS per RTT

Detailed transition example:

Let's trace a connection with MSS=1460, initial cwnd=10, and ssthresh=64 (set by previous loss):

RTT 0: cwnd=10, slow start, send 10 segments
RTT 1: cwnd=20, slow start, send 20 segments
RTT 2: cwnd=40, slow start, send 40 segments
RTT 3: cwnd=80, exceeds ssthresh=64!
        → But wait: did we cross threshold mid-RTT?

The transition isn't perfectly clean. If cwnd=40 at start of RTT 3, and each of 40 ACKs adds 1 MSS:

After receiving 24 ACKs: cwnd = 40 + 24 = 64 (equals ssthresh)
ACK 25 onwards: congestion avoidance rule applies
Growth slows for remaining 16 ACKs

In practice, implementations handle this by checking the threshold continuously. The result: a smooth transition from exponential to linear growth right at ssthresh.

Converting Mermaid diagram...

HyStart: A Smarter Transition

ssthresh as a Capacity Estimator

What ssthresh actually represents:

When loss occurs at cwnd = X, ssthresh is set to X/2. This represents:

Half the window size that caused congestion
An estimate of the Bandwidth-Delay Product (BDP) of the path
A "safe" transmission rate that shouldn't cause immediate congestion

Why X/2 and not exactly the capacity?

The network's capacity (BDP) isn't precisely at the loss point. Several factors create uncertainty:

Sources of Estimation Error

•Buffer absorption — Routers have buffers that delay the congestion signal. cwnd at loss may be BDP + buffer size, so X/2 may still exceed BDP.
•Other flows — The capacity available to this flow depends on competing traffic. What worked before may not work now.
•Measurement delay — Loss detection (RTO or dup ACKs) takes time. By the time TCP reacts, conditions may have changed.
•Exponential overshoot — In slow start, cwnd doubled from working to failing. The "true" limit is somewhere between X/2 and X.
•Transient congestion — A brief spike in cross-traffic may cause loss at a cwnd well below sustainable capacity.

ssthresh convergence behavior:

Over time, ssthresh tends to oscillate around the actual fair-share capacity:

If ssthresh is too low, congestion avoidance slowly increases cwnd until loss occurs, then ssthresh updates
If ssthresh is too high, slow start may overshoot again, reducing ssthresh
The oscillation amplitude depends on network dynamics and competing traffic

Steady-state analysis:

In a stable network with N identical TCP flows sharing a bottleneck of capacity C:

Each flow's fair share is C/N
Each flow's ssthresh oscillates around C/N
cwnd follows a sawtooth pattern between ssthresh and 2×ssthresh
Average utilization approaches 100% minus the buffer requirements

The "pipe" analogy:

Estimating BDP from ssthresh

ssthresh in Different TCP Variants

Different TCP congestion control algorithms handle ssthresh with slight variations, affecting performance characteristics:

TCP Tahoe (Original, 1988):

On any loss:
    ssthresh = cwnd / 2
    cwnd = 1 MSS
    Enter slow start

Simplest behavior: always reset to slow start. This is very conservative—even a single lost packet causes a complete restart.

TCP Reno (1990):

On timeout:
    ssthresh = cwnd / 2
    cwnd = 1 MSS
    Enter slow start

On 3 duplicate ACKs:
    ssthresh = cwnd / 2
    cwnd = ssthresh + 3 MSS  (for 3 dup ACKs in flight)
    Enter fast recovery (retransmit lost, then CA)

Reno distinguishes between timeout (severe) and duplicate ACKs (mild), avoiding slow start restart when possible.

ssthresh Behavior Across TCP Variants
Variant	ssthresh on Loss	Post-Loss cwnd	Key Innovation
Tahoe	cwnd/2	1 MSS always	Original congestion control
Reno	cwnd/2	ssthresh on dup ACK, 1 on timeout	Fast recovery
NewReno	cwnd/2	ssthresh, tracks partial ACKs	Handles multiple losses
CUBIC	0.7 × cwnd	0.7 × cwnd	Less aggressive reduction
Vegas	cwnd/2	cwnd - (cwnd × diff/baseRTT)	RTT-based, rarely triggers
Compound	cwnd/2	dwnd + cwnd/2	Delay + loss based
BBR	N/A (model-based)	Calculated from model	Ignores loss for ssthresh

CUBIC (Linux default since 2006):

CUBIC uses a modified reduction:

ssthresh = 0.7 × cwnd  (reduce to 70%, not 50%)
cwnd = ssthresh

BBR (Google, 2016):

BBR fundamentally changes the model:

No traditional ssthresh
Capacity estimated from measured RTT and delivery rate
Congestion detected by RTT increase, not loss

Implications for performance:

The choice of ssthresh reduction factor (1/2 for classic, 0.7 for CUBIC) directly affects:

Recovery time after loss
Fairness with competing flows
Amplitude of cwnd oscillations

Why Different Reduction Factors?

Common ssthresh Problems

While ssthresh is essential for stable congestion control, several problems can arise:

Problem 1: Stale ssthresh estimates

If network conditions change significantly after ssthresh is set, the estimate becomes stale:

Competing flow ends → capacity increases, but ssthresh stays low
Connection goes idle → doesn't probe for new capacity
Route changes → new path has different capacity

Solution approaches:

Some implementations "decay" ssthresh upward over time
Linux's tcp_metrics cache expires entries after a period
Slow start restart after idle resumes probing

Problem 2: ssthresh too low from spurious loss

Non-congestion losses (wireless corruption, packet reordering) trigger ssthresh reduction despite no actual congestion:

scenario:
    Packet reordered (not lost)
    TCP detects as loss (3 dup ACKs)
    ssthresh = cwnd/2 (unnecessary reduction)
    Throughput halved with no congestion

Solution approaches:

DSACK to detect spurious retransmits
F-RTO algorithm for spurious timeout detection
Eifel algorithm for response to spurious timeouts

Diagnosing ssthresh Issues

•Symptom: Consistently low throughput — Check if ssthresh is stuck at a low value from past loss. Compare with expected BDP.
•Symptom: Slow start never exits — ssthresh may be set very high (normal for new connections) but rwnd limits throughput first.
•Symptom: Saw-tooth throughput — Normal behavior oscillating between ssthresh and 2×ssthresh. Reduce losses to smooth.
•Symptom: Performance varies between connections — ssthresh caching may preserve stale estimates. Consider flushing tcp_metrics.
•Symptom: One direction fast, other slow — ssthresh is per-direction. Each direction learns independently.

Problem 3: ssthresh and path MTU discovery interaction

Before PMTUD: ssthresh = 50 × 1460 = 73,000 bytes
PMTUD reduces MSS to 512
If segments: ssthresh = 50 × 512 = 25,600 bytes (underestimate)
If bytes: ssthresh = 73,000 bytes / 512 = 142 segments (correct)

Most modern implementations store ssthresh in bytes to avoid this issue.

Problem 4: ssthresh and multi-path

Debugging Low ssthresh on Linux

Summary: The Learning Threshold

We've comprehensively analyzed ssthresh—TCP's memory of network capacity and the gatekeeper between exponential and linear growth. Let's consolidate the essential insights:

Key Takeaways

•ssthresh marks the slow start/congestion avoidance boundary — When cwnd ≥ ssthresh, TCP transitions from exponential to linear growth.
•Initially set high, updated on loss — New connections assume infinite capacity; loss teaches the actual limit.
•Halving on loss estimates safe capacity — ssthresh = cwnd/2 captures where trouble started, providing a conservative restart point.
•ssthresh is TCP's capacity memory — It enables learning from past congestion to avoid repeating identical mistakes.
•Different variants modify the behavior — CUBIC uses 0.7x, BBR ignores loss-based ssthresh entirely.
•Stale estimates can hurt performance — Cached or outdated ssthresh values may limit throughput below actual capacity.

What's next:

Page Complete