Loading content...
Imagine driving on an unfamiliar road at night. You accelerate confidently until your headlights reveal a sharp curve ahead—you brake hard, navigate the curve, and continue. But now you know: there's a dangerous curve somewhere around this speed. You won't accelerate as aggressively the next time.
This is exactly how TCP's slow start threshold (ssthresh) works. It's the protocol's memory of where congestion was previously encountered. Once TCP discovers (through packet loss) that a certain transmission rate overwhelms the network, ssthresh records this experience. Future slow start phases will transition to cautious linear growth before repeating the same mistake.
Without ssthresh, TCP would perpetually oscillate between aggressive exponential growth and catastrophic losses. With ssthresh, TCP learns and adapts, becoming increasingly efficient at utilizing available capacity without repeatedly causing congestion.
This page explains what ssthresh represents conceptually, how it's initialized and updated, its role in controlling slow start termination, how it captures network capacity estimates, and why this single variable transforms TCP from memoryless probing to intelligent adaptation.
The slow start threshold (ssthresh) serves a critical function in TCP congestion control: it marks the boundary between aggressive probing (slow start) and conservative probing (congestion avoidance).
Why this boundary matters:
During slow start, cwnd grows exponentially—doubling every RTT. This aggressive growth is appropriate when the network's capacity is unknown and could be very large. But exponential growth has a dangerous property: it will always overshoot the actual capacity, causing packet loss.
The first time slow start overshoots, TCP learns something valuable: the network couldn't handle cwnd at the moment of loss. Rather than forgetting this and repeating the exact same overshoot next time, TCP stores an estimate of the safe limit in ssthresh.
The ssthresh contract:
If cwnd < ssthresh:
We're in slow start (exponential growth)
Network capacity is still being discovered
If cwnd >= ssthresh:
We're in congestion avoidance (linear growth)
We're near the network's estimated limit
Proceed cautiously to avoid loss
| Condition | Current Phase | Growth Behavior | Rationale |
|---|---|---|---|
| cwnd < ssthresh | Slow Start | cwnd += MSS per ACK (exponential) | Capacity unknown, probe aggressively |
| cwnd >= ssthresh | Congestion Avoidance | cwnd += MSS²/cwnd per ACK (linear) | Near capacity, probe conservatively |
| Packet loss (timeout) | Recovery | ssthresh = cwnd/2, cwnd = 1 | Severe congestion, restart cautiously |
| 3 duplicate ACKs | Fast Recovery | ssthresh = cwnd/2, cwnd = ssthresh | Mild congestion, don't reset completely |
The learning cycle:
Over time, ssthresh converges to a value that reflects the network's capacity at the bottleneck link. The connection spends most of its time in congestion avoidance, making small adjustments around this learned value.
Network conditions are dynamic—other flows start and stop, routing changes, interference varies. ssthresh captures a snapshot of past capacity, not a prediction of future capacity. This is why TCP continues probing (via congestion avoidance) even after reaching ssthresh, and why ssthresh gets updated on every loss event.
When a new TCP connection begins, it has no knowledge of the network path's capacity. How should ssthresh be initialized?
The default approach:
Most TCP implementations initialize ssthresh to a very large value:
ssthresh = 65535 bytes (traditional)
ssthresh = INT_MAX (common in modern kernels)
ssthresh = advertised window from SYN (some implementations)
With ssthresh set to a huge value, the initial slow start phase will continue until:
This is the correct behavior—a new connection should probe aggressively since it knows nothing about path capacity.
Why not start with a conservative ssthresh?
Starting with a low ssthresh (say, 10 segments) would cause the connection to immediately enter congestion avoidance, growing linearly from the start. The result:
Platform-specific initialization:
Different operating systems handle initialization slightly differently:
Linux (recent kernels):
ssthresh = TCP_INFINITE_SSTHRESH (0x7FFFFFFF)
FreeBSD:
ssthresh = min(sb_hiwat, TCP_MAXWIN) // Socket buffer high water mark
Windows:
ssthresh = 65535 (traditional) or larger with window scaling
macOS/iOS:
Similar to FreeBSD, based on socket buffer configuration
Connection reuse and ssthresh preservation:
Some TCP implementations cache ssthresh values from previous connections to the same destination. This allows subsequent connections to benefit from learned capacity without repeating the probing phase. However, this optimization is complex and rarely used due to concerns about stale estimates.
You can observe ssthresh values using: ss -i (shows ssthresh in socket info) or ip tcp_metrics show (shows cached values per destination). The latter reveals TCP's memory of past connections, including remembered ssthresh, RTT, and RTO values.
The ssthresh variable is updated whenever TCP detects congestion (packet loss). The update rule is designed to capture the last known "safe" transmission rate:
On loss detection:
ssthresh = max(FlightSize/2, 2*MSS)
Where FlightSize is the amount of data in flight at the time of loss detection. In practice, this approximates:
ssthresh = max(cwnd/2, 2*MSS)
Why halve cwnd?
The halving rule is based on a key insight: if cwnd caused congestion, then cwnd was too large. But by how much? TCP's conservative assumption is that capacity is roughly half the current window. This is because:
The 2*MSS minimum:
The minimum of 2*MSS ensures that ssthresh never drops so low that the connection effectively stops. Even in severe congestion, two segments can be transmitted, allowing the connection to recover.
| Event | cwnd Before | ssthresh Update | cwnd After | Next Phase |
|---|---|---|---|---|
| Timeout (RTO expires) | 100 MSS | max(50, 2) = 50 MSS | 1 MSS | Slow start to 50, then CA |
| 3 duplicate ACKs | 100 MSS | max(50, 2) = 50 MSS | 50 + 3 MSS | Fast recovery, then CA |
| Timeout at low cwnd | 4 MSS | max(2, 2) = 2 MSS | 1 MSS | Slow start to 2, then CA |
| Multiple losses in RTT | 200 MSS | max(100, 2) = 100 MSS | Varies by variant | Depends on TCP variant |
| Loss after long idle | 50 MSS | max(25, 2) = 25 MSS | 1 MSS or restart | May restart slow start |
ssthresh only decreases on loss:
A critical observation: ssthresh is only updated when loss occurs. It never increases during normal operation. This means:
Implications for performance:
After a period of congestion passes, ssthresh may be "stuck" at a low value from past losses. The connection continues in congestion avoidance (linear growth) even though slow start (exponential growth) would be appropriate.
Some TCP variants address this with "slow start restart" (RFC 2414) or by allowing ssthresh to increase under certain conditions, but the core behavior remains: ssthresh is a conservative estimate that errs on the side of caution.
When loss is detected via timeout (RTO expiration) rather than fast retransmit (duplicate ACKs), TCP resets cwnd to 1 MSS and enters slow start. This is a severe penalty—throughput may drop by 99% instantly. The rationale is that timeout indicates severe congestion (the network is so overwhelmed that no ACKs are getting through), warranting the most conservative response.
The transition from slow start to congestion avoidance is one of TCP's most important state changes. Understanding exactly when and how this transition occurs is crucial for analyzing connection behavior.
The transition condition:
When cwnd >= ssthresh:
Exit slow start
Enter congestion avoidance
Change growth from exponential to linear
This condition is checked after every cwnd update. The moment cwnd equals or exceeds ssthresh, the growth rule changes from additive (per ACK) to truly additive (per RTT).
Before transition (slow start):
On ACK: cwnd = cwnd + MSS
Effect: cwnd doubles per RTT
After transition (congestion avoidance):
On ACK: cwnd = cwnd + MSS * (MSS / cwnd)
Effect: cwnd increases by 1 MSS per RTT
Detailed transition example:
Let's trace a connection with MSS=1460, initial cwnd=10, and ssthresh=64 (set by previous loss):
RTT 0: cwnd=10, slow start, send 10 segments
RTT 1: cwnd=20, slow start, send 20 segments
RTT 2: cwnd=40, slow start, send 40 segments
RTT 3: cwnd=80, exceeds ssthresh=64!
→ But wait: did we cross threshold mid-RTT?
The transition isn't perfectly clean. If cwnd=40 at start of RTT 3, and each of 40 ACKs adds 1 MSS:
In practice, implementations handle this by checking the threshold continuously. The result: a smooth transition from exponential to linear growth right at ssthresh.
Modern TCP implementations like CUBIC use HyStart (Hybrid Slow Start) to detect the approaching threshold before loss occurs. By monitoring RTT increases during slow start, HyStart can transition to congestion avoidance early, avoiding the packet loss that would otherwise terminate slow start. This preserves the ssthresh's role while reducing unnecessary losses.
The ssthresh variable can be viewed as TCP's internal estimate of the network's capacity (or more precisely, its safe operating window). Let's analyze how accurate this estimate is and its implications:
What ssthresh actually represents:
When loss occurs at cwnd = X, ssthresh is set to X/2. This represents:
Why X/2 and not exactly the capacity?
The network's capacity (BDP) isn't precisely at the loss point. Several factors create uncertainty:
ssthresh convergence behavior:
Over time, ssthresh tends to oscillate around the actual fair-share capacity:
Steady-state analysis:
In a stable network with N identical TCP flows sharing a bottleneck of capacity C:
The "pipe" analogy:
ssthresh can be thought of as TCP's estimate of the pipe size (BDP). During slow start, TCP rapidly fills the pipe. Once full (loss), ssthresh records the pipe size. Congestion avoidance then maintains the pipe near capacity, probing for changes.
In practice, if you can observe a connection's ssthresh after it stabilizes, you can estimate the path's effective BDP. Use: ip tcp_metrics show DEST on Linux to see cached ssthresh values. Multiply by MSS to get bytes. Compare with expected BDP (known_bandwidth × RTT) to assess if the connection is achieving its potential.
Different TCP congestion control algorithms handle ssthresh with slight variations, affecting performance characteristics:
TCP Tahoe (Original, 1988):
On any loss:
ssthresh = cwnd / 2
cwnd = 1 MSS
Enter slow start
Simplest behavior: always reset to slow start. This is very conservative—even a single lost packet causes a complete restart.
TCP Reno (1990):
On timeout:
ssthresh = cwnd / 2
cwnd = 1 MSS
Enter slow start
On 3 duplicate ACKs:
ssthresh = cwnd / 2
cwnd = ssthresh + 3 MSS (for 3 dup ACKs in flight)
Enter fast recovery (retransmit lost, then CA)
Reno distinguishes between timeout (severe) and duplicate ACKs (mild), avoiding slow start restart when possible.
| Variant | ssthresh on Loss | Post-Loss cwnd | Key Innovation |
|---|---|---|---|
| Tahoe | cwnd/2 | 1 MSS always | Original congestion control |
| Reno | cwnd/2 | ssthresh on dup ACK, 1 on timeout | Fast recovery |
| NewReno | cwnd/2 | ssthresh, tracks partial ACKs | Handles multiple losses |
| CUBIC | 0.7 × cwnd | 0.7 × cwnd | Less aggressive reduction |
| Vegas | cwnd/2 | cwnd - (cwnd × diff/baseRTT) | RTT-based, rarely triggers |
| Compound | cwnd/2 | dwnd + cwnd/2 | Delay + loss based |
| BBR | N/A (model-based) | Calculated from model | Ignores loss for ssthresh |
CUBIC (Linux default since 2006):
CUBIC uses a modified reduction:
ssthresh = 0.7 × cwnd (reduce to 70%, not 50%)
cwnd = ssthresh
This less aggressive reduction maintains higher throughput during mild congestion. CUBIC also uses a cubic function (hence the name) instead of linear growth during congestion avoidance, allowing faster capacity recovery.
BBR (Google, 2016):
BBR fundamentally changes the model:
No traditional ssthresh
Capacity estimated from measured RTT and delivery rate
Congestion detected by RTT increase, not loss
BBR doesn't use loss as the primary congestion signal, so the concept of ssthresh (a loss-driven threshold) doesn't directly apply. Instead, BBR maintains explicit models of bottleneck bandwidth and minimum RTT.
Implications for performance:
The choice of ssthresh reduction factor (1/2 for classic, 0.7 for CUBIC) directly affects:
The traditional 1/2 factor was chosen assuming every flow should reduce equally to create room for new flows. CUBIC's 0.7 factor reflects that with larger BDPs and more efficient loss recovery, less drastic reduction is needed. BBR's approach recognizes that loss may not indicate congestion at all in some modern networks with stochastic loss (e.g., wireless).
While ssthresh is essential for stable congestion control, several problems can arise:
Problem 1: Stale ssthresh estimates
If network conditions change significantly after ssthresh is set, the estimate becomes stale:
Solution approaches:
Problem 2: ssthresh too low from spurious loss
Non-congestion losses (wireless corruption, packet reordering) trigger ssthresh reduction despite no actual congestion:
scenario:
Packet reordered (not lost)
TCP detects as loss (3 dup ACKs)
ssthresh = cwnd/2 (unnecessary reduction)
Throughput halved with no congestion
Solution approaches:
Problem 3: ssthresh and path MTU discovery interaction
When PMTUD reduces MSS, the segment count for the same byte count increases. If ssthresh is stored in bytes, behavior is consistent. If stored in segments (some implementations), the effective capacity changes:
Before PMTUD: ssthresh = 50 × 1460 = 73,000 bytes
PMTUD reduces MSS to 512
If segments: ssthresh = 50 × 512 = 25,600 bytes (underestimate)
If bytes: ssthresh = 73,000 bytes / 512 = 142 segments (correct)
Most modern implementations store ssthresh in bytes to avoid this issue.
Problem 4: ssthresh and multi-path
When load balancing (ECMP) sends packets across multiple paths with different capacities, loss signals become confusing. ssthresh oscillates as different paths congest at different rates. MPTCP addresses this with per-subflow congestion control.
If connections to specific hosts perform poorly, check ip tcp_metrics show DEST. A very low ssthresh (e.g., 2-4 MSS) suggests past severe congestion. Clear with ip tcp_metrics delete DEST or globally with ip tcp_metrics flush all. The next connection will relearn ssthresh from scratch.
We've comprehensively analyzed ssthresh—TCP's memory of network capacity and the gatekeeper between exponential and linear growth. Let's consolidate the essential insights:
What's next:
With slow start terminated (either by reaching ssthresh or experiencing loss), TCP enters congestion avoidance. The next page explores this linear growth phase—how it cautiously probes for additional capacity while minimizing the risk of causing congestion.
You now understand ssthresh—the variable that transforms TCP from a memoryless protocol into an adaptive one. By recording where congestion occurred and using it to control future growth, ssthresh enables TCP to efficiently utilize networks without repeatedly causing collapse. Next, we'll examine the congestion avoidance phase that ssthresh guards.