Loading learning content...
Window reduction during Fast Recovery is not a single, simple operation—it's a carefully choreographed sequence of adjustments that must maintain network stability while preserving throughput. The congestion window (cwnd) and slow start threshold (ssthresh) undergo precise transformations that reflect TCP's understanding of network conditions.
Understanding these mechanics requires appreciating the distinction between the immediate reduction (the initial response to congestion), the inflation period (maintaining flow during recovery), and the deflation (returning to stable state after recovery completes). Each phase has specific rules and serves specific purposes in TCP's overall congestion control strategy.
This page provides comprehensive coverage of window reduction mechanics, including: the exact formulas for ssthresh and cwnd adjustment, the window inflation mechanism during Fast Recovery, window deflation upon recovery completion, the relationship between flight size and window calculations, and edge cases in window management.
Before diving into reduction mechanics, we must precisely understand the window-related variables that TCP maintains and how they interact.
The Key Variables:
cwnd (Congestion Window): The sender's estimate of how many bytes can be outstanding in the network without causing congestion. This is the primary control variable for rate limiting.
ssthresh (Slow Start Threshold): The threshold that determines when TCP transitions from slow start to congestion avoidance. It represents TCP's memory of 'safe' network capacity from previous experience.
rwnd (Receiver Window): The receiver's advertised window indicating available buffer space. This provides flow control independent of congestion control.
FlightSize (Bytes in Flight): The actual bytes currently outstanding in the network, calculated as SND.NXT - SND.UNA.
Effective Window: The actual limit on transmission, calculated as min(cwnd, rwnd) - FlightSize.
| Variable | Unit | Managed By | Update Trigger | Typical Range |
|---|---|---|---|---|
| cwnd | Bytes | Sender | ACKs, Timeouts, Dup ACKs | MSS to BDP (potentially millions) |
| ssthresh | Bytes | Sender | Loss detection only | 2×MSS to previous cwnd |
| rwnd | Bytes | Receiver | Buffer availability | 0 to receiver buffer size |
| FlightSize | Bytes | Calculated | Every send/ACK | 0 to min(cwnd, rwnd) |
| recover | Seq Num | Sender | Fast Recovery entry | Highest seq sent at FR entry |
The Relationship Between Variables:
The effective sending rate is determined by the interaction of these variables:
EffectiveWindow = min(cwnd, rwnd) - FlightSize
CanSend = EffectiveWindow >= MSS
When EffectiveWindow >= MSS, the sender can transmit a new segment. When EffectiveWindow < MSS, the sender must wait for ACKs (which reduce FlightSize) or window increases.
During Fast Recovery:
The variables interact differently:
While conceptual discussions often describe windows in terms of 'segments,' actual implementations track bytes. This distinction matters because not all segments are MSS-sized (e.g., the last segment of a file transfer). RFC 5681 specifies all calculations in bytes, with MSS used as the increment unit.
When TCP detects congestion (third duplicate ACK received), the initial reduction occurs immediately. This reduction establishes the new operating point for post-recovery operation.
The ssthresh Calculation:
The slow start threshold is set based on the current flight size:
ssthresh = max(FlightSize / 2, 2 * MSS)
Key observations about this formula:
FlightSize, Not cwnd: The calculation uses actual bytes in flight, not the theoretical window limit. This provides a more accurate estimate of the achievable rate.
Halving (Multiplicative Decrease): Division by 2 implements the β=0.5 multiplicative decrease factor.
Minimum of 2×MSS: The floor ensures ssthresh never drops below two segments, preventing excessive reduction in low-rate connections.
Conservative Estimate: Using FlightSize accounts for the fact that cwnd might not have been fully utilized (e.g., application-limited).
12345678910111213141516171819202122232425262728
// Initial Window Reduction on Fast Recovery Entry// Per RFC 5681 Section 3.2 ON third duplicate ACK received: // Step 1: Calculate current bytes in flight FlightSize = SND.NXT - SND.UNA // Step 2: Set new slow start threshold (multiplicative decrease) ssthresh = max(FlightSize / 2, 2 * MSS) // Step 3: Perform Fast Retransmit of lost segment retransmit(SND.UNA) // Step 4: Set cwnd for Fast Recovery operation // Note: ssthresh reflects new "safe" rate; +3*MSS is inflation cwnd = ssthresh + 3 * MSS // Step 5: Record recovery point recover = SND.NXT - 1 // Step 6: Enter Fast Recovery state state = FAST_RECOVERY // Example with numbers: // If FlightSize = 100,000 bytes and MSS = 1,460 bytes: // ssthresh = max(50,000, 2,920) = 50,000 bytes // cwnd = 50,000 + 4,380 = 54,380 bytesWhy FlightSize Instead of cwnd?
Using FlightSize rather than cwnd has important implications:
Scenario 1: Fully Utilized Window
Scenario 2: Application-Limited
In Scenario 2, the connection was clearly functioning with only 30KB in flight. Setting ssthresh to 15KB (rather than 50KB) reflects actual, proven network capacity. This conservative approach prevents over-estimation when the application wasn't fully utilizing the network.
TCP CUBIC calculates ssthresh differently, using both the current cwnd and a 'β' factor of approximately 0.7: ssthresh = cwnd × 0.7. This less aggressive reduction reflects CUBIC's design for high-BDP networks where aggressive halving causes excessive throughput loss.
Once TCP enters Fast Recovery, the congestion window undergoes a unique behavior known as window inflation. This mechanism allows TCP to continue transmitting new data during the recovery process, maintaining network utilization.
The Inflation Concept:
During Fast Recovery, each duplicate ACK causes cwnd to increase by one MSS:
ON duplicate ACK during Fast Recovery:
cwnd = cwnd + MSS
This might seem counterintuitive—we just detected congestion, so why increase the window? The reasoning is subtle but essential:
Mathematical Justification:
Let's trace the window values through a Fast Recovery scenario:
Initial state before loss:
Loss occurs at segment 5:
Third duplicate ACK (for seg 5) received:
Fourth duplicate ACK received:
Fifth duplicate ACK received:
Tracking FlightSize:
The critical insight is that FlightSize doesn't automatically decrease when duplicate ACKs arrive. The receiver has buffered out-of-order segments, but they're not acknowledged until the gap is filled. From the sender's perspective:
Think of cwnd inflation as maintaining a 'loan balance.' Each duplicate ACK represents an IOU: 'I'll acknowledge this segment later when the gap is filled.' TCP increases cwnd to advance these IOUs, enabling continued transmission. When recovery completes, these IOUs are 'repaid' through deflation.
When Fast Recovery completes successfully—upon receiving an ACK that acknowledges data beyond the recovery point—the inflated window must be deflated to its proper post-recovery value. This deflation is essential for correct subsequent operation.
The Deflation Rule (RFC 5681):
On receiving a new ACK that completes recovery:
cwnd = ssthresh
state = CONGESTION_AVOIDANCE
Why Deflation is Necessary:
Remove Artificial Inflation: The window was inflated by 1 MSS for each duplicate ACK during recovery. These inflations don't represent actual network capacity.
Stabilize at Safe Rate: ssthresh was calculated as half the pre-loss rate—the 'safe' operating point. cwnd should now operate near this value.
Prepare for Linear Growth: Congestion Avoidance uses additive increase (1 MSS per RTT). Starting from inflated cwnd would overload the network.
| Phase | cwnd Value | ssthresh Value | Purpose |
|---|---|---|---|
| Normal Operation | Growing per AIMD | From previous loss | Maximize throughput |
| 3rd Dup ACK | ssthresh + 3×MSS | FlightSize/2 | Initial Fast Recovery setup |
| Each Dup ACK | cwnd + MSS | Unchanged | Enable continued transmission |
| New ACK (Recovery Done) | ssthresh | Unchanged | Deflate and stabilize |
| Congestion Avoidance | ssthresh + growth | Unchanged | Additive increase resumes |
Deflation Example:
Continuing our earlier scenario:
At end of Fast Recovery:
Upon receiving this new ACK:
The Calculation Specifics:
RFC 5681 specifies:
cwnd = ssthresh
Some implementations use an alternative that accounts for out-of-order delivery:
cwnd = min(ssthresh, FlightSize + MSS)
This ensures cwnd isn't set larger than what's actually needed based on current outstanding data.
TCP NewReno handles partial ACKs (ACKs that advance but don't complete recovery) differently. Instead of full deflation, it performs a partial deflation: cwnd = cwnd - acknowledged_bytes + MSS. This maintains recovery mode while accounting for the acknowledged data.
The relationship between cwnd, FlightSize, and the ability to send new data is central to TCP's operation. During Fast Recovery, this relationship becomes particularly nuanced.
The Fundamental Constraint:
UsableWindow = cwnd - FlightSize
CanSend = UsableWindow >= MSS
The sender can transmit new data only when the usable window (cwnd minus FlightSize) is at least one MSS. This constraint applies during all phases of operation.
FlightSize Dynamics During Fast Recovery:
On Duplicate ACK:
On New Segment Sent:
On New ACK:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
class TCPFlightTracker: """Tracks FlightSize and window relationships during Fast Recovery.""" def __init__(self, mss: int = 1460): self.mss = mss self.snd_una = 0 # Oldest unacknowledged byte self.snd_nxt = 0 # Next byte to send self.cwnd = 10 * mss # Example: 10 segments self.ssthresh = 65535 self.recover = 0 self.state = 'NORMAL' @property def flight_size(self) -> int: """Calculate current bytes in flight.""" return self.snd_nxt - self.snd_una @property def usable_window(self) -> int: """Calculate available space for new transmissions.""" return max(0, self.cwnd - self.flight_size) def can_send(self) -> bool: """Check if at least one MSS can be sent.""" return self.usable_window >= self.mss def on_segment_sent(self, size: int): """Called when a segment is transmitted.""" self.snd_nxt += size # FlightSize automatically increases def on_dup_ack_in_fast_recovery(self): """Handle duplicate ACK during Fast Recovery.""" if self.state == 'FAST_RECOVERY': # Inflate window - packet left network self.cwnd += self.mss # Check if new data can be sent if self.can_send(): return 'CAN_SEND_NEW_DATA' else: return 'CANNOT_SEND_YET' def on_new_ack(self, acked_bytes: int, ack_seq: int): """Handle acknowledgment of new data.""" self.snd_una += acked_bytes if self.state == 'FAST_RECOVERY': if ack_seq > self.recover: # Recovery complete - deflate self.cwnd = self.ssthresh self.state = 'CONGESTION_AVOIDANCE' return 'RECOVERY_COMPLETE' else: # Partial ACK - still recovering return 'PARTIAL_ACK' return 'NORMAL_ACK' def enter_fast_recovery(self): """Enter Fast Recovery state.""" self.ssthresh = max(self.flight_size // 2, 2 * self.mss) self.cwnd = self.ssthresh + 3 * self.mss self.recover = self.snd_nxt - 1 self.state = 'FAST_RECOVERY' # Return whether we can immediately send more data return self.can_send()Conservation of Packets:
The inflation mechanism embodies the 'packet conservation' principle articulated by Van Jacobson: in steady state, a new packet should only be injected when an old packet has exited. During Fast Recovery:
This conservation prevents Fast Recovery from adding to congestion while still maintaining throughput.
Imagine the network as a pipe with fixed capacity N packets. Before loss, N packets fill the pipe. During recovery, for each packet that drains from the output, one new packet enters at the input. The retransmitted packet is 'extra' but replaces the lost one. Net result: pipe stays full at N packets throughout recovery.
Real-world TCP implementations must handle numerous edge cases that can complicate window reduction mechanics. Understanding these cases is essential for debugging and implementation.
Edge Case 1: Minimal FlightSize
When FlightSize is very small, the reduction calculation can produce problematic values:
FlightSize = 3,000 bytes (about 2 segments)
ssthresh = max(3,000/2, 2×1,460) = max(1,500, 2,920) = 2,920
The 2×MSS floor ensures ssthresh never drops below 2 segments, preventing degenerate behavior in low-rate connections.
Edge Case 2: Application-Limited Sender
When the application isn't sending data fast enough to fill the window:
cwnd = 100,000 bytes
FlightSize = 5,000 bytes (app only sends small bursts)
3 Dup ACKs arrive
ssthresh = max(5,000/2, 2,920) = 2,920 bytes (not 50,000!)
This case uses FlightSize rather than cwnd specifically to handle application-limited senders correctly.
| Edge Case | Challenge | Solution | RFC Reference |
|---|---|---|---|
| Very small FlightSize | ssthresh could become < 2×MSS | Floor of 2×MSS | RFC 5681 §3.2 |
| Application-limited | cwnd >> FlightSize | Use FlightSize for calculation | RFC 5681 §3.2 |
| Reordering (spurious dup ACKs) | False congestion detection | Undo mechanism, DSACK | RFC 3708 |
| Multiple losses in window | Partial ACKs cause issues | NewReno partial ACK handling | RFC 6582 |
| cwnd < ssthresh after FR | Shouldn't enter slow start | Remain in congestion avoidance | RFC 5681 |
| Near-zero rwnd | Can't send despite cwnd | Flow control takes precedence | RFC 5681 |
Edge Case 3: Spurious Fast Recovery
Sometimes TCP enters Fast Recovery incorrectly due to packet reordering rather than actual loss. Modern TCP includes undo mechanisms:
Detection: If the retransmitted segment is acknowledged with a DSACK (duplicate SACK) indicating it was unnecessary, the entry was spurious.
Undo Recovery:
Eifel Algorithm: Uses TCP timestamps to detect spurious retransmissions before the response ACK arrives.
Edge Case 4: Retransmission Timeout During Fast Recovery
If the retransmission timer expires while in Fast Recovery:
// More severe response takes precedence
ssthresh = max(FlightSize/2, 2×MSS) // May be redundant
cwnd = 1 MSS (or initial window)
state = SLOW_START
The timeout response supersedes Fast Recovery because it indicates a more severe problem—possibly the retransmitted segment was also lost.
Different operating systems implement these edge cases slightly differently. For example, Linux uses the 'PRR' (Proportional Rate Reduction) algorithm which provides more gradual window reduction. Windows may use different undo heuristics. Always verify behavior against the specific TCP stack being used.
Traditional Fast Recovery window reduction, while functional, can produce bursty behavior—the sender may alternate between being blocked and sending bursts of data. Proportional Rate Reduction (PRR), defined in RFC 6937, provides a smoother alternative.
The Problem with Classic Fast Recovery:
In classic Reno-style Fast Recovery:
This burstiness can trigger additional losses and destabilize the network.
PRR's Solution:
PRR aims to smoothly reduce the sending rate from its pre-loss value to the target (ssthresh) over the course of recovery:
For each ACK during recovery:
sndcnt = CEIL(prr_delivered × ssthresh / RecoverFS) - prr_out
Where:
123456789101112131415161718192021222324252627282930
// Proportional Rate Reduction (RFC 6937) ON entering Fast Recovery: RecoverFS = FlightSize // Remember initial flight size prr_delivered = 0 // Bytes delivered during recovery prr_out = 0 // Bytes sent during recovery ssthresh = max(FlightSize/2, 2*MSS) ON each ACK during Fast Recovery: // Count newly delivered data DeliveredData = acked_bytes + SACKed_bytes prr_delivered = prr_delivered + DeliveredData // Calculate how many bytes we should have sent by now // (proportional to progress through recovery) sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out // Limit: at most MSS per ACK (pacing) sndcnt = min(sndcnt, MSS) // Safety: ensure we don't exceed ssthresh + MSS sndcnt = min(sndcnt, max(0, ssthresh - pipe + MSS)) IF sndcnt > 0: send(sndcnt bytes) prr_out = prr_out + sndcnt ON recovery complete (new ACK covers recover point): cwnd = ssthresh Exit Fast RecoveryPRR Advantages:
Smooth Rate Reduction: Instead of blocking then bursting, PRR smoothly decreases the sending rate over the recovery period.
Better Pacing: Each ACK permits at most one MSS of new data, naturally pacing the sender.
Quicker Recovery Start: PRR can begin sending new data earlier than classic FR in many cases.
Reduced Burstiness: The smooth reduction is less likely to trigger additional router buffer overflows.
PRR Variants:
PRR-SSRB (Slow Start Reduction Bound): Used when FlightSize significantly exceeds ssthresh. Limits reduction to slow-start-like behavior.
PRR-CRB (Congestion Response Bound): Conservative mode that ensures cwnd never exceeds what's needed.
PRR is the default recovery algorithm in Linux TCP since kernel version 3.2. It has proven more stable and performant than classic Fast Recovery in both experimental and production environments.
Window reduction during Fast Recovery is a sophisticated choreography of adjustments that maintains network stability while preserving throughput. Let's consolidate the key concepts:
What's Next:
With window reduction mechanics thoroughly understood, we'll examine the performance benefits of Fast Recovery. The next page quantifies the throughput improvements compared to slow start fallback.
You now understand the precise mechanics of window reduction during Fast Recovery—how ssthresh is calculated, how cwnd is inflated and deflated, and how modern alternatives like PRR improve on classic behavior. This detailed knowledge is essential for TCP implementation, debugging, and performance optimization.