Computer NetworksFast Recovery

TCP Fast Recovery

LevelAdvanced

Duration75 mins

TopicFast Recovery

3 / 5

Window Reduction

The Precision of Window Management

Window reduction during Fast Recovery is not a single, simple operation—it's a carefully choreographed sequence of adjustments that must maintain network stability while preserving throughput. The congestion window (cwnd) and slow start threshold (ssthresh) undergo precise transformations that reflect TCP's understanding of network conditions.

Understanding these mechanics requires appreciating the distinction between the immediate reduction (the initial response to congestion), the inflation period (maintaining flow during recovery), and the deflation (returning to stable state after recovery completes). Each phase has specific rules and serves specific purposes in TCP's overall congestion control strategy.

What You Will Learn

This page provides comprehensive coverage of window reduction mechanics, including: the exact formulas for ssthresh and cwnd adjustment, the window inflation mechanism during Fast Recovery, window deflation upon recovery completion, the relationship between flight size and window calculations, and edge cases in window management.

The Window Variables

Before diving into reduction mechanics, we must precisely understand the window-related variables that TCP maintains and how they interact.

The Key Variables:

cwnd (Congestion Window): The sender's estimate of how many bytes can be outstanding in the network without causing congestion. This is the primary control variable for rate limiting.
ssthresh (Slow Start Threshold): The threshold that determines when TCP transitions from slow start to congestion avoidance. It represents TCP's memory of 'safe' network capacity from previous experience.
rwnd (Receiver Window): The receiver's advertised window indicating available buffer space. This provides flow control independent of congestion control.
FlightSize (Bytes in Flight): The actual bytes currently outstanding in the network, calculated as SND.NXT - SND.UNA.
Effective Window: The actual limit on transmission, calculated as min(cwnd, rwnd) - FlightSize.

Window Variables in Detail
Variable	Unit	Managed By	Update Trigger	Typical Range
cwnd	Bytes	Sender	ACKs, Timeouts, Dup ACKs	MSS to BDP (potentially millions)
ssthresh	Bytes	Sender	Loss detection only	2×MSS to previous cwnd
rwnd	Bytes	Receiver	Buffer availability	0 to receiver buffer size
FlightSize	Bytes	Calculated	Every send/ACK	0 to min(cwnd, rwnd)
recover	Seq Num	Sender	Fast Recovery entry	Highest seq sent at FR entry

The Relationship Between Variables:

The effective sending rate is determined by the interaction of these variables:

EffectiveWindow = min(cwnd, rwnd) - FlightSize
CanSend = EffectiveWindow >= MSS

When EffectiveWindow >= MSS, the sender can transmit a new segment. When EffectiveWindow < MSS, the sender must wait for ACKs (which reduce FlightSize) or window increases.

During Fast Recovery:

The variables interact differently:

cwnd is artificially inflated to permit new data transmission
ssthresh holds the 'target' window for post-recovery operation
FlightSize remains relatively constant (new sends balanced by ACKed data)
recover tracks when recovery is complete

Bytes vs. Segments

While conceptual discussions often describe windows in terms of 'segments,' actual implementations track bytes. This distinction matters because not all segments are MSS-sized (e.g., the last segment of a file transfer). RFC 5681 specifies all calculations in bytes, with MSS used as the increment unit.

The Initial Reduction

When TCP detects congestion (third duplicate ACK received), the initial reduction occurs immediately. This reduction establishes the new operating point for post-recovery operation.

The ssthresh Calculation:

The slow start threshold is set based on the current flight size:

ssthresh = max(FlightSize / 2, 2 * MSS)

Key observations about this formula:

FlightSize, Not cwnd: The calculation uses actual bytes in flight, not the theoretical window limit. This provides a more accurate estimate of the achievable rate.
Halving (Multiplicative Decrease): Division by 2 implements the β=0.5 multiplicative decrease factor.
Minimum of 2×MSS: The floor ensures ssthresh never drops below two segments, preventing excessive reduction in low-rate connections.
Conservative Estimate: Using FlightSize accounts for the fact that cwnd might not have been fully utilized (e.g., application-limited).

initial_reduction_calculation

Pseudocode

// Initial Window Reduction on Fast Recovery Entry
// Per RFC 5681 Section 3.2
 
ON third duplicate ACK received:
    
    // Step 1: Calculate current bytes in flight
    FlightSize = SND.NXT - SND.UNA
    
    // Step 2: Set new slow start threshold (multiplicative decrease)
    ssthresh = max(FlightSize / 2, 2 * MSS)
    
    // Step 3: Perform Fast Retransmit of lost segment
    retransmit(SND.UNA)
    
    // Step 4: Set cwnd for Fast Recovery operation
    // Note: ssthresh reflects new "safe" rate; +3*MSS is inflation
    cwnd = ssthresh + 3 * MSS
    
    // Step 5: Record recovery point
    recover = SND.NXT - 1
    
    // Step 6: Enter Fast Recovery state
    state = FAST_RECOVERY
    
    // Example with numbers:
    // If FlightSize = 100,000 bytes and MSS = 1,460 bytes:
    // ssthresh = max(50,000, 2,920) = 50,000 bytes
    // cwnd = 50,000 + 4,380 = 54,380 bytes

Why FlightSize Instead of cwnd?

Using FlightSize rather than cwnd has important implications:

Scenario 1: Fully Utilized Window

cwnd = 100,000 bytes
FlightSize = 100,000 bytes (window fully used)
Result: ssthresh = 50,000 bytes

Scenario 2: Application-Limited

cwnd = 100,000 bytes
FlightSize = 30,000 bytes (application only sent 30KB)
Result: ssthresh = 15,000 bytes

In Scenario 2, the connection was clearly functioning with only 30KB in flight. Setting ssthresh to 15KB (rather than 50KB) reflects actual, proven network capacity. This conservative approach prevents over-estimation when the application wasn't fully utilizing the network.

The CUBIC Exception

TCP CUBIC calculates ssthresh differently, using both the current cwnd and a 'β' factor of approximately 0.7: ssthresh = cwnd × 0.7. This less aggressive reduction reflects CUBIC's design for high-BDP networks where aggressive halving causes excessive throughput loss.

Window Inflation During Recovery

Once TCP enters Fast Recovery, the congestion window undergoes a unique behavior known as window inflation. This mechanism allows TCP to continue transmitting new data during the recovery process, maintaining network utilization.

The Inflation Concept:

During Fast Recovery, each duplicate ACK causes cwnd to increase by one MSS:

ON duplicate ACK during Fast Recovery:
    cwnd = cwnd + MSS

This might seem counterintuitive—we just detected congestion, so why increase the window? The reasoning is subtle but essential:

Each duplicate ACK indicates that one segment has left the network (arrived at receiver)
This segment's 'slot' in the network is now free
By increasing cwnd, we're not adding segments to the network
We're merely acknowledging that a slot has freed up and can be reused

Converting Mermaid diagram...

Mathematical Justification:

Let's trace the window values through a Fast Recovery scenario:

Initial state before loss:

cwnd = 10 segments (14,600 bytes)
FlightSize = 10 segments
Segments 1-10 transmitted

Loss occurs at segment 5:

Segments 6-10 arrive, generating duplicate ACKs

Third duplicate ACK (for seg 5) received:

ssthresh = max(10/2, 2) = 5 segments (7,300 bytes)
cwnd = 5 + 3 = 8 segments (11,680 bytes)
FastRetransmit(segment 5)

Fourth duplicate ACK received:

cwnd = 8 + 1 = 9 segments
If (cwnd - FlightSize >= MSS): can send new segment

Fifth duplicate ACK received:

cwnd = 9 + 1 = 10 segments
Continue sending new data if allowed

Tracking FlightSize:

The critical insight is that FlightSize doesn't automatically decrease when duplicate ACKs arrive. The receiver has buffered out-of-order segments, but they're not acknowledged until the gap is filled. From the sender's perspective:

Each new segment sent: FlightSize increases by MSS
Each duplicate ACK: FlightSize unchanged (no new data acknowledged)
Each new ACK: FlightSize decreases by acknowledged bytes

The 'Artificial Inflation' Metaphor

Think of cwnd inflation as maintaining a 'loan balance.' Each duplicate ACK represents an IOU: 'I'll acknowledge this segment later when the gap is filled.' TCP increases cwnd to advance these IOUs, enabling continued transmission. When recovery completes, these IOUs are 'repaid' through deflation.

Window Deflation on Recovery Completion

When Fast Recovery completes successfully—upon receiving an ACK that acknowledges data beyond the recovery point—the inflated window must be deflated to its proper post-recovery value. This deflation is essential for correct subsequent operation.

The Deflation Rule (RFC 5681):

On receiving a new ACK that completes recovery:

cwnd = ssthresh
state = CONGESTION_AVOIDANCE

Why Deflation is Necessary:

Remove Artificial Inflation: The window was inflated by 1 MSS for each duplicate ACK during recovery. These inflations don't represent actual network capacity.
Stabilize at Safe Rate: ssthresh was calculated as half the pre-loss rate—the 'safe' operating point. cwnd should now operate near this value.
Prepare for Linear Growth: Congestion Avoidance uses additive increase (1 MSS per RTT). Starting from inflated cwnd would overload the network.

Window Values Through Recovery Lifecycle
Phase	cwnd Value	ssthresh Value	Purpose
Normal Operation	Growing per AIMD	From previous loss	Maximize throughput
3rd Dup ACK	ssthresh + 3×MSS	FlightSize/2	Initial Fast Recovery setup
Each Dup ACK	cwnd + MSS	Unchanged	Enable continued transmission
New ACK (Recovery Done)	ssthresh	Unchanged	Deflate and stabilize
Congestion Avoidance	ssthresh + growth	Unchanged	Additive increase resumes

Deflation Example:

Continuing our earlier scenario:

At end of Fast Recovery:

cwnd was inflated to 15 segments (after 7 duplicate ACKs)
ssthresh = 5 segments (set at FR entry)
Retransmitted segment 5 is received
Receiver sends cumulative ACK for segments 1-10 (or 1-17 if new data was sent)

Upon receiving this new ACK:

cwnd deflates to ssthresh = 5 segments
Transition to Congestion Avoidance
Future ACKs: cwnd += MSS/cwnd (approximately 1 MSS per RTT)

The Calculation Specifics:

RFC 5681 specifies:

cwnd = ssthresh

Some implementations use an alternative that accounts for out-of-order delivery:

cwnd = min(ssthresh, FlightSize + MSS)

This ensures cwnd isn't set larger than what's actually needed based on current outstanding data.

NewReno's Partial Deflation

TCP NewReno handles partial ACKs (ACKs that advance but don't complete recovery) differently. Instead of full deflation, it performs a partial deflation: cwnd = cwnd - acknowledged_bytes + MSS. This maintains recovery mode while accounting for the acknowledged data.

The Flight Size Relationship

The relationship between cwnd, FlightSize, and the ability to send new data is central to TCP's operation. During Fast Recovery, this relationship becomes particularly nuanced.

The Fundamental Constraint:

UsableWindow = cwnd - FlightSize
CanSend = UsableWindow >= MSS

The sender can transmit new data only when the usable window (cwnd minus FlightSize) is at least one MSS. This constraint applies during all phases of operation.

FlightSize Dynamics During Fast Recovery:

On Duplicate ACK:
- FlightSize remains unchanged (no data acknowledged)
- cwnd increases by MSS
- UsableWindow increases by MSS
- May enable new transmission
On New Segment Sent:
- FlightSize increases by segment size
- cwnd unchanged
- UsableWindow decreases
- May block further transmission
On New ACK:
- FlightSize decreases by acknowledged bytes
- cwnd may deflate (if recovery complete)
- UsableWindow likely increases
- Enables new transmission

flight_size_tracking
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
class TCPFlightTracker:
    """Tracks FlightSize and window relationships during Fast Recovery."""
    
    def __init__(self, mss: int = 1460):
        self.mss = mss
        self.snd_una = 0      # Oldest unacknowledged byte
        self.snd_nxt = 0      # Next byte to send
        self.cwnd = 10 * mss  # Example: 10 segments
        self.ssthresh = 65535
        self.recover = 0
        self.state = 'NORMAL'
    
    @property
    def flight_size(self) -> int:
        """Calculate current bytes in flight."""
        return self.snd_nxt - self.snd_una
    
    @property
    def usable_window(self) -> int:
        """Calculate available space for new transmissions."""
        return max(0, self.cwnd - self.flight_size)
    
    def can_send(self) -> bool:
        """Check if at least one MSS can be sent."""
        return self.usable_window >= self.mss
    
    def on_segment_sent(self, size: int):
        """Called when a segment is transmitted."""
        self.snd_nxt += size
        # FlightSize automatically increases
        
    def on_dup_ack_in_fast_recovery(self):
        """Handle duplicate ACK during Fast Recovery."""
        if self.state == 'FAST_RECOVERY':
            # Inflate window - packet left network
            self.cwnd += self.mss
            
            # Check if new data can be sent
            if self.can_send():
                return 'CAN_SEND_NEW_DATA'
            else:
                return 'CANNOT_SEND_YET'
    
    def on_new_ack(self, acked_bytes: int, ack_seq: int):
        """Handle acknowledgment of new data."""
        self.snd_una += acked_bytes
        
        if self.state == 'FAST_RECOVERY':
            if ack_seq > self.recover:
                # Recovery complete - deflate
                self.cwnd = self.ssthresh
                self.state = 'CONGESTION_AVOIDANCE'
                return 'RECOVERY_COMPLETE'
            else:
                # Partial ACK - still recovering
                return 'PARTIAL_ACK'
        
        return 'NORMAL_ACK'
    
    def enter_fast_recovery(self):
        """Enter Fast Recovery state."""
        self.ssthresh = max(self.flight_size // 2, 2 * self.mss)
        self.cwnd = self.ssthresh + 3 * self.mss
        self.recover = self.snd_nxt - 1
        self.state = 'FAST_RECOVERY'
        
        # Return whether we can immediately send more data
        return self.can_send()

Conservation of Packets:

The inflation mechanism embodies the 'packet conservation' principle articulated by Van Jacobson: in steady state, a new packet should only be injected when an old packet has exited. During Fast Recovery:

Each duplicate ACK proves a packet exited the network
Window inflation permits exactly one new packet to enter
Total packets in network remains roughly constant

This conservation prevents Fast Recovery from adding to congestion while still maintaining throughput.

Visualizing the Conservation

Imagine the network as a pipe with fixed capacity N packets. Before loss, N packets fill the pipe. During recovery, for each packet that drains from the output, one new packet enters at the input. The retransmitted packet is 'extra' but replaces the lost one. Net result: pipe stays full at N packets throughout recovery.

Edge Cases in Window Reduction

Real-world TCP implementations must handle numerous edge cases that can complicate window reduction mechanics. Understanding these cases is essential for debugging and implementation.

Edge Case 1: Minimal FlightSize

When FlightSize is very small, the reduction calculation can produce problematic values:

FlightSize = 3,000 bytes (about 2 segments)
ssthresh = max(3,000/2, 2×1,460) = max(1,500, 2,920) = 2,920

The 2×MSS floor ensures ssthresh never drops below 2 segments, preventing degenerate behavior in low-rate connections.

Edge Case 2: Application-Limited Sender

When the application isn't sending data fast enough to fill the window:

cwnd = 100,000 bytes
FlightSize = 5,000 bytes (app only sends small bursts)
3 Dup ACKs arrive
ssthresh = max(5,000/2, 2,920) = 2,920 bytes (not 50,000!)

This case uses FlightSize rather than cwnd specifically to handle application-limited senders correctly.

Edge Cases and Their Handling
Edge Case	Challenge	Solution	RFC Reference
Very small FlightSize	ssthresh could become < 2×MSS	Floor of 2×MSS	RFC 5681 §3.2
Application-limited	cwnd >> FlightSize	Use FlightSize for calculation	RFC 5681 §3.2
Reordering (spurious dup ACKs)	False congestion detection	Undo mechanism, DSACK	RFC 3708
Multiple losses in window	Partial ACKs cause issues	NewReno partial ACK handling	RFC 6582
cwnd < ssthresh after FR	Shouldn't enter slow start	Remain in congestion avoidance	RFC 5681
Near-zero rwnd	Can't send despite cwnd	Flow control takes precedence	RFC 5681

Edge Case 3: Spurious Fast Recovery

Sometimes TCP enters Fast Recovery incorrectly due to packet reordering rather than actual loss. Modern TCP includes undo mechanisms:

Detection: If the retransmitted segment is acknowledged with a DSACK (duplicate SACK) indicating it was unnecessary, the entry was spurious.
Undo Recovery:
- Restore ssthresh to pre-recovery value
- Restore cwnd to pre-recovery value (or higher)
- Exit Fast Recovery immediately
Eifel Algorithm: Uses TCP timestamps to detect spurious retransmissions before the response ACK arrives.

Edge Case 4: Retransmission Timeout During Fast Recovery

If the retransmission timer expires while in Fast Recovery:

// More severe response takes precedence
ssthresh = max(FlightSize/2, 2×MSS)  // May be redundant
cwnd = 1 MSS (or initial window)
state = SLOW_START

The timeout response supersedes Fast Recovery because it indicates a more severe problem—possibly the retransmitted segment was also lost.

Implementation Divergence

Different operating systems implement these edge cases slightly differently. For example, Linux uses the 'PRR' (Proportional Rate Reduction) algorithm which provides more gradual window reduction. Windows may use different undo heuristics. Always verify behavior against the specific TCP stack being used.

Proportional Rate Reduction (PRR)

Traditional Fast Recovery window reduction, while functional, can produce bursty behavior—the sender may alternate between being blocked and sending bursts of data. Proportional Rate Reduction (PRR), defined in RFC 6937, provides a smoother alternative.

The Problem with Classic Fast Recovery:

In classic Reno-style Fast Recovery:

Entering FR: cwnd = ssthresh + 3×MSS
If FlightSize > ssthresh + 3×MSS, sender is blocked
Must wait for multiple dup ACKs to inflate window enough
Then may send a burst of data all at once

This burstiness can trigger additional losses and destabilize the network.

PRR's Solution:

PRR aims to smoothly reduce the sending rate from its pre-loss value to the target (ssthresh) over the course of recovery:

For each ACK during recovery:
    sndcnt = CEIL(prr_delivered × ssthresh / RecoverFS) - prr_out

Where:

prr_delivered = bytes newly acknowledged or marked as lost
RecoverFS = FlightSize at recovery entry
prr_out = bytes sent during recovery

prr_algorithm

Pseudocode

// Proportional Rate Reduction (RFC 6937)
 
ON entering Fast Recovery:
    RecoverFS = FlightSize        // Remember initial flight size
    prr_delivered = 0             // Bytes delivered during recovery
    prr_out = 0                   // Bytes sent during recovery
    ssthresh = max(FlightSize/2, 2*MSS)
 
ON each ACK during Fast Recovery:
    // Count newly delivered data
    DeliveredData = acked_bytes + SACKed_bytes
    prr_delivered = prr_delivered + DeliveredData
    
    // Calculate how many bytes we should have sent by now
    // (proportional to progress through recovery)
    sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out
    
    // Limit: at most MSS per ACK (pacing)
    sndcnt = min(sndcnt, MSS)
    
    // Safety: ensure we don't exceed ssthresh + MSS
    sndcnt = min(sndcnt, max(0, ssthresh - pipe + MSS))
    
    IF sndcnt > 0:
        send(sndcnt bytes)
        prr_out = prr_out + sndcnt
 
ON recovery complete (new ACK covers recover point):
    cwnd = ssthresh
    Exit Fast Recovery

PRR Advantages:

Smooth Rate Reduction: Instead of blocking then bursting, PRR smoothly decreases the sending rate over the recovery period.
Better Pacing: Each ACK permits at most one MSS of new data, naturally pacing the sender.
Quicker Recovery Start: PRR can begin sending new data earlier than classic FR in many cases.
Reduced Burstiness: The smooth reduction is less likely to trigger additional router buffer overflows.

PRR Variants:

PRR-SSRB (Slow Start Reduction Bound): Used when FlightSize significantly exceeds ssthresh. Limits reduction to slow-start-like behavior.
PRR-CRB (Congestion Response Bound): Conservative mode that ensures cwnd never exceeds what's needed.

Linux Default

PRR is the default recovery algorithm in Linux TCP since kernel version 3.2. It has proven more stable and performant than classic Fast Recovery in both experimental and production environments.

Summary: Window Reduction

Window reduction during Fast Recovery is a sophisticated choreography of adjustments that maintains network stability while preserving throughput. Let's consolidate the key concepts:

Key Takeaways

•ssthresh is calculated from FlightSize — Using actual bytes in flight rather than cwnd provides accurate capacity estimation and handles application-limited cases correctly.
•Window inflation maintains transmission — Each duplicate ACK inflates cwnd by MSS, permitting new data transmission and maintaining the ACK clock.
•Deflation restores stable state — Upon recovery completion, cwnd deflates to ssthresh, preparing for congestion avoidance growth.
•FlightSize constrains actual sending — The usable window (cwnd - FlightSize) determines whether new data can be transmitted.
•Edge cases require careful handling — Small FlightSize, application limits, spurious recovery, and multiple losses all need special consideration.
•PRR provides smoother reduction — Proportional Rate Reduction replaces bursty classic behavior with gradual, paced reduction.
•Packet conservation is preserved — The inflation/deflation cycle ensures total packets in network remains roughly constant during recovery.

What's Next:

With window reduction mechanics thoroughly understood, we'll examine the performance benefits of Fast Recovery. The next page quantifies the throughput improvements compared to slow start fallback.

Page Complete

You now understand the precise mechanics of window reduction during Fast Recovery—how ssthresh is calculated, how cwnd is inflated and deflated, and how modern alternatives like PRR improve on classic behavior. This detailed knowledge is essential for TCP implementation, debugging, and performance optimization.

3 / 5

Loading learning content...

Computer NetworksFast Recovery

TCP Fast Recovery

LevelAdvanced

Duration75 mins

TopicFast Recovery

3 / 5

Window Reduction

The Precision of Window Management

What You Will Learn

The Window Variables

Before diving into reduction mechanics, we must precisely understand the window-related variables that TCP maintains and how they interact.

The Key Variables:

cwnd (Congestion Window): The sender's estimate of how many bytes can be outstanding in the network without causing congestion. This is the primary control variable for rate limiting.
ssthresh (Slow Start Threshold): The threshold that determines when TCP transitions from slow start to congestion avoidance. It represents TCP's memory of 'safe' network capacity from previous experience.
rwnd (Receiver Window): The receiver's advertised window indicating available buffer space. This provides flow control independent of congestion control.
FlightSize (Bytes in Flight): The actual bytes currently outstanding in the network, calculated as SND.NXT - SND.UNA.
Effective Window: The actual limit on transmission, calculated as min(cwnd, rwnd) - FlightSize.

Window Variables in Detail
Variable	Unit	Managed By	Update Trigger	Typical Range
cwnd	Bytes	Sender	ACKs, Timeouts, Dup ACKs	MSS to BDP (potentially millions)
ssthresh	Bytes	Sender	Loss detection only	2×MSS to previous cwnd
rwnd	Bytes	Receiver	Buffer availability	0 to receiver buffer size
FlightSize	Bytes	Calculated	Every send/ACK	0 to min(cwnd, rwnd)
recover	Seq Num	Sender	Fast Recovery entry	Highest seq sent at FR entry

The Relationship Between Variables:

The effective sending rate is determined by the interaction of these variables:

EffectiveWindow = min(cwnd, rwnd) - FlightSize
CanSend = EffectiveWindow >= MSS

When EffectiveWindow >= MSS, the sender can transmit a new segment. When EffectiveWindow < MSS, the sender must wait for ACKs (which reduce FlightSize) or window increases.

During Fast Recovery:

The variables interact differently:

cwnd is artificially inflated to permit new data transmission
ssthresh holds the 'target' window for post-recovery operation
FlightSize remains relatively constant (new sends balanced by ACKed data)
recover tracks when recovery is complete

Bytes vs. Segments

The Initial Reduction

When TCP detects congestion (third duplicate ACK received), the initial reduction occurs immediately. This reduction establishes the new operating point for post-recovery operation.

The ssthresh Calculation:

The slow start threshold is set based on the current flight size:

ssthresh = max(FlightSize / 2, 2 * MSS)

Key observations about this formula:

FlightSize, Not cwnd: The calculation uses actual bytes in flight, not the theoretical window limit. This provides a more accurate estimate of the achievable rate.
Halving (Multiplicative Decrease): Division by 2 implements the β=0.5 multiplicative decrease factor.
Minimum of 2×MSS: The floor ensures ssthresh never drops below two segments, preventing excessive reduction in low-rate connections.
Conservative Estimate: Using FlightSize accounts for the fact that cwnd might not have been fully utilized (e.g., application-limited).

initial_reduction_calculation

Pseudocode

// Initial Window Reduction on Fast Recovery Entry
// Per RFC 5681 Section 3.2
 
ON third duplicate ACK received:
    
    // Step 1: Calculate current bytes in flight
    FlightSize = SND.NXT - SND.UNA
    
    // Step 2: Set new slow start threshold (multiplicative decrease)
    ssthresh = max(FlightSize / 2, 2 * MSS)
    
    // Step 3: Perform Fast Retransmit of lost segment
    retransmit(SND.UNA)
    
    // Step 4: Set cwnd for Fast Recovery operation
    // Note: ssthresh reflects new "safe" rate; +3*MSS is inflation
    cwnd = ssthresh + 3 * MSS
    
    // Step 5: Record recovery point
    recover = SND.NXT - 1
    
    // Step 6: Enter Fast Recovery state
    state = FAST_RECOVERY
    
    // Example with numbers:
    // If FlightSize = 100,000 bytes and MSS = 1,460 bytes:
    // ssthresh = max(50,000, 2,920) = 50,000 bytes
    // cwnd = 50,000 + 4,380 = 54,380 bytes

Why FlightSize Instead of cwnd?

Using FlightSize rather than cwnd has important implications:

Scenario 1: Fully Utilized Window

cwnd = 100,000 bytes
FlightSize = 100,000 bytes (window fully used)
Result: ssthresh = 50,000 bytes

Scenario 2: Application-Limited

cwnd = 100,000 bytes
FlightSize = 30,000 bytes (application only sent 30KB)
Result: ssthresh = 15,000 bytes

The CUBIC Exception

Window Inflation During Recovery

The Inflation Concept:

During Fast Recovery, each duplicate ACK causes cwnd to increase by one MSS:

ON duplicate ACK during Fast Recovery:
    cwnd = cwnd + MSS

This might seem counterintuitive—we just detected congestion, so why increase the window? The reasoning is subtle but essential:

Each duplicate ACK indicates that one segment has left the network (arrived at receiver)
This segment's 'slot' in the network is now free
By increasing cwnd, we're not adding segments to the network
We're merely acknowledging that a slot has freed up and can be reused

Converting Mermaid diagram...

Mathematical Justification:

Let's trace the window values through a Fast Recovery scenario:

Initial state before loss:

cwnd = 10 segments (14,600 bytes)
FlightSize = 10 segments
Segments 1-10 transmitted

Loss occurs at segment 5:

Segments 6-10 arrive, generating duplicate ACKs

Third duplicate ACK (for seg 5) received:

ssthresh = max(10/2, 2) = 5 segments (7,300 bytes)
cwnd = 5 + 3 = 8 segments (11,680 bytes)
FastRetransmit(segment 5)

Fourth duplicate ACK received:

cwnd = 8 + 1 = 9 segments
If (cwnd - FlightSize >= MSS): can send new segment

Fifth duplicate ACK received:

cwnd = 9 + 1 = 10 segments
Continue sending new data if allowed

Tracking FlightSize:

Each new segment sent: FlightSize increases by MSS
Each duplicate ACK: FlightSize unchanged (no new data acknowledged)
Each new ACK: FlightSize decreases by acknowledged bytes

The 'Artificial Inflation' Metaphor

Window Deflation on Recovery Completion

The Deflation Rule (RFC 5681):

On receiving a new ACK that completes recovery:

cwnd = ssthresh
state = CONGESTION_AVOIDANCE

Why Deflation is Necessary:

Remove Artificial Inflation: The window was inflated by 1 MSS for each duplicate ACK during recovery. These inflations don't represent actual network capacity.
Stabilize at Safe Rate: ssthresh was calculated as half the pre-loss rate—the 'safe' operating point. cwnd should now operate near this value.
Prepare for Linear Growth: Congestion Avoidance uses additive increase (1 MSS per RTT). Starting from inflated cwnd would overload the network.

Window Values Through Recovery Lifecycle
Phase	cwnd Value	ssthresh Value	Purpose
Normal Operation	Growing per AIMD	From previous loss	Maximize throughput
3rd Dup ACK	ssthresh + 3×MSS	FlightSize/2	Initial Fast Recovery setup
Each Dup ACK	cwnd + MSS	Unchanged	Enable continued transmission
New ACK (Recovery Done)	ssthresh	Unchanged	Deflate and stabilize
Congestion Avoidance	ssthresh + growth	Unchanged	Additive increase resumes

Deflation Example:

Continuing our earlier scenario:

At end of Fast Recovery:

cwnd was inflated to 15 segments (after 7 duplicate ACKs)
ssthresh = 5 segments (set at FR entry)
Retransmitted segment 5 is received
Receiver sends cumulative ACK for segments 1-10 (or 1-17 if new data was sent)

Upon receiving this new ACK:

cwnd deflates to ssthresh = 5 segments
Transition to Congestion Avoidance
Future ACKs: cwnd += MSS/cwnd (approximately 1 MSS per RTT)

The Calculation Specifics:

RFC 5681 specifies:

cwnd = ssthresh

Some implementations use an alternative that accounts for out-of-order delivery:

cwnd = min(ssthresh, FlightSize + MSS)

This ensures cwnd isn't set larger than what's actually needed based on current outstanding data.

NewReno's Partial Deflation

The Flight Size Relationship

The relationship between cwnd, FlightSize, and the ability to send new data is central to TCP's operation. During Fast Recovery, this relationship becomes particularly nuanced.

The Fundamental Constraint:

UsableWindow = cwnd - FlightSize
CanSend = UsableWindow >= MSS

The sender can transmit new data only when the usable window (cwnd minus FlightSize) is at least one MSS. This constraint applies during all phases of operation.

FlightSize Dynamics During Fast Recovery:

On Duplicate ACK:
- FlightSize remains unchanged (no data acknowledged)
- cwnd increases by MSS
- UsableWindow increases by MSS
- May enable new transmission
On New Segment Sent:
- FlightSize increases by segment size
- cwnd unchanged
- UsableWindow decreases
- May block further transmission
On New ACK:
- FlightSize decreases by acknowledged bytes
- cwnd may deflate (if recovery complete)
- UsableWindow likely increases
- Enables new transmission

flight_size_tracking
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
class TCPFlightTracker:
    """Tracks FlightSize and window relationships during Fast Recovery."""
    
    def __init__(self, mss: int = 1460):
        self.mss = mss
        self.snd_una = 0      # Oldest unacknowledged byte
        self.snd_nxt = 0      # Next byte to send
        self.cwnd = 10 * mss  # Example: 10 segments
        self.ssthresh = 65535
        self.recover = 0
        self.state = 'NORMAL'
    
    @property
    def flight_size(self) -> int:
        """Calculate current bytes in flight."""
        return self.snd_nxt - self.snd_una
    
    @property
    def usable_window(self) -> int:
        """Calculate available space for new transmissions."""
        return max(0, self.cwnd - self.flight_size)
    
    def can_send(self) -> bool:
        """Check if at least one MSS can be sent."""
        return self.usable_window >= self.mss
    
    def on_segment_sent(self, size: int):
        """Called when a segment is transmitted."""
        self.snd_nxt += size
        # FlightSize automatically increases
        
    def on_dup_ack_in_fast_recovery(self):
        """Handle duplicate ACK during Fast Recovery."""
        if self.state == 'FAST_RECOVERY':
            # Inflate window - packet left network
            self.cwnd += self.mss
            
            # Check if new data can be sent
            if self.can_send():
                return 'CAN_SEND_NEW_DATA'
            else:
                return 'CANNOT_SEND_YET'
    
    def on_new_ack(self, acked_bytes: int, ack_seq: int):
        """Handle acknowledgment of new data."""
        self.snd_una += acked_bytes
        
        if self.state == 'FAST_RECOVERY':
            if ack_seq > self.recover:
                # Recovery complete - deflate
                self.cwnd = self.ssthresh
                self.state = 'CONGESTION_AVOIDANCE'
                return 'RECOVERY_COMPLETE'
            else:
                # Partial ACK - still recovering
                return 'PARTIAL_ACK'
        
        return 'NORMAL_ACK'
    
    def enter_fast_recovery(self):
        """Enter Fast Recovery state."""
        self.ssthresh = max(self.flight_size // 2, 2 * self.mss)
        self.cwnd = self.ssthresh + 3 * self.mss
        self.recover = self.snd_nxt - 1
        self.state = 'FAST_RECOVERY'
        
        # Return whether we can immediately send more data
        return self.can_send()

Conservation of Packets:

Each duplicate ACK proves a packet exited the network
Window inflation permits exactly one new packet to enter
Total packets in network remains roughly constant

This conservation prevents Fast Recovery from adding to congestion while still maintaining throughput.

Visualizing the Conservation

Edge Cases in Window Reduction

Real-world TCP implementations must handle numerous edge cases that can complicate window reduction mechanics. Understanding these cases is essential for debugging and implementation.

Edge Case 1: Minimal FlightSize

When FlightSize is very small, the reduction calculation can produce problematic values:

FlightSize = 3,000 bytes (about 2 segments)
ssthresh = max(3,000/2, 2×1,460) = max(1,500, 2,920) = 2,920

The 2×MSS floor ensures ssthresh never drops below 2 segments, preventing degenerate behavior in low-rate connections.

Edge Case 2: Application-Limited Sender

When the application isn't sending data fast enough to fill the window:

cwnd = 100,000 bytes
FlightSize = 5,000 bytes (app only sends small bursts)
3 Dup ACKs arrive
ssthresh = max(5,000/2, 2,920) = 2,920 bytes (not 50,000!)

This case uses FlightSize rather than cwnd specifically to handle application-limited senders correctly.

Edge Cases and Their Handling
Edge Case	Challenge	Solution	RFC Reference
Very small FlightSize	ssthresh could become < 2×MSS	Floor of 2×MSS	RFC 5681 §3.2
Application-limited	cwnd >> FlightSize	Use FlightSize for calculation	RFC 5681 §3.2
Reordering (spurious dup ACKs)	False congestion detection	Undo mechanism, DSACK	RFC 3708
Multiple losses in window	Partial ACKs cause issues	NewReno partial ACK handling	RFC 6582
cwnd < ssthresh after FR	Shouldn't enter slow start	Remain in congestion avoidance	RFC 5681
Near-zero rwnd	Can't send despite cwnd	Flow control takes precedence	RFC 5681

Edge Case 3: Spurious Fast Recovery

Sometimes TCP enters Fast Recovery incorrectly due to packet reordering rather than actual loss. Modern TCP includes undo mechanisms:

Detection: If the retransmitted segment is acknowledged with a DSACK (duplicate SACK) indicating it was unnecessary, the entry was spurious.
Undo Recovery:
- Restore ssthresh to pre-recovery value
- Restore cwnd to pre-recovery value (or higher)
- Exit Fast Recovery immediately
Eifel Algorithm: Uses TCP timestamps to detect spurious retransmissions before the response ACK arrives.

Edge Case 4: Retransmission Timeout During Fast Recovery

If the retransmission timer expires while in Fast Recovery:

// More severe response takes precedence
ssthresh = max(FlightSize/2, 2×MSS)  // May be redundant
cwnd = 1 MSS (or initial window)
state = SLOW_START

The timeout response supersedes Fast Recovery because it indicates a more severe problem—possibly the retransmitted segment was also lost.

Implementation Divergence

Proportional Rate Reduction (PRR)

The Problem with Classic Fast Recovery:

In classic Reno-style Fast Recovery:

Entering FR: cwnd = ssthresh + 3×MSS
If FlightSize > ssthresh + 3×MSS, sender is blocked
Must wait for multiple dup ACKs to inflate window enough
Then may send a burst of data all at once

This burstiness can trigger additional losses and destabilize the network.

PRR's Solution:

PRR aims to smoothly reduce the sending rate from its pre-loss value to the target (ssthresh) over the course of recovery:

For each ACK during recovery:
    sndcnt = CEIL(prr_delivered × ssthresh / RecoverFS) - prr_out

Where:

prr_delivered = bytes newly acknowledged or marked as lost
RecoverFS = FlightSize at recovery entry
prr_out = bytes sent during recovery

prr_algorithm

Pseudocode

// Proportional Rate Reduction (RFC 6937)
 
ON entering Fast Recovery:
    RecoverFS = FlightSize        // Remember initial flight size
    prr_delivered = 0             // Bytes delivered during recovery
    prr_out = 0                   // Bytes sent during recovery
    ssthresh = max(FlightSize/2, 2*MSS)
 
ON each ACK during Fast Recovery:
    // Count newly delivered data
    DeliveredData = acked_bytes + SACKed_bytes
    prr_delivered = prr_delivered + DeliveredData
    
    // Calculate how many bytes we should have sent by now
    // (proportional to progress through recovery)
    sndcnt = CEIL(prr_delivered * ssthresh / RecoverFS) - prr_out
    
    // Limit: at most MSS per ACK (pacing)
    sndcnt = min(sndcnt, MSS)
    
    // Safety: ensure we don't exceed ssthresh + MSS
    sndcnt = min(sndcnt, max(0, ssthresh - pipe + MSS))
    
    IF sndcnt > 0:
        send(sndcnt bytes)
        prr_out = prr_out + sndcnt
 
ON recovery complete (new ACK covers recover point):
    cwnd = ssthresh
    Exit Fast Recovery

PRR Advantages:

Smooth Rate Reduction: Instead of blocking then bursting, PRR smoothly decreases the sending rate over the recovery period.
Better Pacing: Each ACK permits at most one MSS of new data, naturally pacing the sender.
Quicker Recovery Start: PRR can begin sending new data earlier than classic FR in many cases.
Reduced Burstiness: The smooth reduction is less likely to trigger additional router buffer overflows.

PRR Variants:

PRR-SSRB (Slow Start Reduction Bound): Used when FlightSize significantly exceeds ssthresh. Limits reduction to slow-start-like behavior.
PRR-CRB (Congestion Response Bound): Conservative mode that ensures cwnd never exceeds what's needed.

Linux Default

PRR is the default recovery algorithm in Linux TCP since kernel version 3.2. It has proven more stable and performant than classic Fast Recovery in both experimental and production environments.

Summary: Window Reduction

Window reduction during Fast Recovery is a sophisticated choreography of adjustments that maintains network stability while preserving throughput. Let's consolidate the key concepts:

Key Takeaways

•ssthresh is calculated from FlightSize — Using actual bytes in flight rather than cwnd provides accurate capacity estimation and handles application-limited cases correctly.
•Window inflation maintains transmission — Each duplicate ACK inflates cwnd by MSS, permitting new data transmission and maintaining the ACK clock.
•Deflation restores stable state — Upon recovery completion, cwnd deflates to ssthresh, preparing for congestion avoidance growth.
•FlightSize constrains actual sending — The usable window (cwnd - FlightSize) determines whether new data can be transmitted.
•Edge cases require careful handling — Small FlightSize, application limits, spurious recovery, and multiple losses all need special consideration.
•PRR provides smoother reduction — Proportional Rate Reduction replaces bursty classic behavior with gradual, paced reduction.
•Packet conservation is preserved — The inflation/deflation cycle ensures total packets in network remains roughly constant during recovery.

What's Next:

With window reduction mechanics thoroughly understood, we'll examine the performance benefits of Fast Recovery. The next page quantifies the throughput improvements compared to slow start fallback.

Page Complete

3 / 5