Computer NetworksTCP Advanced Topics

TCP Variants

LevelAdvanced

Duration90 mins

TopicTCP Advanced Topics

1 / 5

TCP Tahoe: The Foundation of Modern Congestion Control

The Congestion Control Revolution

In October 1986, the Internet experienced a series of congestion collapses that brought network throughput down by a factor of 1000—from an expected 32 Kbps to a mere 40 bps on some links. Traffic that should have flowed smoothly became stuck in congested routers, retransmissions piled upon retransmissions, and the nascent Internet ground nearly to a halt.

This crisis catalyzed one of the most important developments in networking history. Van Jacobson, working at the Lawrence Berkeley Laboratory, analyzed the problem and developed a series of algorithms that would fundamentally transform how TCP manages network congestion. The result, published in 1988 and implemented in BSD 4.3 Tahoe, became known as TCP Tahoe—the first TCP variant with sophisticated congestion control.

TCP Tahoe wasn't just a fix for a crisis; it established the philosophical foundation for all subsequent congestion control work. Its core insight—that end hosts should infer network congestion from packet loss and respond by reducing their sending rate—remains the dominant paradigm over three decades later.

What You Will Learn

By the end of this page, you will understand TCP Tahoe's three core mechanisms (Slow Start, Congestion Avoidance, and Fast Retransmit), how they work together to prevent and recover from congestion, why these algorithms were revolutionary, and what limitations ultimately led to the development of more advanced variants like TCP Reno.

Historical Context and Motivation

To truly understand TCP Tahoe, we must first understand the problem it was designed to solve. Before Tahoe, TCP had no congestion control whatsoever. The protocol was designed under the assumption that networks had sufficient capacity and that packet loss was primarily due to transmission errors, not congestion.

The Original TCP Behavior:

In the pre-Tahoe era, TCP senders would transmit data as fast as the receiver's advertised window allowed. If packets were lost, TCP would dutifully retransmit them. But here's the critical flaw: if packets were lost due to router buffer overflow (congestion), sending more packets only made the problem worse.

The Congestion Collapse Scenario

Imagine 100 senders each transmitting at full speed toward a bottleneck router. The router's buffers fill, packets are dropped, senders detect loss and retransmit—but they don't slow down. Now the router is receiving 100 original packets PLUS 100 retransmissions. More drops occur, more retransmissions, and soon the network carries almost nothing but redundant data. Useful throughput approaches zero despite high raw utilization.

Van Jacobson's Key Insights:

Jacobson observed that congestion collapse resulted from a fundamental mismatch between how TCP operated and how networks behaved. He identified three critical insights:

Conservation of Packets: A well-behaved network in equilibrium has a certain number of packets "in flight." New packets should only be injected when old packets leave (are acknowledged). If senders inject packets faster than the network can deliver them, queues build and collapse follows.
Packet Loss as a Signal: In most networks, packet loss indicates router buffer overflow—congestion. Rather than treating loss as random noise, TCP should interpret it as a signal to slow down.
Probing for Capacity: A sender cannot know in advance what rate the network can sustain. The only way to discover this is to probe by gradually increasing the sending rate until loss occurs, then backing off.

Pre-Tahoe vs. Tahoe TCP Behavior
Aspect	Pre-Tahoe TCP	TCP Tahoe
Initial sending rate	Full receiver window immediately	Starts at 1 segment (Slow Start)
Response to loss	Retransmit, but maintain rate	Reset to Slow Start, halve threshold
Rate adjustment	None—receiver window only	Dynamic based on inferred congestion
Queue behavior	Persistent high queue lengths	Self-limiting queue growth
Multi-flow behavior	Collapse under contention	Fair sharing emerges naturally

The Naming Convention

TCP variants are named after places in California. Tahoe refers to Lake Tahoe, corresponding to BSD 4.3 Tahoe where these algorithms were first released. Subsequent variants (Reno, NewReno, Vegas, CUBIC, BBR) maintain this geographic naming tradition, though not all are California-based.

Core Concepts and State Variables

Before diving into Tahoe's algorithms, we must understand the fundamental state variables that govern TCP congestion control. These variables represent the sender's model of the network and determine how much data can be in flight at any moment.

The Congestion Window (cwnd):

The congestion window is the sender's estimate of how many bytes (or segments) the network can handle without becoming congested. Unlike the receiver's advertised window (rwnd), which reflects the receiver's buffer capacity, cwnd reflects the sender's perception of network capacity.

The effective window—the actual amount of data the sender can have outstanding—is:

Effective Window = min(cwnd, rwnd)

This means the sender is always constrained by whichever is smaller: the receiver's ability to accept data or the network's ability to carry it.

tcp_tahoe_state.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/* TCP Tahoe State Variables */
typedef struct tcp_tahoe_state {
    /* Congestion Window: sender's estimate of network capacity */
    uint32_t cwnd;           /* In bytes or segments */
    
    /* Slow Start Threshold: boundary between phases */
    uint32_t ssthresh;       /* Initial: often 65535 bytes */
    
    /* Receiver's advertised window */
    uint32_t rwnd;           /* From incoming ACKs */
    
    /* Maximum Segment Size: negotiated during handshake */
    uint16_t mss;            /* Typically 1460 bytes for Ethernet */
    
    /* Duplicate ACK counter */
    uint8_t  dupacks;        /* Reset on new ACK, increment on dup */
    
} TCPTahoeState;
 
/* Effective window calculation */
uint32_t get_effective_window(TCPTahoeState* state) {
    return MIN(state->cwnd, state->rwnd);
}
 
/* Determine current phase */
typedef enum {
    SLOW_START,
    CONGESTION_AVOIDANCE
} CongestionPhase;
 
CongestionPhase get_phase(TCPTahoeState* state) {
    return (state->cwnd < state->ssthresh) 
           ? SLOW_START 
           : CONGESTION_AVOIDANCE;
}

The Slow Start Threshold (ssthresh):

The slow start threshold marks the boundary between two operational phases: Slow Start and Congestion Avoidance. When cwnd is below ssthresh, the sender is in Slow Start mode (exponential growth). When cwnd reaches or exceeds ssthresh, the sender transitions to Congestion Avoidance (linear growth).

The ssthresh serves as the sender's memory of the last known safe capacity. When congestion is detected (packet loss), ssthresh is set to half the current cwnd, establishing a new safe upper bound.

Duplicate ACK Counter:

TCP uses duplicate ACKs as an early warning system for potential packet loss. When a receiver gets an out-of-order segment, it immediately sends a duplicate ACK for the last in-order segment it received. The sender tracks consecutive duplicate ACKs—three duplicates trigger Fast Retransmit.

State Variable Initialization

•cwnd: Initialized to 1 MSS (or IW = 2-4 MSS in modern implementations per RFC 3390/6928)
•ssthresh: Initialized to a large value (often 65535 bytes or rwnd), representing 'no known congestion limit'
•dupacks: Initialized to 0, reset whenever a new (non-duplicate) ACK arrives
•rwnd: Set from the Window field in incoming ACK segments

Bytes vs. Segments

Implementations may track cwnd in bytes or segments (multiples of MSS). Conceptually, it's easier to think in segments: 'cwnd = 4' means 4 segments can be outstanding. However, byte-based tracking is more precise for variable-size segments. The algorithms work identically either way.

The Slow Start Algorithm

Despite its name, Slow Start is actually an exponentially fast algorithm for ramping up transmission rate. The name derives from its contrast with the pre-Tahoe behavior of immediately transmitting at full receiver window speed. Slow Start is "slow" only in that it doesn't start at maximum rate.

The Core Mechanism:

Slow Start begins with cwnd set to 1 segment (Initial Window, or IW). For each ACK received that acknowledges new data, cwnd is increased by 1 segment. Because each segment generates an ACK (roughly), and each ACK increases cwnd by 1, the window doubles every round-trip time (RTT).

slow_start_algorithm.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/* Slow Start: Exponential cwnd growth */
void slow_start_on_ack(TCPTahoeState* state, uint32_t bytes_acked) {
    /*
     * For each ACK acknowledging new data:
     *   cwnd += MSS (or bytes_acked for byte-counting)
     * 
     * This causes exponential growth:
     *   RTT 0: cwnd = 1,  send 1 segment,  receive 1 ACK  -> cwnd = 2
     *   RTT 1: cwnd = 2,  send 2 segments, receive 2 ACKs -> cwnd = 4
     *   RTT 2: cwnd = 4,  send 4 segments, receive 4 ACKs -> cwnd = 8
     *   RTT 3: cwnd = 8,  send 8 segments, receive 8 ACKs -> cwnd = 16
     *   ...
     */
    
    /* Check we're in Slow Start phase */
    if (state->cwnd < state->ssthresh) {
        /* Increase cwnd by 1 MSS per ACK (segment counting) */
        state->cwnd += state->mss;
        
        /* Alternative: Appropriate Byte Counting (ABC) per RFC 3465 */
        /* state->cwnd += MIN(bytes_acked, state->mss); */
        
        /* Check for phase transition */
        if (state->cwnd >= state->ssthresh) {
            /* Transition to Congestion Avoidance */
            /* cwnd may slightly exceed ssthresh; that's acceptable */
        }
    }
}
 
/* Slow Start progression example */
/*
 * Assume: MSS = 1460 bytes, ssthresh = 64KB (44 segments)
 * 
 * Time    cwnd (segs)    Data in flight    Phase
 * ----    -----------    --------------    -----
 * 0 RTT       1              1 seg          SS
 * 1 RTT       2              2 segs         SS
 * 2 RTT       4              4 segs         SS
 * 3 RTT       8              8 segs         SS
 * 4 RTT      16             16 segs         SS
 * 5 RTT      32             32 segs         SS
 * 6 RTT      64             44 segs    -> CA (hit ssthresh)
 */

Mathematical Analysis:

Let's analyze Slow Start's growth rate precisely. If we start with cwnd = 1 segment:

After RTT 0: cwnd = 1 (initial)
After RTT 1: cwnd = 1 + 1 = 2
After RTT 2: cwnd = 2 + 2 = 4
After RTT n: cwnd = 2^n segments

To reach a window of W segments, we need approximately log₂(W) round-trip times. For a high-bandwidth network where the optimal window might be 1000 segments, we can reach that size in just 10 RTTs—a few hundred milliseconds on typical Internet paths.

Why Not Start Faster?

A natural question is: why not start with a larger initial window? The answer involves the unknown network state. When a connection begins, the sender has no information about the path's capacity. Starting too aggressively could immediately cause congestion on a bottleneck link.

Slow Start Growth at Different Initial Windows
Round-Trip	IW=1	IW=2	IW=4	IW=10
RTT 0	1	2	4	10
RTT 1	2	4	8	20
RTT 2	4	8	16	40
RTT 3	8	16	32	80
RTT 4	16	32	64	160
RTT 5	32	64	128	320
Time to 100 segs	~6.6 RTT	~5.6 RTT	~4.6 RTT	~3.3 RTT

Modern Initial Window

RFC 6928 (2013) increased the recommended initial window to 10 segments (IW10). This reflects modern networks with higher capacity and the desire to reduce page load latency for short flows. However, the exponential Slow Start behavior remains identical—only the starting point changed.

The Congestion Avoidance Algorithm

Once cwnd reaches ssthresh, the sender transitions from Slow Start to Congestion Avoidance. While Slow Start aggressively probes for available capacity, Congestion Avoidance takes a more conservative approach—growing the window linearly rather than exponentially.

The rationale is straightforward: once we've reached the threshold (which approximates the point where congestion previously occurred), we're operating in a region where we're likely near the network's capacity. Aggressive growth would probably trigger congestion; cautious probing is wiser.

The Core Mechanism:

In Congestion Avoidance, cwnd increases by 1 segment for every full window of data acknowledged—not for every ACK. This results in approximately 1 segment increase per RTT, regardless of the current window size.

congestion_avoidance_algorithm.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/* Congestion Avoidance: Linear cwnd growth */
void congestion_avoidance_on_ack(TCPTahoeState* state, uint32_t bytes_acked) {
    /*
     * Increase cwnd by MSS/cwnd for each ACK (fractional increase)
     * This means cwnd increases by ~1 MSS per RTT (per full window ACKed)
     * 
     * For segment-counting:
     *   cwnd += MSS * MSS / cwnd    (per ACK)
     * 
     * Alternative approximation:
     *   cwnd += 1                   (per RTT, i.e., per cwnd ACKs)
     */
    
    /* Check we're in Congestion Avoidance phase */
    if (state->cwnd >= state->ssthresh) {
        /* Precise formula: fractional increase per ACK */
        /* This accumulates to 1 MSS increase after cwnd/MSS ACKs */
        uint32_t increment = (state->mss * state->mss) / state->cwnd;
        
        /* Ensure at least some increment (avoid stagnation) */
        if (increment == 0) {
            increment = 1;
        }
        
        state->cwnd += increment;
    }
}
 
/* Congestion Avoidance progression example */
/*
 * Assume: MSS = 1 segment (for simplicity)
 * Starting cwnd = 32 segments (just entered CA from SS)
 * 
 * RTT     cwnd start    ACKs received    cwnd end
 * ---     ----------    -------------    --------
 *  0          32             32             33
 *  1          33             33             34
 *  2          34             34             35
 *  3          35             35             36
 *  4          36             36             37
 *  ...
 *  
 * Linear growth: ~1 segment per RTT regardless of current cwnd
 * Compare to Slow Start: cwnd would double each RTT
 */

AIMD: Additive Increase, Multiplicative Decrease

Congestion Avoidance embodies the AIMD principle—a cornerstone of TCP congestion control:

Additive Increase (AI): When no congestion is detected, increase cwnd by a small, constant amount per RTT (typically 1 segment). This gentle probing gradually discovers additional capacity.
Multiplicative Decrease (MD): When congestion is detected (packet loss), reduce cwnd by a multiplicative factor (typically halving). This dramatic reduction quickly alleviates congestion.

The asymmetry is intentional. Additive increase means we approach the congestion point slowly, minimizing overshoot. Multiplicative decrease ensures we retreat rapidly when we've exceeded capacity, preventing prolonged congestion.

Why AIMD Achieves Fairness

•Convergence to Fairness: Multiple flows sharing a bottleneck will converge to equal shares over time. Flows that are 'ahead' lose more absolutely when they halve; flows that are 'behind' gain relatively more with additive increase.
•Stability: The multiplicative decrease prevents oscillations from growing unboundedly. Each congestion event reduces the amplitude of rate fluctuations.
•Efficiency: The additive increase probes for unused capacity. If the network has headroom, flows will gradually claim it.
•Responsiveness: The multiplicative decrease reacts quickly to congestion signals, preventing prolonged queue buildup or packet loss.

The Sawtooth Pattern

Under steady-state conditions, TCP's cwnd traces a characteristic 'sawtooth' pattern: slow linear increase during Congestion Avoidance, followed by a sharp decrease upon detecting loss, then linear increase again. This pattern represents TCP constantly probing for the network's capacity limit.

Fast Retransmit: Early Loss Detection

Prior to Tahoe, TCP relied entirely on retransmission timeouts (RTO) to detect packet loss. When a segment was lost, the sender would wait for the RTO timer to expire before retransmitting. The problem: RTO is typically set conservatively (often several RTTs) to avoid spurious retransmissions. Waiting for timeout introduces substantial delay.

Fast Retransmit provides a mechanism for detecting loss before the timer expires, using duplicate ACKs as an early warning.

How Duplicate ACKs Arise:

When a TCP receiver gets a segment with a higher-than-expected sequence number (indicating gaps in the stream), it cannot send a normal ACK for that data—it would violate TCP's cumulative ACK semantics. Instead, it immediately sends a duplicate ACK for the last contiguous byte it received.

fast_retransmit.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/* Fast Retransmit Algorithm */
void on_ack_received(TCPTahoeState* state, uint32_t ack_num, bool is_dup) {
    if (!is_dup) {
        /* New ACK: reset duplicate counter */
        state->dupacks = 0;
        
        /* Normal ACK processing: update cwnd based on phase */
        if (state->cwnd < state->ssthresh) {
            slow_start_on_ack(state);
        } else {
            congestion_avoidance_on_ack(state);
        }
    } else {
        /* Duplicate ACK received */
        state->dupacks++;
        
        if (state->dupacks == 3) {
            /*
             * FAST RETRANSMIT TRIGGER
             * 
             * Interpretation: 3 dupacks strongly suggest the segment
             * following the ACKed sequence number was lost.
             * 
             * Why 3? Robust against reordering. Network paths can
             * deliver segments out-of-order. 1-2 dupacks might just
             * be reordering. 3 consecutive dupacks almost certainly
             * indicate loss.
             */
            
            /* 1. Set ssthresh to half of current flight size */
            uint32_t flight_size = get_bytes_in_flight(state);
            state->ssthresh = MAX(flight_size / 2, 2 * state->mss);
            
            /* 2. Retransmit the presumed lost segment immediately */
            retransmit_segment(state, ack_num);
            
            /* 3. TCP Tahoe: Reset cwnd to 1 MSS (back to Slow Start) */
            state->cwnd = state->mss;
            
            /* 4. Reset dup ACK counter */
            state->dupacks = 0;
        }
    }
}
 
/*
 * Fast Retransmit Example:
 * 
 * Sender transmits: [1] [2] [3] [4] [5] [6] [7] ...
 * 
 * Segment [3] is lost in transit.
 * 
 * Receiver gets [1]: sends ACK 2 (expecting seq 2)
 * Receiver gets [2]: sends ACK 3 (expecting seq 3)
 * Receiver gets [4]: sends ACK 3 (DUPACK #1 - gap detected, still expecting 3)
 * Receiver gets [5]: sends ACK 3 (DUPACK #2 - gap persists)
 * Receiver gets [6]: sends ACK 3 (DUPACK #3 - FAST RETRANSMIT TRIGGERED)
 * 
 * Sender immediately retransmits [3], doesn't wait for RTO.
 */

Why Three Duplicate ACKs?

The choice of three duplicate ACKs (four identical ACKs total) balances two concerns:

Robustness to Reordering: Packets can arrive out of order, especially on multi-path networks. A single out-of-order packet generates one duplicate ACK. Two out-of-order packets generate two. Three consecutive duplicates are statistically unlikely from reordering alone.
Timely Detection: Waiting for more duplicates would increase detection latency. Three is enough to be confident about loss without excessive delay.

The Critical Difference from Timeout:

Fast Retransmit typically triggers within one RTT after loss (the time for three subsequent packets to generate duplicate ACKs), compared to RTO which might be several RTTs. This difference is enormous for throughput.

Loss Detection: Fast Retransmit vs. Timeout
Characteristic	Fast Retransmit	RTO Timeout
Detection trigger	3 duplicate ACKs	Timer expiration
Typical detection time	~1 RTT after loss	RTO (often 1-3 seconds)
Requires ongoing traffic	Yes (need subsequent ACKs)	No (timer-based)
Handles tail loss	No (no subsequent ACKs)	Yes (timer always works)
Throughput impact	Minimal pause	Substantial gap in transmission

Fast Retransmit Cannot Detect All Losses

If the lost packet is near the end of a burst (tail loss), there may not be enough subsequent packets to generate three duplicate ACKs. In this case, TCP must fall back to the RTO mechanism. Tail loss is particularly problematic for short flows and request-response patterns.

Tahoe's Response to Congestion

When TCP Tahoe detects packet loss (whether via triple duplicate ACK or timeout), it responds with a full reset to Slow Start. This aggressive response is Tahoe's defining characteristic—and also its primary limitation.

The Complete Response Sequence:

tahoe_loss_response.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
/* TCP Tahoe Loss Response (both timeout and fast retransmit) */
void tahoe_on_loss_detected(TCPTahoeState* state, LossType type) {
    /* 
     * TCP Tahoe treats all losses equally:
     * - 3 duplicate ACKs (fast retransmit)
     * - RTO timeout
     * 
     * The response is always the same: go back to Slow Start
     */
    
    /* 1. Calculate new ssthresh: half of current flight size */
    uint32_t flight_size = get_bytes_in_flight(state);
    state->ssthresh = MAX(flight_size / 2, 2 * state->mss);
    
    /* 2. Reset cwnd to initial window (1 MSS in classic Tahoe) */
    state->cwnd = state->mss;  /* Or IW for modern implementations */
    
    /* 3. Retransmit the lost segment */
    if (type == TRIPLE_DUPACK) {
        /* Fast retransmit: segment after last ACKed */
        retransmit_from_last_ack(state);
    } else {  /* TIMEOUT */
        /* Go-back-N: retransmit from last ACKed segment */
        retransmit_from_last_ack(state);
    }
    
    /* 4. Begin Slow Start phase with new, reduced ssthresh */
    /* cwnd will grow exponentially until it reaches ssthresh */
    /* Then transition to Congestion Avoidance */
}
 
/*
 * Visualization of Tahoe's loss response:
 * 
 * cwnd
 *  ^
 *  |         /\        /\        /\
 *  |        /  \      /  \      /  \
 *  |       /    \    /    \    /    \
 *  |      /      \  /      \  /      \
 *  |     /        \/        \/        -- ssthresh
 *  |    /                             
 *  |   /  <-- Slow Start              
 *  |  /                               
 *  | /                                
 *  |/____________________________________________> time
 *  ^     ^          ^          ^
 *  |     |          |          |
 *  Start Loss1      Loss2      Loss3
 *        (reset     (reset     (reset
 *         to 1)      to 1)      to 1)
 *
 * Note: After each loss, cwnd drops to 1 and ssthresh halves.
 * This creates deep "valleys" that significantly impact throughput.
 */

Why Reset to One?

Tahoe's aggressive response stems from its conservative philosophy: when in doubt, back off completely. The reasoning is:

A loss indicates the network is congested
We don't know how badly congested
The safest response is to reduce load dramatically
Slow Start will quickly recover if the network has capacity

This approach guarantees safety but sacrifices performance. After every loss event, no matter how minor, TCP Tahoe drops its sending rate to nearly zero and must rebuild.

The Recovery Timeline:

Consider a connection with cwnd = 32 segments when loss occurs:

Loss detected, ssthresh set to 16, cwnd reset to 1
RTT 1: cwnd = 2 (Slow Start)
RTT 2: cwnd = 4
RTT 3: cwnd = 8
RTT 4: cwnd = 16 (reaches ssthresh, switch to CA)
RTT 5: cwnd = 17
RTT 6: cwnd = 18
...
RTT 20: cwnd ≈ 32 (back to pre-loss level)

It takes approximately 4 RTTs to reach ssthresh and another 16 RTTs to climb back to the original window—20 RTTs total for full recovery.

The Half-Window Problem

Even after the slow start phase, Tahoe can only climb back to half its former window before hitting the new ssthresh. To exceed the pre-loss window, it must enter Congestion Avoidance and creep up one segment per RTT. Recovery from a single loss can take many seconds on high-latency paths.

Tahoe's Loss Response Summary

•ssthresh ← max(FlightSize/2, 2×MSS): Remember the loss point
•cwnd ← 1 MSS: Full reset to initial state
•Retransmit the lost segment immediately (if fast retransmit) or on timer (if timeout)
•Enter Slow Start: Exponential growth until new ssthresh is reached
•Then Congestion Avoidance: Linear growth to probe for additional capacity

Complete Algorithm State Machine

Let's consolidate TCP Tahoe's behavior into a complete state machine. This unified view shows how Slow Start, Congestion Avoidance, and Fast Retransmit interact to form a coherent congestion control strategy.

Converting Mermaid diagram...

tahoe_complete.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/* Complete TCP Tahoe Implementation */
typedef enum {
    SLOW_START,
    CONGESTION_AVOIDANCE
} Phase;
 
typedef struct {
    uint32_t cwnd;          /* Congestion window */
    uint32_t ssthresh;      /* Slow start threshold */
    uint32_t mss;           /* Maximum segment size */
    uint8_t  dupacks;       /* Duplicate ACK counter */
    Phase    phase;         /* Current operational phase */
} TCPTahoe;
 
void initialize(TCPTahoe* t, uint32_t mss) {
    t->mss      = mss;
    t->cwnd     = mss;        /* IW = 1 MSS */
    t->ssthresh = 65535;      /* No initial threshold */
    t->dupacks  = 0;
    t->phase    = SLOW_START;
}
 
void on_new_ack(TCPTahoe* t, uint32_t bytes_acked) {
    t->dupacks = 0;  /* Reset on any new data ACKed */
    
    if (t->cwnd < t->ssthresh) {
        /* SLOW START: exponential growth */
        t->cwnd += t->mss;
        t->phase = SLOW_START;
    } else {
        /* CONGESTION AVOIDANCE: linear growth */
        t->cwnd += (t->mss * t->mss) / t->cwnd;
        t->phase = CONGESTION_AVOIDANCE;
    }
}
 
void on_dup_ack(TCPTahoe* t) {
    t->dupacks++;
    
    if (t->dupacks >= 3) {
        /* FAST RETRANSMIT trigger */
        handle_loss(t);
        do_retransmit();
    }
}
 
void on_timeout(TCPTahoe* t) {
    /* Timeout implies severe congestion or loss */
    handle_loss(t);
    do_retransmit();
}
 
void handle_loss(TCPTahoe* t) {
    /* TCP Tahoe loss response: always go back to Slow Start */
    
    /* 1. Set ssthresh to half of current window */
    t->ssthresh = t->cwnd / 2;
    if (t->ssthresh < 2 * t->mss) {
        t->ssthresh = 2 * t->mss;  /* Minimum threshold */
    }
    
    /* 2. Reset cwnd to 1 MSS */
    t->cwnd = t->mss;
    
    /* 3. Reset to Slow Start phase */
    t->phase = SLOW_START;
    
    /* 4. Clear duplicate ACK counter */
    t->dupacks = 0;
}

Key Observations:

Two-Phase Operation: Tahoe has only two steady-state phases (Slow Start and Congestion Avoidance). Loss triggers a reset, not a new phase.
Unified Loss Response: Whether loss is detected by timeout or triple duplicate ACK, the response is identical: halve ssthresh, reset cwnd to 1, re-enter Slow Start.
Memory in ssthresh: The ssthresh preserves information about the network's capacity. After loss, exponential growth continues only until reaching this remembered limit.
Perpetual Probing: Even in steady state, Congestion Avoidance continuously probes for additional capacity by incrementing cwnd. TCP never "settles"—it always seeks more bandwidth.

Strengths of TCP Tahoe

Despite being superseded by more advanced variants, TCP Tahoe introduced principles that remain foundational. Its strengths justify its historical importance and explain why its core ideas persist in every subsequent TCP variant.

Fundamental Strengths

•Congestion Collapse Prevention: Tahoe's primary goal—preventing congestion collapse—is achieved reliably. By backing off dramatically on loss, Tahoe ensures that congested networks recover rather than spiral into collapse.
•End-to-End Operation: No router modifications required. Tahoe works entirely within end hosts, using only information available to TCP (ACKs, timeouts). This deployability was crucial for Internet-wide adoption.
•Fairness: Multiple Tahoe flows sharing a bottleneck converge to fair bandwidth shares. The AIMD property ensures that no single flow can monopolize capacity indefinitely.
•Stability: Tahoe's conservative response to loss guarantees stability. It may be slow to recover, but it never oscillates wildly or fails to back off when congested.
•Simplicity: The algorithm is straightforward to implement and understand. With three simple rules (Slow Start growth, CA growth, loss response), Tahoe can be coded in dozens of lines.
•Robustness: Tahoe handles a wide variety of network conditions. Whether the path is fast or slow, lossy or clean, Tahoe adapts and finds a sustainable operating point.

The Lasting Legacy

Every TCP variant in use today—Reno, NewReno, CUBIC, BBR—builds upon Tahoe's foundations. Slow Start remains essentially unchanged. Congestion Avoidance's AIMD principle persists. Fast Retransmit is universal. Tahoe's conceptual framework proved so sound that 35+ years later, we still use its vocabulary and core mechanisms.

Limitations and Critique

TCP Tahoe's conservative design, while ensuring safety, introduces significant performance limitations. Understanding these limitations is essential both for appreciating subsequent variants and for recognizing where Tahoe might still be appropriate.

Performance Limitations

•Slow Recovery: Resetting to 1 MSS on every loss causes long recovery times, especially on high-bandwidth, high-latency paths (high BDP networks).
•Throughput Loss: The deep cwnd drops mean significant underutilization during recovery. Networks have capacity that Tahoe isn't using.
•Poor BDP Performance: On paths with large bandwidth-delay products, Tahoe spends most of its time climbing back from losses rather than at full capacity.

Behavioral Issues

•No Distinction of Loss Types: Triple dupacks suggest mild congestion; timeout suggests severe. Tahoe treats both identically, which is overly conservative for mild cases.
•Multiple Loss Penalty: If multiple packets are lost in one window, each triggers the full reset. One congestion event can cause repeated penalties.
•Slow Start Overshoot: After loss, Slow Start may again overshoot the network's capacity before reaching ssthresh, causing another loss.

Quantifying the Performance Gap:

Consider a 100 Mbps link with 100ms RTT (Bandwidth-Delay Product = 1.25 MB, roughly 850 segments). Under Tahoe:

Optimal steady-state: cwnd ≈ 850 segments
After loss: cwnd drops to 1, ssthresh = 425
Time to reach ssthresh: log₂(425) ≈ 9 RTTs = 0.9 seconds
Time to reach 850: 425 additional RTTs ≈ 42 seconds

If losses occur periodically (say, every 1000 segments), Tahoe's effective utilization could be well under 50% of the link's capacity.

The Motivation for Reno

Tahoe's limitations, particularly the full cwnd reset on fast retransmit, directly motivated the development of TCP Reno. Reno's key insight: if we detected loss via triple dupacks, the network is clearly still delivering packets—it's not completely congested. A full reset is unnecessarily conservative.

Summary: TCP Tahoe

TCP Tahoe represents a watershed moment in networking history—the first systematic approach to congestion control that made the Internet viable at scale. Let's consolidate the essential concepts:

Key Takeaways

•Historical Significance: Tahoe (1988) solved the congestion collapse problem that threatened to make the Internet unusable. Its core ideas remain fundamental to networking.
•Slow Start: Begin with cwnd = 1 MSS; double the window every RTT (increase by 1 MSS per ACK). Rapidly probes for available capacity without starting aggressively.
•Congestion Avoidance: Once cwnd ≥ ssthresh, grow linearly (1 MSS per RTT). Carefully probes near the perceived capacity limit.
•Fast Retransmit: Three duplicate ACKs trigger immediate retransmission without waiting for timeout. Dramatically reduces loss recovery time for non-tail losses.
•Loss Response: On any loss (timeout or triple dupack): ssthresh = cwnd/2, cwnd = 1 MSS, return to Slow Start. Conservative but safe.
•AIMD Principle: Additive Increase (Congestion Avoidance phase) + Multiplicative Decrease (loss response) yields fairness, stability, and efficiency.
•Limitations: Full cwnd reset causes slow recovery, poor high-BDP performance, and unnecessary throughput loss on mild congestion.

What's Next:

In the next page, we explore TCP Reno, which built upon Tahoe by introducing a critical enhancement: Fast Recovery. Instead of resetting to Slow Start on triple duplicate ACKs, Reno halves the window and continues in Congestion Avoidance mode—dramatically improving throughput in the face of single-packet losses.

Page Complete

You now have a comprehensive understanding of TCP Tahoe—its historical context, core mechanisms (Slow Start, Congestion Avoidance, Fast Retransmit), algorithmic behavior, strengths, and limitations. This foundation is essential for understanding how subsequent TCP variants evolved to address Tahoe's weaknesses while preserving its core principles.

1 / 5

Loading learning content...

Computer NetworksTCP Advanced Topics

TCP Variants

LevelAdvanced

Duration90 mins

TopicTCP Advanced Topics

1 / 5

TCP Tahoe: The Foundation of Modern Congestion Control

The Congestion Control Revolution

What You Will Learn

Historical Context and Motivation

The Original TCP Behavior:

The Congestion Collapse Scenario

Van Jacobson's Key Insights:

Jacobson observed that congestion collapse resulted from a fundamental mismatch between how TCP operated and how networks behaved. He identified three critical insights:

Conservation of Packets: A well-behaved network in equilibrium has a certain number of packets "in flight." New packets should only be injected when old packets leave (are acknowledged). If senders inject packets faster than the network can deliver them, queues build and collapse follows.
Packet Loss as a Signal: In most networks, packet loss indicates router buffer overflow—congestion. Rather than treating loss as random noise, TCP should interpret it as a signal to slow down.
Probing for Capacity: A sender cannot know in advance what rate the network can sustain. The only way to discover this is to probe by gradually increasing the sending rate until loss occurs, then backing off.

Pre-Tahoe vs. Tahoe TCP Behavior
Aspect	Pre-Tahoe TCP	TCP Tahoe
Initial sending rate	Full receiver window immediately	Starts at 1 segment (Slow Start)
Response to loss	Retransmit, but maintain rate	Reset to Slow Start, halve threshold
Rate adjustment	None—receiver window only	Dynamic based on inferred congestion
Queue behavior	Persistent high queue lengths	Self-limiting queue growth
Multi-flow behavior	Collapse under contention	Fair sharing emerges naturally

The Naming Convention

Core Concepts and State Variables

The Congestion Window (cwnd):

The effective window—the actual amount of data the sender can have outstanding—is:

Effective Window = min(cwnd, rwnd)

This means the sender is always constrained by whichever is smaller: the receiver's ability to accept data or the network's ability to carry it.

tcp_tahoe_state.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/* TCP Tahoe State Variables */
typedef struct tcp_tahoe_state {
    /* Congestion Window: sender's estimate of network capacity */
    uint32_t cwnd;           /* In bytes or segments */
    
    /* Slow Start Threshold: boundary between phases */
    uint32_t ssthresh;       /* Initial: often 65535 bytes */
    
    /* Receiver's advertised window */
    uint32_t rwnd;           /* From incoming ACKs */
    
    /* Maximum Segment Size: negotiated during handshake */
    uint16_t mss;            /* Typically 1460 bytes for Ethernet */
    
    /* Duplicate ACK counter */
    uint8_t  dupacks;        /* Reset on new ACK, increment on dup */
    
} TCPTahoeState;
 
/* Effective window calculation */
uint32_t get_effective_window(TCPTahoeState* state) {
    return MIN(state->cwnd, state->rwnd);
}
 
/* Determine current phase */
typedef enum {
    SLOW_START,
    CONGESTION_AVOIDANCE
} CongestionPhase;
 
CongestionPhase get_phase(TCPTahoeState* state) {
    return (state->cwnd < state->ssthresh) 
           ? SLOW_START 
           : CONGESTION_AVOIDANCE;
}

The Slow Start Threshold (ssthresh):

Duplicate ACK Counter:

State Variable Initialization

•cwnd: Initialized to 1 MSS (or IW = 2-4 MSS in modern implementations per RFC 3390/6928)
•ssthresh: Initialized to a large value (often 65535 bytes or rwnd), representing 'no known congestion limit'
•dupacks: Initialized to 0, reset whenever a new (non-duplicate) ACK arrives
•rwnd: Set from the Window field in incoming ACK segments

Bytes vs. Segments

The Slow Start Algorithm

The Core Mechanism:

slow_start_algorithm.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/* Slow Start: Exponential cwnd growth */
void slow_start_on_ack(TCPTahoeState* state, uint32_t bytes_acked) {
    /*
     * For each ACK acknowledging new data:
     *   cwnd += MSS (or bytes_acked for byte-counting)
     * 
     * This causes exponential growth:
     *   RTT 0: cwnd = 1,  send 1 segment,  receive 1 ACK  -> cwnd = 2
     *   RTT 1: cwnd = 2,  send 2 segments, receive 2 ACKs -> cwnd = 4
     *   RTT 2: cwnd = 4,  send 4 segments, receive 4 ACKs -> cwnd = 8
     *   RTT 3: cwnd = 8,  send 8 segments, receive 8 ACKs -> cwnd = 16
     *   ...
     */
    
    /* Check we're in Slow Start phase */
    if (state->cwnd < state->ssthresh) {
        /* Increase cwnd by 1 MSS per ACK (segment counting) */
        state->cwnd += state->mss;
        
        /* Alternative: Appropriate Byte Counting (ABC) per RFC 3465 */
        /* state->cwnd += MIN(bytes_acked, state->mss); */
        
        /* Check for phase transition */
        if (state->cwnd >= state->ssthresh) {
            /* Transition to Congestion Avoidance */
            /* cwnd may slightly exceed ssthresh; that's acceptable */
        }
    }
}
 
/* Slow Start progression example */
/*
 * Assume: MSS = 1460 bytes, ssthresh = 64KB (44 segments)
 * 
 * Time    cwnd (segs)    Data in flight    Phase
 * ----    -----------    --------------    -----
 * 0 RTT       1              1 seg          SS
 * 1 RTT       2              2 segs         SS
 * 2 RTT       4              4 segs         SS
 * 3 RTT       8              8 segs         SS
 * 4 RTT      16             16 segs         SS
 * 5 RTT      32             32 segs         SS
 * 6 RTT      64             44 segs    -> CA (hit ssthresh)
 */

Mathematical Analysis:

Let's analyze Slow Start's growth rate precisely. If we start with cwnd = 1 segment:

After RTT 0: cwnd = 1 (initial)
After RTT 1: cwnd = 1 + 1 = 2
After RTT 2: cwnd = 2 + 2 = 4
After RTT n: cwnd = 2^n segments

Why Not Start Faster?

Slow Start Growth at Different Initial Windows
Round-Trip	IW=1	IW=2	IW=4	IW=10
RTT 0	1	2	4	10
RTT 1	2	4	8	20
RTT 2	4	8	16	40
RTT 3	8	16	32	80
RTT 4	16	32	64	160
RTT 5	32	64	128	320
Time to 100 segs	~6.6 RTT	~5.6 RTT	~4.6 RTT	~3.3 RTT

Modern Initial Window

The Congestion Avoidance Algorithm

The Core Mechanism:

congestion_avoidance_algorithm.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/* Congestion Avoidance: Linear cwnd growth */
void congestion_avoidance_on_ack(TCPTahoeState* state, uint32_t bytes_acked) {
    /*
     * Increase cwnd by MSS/cwnd for each ACK (fractional increase)
     * This means cwnd increases by ~1 MSS per RTT (per full window ACKed)
     * 
     * For segment-counting:
     *   cwnd += MSS * MSS / cwnd    (per ACK)
     * 
     * Alternative approximation:
     *   cwnd += 1                   (per RTT, i.e., per cwnd ACKs)
     */
    
    /* Check we're in Congestion Avoidance phase */
    if (state->cwnd >= state->ssthresh) {
        /* Precise formula: fractional increase per ACK */
        /* This accumulates to 1 MSS increase after cwnd/MSS ACKs */
        uint32_t increment = (state->mss * state->mss) / state->cwnd;
        
        /* Ensure at least some increment (avoid stagnation) */
        if (increment == 0) {
            increment = 1;
        }
        
        state->cwnd += increment;
    }
}
 
/* Congestion Avoidance progression example */
/*
 * Assume: MSS = 1 segment (for simplicity)
 * Starting cwnd = 32 segments (just entered CA from SS)
 * 
 * RTT     cwnd start    ACKs received    cwnd end
 * ---     ----------    -------------    --------
 *  0          32             32             33
 *  1          33             33             34
 *  2          34             34             35
 *  3          35             35             36
 *  4          36             36             37
 *  ...
 *  
 * Linear growth: ~1 segment per RTT regardless of current cwnd
 * Compare to Slow Start: cwnd would double each RTT
 */

AIMD: Additive Increase, Multiplicative Decrease

Congestion Avoidance embodies the AIMD principle—a cornerstone of TCP congestion control:

Additive Increase (AI): When no congestion is detected, increase cwnd by a small, constant amount per RTT (typically 1 segment). This gentle probing gradually discovers additional capacity.
Multiplicative Decrease (MD): When congestion is detected (packet loss), reduce cwnd by a multiplicative factor (typically halving). This dramatic reduction quickly alleviates congestion.

Why AIMD Achieves Fairness

•Convergence to Fairness: Multiple flows sharing a bottleneck will converge to equal shares over time. Flows that are 'ahead' lose more absolutely when they halve; flows that are 'behind' gain relatively more with additive increase.
•Stability: The multiplicative decrease prevents oscillations from growing unboundedly. Each congestion event reduces the amplitude of rate fluctuations.
•Efficiency: The additive increase probes for unused capacity. If the network has headroom, flows will gradually claim it.
•Responsiveness: The multiplicative decrease reacts quickly to congestion signals, preventing prolonged queue buildup or packet loss.

The Sawtooth Pattern

Fast Retransmit: Early Loss Detection

Fast Retransmit provides a mechanism for detecting loss before the timer expires, using duplicate ACKs as an early warning.

How Duplicate ACKs Arise:

fast_retransmit.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/* Fast Retransmit Algorithm */
void on_ack_received(TCPTahoeState* state, uint32_t ack_num, bool is_dup) {
    if (!is_dup) {
        /* New ACK: reset duplicate counter */
        state->dupacks = 0;
        
        /* Normal ACK processing: update cwnd based on phase */
        if (state->cwnd < state->ssthresh) {
            slow_start_on_ack(state);
        } else {
            congestion_avoidance_on_ack(state);
        }
    } else {
        /* Duplicate ACK received */
        state->dupacks++;
        
        if (state->dupacks == 3) {
            /*
             * FAST RETRANSMIT TRIGGER
             * 
             * Interpretation: 3 dupacks strongly suggest the segment
             * following the ACKed sequence number was lost.
             * 
             * Why 3? Robust against reordering. Network paths can
             * deliver segments out-of-order. 1-2 dupacks might just
             * be reordering. 3 consecutive dupacks almost certainly
             * indicate loss.
             */
            
            /* 1. Set ssthresh to half of current flight size */
            uint32_t flight_size = get_bytes_in_flight(state);
            state->ssthresh = MAX(flight_size / 2, 2 * state->mss);
            
            /* 2. Retransmit the presumed lost segment immediately */
            retransmit_segment(state, ack_num);
            
            /* 3. TCP Tahoe: Reset cwnd to 1 MSS (back to Slow Start) */
            state->cwnd = state->mss;
            
            /* 4. Reset dup ACK counter */
            state->dupacks = 0;
        }
    }
}
 
/*
 * Fast Retransmit Example:
 * 
 * Sender transmits: [1] [2] [3] [4] [5] [6] [7] ...
 * 
 * Segment [3] is lost in transit.
 * 
 * Receiver gets [1]: sends ACK 2 (expecting seq 2)
 * Receiver gets [2]: sends ACK 3 (expecting seq 3)
 * Receiver gets [4]: sends ACK 3 (DUPACK #1 - gap detected, still expecting 3)
 * Receiver gets [5]: sends ACK 3 (DUPACK #2 - gap persists)
 * Receiver gets [6]: sends ACK 3 (DUPACK #3 - FAST RETRANSMIT TRIGGERED)
 * 
 * Sender immediately retransmits [3], doesn't wait for RTO.
 */

Why Three Duplicate ACKs?

The choice of three duplicate ACKs (four identical ACKs total) balances two concerns:

Robustness to Reordering: Packets can arrive out of order, especially on multi-path networks. A single out-of-order packet generates one duplicate ACK. Two out-of-order packets generate two. Three consecutive duplicates are statistically unlikely from reordering alone.
Timely Detection: Waiting for more duplicates would increase detection latency. Three is enough to be confident about loss without excessive delay.

The Critical Difference from Timeout:

Loss Detection: Fast Retransmit vs. Timeout
Characteristic	Fast Retransmit	RTO Timeout
Detection trigger	3 duplicate ACKs	Timer expiration
Typical detection time	~1 RTT after loss	RTO (often 1-3 seconds)
Requires ongoing traffic	Yes (need subsequent ACKs)	No (timer-based)
Handles tail loss	No (no subsequent ACKs)	Yes (timer always works)
Throughput impact	Minimal pause	Substantial gap in transmission

Fast Retransmit Cannot Detect All Losses

Tahoe's Response to Congestion

The Complete Response Sequence:

tahoe_loss_response.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
/* TCP Tahoe Loss Response (both timeout and fast retransmit) */
void tahoe_on_loss_detected(TCPTahoeState* state, LossType type) {
    /* 
     * TCP Tahoe treats all losses equally:
     * - 3 duplicate ACKs (fast retransmit)
     * - RTO timeout
     * 
     * The response is always the same: go back to Slow Start
     */
    
    /* 1. Calculate new ssthresh: half of current flight size */
    uint32_t flight_size = get_bytes_in_flight(state);
    state->ssthresh = MAX(flight_size / 2, 2 * state->mss);
    
    /* 2. Reset cwnd to initial window (1 MSS in classic Tahoe) */
    state->cwnd = state->mss;  /* Or IW for modern implementations */
    
    /* 3. Retransmit the lost segment */
    if (type == TRIPLE_DUPACK) {
        /* Fast retransmit: segment after last ACKed */
        retransmit_from_last_ack(state);
    } else {  /* TIMEOUT */
        /* Go-back-N: retransmit from last ACKed segment */
        retransmit_from_last_ack(state);
    }
    
    /* 4. Begin Slow Start phase with new, reduced ssthresh */
    /* cwnd will grow exponentially until it reaches ssthresh */
    /* Then transition to Congestion Avoidance */
}
 
/*
 * Visualization of Tahoe's loss response:
 * 
 * cwnd
 *  ^
 *  |         /\        /\        /\
 *  |        /  \      /  \      /  \
 *  |       /    \    /    \    /    \
 *  |      /      \  /      \  /      \
 *  |     /        \/        \/        -- ssthresh
 *  |    /                             
 *  |   /  <-- Slow Start              
 *  |  /                               
 *  | /                                
 *  |/____________________________________________> time
 *  ^     ^          ^          ^
 *  |     |          |          |
 *  Start Loss1      Loss2      Loss3
 *        (reset     (reset     (reset
 *         to 1)      to 1)      to 1)
 *
 * Note: After each loss, cwnd drops to 1 and ssthresh halves.
 * This creates deep "valleys" that significantly impact throughput.
 */

Why Reset to One?

Tahoe's aggressive response stems from its conservative philosophy: when in doubt, back off completely. The reasoning is:

A loss indicates the network is congested
We don't know how badly congested
The safest response is to reduce load dramatically
Slow Start will quickly recover if the network has capacity

This approach guarantees safety but sacrifices performance. After every loss event, no matter how minor, TCP Tahoe drops its sending rate to nearly zero and must rebuild.

The Recovery Timeline:

Consider a connection with cwnd = 32 segments when loss occurs:

Loss detected, ssthresh set to 16, cwnd reset to 1
RTT 1: cwnd = 2 (Slow Start)
RTT 2: cwnd = 4
RTT 3: cwnd = 8
RTT 4: cwnd = 16 (reaches ssthresh, switch to CA)
RTT 5: cwnd = 17
RTT 6: cwnd = 18
...
RTT 20: cwnd ≈ 32 (back to pre-loss level)

It takes approximately 4 RTTs to reach ssthresh and another 16 RTTs to climb back to the original window—20 RTTs total for full recovery.

The Half-Window Problem

Tahoe's Loss Response Summary

•ssthresh ← max(FlightSize/2, 2×MSS): Remember the loss point
•cwnd ← 1 MSS: Full reset to initial state
•Retransmit the lost segment immediately (if fast retransmit) or on timer (if timeout)
•Enter Slow Start: Exponential growth until new ssthresh is reached
•Then Congestion Avoidance: Linear growth to probe for additional capacity

Complete Algorithm State Machine

Converting Mermaid diagram...

tahoe_complete.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/* Complete TCP Tahoe Implementation */
typedef enum {
    SLOW_START,
    CONGESTION_AVOIDANCE
} Phase;
 
typedef struct {
    uint32_t cwnd;          /* Congestion window */
    uint32_t ssthresh;      /* Slow start threshold */
    uint32_t mss;           /* Maximum segment size */
    uint8_t  dupacks;       /* Duplicate ACK counter */
    Phase    phase;         /* Current operational phase */
} TCPTahoe;
 
void initialize(TCPTahoe* t, uint32_t mss) {
    t->mss      = mss;
    t->cwnd     = mss;        /* IW = 1 MSS */
    t->ssthresh = 65535;      /* No initial threshold */
    t->dupacks  = 0;
    t->phase    = SLOW_START;
}
 
void on_new_ack(TCPTahoe* t, uint32_t bytes_acked) {
    t->dupacks = 0;  /* Reset on any new data ACKed */
    
    if (t->cwnd < t->ssthresh) {
        /* SLOW START: exponential growth */
        t->cwnd += t->mss;
        t->phase = SLOW_START;
    } else {
        /* CONGESTION AVOIDANCE: linear growth */
        t->cwnd += (t->mss * t->mss) / t->cwnd;
        t->phase = CONGESTION_AVOIDANCE;
    }
}
 
void on_dup_ack(TCPTahoe* t) {
    t->dupacks++;
    
    if (t->dupacks >= 3) {
        /* FAST RETRANSMIT trigger */
        handle_loss(t);
        do_retransmit();
    }
}
 
void on_timeout(TCPTahoe* t) {
    /* Timeout implies severe congestion or loss */
    handle_loss(t);
    do_retransmit();
}
 
void handle_loss(TCPTahoe* t) {
    /* TCP Tahoe loss response: always go back to Slow Start */
    
    /* 1. Set ssthresh to half of current window */
    t->ssthresh = t->cwnd / 2;
    if (t->ssthresh < 2 * t->mss) {
        t->ssthresh = 2 * t->mss;  /* Minimum threshold */
    }
    
    /* 2. Reset cwnd to 1 MSS */
    t->cwnd = t->mss;
    
    /* 3. Reset to Slow Start phase */
    t->phase = SLOW_START;
    
    /* 4. Clear duplicate ACK counter */
    t->dupacks = 0;
}

Key Observations:

Two-Phase Operation: Tahoe has only two steady-state phases (Slow Start and Congestion Avoidance). Loss triggers a reset, not a new phase.
Unified Loss Response: Whether loss is detected by timeout or triple duplicate ACK, the response is identical: halve ssthresh, reset cwnd to 1, re-enter Slow Start.
Memory in ssthresh: The ssthresh preserves information about the network's capacity. After loss, exponential growth continues only until reaching this remembered limit.
Perpetual Probing: Even in steady state, Congestion Avoidance continuously probes for additional capacity by incrementing cwnd. TCP never "settles"—it always seeks more bandwidth.

Strengths of TCP Tahoe

Fundamental Strengths

•Congestion Collapse Prevention: Tahoe's primary goal—preventing congestion collapse—is achieved reliably. By backing off dramatically on loss, Tahoe ensures that congested networks recover rather than spiral into collapse.
•End-to-End Operation: No router modifications required. Tahoe works entirely within end hosts, using only information available to TCP (ACKs, timeouts). This deployability was crucial for Internet-wide adoption.
•Fairness: Multiple Tahoe flows sharing a bottleneck converge to fair bandwidth shares. The AIMD property ensures that no single flow can monopolize capacity indefinitely.
•Stability: Tahoe's conservative response to loss guarantees stability. It may be slow to recover, but it never oscillates wildly or fails to back off when congested.
•Simplicity: The algorithm is straightforward to implement and understand. With three simple rules (Slow Start growth, CA growth, loss response), Tahoe can be coded in dozens of lines.
•Robustness: Tahoe handles a wide variety of network conditions. Whether the path is fast or slow, lossy or clean, Tahoe adapts and finds a sustainable operating point.

The Lasting Legacy

Limitations and Critique

Performance Limitations

•Slow Recovery: Resetting to 1 MSS on every loss causes long recovery times, especially on high-bandwidth, high-latency paths (high BDP networks).
•Throughput Loss: The deep cwnd drops mean significant underutilization during recovery. Networks have capacity that Tahoe isn't using.
•Poor BDP Performance: On paths with large bandwidth-delay products, Tahoe spends most of its time climbing back from losses rather than at full capacity.

Behavioral Issues

•No Distinction of Loss Types: Triple dupacks suggest mild congestion; timeout suggests severe. Tahoe treats both identically, which is overly conservative for mild cases.
•Multiple Loss Penalty: If multiple packets are lost in one window, each triggers the full reset. One congestion event can cause repeated penalties.
•Slow Start Overshoot: After loss, Slow Start may again overshoot the network's capacity before reaching ssthresh, causing another loss.

Quantifying the Performance Gap:

Consider a 100 Mbps link with 100ms RTT (Bandwidth-Delay Product = 1.25 MB, roughly 850 segments). Under Tahoe:

Optimal steady-state: cwnd ≈ 850 segments
After loss: cwnd drops to 1, ssthresh = 425
Time to reach ssthresh: log₂(425) ≈ 9 RTTs = 0.9 seconds
Time to reach 850: 425 additional RTTs ≈ 42 seconds

If losses occur periodically (say, every 1000 segments), Tahoe's effective utilization could be well under 50% of the link's capacity.

The Motivation for Reno

Summary: TCP Tahoe

TCP Tahoe represents a watershed moment in networking history—the first systematic approach to congestion control that made the Internet viable at scale. Let's consolidate the essential concepts:

Key Takeaways

•Historical Significance: Tahoe (1988) solved the congestion collapse problem that threatened to make the Internet unusable. Its core ideas remain fundamental to networking.
•Slow Start: Begin with cwnd = 1 MSS; double the window every RTT (increase by 1 MSS per ACK). Rapidly probes for available capacity without starting aggressively.
•Congestion Avoidance: Once cwnd ≥ ssthresh, grow linearly (1 MSS per RTT). Carefully probes near the perceived capacity limit.
•Fast Retransmit: Three duplicate ACKs trigger immediate retransmission without waiting for timeout. Dramatically reduces loss recovery time for non-tail losses.
•Loss Response: On any loss (timeout or triple dupack): ssthresh = cwnd/2, cwnd = 1 MSS, return to Slow Start. Conservative but safe.
•AIMD Principle: Additive Increase (Congestion Avoidance phase) + Multiplicative Decrease (loss response) yields fairness, stability, and efficiency.
•Limitations: Full cwnd reset causes slow recovery, poor high-BDP performance, and unnecessary throughput loss on mild congestion.

What's Next:

Page Complete

1 / 5