Loading content...
In October 1986, the Internet experienced its first congestion collapse. Networks that should have been operating at megabit speeds slowed to a crawl—throughput dropped by factors of 100 or more. The cause? TCP's inability to adapt to network conditions. Among the critical weaknesses exposed was the primitive timeout calculation that caused massive spurious retransmissions, further amplifying congestion.
Van Jacobson, a researcher at Lawrence Berkeley Laboratory, responded with a series of algorithmic improvements that literally saved the early Internet. Chief among these was his variance-based timeout algorithm, published in his seminal 1988 paper "Congestion Avoidance and Control." This algorithm fundamentally changed how TCP calculates retransmission timeouts—and remains the foundation of timeout calculation in modern TCP implementations over 35 years later.
By the end of this page, you will understand the historical context that motivated Jacobson's algorithm, the key insight of tracking RTT variance, the mathematical formulation using Exponentially Weighted Moving Averages (EWMA), the clever use of mean deviation instead of standard deviation, and exactly how to implement this algorithm step by step.
The original TCP specification (RFC 793) used a simple approach to timeout calculation:
RTO = β × SRTT
Where SRTT was a simple moving average of RTT samples and β was typically 2. This approach had three critical problems:
As we explored in Page 1, networks with the same average RTT can have vastly different RTT variance. A network with mean RTT of 100ms but variance causing occasional 300ms samples needs a much higher timeout than a stable network with the same mean. The simple β multiplier couldn't distinguish these cases.
When network conditions changed, the SRTT adapted slowly because it was a heavily smoothed average (typically α = 0.875, meaning new samples only contributed 12.5% weight). If RTT suddenly doubled due to congestion, it took many samples for SRTT to catch up.
Using β = 2 worked reasonably for many networks, but failed spectacularly for high-variance networks. Yet there was no principled way to determine what β should be—it was a magic number chosen by intuition.
| Time | Event | Impact |
|---|---|---|
| T | Network load increases, RTT grows | Actual RTT exceeds SRTT-based RTO |
| T+1 | Timeouts trigger spurious retransmissions | Duplicate packets injected |
| T+2 | Retransmissions add to congestion | RTT increases further |
| T+3 | More spurious timeouts occur | More retransmissions, more congestion |
| T+n | Positive feedback loop spirals | Congestion collapse: throughput → 0 |
The 1986 congestion collapse demonstrated that this positive feedback loop wasn't theoretical—it was a practical disaster waiting to happen whenever networks approached capacity. Jacobson's insight was that the solution wasn't a better fixed multiplier, but a dynamic one based on observed variance.
Van Jacobson introduced multiple algorithms in his 1988 paper: slow start, congestion avoidance, fast retransmit, and the variance-based RTO. All of these together transformed TCP from a protocol that couldn't survive network congestion to one that adapts gracefully. The timeout algorithm was a crucial piece of this puzzle.
Jacobson's fundamental insight was elegant: if we track how much RTT varies, we can set the timeout just high enough to accommodate normal variation while still detecting actual losses promptly.
Consider RTT as a random variable with some distribution. If we assume (approximately) that RTT follows a normal distribution with mean μ and standard deviation σ:
If we set RTO = μ + 4σ, we expect only about 0.01% of legitimate ACKs to arrive after the timeout. This makes spurious retransmissions extremely rare while maintaining responsiveness.
Calculating true standard deviation requires:
σ = √(Σ(xᵢ - μ)² / n)
This involves:
Jacobson needed something that behaves like standard deviation but is much cheaper to compute. His solution: mean deviation.
Mean deviation is simply the average of the absolute differences from the mean: MD = Σ|xᵢ - μ| / n. For a normal distribution, MD ≈ 0.8σ. It provides similar information about spread but requires only subtraction and absolute value—no squares or square roots. This made it practical for the limited processors of the 1980s.
Mean deviation isn't a perfect substitute for standard deviation:
| Property | Standard Deviation | Mean Deviation |
|---|---|---|
| Mathematical properties | Well-defined statistical theory | Less common in statistical analysis |
| Sensitivity to outliers | Higher (squares amplify large deviations) | Lower (linear amplification) |
| Computational cost | High (squares, square root) | Low (absolute value only) |
| Relationship for normal dist. | σ | ≈ 0.8σ |
Jacobson compensated for the relationship by using a multiplier. If we want RTO ≈ μ + 4σ and we have MD ≈ 0.8σ, then:
RTO ≈ μ + 4σ ≈ μ + 4 × (MD/0.8) ≈ μ + 5 × MD
In practice, Jacobson used a multiplier of 4 for the deviation term, providing a robust safety margin.
Jacobson's algorithm tracks two quantities using exponentially weighted moving averages (EWMA):
When a new RTT sample (R) is measured:
Step 1: Calculate the error
Err = R - SRTT
This is the difference between the new sample and our current estimate.
Step 2: Update the smoothed RTT
SRTT = SRTT + (1/8) × Err = SRTT + (Err >> 3)
This adds 1/8 of the error to the current estimate.
Step 3: Update the variance estimate
RTTVAR = RTTVAR + (1/4) × (|Err| - RTTVAR) = RTTVAR + ((|Err| - RTTVAR) >> 2)
This moves the variance estimate toward the absolute error.
Step 4: Calculate RTO
RTO = SRTT + 4 × RTTVAR = SRTT + (RTTVAR << 2)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
def update_rto_jacobson(srtt: float, rttvar: float, sample_rtt: float) -> tuple[float, float, float]: """ Jacobson's algorithm for RTO calculation. Args: srtt: Current smoothed RTT estimate (in milliseconds) rttvar: Current RTT variance estimate (in milliseconds) sample_rtt: New RTT measurement (in milliseconds) Returns: Tuple of (new_srtt, new_rttvar, new_rto) """ # RFC 6298 uses α = 1/8 for SRTT and β = 1/4 for RTTVAR ALPHA = 1/8 # Weight for new sample in SRTT BETA = 1/4 # Weight for new sample in RTTVAR K = 4 # Multiplier for variance term in RTO # Step 1: Calculate the error (difference from estimate) err = sample_rtt - srtt # Step 2: Update smoothed RTT # SRTT = SRTT + α * (R - SRTT) = (1-α) * SRTT + α * R new_srtt = srtt + ALPHA * err # Step 3: Update variance estimate (using absolute error) # RTTVAR = (1-β) * RTTVAR + β * |R - SRTT| abs_err = abs(err) new_rttvar = rttvar + BETA * (abs_err - rttvar) # Alternative formulation: # new_rttvar = (1 - BETA) * rttvar + BETA * abs_err # Step 4: Calculate RTO # RTO = SRTT + K * RTTVAR new_rto = new_srtt + K * new_rttvar return new_srtt, new_rttvar, new_rto def initialize_rto(first_sample_rtt: float) -> tuple[float, float, float]: """ Initialize RTO estimates with first RTT measurement. Per RFC 6298: - SRTT = R (first measurement) - RTTVAR = R/2 - RTO = SRTT + 4*RTTVAR = R + 2R = 3R """ srtt = first_sample_rtt rttvar = first_sample_rtt / 2 rto = srtt + 4 * rttvar # = 3 * first_sample_rtt return srtt, rttvar, rto # Example usage:# Connection starts, first RTT measured as 100mssrtt, rttvar, rto = initialize_rto(100)print(f"Initial: SRTT={srtt}ms, RTTVAR={rttvar}ms, RTO={rto}ms")# Output: Initial: SRTT=100ms, RTTVAR=50ms, RTO=300ms # Second sample comes in at 120ms (higher than expected)srtt, rttvar, rto = update_rto_jacobson(srtt, rttvar, 120)print(f"Sample 2: SRTT={srtt:.1f}ms, RTTVAR={rttvar:.1f}ms, RTO={rto:.1f}ms")# SRTT = 100 + 0.125 * 20 = 102.5ms# RTTVAR = 50 + 0.25 * (20 - 50) = 42.5ms# RTO = 102.5 + 4 * 42.5 = 272.5msThe algorithm shown above follows RFC 6298 'Computing TCP's Retransmission Timer'. Note that RFC 6298 also specifies minimum RTO bounds (typically 1 second) and the clock granularity considerations. Production implementations must also handle these additional requirements.
The specific values in Jacobson's algorithm weren't arbitrary—they were carefully chosen for both mathematical properties and practical implementation.
Mathematical rationale: An EWMA with α = 1/8 gives a "time constant" of approximately 8 samples. After 8 samples, the contribution of the oldest sample has decayed to about 1/e ≈ 37% of its original weight. This provides stability while still adapting to trends.
Practical rationale: 1/8 = 2⁻³. This means the multiplication can be implemented as a right bit-shift by 3 positions:
SRTT + Err/8 = SRTT + (Err >> 3)
In 1988, this was a significant optimization. Bit shifts were single-cycle operations; division was expensive.
Mathematical rationale: The variance estimate uses a larger weight (1/4 vs 1/8) for new samples. This makes it more responsive to changes in network variability. If the network suddenly becomes more unpredictable, RTTVAR adapts faster than SRTT.
Practical rationale: 1/4 = 2⁻². Right bit-shift by 2 positions:
RTTVAR + ((|Err| - RTTVAR)) >> 2)
| Smoothing Factor α | Half-Life (samples) | Effect on Estimate | Bit Shift |
|---|---|---|---|
| 1/2 | ~1 | Very responsive, noisy | 1 |
| 1/4 | ~2.4 | Moderately responsive | 2 |
| 1/8 | ~5.2 | Smooth, slower adaptation | 3 |
| 1/16 | ~10.7 | Very smooth, very slow | 4 |
Mathematical rationale: This multiplier determines how many "units" of variance we add to the mean to get the timeout. With K = 4 and RTTVAR ≈ 0.8σ:
RTO ≈ SRTT + 4 × 0.8σ ≈ SRTT + 3.2σ
For a normal distribution, this means about 99.9% of samples should fall below the RTO. The extra margin accounts for:
Practical rationale: 4 = 2². Left bit-shift by 2 positions:
SRTT + 4 × RTTVAR = SRTT + (RTTVAR << 2)
Putting it all together, the entire algorithm can be implemented using only additions, subtractions, and bit shifts:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
/* * Jacobson's algorithm using fixed-point arithmetic and bit shifts. * All values are scaled by 8 (left-shifted 3 bits) for precision. * This allows using integer arithmetic while maintaining sub-unit precision. */ /* Scaled values (multiplied by 8 for precision) */static int srtt_scaled = 0; /* Smoothed RTT × 8 */static int rttvar_scaled = 0; /* RTT Variance × 8 */static int rto = 0; /* Retransmission timeout (unscaled) */ void tcp_rto_update(int sample_rtt) { int err; int abs_err; if (srtt_scaled == 0) { /* First measurement: initialize */ /* SRTT = R, RTTVAR = R/2 */ srtt_scaled = sample_rtt << 3; /* SRTT × 8 = R × 8 */ rttvar_scaled = sample_rtt << 1; /* RTTVAR × 8 = R/2 × 8 = R × 4 */ } else { /* Subsequent measurements: update using Jacobson's algorithm */ /* err = R - SRTT (but SRTT is scaled, so we need R×8 - SRTT×8) */ err = (sample_rtt << 3) - srtt_scaled; /* SRTT_new = SRTT_old + err/8 * In scaled form: SRTT_s_new = SRTT_s_old + err/8 * Since err is already in "×8" terms: just add err >> 3 */ /* Wait, err = (R×8 - SRTT×8), so err/8 in unscaled = err >> 3 in scaled * But we want SRTT×8 + (R - SRTT_unscaled)/8 × 8 = SRTT×8 + (R - SRTT_unscaled) * err = R×8 - SRTT×8 = 8×(R - SRTT_unscaled) * We want to add (R - SRTT_unscaled)/8 × 8 = R - SRTT_unscaled = err/8 */ srtt_scaled += err >> 3; /* abs_err = |err| */ abs_err = (err < 0) ? -err : err; /* RTTVAR_new = RTTVAR_old + (|err|/8 - RTTVAR_old)/4 * In scaled form: RTTVAR_s_new = RTTVAR_s_old + ((|err| >> 3) - RTTVAR_s_old) >> 2 * abs_err is in ×8 terms, abs_err >> 3 gives unscaled |err| * But RTTVAR_s is also in ×8 terms from init... * * Let's reconsider: rttvar_scaled = RTTVAR × 8 * We want: RTTVAR_new = RTTVAR + (|R-SRTT| - RTTVAR) >> 2 * Scaled: RTTVAR×8_new = RTTVAR×8 + ((|R-SRTT| - RTTVAR) >> 2) × 8 * = RTTVAR×8 + (|R-SRTT|×8 - RTTVAR×8) >> 2 * But err = (R - SRTT)×8, so |err| = |R-SRTT|×8 */ rttvar_scaled += (abs_err - rttvar_scaled) >> 2; } /* RTO = SRTT + 4 × RTTVAR * Unscaled: RTO = SRTT_s/8 + 4 × RTTVAR_s/8 * = (SRTT_s + 4×RTTVAR_s) / 8 * = (SRTT_s + (RTTVAR_s << 2)) >> 3 */ rto = (srtt_scaled + (rttvar_scaled << 2)) >> 3; /* Apply minimum bound (e.g., 1 second = 1000ms in our units) */ if (rto < 1000) { rto = 1000; }}The scaling by 8 (3 bit shifts) provides 3 bits of fractional precision while keeping all arithmetic in integers. This was essential for efficient implementation on 1980s CPUs without floating-point units. Modern implementations often use floating-point for clarity, but the principle remains the same.
To truly understand Jacobson's algorithm, we need to see how it behaves in different scenarios. Let's trace through several examples:
Consider a network with consistent RTT around 100ms:
| Sample # | Sample RTT | SRTT | RTTVAR | RTO | Notes |
|---|---|---|---|---|---|
| 1 | 100ms | 100.0 | 50.0 | 300.0 | Initial: RTO = 3 × RTT |
| 2 | 98ms | 99.75 | 42.5 | 269.75 | Slight decrease |
| 3 | 102ms | 100.03 | 32.37 | 229.5 | RTTVAR decreasing as network stabilizes |
| 4 | 99ms | 99.90 | 24.5 | 197.9 | Converging toward stable RTO |
| 5 | 101ms | 100.04 | 18.6 | 174.4 | RTO continues to tighten |
| 10 | ~100ms | ~100 | ~8 | ~132 | After 10 samples, RTO stabilizes |
| 20 | ~100ms | ~100 | ~4 | ~116 | Very stable: RTO ≈ SRTT + small margin |
Key observation: In a stable network, RTTVAR shrinks over time, and RTO converges toward SRTT plus a small safety margin. This allows tight loss detection.
Now consider what happens when RTT suddenly jumps (e.g., route change or congestion):
| Sample # | Sample RTT | SRTT | RTTVAR | RTO | Notes |
|---|---|---|---|---|---|
| Steady | 100ms | 100.0 | 5.0 | 120.0 | Stabilized at low variance |
| Jump | 200ms | 112.5 | 28.75 | 227.5 | First 200ms sample, big jump in RTTVAR |
| +1 | 200ms | 123.4 | 47.9 | 315.0 | RTTVAR continues increasing |
| +2 | 200ms | 133.0 | 60.7 | 375.8 | RTO exceeds actual RTT |
| +5 | 200ms | 165.3 | 64.5 | 423.3 | SRTT catching up |
| +10 | 200ms | 189.6 | 35.2 | 330.4 | RTTVAR starts decreasing as network stabilizes at new level |
| +20 | 200ms | ~200 | ~10 | ~240 | Converged to new stable state |
Key observation: When RTT jumps, both SRTT and RTTVAR increase. RTTVAR increases faster (larger α), immediately expanding RTO to prevent spurious timeouts during the transition. After the network stabilizes at the new level, RTTVAR shrinks again.
Consider a network with highly variable RTT (e.g., mobile or satellite):
| Sample # | Sample RTT | SRTT | RTTVAR | RTO | Notes |
|---|---|---|---|---|---|
| 1 | 100ms | 100.0 | 50.0 | 300.0 | Initial |
| 2 | 80ms | 97.5 | 42.5 | 267.5 | Below average |
| 3 | 120ms | 100.3 | 37.1 | 248.7 | Above average |
| 4 | 80ms | 97.8 | 33.4 | 231.4 | Below average |
| 5 | 120ms | 100.6 | 30.0 | 220.6 | Above average |
| 10 | ~100 | ~100 | ~20 | ~180 | RTTVAR stays elevated due to variance |
| 20 | ~100 | ~100 | ~15-20 | ~160-180 | Sustained variance keeps RTTVAR from shrinking |
Notice how the algorithm naturally maintains a higher RTTVAR (and thus higher RTO) in the variable network. It doesn't need to be told the network is variable—it observes the variation in samples and adapts accordingly. This is the power of variance tracking.
To appreciate Jacobson's contribution, let's compare the performance of the original simple averaging approach versus the variance-based approach:
Studies following the 1988 publication showed dramatic improvements:
| Metric | Before (Original TCP) | After (Jacobson's TCP) |
|---|---|---|
| Spurious retransmission rate | 30-50% under load | < 1% |
| Network utilization | 10-20% during congestion | 80-95% |
| Congestion collapse occurrences | Frequent | Rare |
| Timeout accuracy | Poor (fixed β) | Excellent (adaptive) |
These weren't incremental improvements—they were order-of-magnitude changes that made TCP viable as an Internet protocol.
It's worth noting that Jacobson's algorithm was developed for networks with relatively modest bandwidth (measured in kilobits or low megabits) and significant RTT variance. Modern high-speed networks sometimes face different challenges, leading to continued research on timeout algorithms. However, the basic variance-tracking insight remains fundamental.
Real-world implementations of Jacobson's algorithm must handle several edge cases and practical considerations:
Before any RTT samples are collected, the algorithm has no data to work with. RFC 6298 specifies:
Before first measurement:
- RTO = 1 second (initial, conservative default)
After first measurement R:
- SRTT = R
- RTTVAR = R/2
- RTO = SRTT + 4 × RTTVAR = 3R
The initial RTTVAR = R/2 is a heuristic: assume variance is proportional to the mean until we have more data. This gives a conservative initial RTO of 3× the first measurement.
RFC 6298 specifies a minimum RTO (typically 1 second) to prevent pathologically small timeouts:
The algorithm's accuracy depends on the granularity of the system clock. RFC 6298 recommends:
RTO ≥ G (clock granularity)
With modern systems providing microsecond or better timing, this is rarely a concern. However, older systems with 500ms granularity required special handling.
While not specified in RFC 6298, implementations typically enforce a maximum RTO (often 60-120 seconds) to ensure connections eventually attempt recovery even in pathological conditions.
Jacobson's algorithm only tells us how to update estimates given an RTT sample. It doesn't tell us which samples to use. That's where Karn's algorithm (covered in the next page) comes in—specifying that retransmitted segments should not generate RTT samples due to the ambiguity problem.
A complete RTO implementation combines Jacobson's algorithm (how to compute RTO from samples) with Karn's algorithm (which samples to use) and exponential backoff (how to adjust RTO after timeout). Each piece is essential; together they form the foundation of reliable TCP retransmission.
Jacobson's algorithm represents one of the most elegant and impactful contributions to Internet protocol design. Let's consolidate the key takeaways:
What's next:
Jacobson's algorithm tells us how to update our RTT estimates given samples, but we've glossed over a critical question: which samples should we use? The next page explores Karn's algorithm, which addresses the retransmission ambiguity problem by specifying that samples from retransmitted segments should not update the estimator.
You now understand Jacobson's variance-based timeout algorithm—the mathematical formulation, the intuition behind the constants, and how it adapts to different network conditions. Next, we'll learn Karn's algorithm for handling the retransmission ambiguity that can corrupt our RTT estimates.