Dynamic Timeout - Learning Module

Loading content...

0/228

Jacobson's Algorithm: Variance-Based Timeout Calculation

The Algorithm That Saved the Internet

In October 1986, the Internet experienced its first congestion collapse. Networks that should have been operating at megabit speeds slowed to a crawl—throughput dropped by factors of 100 or more. The cause? TCP's inability to adapt to network conditions. Among the critical weaknesses exposed was the primitive timeout calculation that caused massive spurious retransmissions, further amplifying congestion.

Van Jacobson, a researcher at Lawrence Berkeley Laboratory, responded with a series of algorithmic improvements that literally saved the early Internet. Chief among these was his variance-based timeout algorithm, published in his seminal 1988 paper "Congestion Avoidance and Control." This algorithm fundamentally changed how TCP calculates retransmission timeouts—and remains the foundation of timeout calculation in modern TCP implementations over 35 years later.

What You Will Learn

By the end of this page, you will understand the historical context that motivated Jacobson's algorithm, the key insight of tracking RTT variance, the mathematical formulation using Exponentially Weighted Moving Averages (EWMA), the clever use of mean deviation instead of standard deviation, and exactly how to implement this algorithm step by step.

Historical Context: Why the Old Approach Failed

The original TCP specification (RFC 793) used a simple approach to timeout calculation:

RTO = β × SRTT

Where SRTT was a simple moving average of RTT samples and β was typically 2. This approach had three critical problems:

Problem 1: No Variance Tracking

As we explored in Page 1, networks with the same average RTT can have vastly different RTT variance. A network with mean RTT of 100ms but variance causing occasional 300ms samples needs a much higher timeout than a stable network with the same mean. The simple β multiplier couldn't distinguish these cases.

Problem 2: Slow Adaptation

When network conditions changed, the SRTT adapted slowly because it was a heavily smoothed average (typically α = 0.875, meaning new samples only contributed 12.5% weight). If RTT suddenly doubled due to congestion, it took many samples for SRTT to catch up.

Problem 3: No Lower Bound on β

Using β = 2 worked reasonably for many networks, but failed spectacularly for high-variance networks. Yet there was no principled way to determine what β should be—it was a magic number chosen by intuition.

The Congestion Collapse Scenario
Time	Event	Impact
T	Network load increases, RTT grows	Actual RTT exceeds SRTT-based RTO
T+1	Timeouts trigger spurious retransmissions	Duplicate packets injected
T+2	Retransmissions add to congestion	RTT increases further
T+3	More spurious timeouts occur	More retransmissions, more congestion
T+n	Positive feedback loop spirals	Congestion collapse: throughput → 0

The 1986 congestion collapse demonstrated that this positive feedback loop wasn't theoretical—it was a practical disaster waiting to happen whenever networks approached capacity. Jacobson's insight was that the solution wasn't a better fixed multiplier, but a dynamic one based on observed variance.

Jacobson's Broader Contribution

Van Jacobson introduced multiple algorithms in his 1988 paper: slow start, congestion avoidance, fast retransmit, and the variance-based RTO. All of these together transformed TCP from a protocol that couldn't survive network congestion to one that adapts gracefully. The timeout algorithm was a crucial piece of this puzzle.

The Key Insight: Variance Matters

Jacobson's fundamental insight was elegant: if we track how much RTT varies, we can set the timeout just high enough to accommodate normal variation while still detecting actual losses promptly.

The Statistical Foundation

Consider RTT as a random variable with some distribution. If we assume (approximately) that RTT follows a normal distribution with mean μ and standard deviation σ:

68% of samples fall within μ ± 1σ
95% of samples fall within μ ± 2σ
99.7% of samples fall within μ ± 3σ
99.99% of samples fall within μ ± 4σ

If we set RTO = μ + 4σ, we expect only about 0.01% of legitimate ACKs to arrive after the timeout. This makes spurious retransmissions extremely rare while maintaining responsiveness.

Why Not Use Standard Deviation Directly?

Calculating true standard deviation requires:

σ = √(Σ(xᵢ - μ)² / n)

This involves:

Squaring (computationally expensive, especially in 1980s embedded systems)
Accumulating squares (overflow risk with large values)
Square root (very expensive to compute)

Jacobson needed something that behaves like standard deviation but is much cheaper to compute. His solution: mean deviation.

Mean Deviation vs. Standard Deviation

Mean deviation is simply the average of the absolute differences from the mean: MD = Σ|xᵢ - μ| / n. For a normal distribution, MD ≈ 0.8σ. It provides similar information about spread but requires only subtraction and absolute value—no squares or square roots. This made it practical for the limited processors of the 1980s.

The Trade-off

Mean deviation isn't a perfect substitute for standard deviation:

Property	Standard Deviation	Mean Deviation
Mathematical properties	Well-defined statistical theory	Less common in statistical analysis
Sensitivity to outliers	Higher (squares amplify large deviations)	Lower (linear amplification)
Computational cost	High (squares, square root)	Low (absolute value only)
Relationship for normal dist.	σ	≈ 0.8σ

Jacobson compensated for the relationship by using a multiplier. If we want RTO ≈ μ + 4σ and we have MD ≈ 0.8σ, then:

RTO ≈ μ + 4σ ≈ μ + 4 × (MD/0.8) ≈ μ + 5 × MD

In practice, Jacobson used a multiplier of 4 for the deviation term, providing a robust safety margin.

The Algorithm Formulation

Jacobson's algorithm tracks two quantities using exponentially weighted moving averages (EWMA):

SRTT (Smoothed RTT): The estimated mean RTT
RTTVAR (RTT Variance): Actually the smoothed mean deviation, despite the name

The Update Equations

When a new RTT sample (R) is measured:

Step 1: Calculate the error

Err = R - SRTT

This is the difference between the new sample and our current estimate.

Step 2: Update the smoothed RTT

SRTT = SRTT + (1/8) × Err = SRTT + (Err >> 3)

This adds 1/8 of the error to the current estimate.

Step 3: Update the variance estimate

RTTVAR = RTTVAR + (1/4) × (|Err| - RTTVAR) = RTTVAR + ((|Err| - RTTVAR) >> 2)

This moves the variance estimate toward the absolute error.

Step 4: Calculate RTO

RTO = SRTT + 4 × RTTVAR = SRTT + (RTTVAR << 2)

jacobson_algorithm.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def update_rto_jacobson(srtt: float, rttvar: float, sample_rtt: float) -> tuple[float, float, float]:
    """
    Jacobson's algorithm for RTO calculation.
    
    Args:
        srtt: Current smoothed RTT estimate (in milliseconds)
        rttvar: Current RTT variance estimate (in milliseconds)  
        sample_rtt: New RTT measurement (in milliseconds)
    
    Returns:
        Tuple of (new_srtt, new_rttvar, new_rto)
    """
    # RFC 6298 uses α = 1/8 for SRTT and β = 1/4 for RTTVAR
    ALPHA = 1/8  # Weight for new sample in SRTT
    BETA = 1/4   # Weight for new sample in RTTVAR
    K = 4        # Multiplier for variance term in RTO
    
    # Step 1: Calculate the error (difference from estimate)
    err = sample_rtt - srtt
    
    # Step 2: Update smoothed RTT
    # SRTT = SRTT + α * (R - SRTT) = (1-α) * SRTT + α * R
    new_srtt = srtt + ALPHA * err
    
    # Step 3: Update variance estimate (using absolute error)
    # RTTVAR = (1-β) * RTTVAR + β * |R - SRTT|
    abs_err = abs(err)
    new_rttvar = rttvar + BETA * (abs_err - rttvar)
    
    # Alternative formulation:
    # new_rttvar = (1 - BETA) * rttvar + BETA * abs_err
    
    # Step 4: Calculate RTO
    # RTO = SRTT + K * RTTVAR
    new_rto = new_srtt + K * new_rttvar
    
    return new_srtt, new_rttvar, new_rto
 
 
def initialize_rto(first_sample_rtt: float) -> tuple[float, float, float]:
    """
    Initialize RTO estimates with first RTT measurement.
    
    Per RFC 6298:
    - SRTT = R (first measurement)
    - RTTVAR = R/2
    - RTO = SRTT + 4*RTTVAR = R + 2R = 3R
    """
    srtt = first_sample_rtt
    rttvar = first_sample_rtt / 2
    rto = srtt + 4 * rttvar  # = 3 * first_sample_rtt
    
    return srtt, rttvar, rto
 
 
# Example usage:
# Connection starts, first RTT measured as 100ms
srtt, rttvar, rto = initialize_rto(100)
print(f"Initial: SRTT={srtt}ms, RTTVAR={rttvar}ms, RTO={rto}ms")
# Output: Initial: SRTT=100ms, RTTVAR=50ms, RTO=300ms
 
# Second sample comes in at 120ms (higher than expected)
srtt, rttvar, rto = update_rto_jacobson(srtt, rttvar, 120)
print(f"Sample 2: SRTT={srtt:.1f}ms, RTTVAR={rttvar:.1f}ms, RTO={rto:.1f}ms")
# SRTT = 100 + 0.125 * 20 = 102.5ms
# RTTVAR = 50 + 0.25 * (20 - 50) = 42.5ms
# RTO = 102.5 + 4 * 42.5 = 272.5ms

RFC 6298 Compliance

The algorithm shown above follows RFC 6298 'Computing TCP's Retransmission Timer'. Note that RFC 6298 also specifies minimum RTO bounds (typically 1 second) and the clock granularity considerations. Production implementations must also handle these additional requirements.

Understanding the Constants: Why 1/8, 1/4, and 4?

The specific values in Jacobson's algorithm weren't arbitrary—they were carefully chosen for both mathematical properties and practical implementation.

The Smoothing Factor α = 1/8 for SRTT

Mathematical rationale: An EWMA with α = 1/8 gives a "time constant" of approximately 8 samples. After 8 samples, the contribution of the oldest sample has decayed to about 1/e ≈ 37% of its original weight. This provides stability while still adapting to trends.

Practical rationale: 1/8 = 2⁻³. This means the multiplication can be implemented as a right bit-shift by 3 positions:

SRTT + Err/8 = SRTT + (Err >> 3)

In 1988, this was a significant optimization. Bit shifts were single-cycle operations; division was expensive.

The Smoothing Factor β = 1/4 for RTTVAR

Mathematical rationale: The variance estimate uses a larger weight (1/4 vs 1/8) for new samples. This makes it more responsive to changes in network variability. If the network suddenly becomes more unpredictable, RTTVAR adapts faster than SRTT.

Practical rationale: 1/4 = 2⁻². Right bit-shift by 2 positions:

RTTVAR + ((|Err| - RTTVAR)) >> 2)

EWMA Responsiveness Comparison
Smoothing Factor α	Half-Life (samples)	Effect on Estimate	Bit Shift
1/2	~1	Very responsive, noisy	1
1/4	~2.4	Moderately responsive	2
1/8	~5.2	Smooth, slower adaptation	3
1/16	~10.7	Very smooth, very slow	4

The Multiplier K = 4 for RTTVAR

Mathematical rationale: This multiplier determines how many "units" of variance we add to the mean to get the timeout. With K = 4 and RTTVAR ≈ 0.8σ:

RTO ≈ SRTT + 4 × 0.8σ ≈ SRTT + 3.2σ

For a normal distribution, this means about 99.9% of samples should fall below the RTO. The extra margin accounts for:

Non-normality in RTT distributions (typically right-skewed)
Clock granularity effects
Safety margin for loss detection

Practical rationale: 4 = 2². Left bit-shift by 2 positions:

SRTT + 4 × RTTVAR = SRTT + (RTTVAR << 2)

Complete Implementation with Bit Operations

Putting it all together, the entire algorithm can be implemented using only additions, subtractions, and bit shifts:

jacobson_bitshift.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
/*
 * Jacobson's algorithm using fixed-point arithmetic and bit shifts.
 * All values are scaled by 8 (left-shifted 3 bits) for precision.
 * This allows using integer arithmetic while maintaining sub-unit precision.
 */
 
/* Scaled values (multiplied by 8 for precision) */
static int srtt_scaled = 0;    /* Smoothed RTT × 8 */
static int rttvar_scaled = 0;  /* RTT Variance × 8 */
static int rto = 0;            /* Retransmission timeout (unscaled) */
 
void tcp_rto_update(int sample_rtt) {
    int err;
    int abs_err;
    
    if (srtt_scaled == 0) {
        /* First measurement: initialize */
        /* SRTT = R, RTTVAR = R/2 */
        srtt_scaled = sample_rtt << 3;     /* SRTT × 8 = R × 8 */
        rttvar_scaled = sample_rtt << 1;   /* RTTVAR × 8 = R/2 × 8 = R × 4 */
    } else {
        /* Subsequent measurements: update using Jacobson's algorithm */
        
        /* err = R - SRTT (but SRTT is scaled, so we need R×8 - SRTT×8) */
        err = (sample_rtt << 3) - srtt_scaled;
        
        /* SRTT_new = SRTT_old + err/8
         * In scaled form: SRTT_s_new = SRTT_s_old + err/8
         * Since err is already in "×8" terms: just add err >> 3  */
        /* Wait, err = (R×8 - SRTT×8), so err/8 in unscaled = err >> 3 in scaled
         * But we want SRTT×8 + (R - SRTT_unscaled)/8 × 8 = SRTT×8 + (R - SRTT_unscaled)
         * err = R×8 - SRTT×8 = 8×(R - SRTT_unscaled)
         * We want to add (R - SRTT_unscaled)/8 × 8 = R - SRTT_unscaled = err/8 */
        srtt_scaled += err >> 3;
        
        /* abs_err = |err| */
        abs_err = (err < 0) ? -err : err;
        
        /* RTTVAR_new = RTTVAR_old + (|err|/8 - RTTVAR_old)/4
         * In scaled form: RTTVAR_s_new = RTTVAR_s_old + ((|err| >> 3) - RTTVAR_s_old) >> 2
         * abs_err is in ×8 terms, abs_err >> 3 gives unscaled |err|
         * But RTTVAR_s is also in ×8 terms from init...
         * 
         * Let's reconsider: rttvar_scaled = RTTVAR × 8
         * We want: RTTVAR_new = RTTVAR + (|R-SRTT| - RTTVAR) >> 2
         * Scaled: RTTVAR×8_new = RTTVAR×8 + ((|R-SRTT| - RTTVAR) >> 2) × 8
         *       = RTTVAR×8 + (|R-SRTT|×8 - RTTVAR×8) >> 2
         * But err = (R - SRTT)×8, so |err| = |R-SRTT|×8
         */
        rttvar_scaled += (abs_err - rttvar_scaled) >> 2;
    }
    
    /* RTO = SRTT + 4 × RTTVAR 
     * Unscaled: RTO = SRTT_s/8 + 4 × RTTVAR_s/8
     *               = (SRTT_s + 4×RTTVAR_s) / 8
     *               = (SRTT_s + (RTTVAR_s << 2)) >> 3
     */
    rto = (srtt_scaled + (rttvar_scaled << 2)) >> 3;
    
    /* Apply minimum bound (e.g., 1 second = 1000ms in our units) */
    if (rto < 1000) {
        rto = 1000;
    }
}

Fixed-Point Arithmetic

The scaling by 8 (3 bit shifts) provides 3 bits of fractional precision while keeping all arithmetic in integers. This was essential for efficient implementation on 1980s CPUs without floating-point units. Modern implementations often use floating-point for clarity, but the principle remains the same.

Algorithm Behavior and Dynamics

To truly understand Jacobson's algorithm, we need to see how it behaves in different scenarios. Let's trace through several examples:

Scenario 1: Stable Network

Consider a network with consistent RTT around 100ms:

Algorithm Trace: Stable Network (RTT ≈ 100ms)
Sample #	Sample RTT	SRTT	RTTVAR	RTO	Notes
1	100ms	100.0	50.0	300.0	Initial: RTO = 3 × RTT
2	98ms	99.75	42.5	269.75	Slight decrease
3	102ms	100.03	32.37	229.5	RTTVAR decreasing as network stabilizes
4	99ms	99.90	24.5	197.9	Converging toward stable RTO
5	101ms	100.04	18.6	174.4	RTO continues to tighten
10	~100ms	~100	~8	~132	After 10 samples, RTO stabilizes
20	~100ms	~100	~4	~116	Very stable: RTO ≈ SRTT + small margin

Key observation: In a stable network, RTTVAR shrinks over time, and RTO converges toward SRTT plus a small safety margin. This allows tight loss detection.

Scenario 2: Sudden RTT Increase

Now consider what happens when RTT suddenly jumps (e.g., route change or congestion):

Algorithm Trace: RTT Jump from 100ms to 200ms
Sample #	Sample RTT	SRTT	RTTVAR	RTO	Notes
Steady	100ms	100.0	5.0	120.0	Stabilized at low variance
Jump	200ms	112.5	28.75	227.5	First 200ms sample, big jump in RTTVAR
+1	200ms	123.4	47.9	315.0	RTTVAR continues increasing
+2	200ms	133.0	60.7	375.8	RTO exceeds actual RTT
+5	200ms	165.3	64.5	423.3	SRTT catching up
+10	200ms	189.6	35.2	330.4	RTTVAR starts decreasing as network stabilizes at new level
+20	200ms	~200	~10	~240	Converged to new stable state

Key observation: When RTT jumps, both SRTT and RTTVAR increase. RTTVAR increases faster (larger α), immediately expanding RTO to prevent spurious timeouts during the transition. After the network stabilizes at the new level, RTTVAR shrinks again.

Scenario 3: High Variance Network

Consider a network with highly variable RTT (e.g., mobile or satellite):

Algorithm Trace: Variable Network (RTT oscillating 80-120ms)
Sample #	Sample RTT	SRTT	RTTVAR	RTO	Notes
1	100ms	100.0	50.0	300.0	Initial
2	80ms	97.5	42.5	267.5	Below average
3	120ms	100.3	37.1	248.7	Above average
4	80ms	97.8	33.4	231.4	Below average
5	120ms	100.6	30.0	220.6	Above average
10	~100	~100	~20	~180	RTTVAR stays elevated due to variance
20	~100	~100	~15-20	~160-180	Sustained variance keeps RTTVAR from shrinking

Adaptive by Design

Notice how the algorithm naturally maintains a higher RTTVAR (and thus higher RTO) in the variable network. It doesn't need to be told the network is variable—it observes the variation in samples and adapts accordingly. This is the power of variance tracking.

Comparison with Simple Averaging

To appreciate Jacobson's contribution, let's compare the performance of the original simple averaging approach versus the variance-based approach:

Original Approach (RTO = 2 × SRTT)

•Fixed multiplier ignores actual variance
•Same RTO for stable and variable networks
•Often too aggressive for variable networks
•Causes spurious timeouts under load
•Contributes to congestion collapse
•Slow to adapt when RTT changes

Jacobson's Algorithm

•Variance tracking adapts to network behavior
•Higher RTO for variable networks, lower for stable
•Conservative when appropriate, aggressive when safe
•Dramatically reduces spurious timeouts
•Prevents positive feedback congestion loops
•RTTVAR responds quickly to changes

Quantitative Impact

Studies following the 1988 publication showed dramatic improvements:

Metric	Before (Original TCP)	After (Jacobson's TCP)
Spurious retransmission rate	30-50% under load	< 1%
Network utilization	10-20% during congestion	80-95%
Congestion collapse occurrences	Frequent	Rare
Timeout accuracy	Poor (fixed β)	Excellent (adaptive)

These weren't incremental improvements—they were order-of-magnitude changes that made TCP viable as an Internet protocol.

Historical Context

It's worth noting that Jacobson's algorithm was developed for networks with relatively modest bandwidth (measured in kilobits or low megabits) and significant RTT variance. Modern high-speed networks sometimes face different challenges, leading to continued research on timeout algorithms. However, the basic variance-tracking insight remains fundamental.

Edge Cases and Implementation Considerations

Real-world implementations of Jacobson's algorithm must handle several edge cases and practical considerations:

Initial Measurement

Before any RTT samples are collected, the algorithm has no data to work with. RFC 6298 specifies:

Before first measurement:

RTO = 1 second (initial, conservative default)

After first measurement R:

SRTT = R

RTTVAR = R/2

RTO = SRTT + 4 × RTTVAR = 3R

The initial RTTVAR = R/2 is a heuristic: assume variance is proportional to the mean until we have more data. This gives a conservative initial RTO of 3× the first measurement.

Minimum RTO Bound

RFC 6298 specifies a minimum RTO (typically 1 second) to prevent pathologically small timeouts:

Why Enforce Minimum RTO

•Clock granularity — If the system clock updates only every 100ms, an RTO of 50ms can't be measured accurately.
•Delayed ACKs — Receivers may delay ACKs up to 200ms; very small RTOs would always trigger.
•Protection against measurement errors — A few anomalously small RTT samples shouldn't collapse the RTO to zero.
•Congestion safety — Very aggressive retransmission can worsen congestion.

Clock Granularity

The algorithm's accuracy depends on the granularity of the system clock. RFC 6298 recommends:

RTO ≥ G (clock granularity)

With modern systems providing microsecond or better timing, this is rarely a concern. However, older systems with 500ms granularity required special handling.

Maximum RTO Bound

While not specified in RFC 6298, implementations typically enforce a maximum RTO (often 60-120 seconds) to ensure connections eventually attempt recovery even in pathological conditions.

Interaction with Karn's Algorithm

Jacobson's algorithm only tells us how to update estimates given an RTT sample. It doesn't tell us which samples to use. That's where Karn's algorithm (covered in the next page) comes in—specifying that retransmitted segments should not generate RTT samples due to the ambiguity problem.

Complete RTO Computation

A complete RTO implementation combines Jacobson's algorithm (how to compute RTO from samples) with Karn's algorithm (which samples to use) and exponential backoff (how to adjust RTO after timeout). Each piece is essential; together they form the foundation of reliable TCP retransmission.

Summary: The Power of Variance Tracking

Jacobson's algorithm represents one of the most elegant and impactful contributions to Internet protocol design. Let's consolidate the key takeaways:

Key Takeaways

•Variance tracking is essential — The original TCP approach used a fixed multiplier of averaged RTT; this failed for high-variance networks. Jacobson's algorithm explicitly tracks RTT variance.
•Mean deviation approximates standard deviation — Using absolute error instead of squared error provides similar spread information at a fraction of the computational cost.
•EWMA provides smooth adaptation — Exponentially weighted moving averages for both SRTT and RTTVAR allow gradual adaptation to changing conditions.
•RTTVAR is more responsive than SRTT — Using β = 1/4 vs α = 1/8 means variance estimates adapt faster than mean estimates, quickly expanding RTO when variability increases.
•The constants enable bit-shift implementation — 1/8, 1/4, and 4 are all powers of 2, allowing efficient fixed-point arithmetic on limited hardware.
•RTO = SRTT + 4 × RTTVAR — The final formula accounts for both the estimated mean and a safety margin based on observed variance.
•The algorithm saved the Internet — This isn't hyperbole; the 1986 congestion collapse demonstrated that TCP without variance-based timeout was fundamentally broken.

What's next:

Jacobson's algorithm tells us how to update our RTT estimates given samples, but we've glossed over a critical question: which samples should we use? The next page explores Karn's algorithm, which addresses the retransmission ambiguity problem by specifying that samples from retransmitted segments should not update the estimator.

Page Complete

You now understand Jacobson's variance-based timeout algorithm—the mathematical formulation, the intuition behind the constants, and how it adapts to different network conditions. Next, we'll learn Karn's algorithm for handling the retransmission ambiguity that can corrupt our RTT estimates.