Computer NetworksTCP Timers

TCP Timers: The Temporal Guardians of Reliable Communication

LevelIntermediate

Duration90 mins

TopicTCP Timers

1 / 5

Retransmission Timer: The Guardian of Reliable Delivery

The Fundamental Challenge of Reliable Communication

In the realm of computer networking, reliability is not a given—it is an achievement. The Internet Protocol (IP) layer beneath TCP makes no guarantees about packet delivery. Packets can be lost due to router buffer overflow, corrupted by transmission errors, duplicated by network anomalies, or simply vanish into the void of a congested network path. Yet TCP promises reliable, in-order delivery to applications. How does it keep this promise?

The answer lies in one of TCP's most critical mechanisms: the retransmission timer. This timer is the sentinel that watches over every segment TCP sends, ready to trigger retransmission if acknowledgment doesn't arrive in time. Understanding this timer is essential to understanding TCP's reliability guarantee—and to diagnosing and optimizing TCP performance in production systems.

What You Will Learn

By the end of this page, you will understand: why retransmission timers are necessary, how TCP determines the optimal timeout value, the mathematical foundations of RTT estimation and RTO calculation, the mechanisms that handle retransmission, and the real-world implications of timer misconfiguration. You'll gain the expertise to diagnose retransmission-related performance issues and understand the tradeoffs inherent in TCP's design.

Why Retransmission Timers Are Essential

To understand why retransmission timers are essential, we must first understand the problem TCP is solving. Consider what happens when a host sends a segment across the Internet:

Scenario 1: Segment Loss Host A sends a segment to Host B. The segment traverses multiple routers, but one router's buffer overflows due to congestion. The segment is dropped. Host B never receives it and therefore never sends an acknowledgment. Without a mechanism to detect this loss, Host A would wait forever.

Scenario 2: ACK Loss Host A sends a segment to Host B. Host B receives it correctly and sends an acknowledgment. However, this ACK packet is lost in transit. From Host A's perspective, this is indistinguishable from Scenario 1—it sent a segment and received no acknowledgment.

Scenario 3: Excessive Delay Host A sends a segment. Due to network congestion, routing changes, or link problems, the segment or its acknowledgment is severely delayed (but not lost). Host A must decide: is this packet lost and needs retransmission, or is it just delayed?

The retransmission timer provides TCP's answer to all three scenarios: if an acknowledgment isn't received within a timeout period, assume the segment was lost and retransmit it.

The Fundamental Tradeoff

The retransmission timer embodies a critical tradeoff. If the timeout is too short, TCP will retransmit segments unnecessarily (the acknowledgment was just delayed), wasting bandwidth and potentially worsening congestion. If the timeout is too long, TCP will wait too long before recovering from actual loss, reducing throughput. Finding the optimal timeout value is one of TCP's most delicate balancing acts.

Impact of Retransmission Timer (RTO) Settings
RTO Characteristic	Consequence	Performance Impact
Too Short	Spurious retransmissions (retransmitting data that isn't lost)	Wasted bandwidth, potential congestion amplification, unfair use of network resources
Too Long	Delayed recovery from actual packet loss	Reduced throughput, increased latency for applications, poor user experience
Well-Tuned	Retransmit only when necessary, with minimal delay	Optimal throughput, efficient bandwidth utilization, fair network sharing
Static (not adaptive)	Appropriate for some paths, wrong for others	Inconsistent performance across different network conditions
Adaptive (dynamic)	Adjusts to actual network conditions	Consistent performance as network conditions change

The Round-Trip Time (RTT) Foundation

The retransmission timeout (RTO) must be based on the actual time it takes for a segment to reach its destination and for the acknowledgment to return. This time is called the Round-Trip Time (RTT). But RTT is not a fixed value—it varies continuously based on:

Network congestion: When routers are congested, packets spend more time in queues
Path changes: Routing updates can redirect packets through longer or shorter paths
Link characteristics: Wireless links, satellite links, and wired links have different latencies
Processing delays: Hosts and routers take varying amounts of time to process packets
Queuing delays: The primary source of RTT variance in most networks

Because RTT varies, TCP cannot use a single fixed timeout value. Instead, it must measure RTT and adapt the timeout accordingly. This is the essence of TCP's dynamic timeout calculation.

Components of Round-Trip Time

•Propagation Delay — The time for a signal to travel through the medium. Determined by the physical distance and speed of light in the medium (~200,000 km/s in fiber). This component is relatively stable for a given path.
•Transmission Delay — The time to push all bits of a packet onto the link. Equals packet size divided by link bandwidth. Higher bandwidth links have lower transmission delay.
•Processing Delay — Time spent by routers and hosts processing packet headers, making routing decisions, and performing error checking. Typically microseconds in modern hardware.
•Queuing Delay — Time spent waiting in router buffers. This is the most variable component and the primary cause of RTT fluctuation. During congestion, queuing delays can increase by orders of magnitude.

RTT Measurement in Practice

TCP measures RTT by recording the time when a segment is sent and noting when its acknowledgment arrives. However, this measurement has complexities: if a segment is retransmitted, TCP cannot determine if the received ACK is for the original or retransmitted segment (Karn's problem). We'll address this later when discussing Karn's algorithm.

RTT Measurement Visualization:

Consider a TCP connection between New York and London:

Time 0ms:    Host A sends segment (Seq=1000)
             └─── Segment begins traversing the network ───┐
                                                           │
                  [Router 1] → [Router 2] → ... → [Router N]
                                                           │
Time 35ms:   Host B receives segment                       ◄
             └─── Host B generates ACK (Ack=1500) ───┐
                                                     │
Time 70ms:   Host A receives ACK                     ◄
             └─── Measured RTT = 70ms

In this example, the measured RTT is 70ms. But the next RTT sample might be 65ms, 80ms, or 120ms depending on current network conditions. TCP must smooth these measurements to produce a stable timeout value.

RTT Estimation and Smoothing

Given the inherent variability of RTT, TCP cannot simply use the most recent RTT measurement as the timeout value. A single high RTT sample (perhaps due to temporary congestion) would set an overly conservative timeout, while a single low sample would set an aggressive timeout that triggers spurious retransmissions.

TCP addresses this through Exponential Weighted Moving Average (EWMA), which smooths RTT measurements over time. The key insight is that recent measurements should carry more weight than older ones, but the estimate should not react too aggressively to any single sample.

The Original TCP Specification (RFC 793) Approach:

The original TCP specification used a simple smoothed RTT (SRTT) calculation:

SRTT = α × SRTT + (1 - α) × RTT_sample

Where α is typically 0.875 (7/8), meaning the new estimate is 87.5% based on the previous estimate and 12.5% on the new sample. This creates a smooth, stable estimate that gradually adapts to changing conditions.

srtt_calculation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# Original TCP SRTT calculation (RFC 793 style)
def update_srtt_original(srtt: float, rtt_sample: float, alpha: float = 0.875) -> float:
    """
    Update the Smoothed Round-Trip Time using EWMA.
    
    Args:
        srtt: Current smoothed RTT estimate (in ms)
        rtt_sample: New RTT measurement (in ms)
        alpha: Smoothing factor (typically 0.875 = 7/8)
    
    Returns:
        Updated SRTT estimate
    
    Example:
        >>> srtt = 100  # Initial estimate: 100ms
        >>> srtt = update_srtt_original(srtt, 80)   # New sample: 80ms
        >>> print(f"New SRTT: {srtt:.2f}ms")  # SRTT moves slightly toward 80
        New SRTT: 97.50ms
        
        >>> srtt = update_srtt_original(srtt, 120)  # New sample: 120ms  
        >>> print(f"New SRTT: {srtt:.2f}ms")  # SRTT moves toward 120
        New SRTT: 100.31ms
    """
    return alpha * srtt + (1 - alpha) * rtt_sample
 
 
# Simulation: How SRTT evolves over time
def simulate_srtt_evolution():
    """Demonstrate SRTT smoothing over varying RTT samples."""
    import random
    
    # Initial SRTT estimate
    srtt = 100.0  # 100ms initial estimate
    
    # Simulate RTT samples with some variation and a spike
    samples = [95, 102, 98, 105, 97, 150, 103, 99, 101, 98]
    
    print("RTT Measurement Smoothing Simulation")
    print("=" * 50)
    print(f"Initial SRTT estimate: {srtt:.2f}ms")
    print(f"Smoothing factor (α): 0.875 (7/8)")
    print()
    
    for i, sample in enumerate(samples):
        old_srtt = srtt
        srtt = update_srtt_original(srtt, sample)
        delta = srtt - old_srtt
        print(f"Sample {i+1}: RTT={sample:3d}ms → SRTT: {old_srtt:.2f} → {srtt:.2f}ms (Δ={delta:+.2f})")
    
    print()
    print("Notice how the spike (150ms) affects SRTT only gradually,")
    print("preventing overreaction to temporary conditions.")
 
 
if __name__ == "__main__":
    simulate_srtt_evolution()

Why Smoothing Matters:

Consider what happens without smoothing. If TCP simply used the last RTT sample as its timeout:

Sample 1: RTT = 100ms → RTO = 100ms
Sample 2: RTT = 150ms (congestion spike) → RTO = 150ms
Sample 3: RTT = 95ms → RTO = 95ms
If the actual RTT is 105ms, the RTO of 95ms triggers unnecessary retransmission!

With EWMA smoothing (α = 0.875):

Initial SRTT = 100ms
After 150ms sample: SRTT = 0.875 × 100 + 0.125 × 150 = 106.25ms
After 95ms sample: SRTT = 0.875 × 106.25 + 0.125 × 95 = 104.84ms

The smoothed estimate remains stable despite individual sample variations.

RTT Variance and Jacobson's Algorithm

The original SRTT approach had a critical flaw: it didn't account for RTT variance. Consider two scenarios:

Scenario A: Low-Variance Network

RTT samples: 100, 102, 99, 101, 100ms
Mean RTT ≈ 100ms, very consistent

Scenario B: High-Variance Network

RTT samples: 50, 150, 80, 120, 100ms
Mean RTT ≈ 100ms, highly variable

Both scenarios have the same average RTT, but they require very different timeout values. In Scenario A, an RTO of 110ms would almost never trigger spuriously. In Scenario B, the same RTO would cause many spurious retransmissions.

In 1988, Van Jacobson published his seminal paper addressing this problem. Jacobson's algorithm computes both the smoothed RTT and the RTT deviation (variance), using both to calculate a more appropriate RTO.

Jacobson's Key Insight

The timeout should be set high enough to accommodate normal RTT variance but not so high that it delays loss recovery. By tracking RTT deviation, TCP can set aggressive timeouts when the network is stable and conservative timeouts when the network is variable—automatically adapting to current conditions.

Jacobson's Algorithm (RFC 6298):

The algorithm maintains two state variables:

SRTT (Smoothed Round-Trip Time): The exponentially weighted average of RTT
RTTVAR (RTT Variance): A measure of RTT variability

On the first RTT measurement:

SRTT = R (the first RTT sample)
RTTVAR = R / 2
RTO = SRTT + max(G, K × RTTVAR)

Where G is the clock granularity and K = 4.

On subsequent RTT measurements:

RTTVAR = (1 - β) × RTTVAR + β × |SRTT - R|
SRTT = (1 - α) × SRTT + α × R
RTO = SRTT + max(G, K × RTTVAR)

Where α = 1/8, β = 1/4, and K = 4.

Why K = 4?

For a normal distribution, ~99.99% of samples fall within μ ± 4σ. Using K = 4 means the timeout should accommodate approximately 99.99% of legitimate delays, making spurious retransmissions extremely rare while still detecting actual loss promptly.

jacobson_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
class JacobsonRTOEstimator:
    """
    Implements Jacobson's algorithm for RTO estimation (RFC 6298).
    
    This is the algorithm used by modern TCP implementations to calculate
    the retransmission timeout based on RTT measurements.
    """
    
    def __init__(self, clock_granularity: float = 1.0, min_rto: float = 1000.0):
        """
        Initialize the RTO estimator.
        
        Args:
            clock_granularity: Timer granularity in ms (G in RFC 6298)
            min_rto: Minimum RTO value in ms (RFC 6298 recommends 1 second)
        """
        self.srtt: float | None = None      # Smoothed RTT
        self.rttvar: float | None = None    # RTT variance
        self.rto: float = 1000.0            # Current RTO (default 1 second)
        self.clock_granularity = clock_granularity
        self.min_rto = min_rto
        
        # Constants from RFC 6298
        self.alpha = 1/8    # SRTT smoothing factor
        self.beta = 1/4     # RTTVAR smoothing factor
        self.K = 4          # Variance multiplier
    
    def update(self, rtt_sample: float) -> float:
        """
        Update RTO estimate with a new RTT measurement.
        
        Args:
            rtt_sample: A new RTT measurement in milliseconds
            
        Returns:
            The updated RTO value in milliseconds
        """
        if self.srtt is None:
            # First measurement - initialize state
            self.srtt = rtt_sample
            self.rttvar = rtt_sample / 2
        else:
            # Subsequent measurements - apply Jacobson's algorithm
            # Note: RTTVAR must be updated BEFORE SRTT (uses old SRTT value)
            
            # Calculate absolute deviation from current estimate
            deviation = abs(self.srtt - rtt_sample)
            
            # Update RTTVAR: exponentially weighted deviation
            self.rttvar = (1 - self.beta) * self.rttvar + self.beta * deviation
            
            # Update SRTT: exponentially weighted average
            self.srtt = (1 - self.alpha) * self.srtt + self.alpha * rtt_sample
        
        # Calculate new RTO
        # RTO = SRTT + max(G, K * RTTVAR)
        self.rto = self.srtt + max(self.clock_granularity, self.K * self.rttvar)
        
        # Apply minimum RTO (RFC 6298: SHOULD be 1 second)
        self.rto = max(self.rto, self.min_rto)
        
        return self.rto
    
    def get_state(self) -> dict:
        """Return current estimator state for debugging/monitoring."""
        return {
            "srtt": self.srtt,
            "rttvar": self.rttvar,
            "rto": self.rto,
        }
 
 
def demonstrate_jacobson():
    """Demonstrate Jacobson's algorithm with real-world scenarios."""
    
    print("=" * 70)
    print("Jacobson's Algorithm Demonstration")
    print("=" * 70)
    
    # Scenario 1: Stable network
    print("\n📊 SCENARIO 1: Stable Low-Latency Network")
    print("-" * 50)
    estimator = JacobsonRTOEstimator(min_rto=200)  # Lower min for demo
    
    stable_samples = [100, 102, 99, 101, 98, 103, 100, 97, 102, 99]
    
    for i, sample in enumerate(stable_samples):
        rto = estimator.update(sample)
        state = estimator.get_state()
        print(f"Sample {i+1:2d}: RTT={sample:3d}ms | "
              f"SRTT={state['srtt']:.1f}ms | "
              f"RTTVAR={state['rttvar']:.1f}ms | "
              f"RTO={rto:.1f}ms")
    
    print(f"\n✓ Final RTO for stable network: {estimator.rto:.1f}ms")
    print("  Notice: Low RTTVAR leads to tight RTO close to SRTT")
    
    # Scenario 2: Variable network
    print("\n📊 SCENARIO 2: High-Variance Network")
    print("-" * 50)
    estimator = JacobsonRTOEstimator(min_rto=200)
    
    variable_samples = [100, 150, 80, 130, 60, 140, 90, 120, 70, 110]
    
    for i, sample in enumerate(variable_samples):
        rto = estimator.update(sample)
        state = estimator.get_state()
        print(f"Sample {i+1:2d}: RTT={sample:3d}ms | "
              f"SRTT={state['srtt']:.1f}ms | "
              f"RTTVAR={state['rttvar']:.1f}ms | "
              f"RTO={rto:.1f}ms")
    
    print(f"\n✓ Final RTO for variable network: {estimator.rto:.1f}ms")
    print("  Notice: High RTTVAR leads to more conservative RTO")
    
    # Scenario 3: Network with sudden congestion
    print("\n📊 SCENARIO 3: Sudden Congestion Event")
    print("-" * 50)
    estimator = JacobsonRTOEstimator(min_rto=200)
    
    # Start stable, then congestion spike, then recovery
    congestion_samples = [100, 98, 102, 99, 250, 300, 280, 150, 110, 102]
    
    for i, sample in enumerate(congestion_samples):
        rto = estimator.update(sample)
        state = estimator.get_state()
        marker = " ← congestion!" if 250 <= sample <= 300 else ""
        print(f"Sample {i+1:2d}: RTT={sample:3d}ms | "
              f"SRTT={state['srtt']:.1f}ms | "
              f"RTTVAR={state['rttvar']:.1f}ms | "
              f"RTO={rto:.1f}ms{marker}")
    
    print(f"\n✓ Algorithm adapts to congestion and recovers gradually")
 
 
if __name__ == "__main__":
    demonstrate_jacobson()

Computing the Retransmission Timeout (RTO)

The complete RTO calculation, as specified in RFC 6298, involves several rules and boundary conditions that ensure robust behavior:

Rule 1: Initial RTO

Before any RTT measurements are taken (such as when opening a new connection), the RTO should be set to a conservative default value. RFC 6298 recommends 1 second, though some implementations use 3 seconds or allow configuration.

Rule 2: First RTT Measurement

When the first RTT sample R is collected:

SRTT ← R
RTTVAR ← R / 2
RTO ← SRTT + max(G, K × RTTVAR)

Rule 3: Subsequent Measurements

For each new RTT sample R':

RTTVAR ← (1 - β) × RTTVAR + β × |SRTT - R'|
SRTT ← (1 - α) × SRTT + α × R'
RTO ← SRTT + max(G, K × RTTVAR)

Rule 4: RTO Bounds

The computed RTO must be bounded:

Minimum RTO: SHOULD be 1 second (to prevent excessive retransmission)
Maximum RTO: MAY be imposed by implementations (often 60-120 seconds)

RTO Algorithm Constants and Their Rationale
Constant	Value	Purpose	Rationale
α (alpha)	1/8 = 0.125	SRTT smoothing factor	Balances responsiveness vs stability; 1/8 chosen for efficient binary arithmetic
β (beta)	1/4 = 0.25	RTTVAR smoothing factor	Variance needs faster adaptation than mean; 1/4 provides good tracking
K	4	Variance multiplier	4 standard deviations covers 99.99% of normal distribution
G	Clock granularity	Timer resolution	Accounts for system timer limitations; typically 1-10ms on modern systems
MinRTO	1 second	Minimum timeout	Prevents retransmission storms in local/fast networks
MaxRTO	60+ seconds	Maximum timeout	Prevents indefinite waiting; allows eventual failure detection

The 1-Second Minimum RTO Debate

The RFC 6298 recommendation of a 1-second minimum RTO is controversial. In modern data center networks with RTTs under 1ms, this creates massive inefficiency—if a packet is lost, TCP must wait 1000x the RTT before retransmitting. Some data center TCP variants (like DCTCP) reduce this minimum, but doing so on the Internet could cause congestion collapse due to spurious retransmissions.

Practical Considerations:

Timer Granularity (G):

Historically, Unix systems had timer granularity of 500ms, meaning the minimum measurable time was half a second. Modern systems have granularity in the 1-10ms range, allowing much finer-grained timeouts. The max(G, K × RTTVAR) term ensures that even with zero variance, the timeout includes at least one clock tick.

Integer Arithmetic:

The choice of α = 1/8 and β = 1/4 is deliberate: these allow efficient computation using bit shifts instead of floating-point division:

// Efficient SRTT update using bit shifts
new_srtt = old_srtt + (rtt_sample - old_srtt) >> 3;  // Divide by 8

// Efficient RTTVAR update
new_rttvar = old_rttvar + (|deviation| - old_rttvar) >> 2;  // Divide by 4

This was crucial for early TCP implementations on limited hardware and remains efficient today.

Exponential Backoff on Retransmission

When a retransmission timeout occurs and TCP retransmits a segment, it doesn't simply use the same RTO for the retransmitted segment. Instead, TCP applies exponential backoff: each successive retransmission doubles the RTO.

Why Exponential Backoff?

If a segment times out, there are two possibilities:

The segment was genuinely lost (network problem)
The RTO was too short (timeout was premature)

In case 1, the network might be congested. Retransmitting aggressively could worsen congestion. In case 2, the network is slower than estimated—retransmitting at the same rate would cause repeated spurious retransmissions.

Exponential backoff addresses both scenarios by progressively backing off, giving the network time to recover.

exponential_backoff.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def calculate_backoff_schedule(initial_rto: float, max_retries: int = 6, 
                                  max_rto: float = 120000.0) -> list[dict]:
    """
    Calculate the exponential backoff schedule for TCP retransmissions.
    
    Args:
        initial_rto: Initial RTO value in milliseconds
        max_retries: Maximum number of retransmission attempts
        max_rto: Maximum RTO cap in milliseconds (default 120 seconds)
    
    Returns:
        List of dictionaries with retry info
    """
    schedule = []
    current_rto = initial_rto
    cumulative_time = 0
    
    for retry in range(max_retries + 1):
        schedule.append({
            "attempt": retry,
            "rto_ms": current_rto,
            "rto_seconds": current_rto / 1000,
            "cumulative_ms": cumulative_time,
            "cumulative_seconds": cumulative_time / 1000,
        })
        
        cumulative_time += current_rto
        current_rto = min(current_rto * 2, max_rto)  # Double with cap
    
    return schedule
 
 
def demonstrate_backoff():
    """Show TCP exponential backoff in action."""
    
    print("TCP Retransmission Exponential Backoff")
    print("=" * 70)
    print()
    
    # Typical internet RTO scenario
    initial_rto = 1000  # 1 second (RFC 6298 minimum)
    schedule = calculate_backoff_schedule(initial_rto, max_retries=6)
    
    print("Scenario: Typical Internet connection (RTO starts at 1 second)")
    print("-" * 70)
    print(f"{'Attempt':<10} {'RTO':<15} {'Cumulative Wait':<20} {'Status'}")
    print("-" * 70)
    
    for entry in schedule:
        attempt = entry['attempt']
        rto = entry['rto_seconds']
        cumulative = entry['cumulative_seconds']
        
        if attempt == 0:
            status = "Initial transmission"
        elif attempt <= 2:
            status = "Probably recoverable"
        elif attempt <= 4:
            status = "Significant delay"
        else:
            status = "Connection likely dead"
        
        print(f"{attempt:<10} {rto:>10.1f}s    {cumulative:>10.1f}s           {status}")
    
    total_time = schedule[-1]['cumulative_seconds'] + schedule[-1]['rto_seconds']
    print("-" * 70)
    print(f"Total time before giving up: {total_time:.0f} seconds ({total_time/60:.1f} minutes)")
    
    print()
    print("Key Observations:")
    print("• Each retry doubles the timeout (exponential growth)")
    print("• After 6 retries, total wait time exceeds 2 minutes")
    print("• This prevents TCP from hammering a failing path")
    print("• Applications should implement their own timeouts for responsiveness")
 
 
if __name__ == "__main__":
    demonstrate_backoff()

Backoff Reset

When TCP successfully receives an acknowledgment, it resets the backoff. The RTO returns to its calculated value based on SRTT and RTTVAR. However, after a retransmission, Karn's algorithm (discussed next) prevents using the ambiguous RTT sample for updating the estimates.

Maximum Retransmissions:

TCP implementations typically limit the number of retransmission attempts. After exhausting retries, TCP gives up and reports a connection failure to the application. Common limits:

SYN retransmissions: 2-6 attempts (establishing new connections)
Data retransmissions: 8-15 attempts (established connections)
Total time limit: Often 9-15 minutes before abandoning

Linux, for example, uses tcp_retries1 (typically 3) for warning/soft error and tcp_retries2 (typically 15) for hard failure. The actual timeout depends on the RTO at each step.

Karn's Algorithm: Handling Ambiguous Acknowledgments

There's a subtle but critical problem with measuring RTT when retransmissions occur. Consider this scenario:

TCP sends segment S at time T0
Timer expires; TCP retransmits S at time T1
ACK arrives at time T2

Question: What is the RTT?

If the ACK is for the original segment: RTT = T2 - T0
If the ACK is for the retransmission: RTT = T2 - T1

TCP cannot distinguish which segment the ACK acknowledges. This is the retransmission ambiguity problem.

Using Wrong RTT Estimates

•If ACK was for original but we use T2-T1: RTT estimate is too short
•Leads to increasingly aggressive timeouts
•More spurious retransmissions occur
•Network congestion worsens
•Positive feedback loop → congestion collapse

Karn's Algorithm Solution

•Do NOT update SRTT/RTTVAR from retransmitted segments
•Only use RTT from segments acknowledged on first transmission
•Maintain backed-off RTO until non-ambiguous ACK
•Avoids contaminating estimates with bad data
•Prevents positive feedback degradation

Karn's Algorithm (1987):

Phil Karn and Craig Partridge proposed a simple but effective solution:

Rule 1: When a timeout occurs and a segment is retransmitted, do not update SRTT or RTTVAR based on any acknowledgment for that segment.

Rule 2: Keep the backed-off RTO (doubled value) in effect until a non-ambiguous acknowledgment is received.

Rule 3: Only update RTT estimates based on segments that were acknowledged on their first transmission.

This ensures that RTT estimates are never contaminated by ambiguous measurements.

karns_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
from dataclasses import dataclass
from enum import Enum
from typing import Optional
 
class SegmentState(Enum):
    SENT = "sent"
    RETRANSMITTED = "retransmitted"
    ACKNOWLEDGED = "acknowledged"
 
@dataclass
class TrackedSegment:
    """Track state of a sent segment for RTT measurement."""
    sequence_number: int
    first_send_time: float
    retransmit_count: int = 0
    state: SegmentState = SegmentState.SENT
 
 
class TCPRTOManagerWithKarn:
    """
    TCP RTO manager implementing Karn's algorithm.
    
    Demonstrates how RTT measurements are handled with
    proper ambiguity resolution.
    """
    
    def __init__(self):
        self.srtt: Optional[float] = None
        self.rttvar: Optional[float] = None
        self.rto: float = 1000.0  # Initial 1 second
        self.backed_off_rto: float = 1000.0  # Current operational RTO
        
        # Track outstanding segments
        self.segments: dict[int, TrackedSegment] = {}
        
        # Statistics
        self.rtt_samples_used = 0
        self.rtt_samples_discarded = 0
    
    def segment_sent(self, seq: int, send_time: float):
        """Record a segment being sent for the first time."""
        self.segments[seq] = TrackedSegment(
            sequence_number=seq,
            first_send_time=send_time
        )
    
    def segment_retransmitted(self, seq: int, retransmit_time: float):
        """Record a segment being retransmitted (timeout occurred)."""
        if seq in self.segments:
            seg = self.segments[seq]
            seg.retransmit_count += 1
            seg.state = SegmentState.RETRANSMITTED
            
            # Apply exponential backoff to operational RTO
            self.backed_off_rto = min(self.backed_off_rto * 2, 120000)
            
            print(f"⚠️  Segment {seq} retransmitted (attempt #{seg.retransmit_count})")
            print(f"    Backed-off RTO: {self.backed_off_rto:.0f}ms")
    
    def ack_received(self, ack: int, recv_time: float) -> Optional[float]:
        """
        Process acknowledgment. Returns RTT if measurement was valid.
        Implements Karn's algorithm for ambiguity handling.
        """
        # Find the acknowledged segment
        acked_seqs = [s for s in self.segments if s < ack]
        
        if not acked_seqs:
            return None
        
        measured_rtt = None
        
        for seq in acked_seqs:
            seg = self.segments[seq]
            
            if seg.retransmit_count > 0:
                # KARN'S ALGORITHM: Do not use this RTT sample
                self.rtt_samples_discarded += 1
                print(f"📊 Segment {seq}: ACK received but RTT DISCARDED (ambiguous)")
                print(f"   Segment was retransmitted - cannot determine which copy was ACKed")
            else:
                # First transmission was acknowledged - safe to measure
                rtt = recv_time - seg.first_send_time
                measured_rtt = rtt
                self._update_rtt_estimates(rtt)
                self.rtt_samples_used += 1
                
                # KARN's ALGORITHM: Reset backed-off RTO only on clean measurement
                self.backed_off_rto = self.rto
                
                print(f"✓ Segment {seq}: RTT measured = {rtt:.1f}ms (used for estimate)")
            
            del self.segments[seq]
        
        return measured_rtt
    
    def _update_rtt_estimates(self, rtt_sample: float):
        """Update SRTT/RTTVAR using Jacobson's algorithm."""
        if self.srtt is None:
            self.srtt = rtt_sample
            self.rttvar = rtt_sample / 2
        else:
            deviation = abs(self.srtt - rtt_sample)
            self.rttvar = 0.75 * self.rttvar + 0.25 * deviation
            self.srtt = 0.875 * self.srtt + 0.125 * rtt_sample
        
        self.rto = self.srtt + max(1, 4 * self.rttvar)
        self.rto = max(self.rto, 200)  # Minimum 200ms for demo
    
    def get_current_rto(self) -> float:
        """Get the current operational RTO (includes backoff)."""
        return self.backed_off_rto
    
    def print_statistics(self):
        """Print measurement statistics."""
        print(f"\n📈 RTT Measurement Statistics:")
        print(f"   Samples used: {self.rtt_samples_used}")
        print(f"   Samples discarded (Karn): {self.rtt_samples_discarded}")
        print(f"   Current SRTT: {self.srtt:.1f}ms" if self.srtt else "   SRTT: Not yet measured")
        print(f"   Current RTO: {self.rto:.1f}ms")
 
 
def demonstrate_karns_algorithm():
    """Demonstrate Karn's algorithm with a realistic scenario."""
    
    print("=" * 70)
    print("Karn's Algorithm Demonstration")
    print("=" * 70)
    print()
    
    manager = TCPRTOManagerWithKarn()
    
    # Scenario: Mix of successful and retransmitted segments
    print("Timeline of events:")
    print("-" * 50)
    
    # Segment 1: Successful first transmission
    print("\nT=0ms:    Send Segment 1000")
    manager.segment_sent(1000, 0)
    
    # Segment 2: Will need retransmission
    print("T=50ms:   Send Segment 1500")
    manager.segment_sent(1500, 50)
    
    # ACK for segment 1 arrives
    print("T=120ms:  ACK 1500 received")
    manager.ack_received(1500, 120)
    
    # Segment 2 times out
    print("T=1100ms: Timeout for Segment 1500")
    manager.segment_retransmitted(1500, 1100)
    
    # Segment 3 sent during backoff period
    print("T=1200ms: Send Segment 2000")
    manager.segment_sent(2000, 1200)
    
    # ACK for retransmitted segment 2
    print("T=1400ms: ACK 2000 received")
    manager.ack_received(2000, 1400)
    
    # ACK for segment 3 (clean measurement)
    print("T=1350ms: ACK 2500 received")
    manager.ack_received(2500, 1350)
    
    manager.print_statistics()
 
 
if __name__ == "__main__":
    demonstrate_karns_algorithm()

Retransmission Timer in Practice

Understanding how retransmission timers behave in real systems is crucial for diagnosing network performance issues. Let's examine practical considerations and how to observe timer behavior.

Linux TCP Stack Configurables:

The Linux kernel exposes several parameters that affect retransmission behavior:

# View current settings
sysctl net.ipv4.tcp_retries1      # Threshold for "soft" errors (default: 3)
sysctl net.ipv4.tcp_retries2      # Max data retransmissions (default: 15)
sysctl net.ipv4.tcp_syn_retries   # SYN retransmissions (default: 6)
sysctl net.ipv4.tcp_synack_retries # SYN-ACK retries (default: 5)

# RTO-related (may not be directly settable on all systems)
# Initial RTO is typically hardcoded or calculated

Observing Retransmissions:

Tools for monitoring retransmission behavior:

# Netstat (shows retransmission statistics)
netstat -s | grep -i retrans

# SS (socket statistics - shows per-connection RTO)
ss -ti dst 192.168.1.1

# Tcpdump/Wireshark (capture retransmissions)
tcpdump -n "tcp[tcpflags] & tcp-syn != 0"

Common Retransmission Scenarios and Diagnosis
Symptom	Possible Cause	Diagnostic Approach	Resolution
High retransmission rate (>1%)	Network congestion or link errors	Check router buffer utilization, link error counters	Add capacity, fix faulty hardware, implement QoS
Spurious retransmissions	RTO too aggressive or RTT spike	Compare RTO vs actual RTT in captures	Check for competing traffic, verify min RTO settings
Very long recovery times	RTO too conservative or continued loss	Monitor RTO progression during outage	Check for persistent path problems, verify backoff logic
SYN retransmissions only	Firewall blocking, service down, or SYN flood mitigation	Check server SYN queue, firewall rules	Increase SYN queue, whitelist legitimate sources, tune SYN cookies
Connection timeouts	Path is dead or experiencing severe packet loss	Traceroute, check for routing blackholes	Route around problem, contact network operator

Reading Wireshark Retransmission Analysis

Wireshark's TCP analysis automatically identifies retransmissions and categorizes them. Look for '[TCP Retransmission]' (segment sent again after timeout) and '[TCP Spurious Retransmission]' (original was actually ACKed, we retransmitted unnecessarily). The 'tcp.analysis.retransmission' display filter shows all retransmitted segments.

Performance Implications:

Retransmission timer behavior has profound performance implications:

Data Center Networks: With RTTs of 100μs to 1ms, the 1-second minimum RTO means that a single packet loss causes 1000-10000× the RTT in delay. This is why data center TCP variants like DCTCP and specialized switches with explicit congestion notification (ECN) are important—they prevent loss in the first place.

Wide Area Networks: For intercontinental connections with 100-300ms RTTs, the RTO calculation works well. The adaptive algorithm tracks varying conditions, and the 1-second minimum rarely applies since K × RTTVAR typically exceeds it.

Satellite Links: With RTTs of 600ms+, TCP must wait substantial time for ACKs. Aggressive retransmission would be catastrophic. Performance Enhancing Proxies (PEPs) sometimes terminate TCP connections and use link-optimized protocols over the satellite hop.

Lossy Wireless Links: Random wireless loss (not congestion) triggers TCP retransmission and congestion control, reducing throughput unnecessarily. Link-layer retransmission (e.g., WiFi ARQ) often handles this faster than TCP can.

Summary: Mastering Retransmission Timers

We've taken a comprehensive journey through TCP's retransmission timer mechanism. Let's consolidate the essential knowledge:

Key Takeaways

•Retransmission Timer Purpose — Enables reliable delivery by triggering retransmission when ACKs don't arrive within expected time. This is the fundamental mechanism that makes TCP reliable.
•RTT-Based Calculation — The timeout must adapt to actual network conditions. Static timeouts fail in the dynamic environment of the Internet.
•Jacobson's Algorithm — Uses exponentially weighted moving averages for both RTT (SRTT) and variance (RTTVAR) to calculate RTO = SRTT + 4×RTTVAR.
•Exponential Backoff — Each retransmission doubles the RTO, preventing congestion amplification and handling ambiguous conditions.
•Karn's Algorithm — RTT measurements from retransmitted segments are discarded to prevent contaminating estimates with ambiguous data.
•Practical Trade-offs — The 1-second minimum RTO protects the Internet but penalizes low-latency environments. Understanding this helps diagnose performance issues.

What's Next:

The retransmission timer is just one of several timers that govern TCP's behavior. In the next page, we'll explore the persistence timer, which addresses a completely different problem: preventing deadlock when the receiver's window closes. Together, these timers form a robust system that handles the myriad edge cases of network communication.

Page Complete

You now have a deep understanding of TCP's retransmission timer—from the mathematical foundations of RTT estimation to practical debugging techniques. This knowledge is essential for diagnosing TCP performance issues and understanding how TCP achieves reliable delivery over an unreliable network.

1 / 5

Loading learning content...

Computer NetworksTCP Timers

TCP Timers: The Temporal Guardians of Reliable Communication

LevelIntermediate

Duration90 mins

TopicTCP Timers

1 / 5

Retransmission Timer: The Guardian of Reliable Delivery

The Fundamental Challenge of Reliable Communication

What You Will Learn

Why Retransmission Timers Are Essential

To understand why retransmission timers are essential, we must first understand the problem TCP is solving. Consider what happens when a host sends a segment across the Internet:

The retransmission timer provides TCP's answer to all three scenarios: if an acknowledgment isn't received within a timeout period, assume the segment was lost and retransmit it.

The Fundamental Tradeoff

Impact of Retransmission Timer (RTO) Settings
RTO Characteristic	Consequence	Performance Impact
Too Short	Spurious retransmissions (retransmitting data that isn't lost)	Wasted bandwidth, potential congestion amplification, unfair use of network resources
Too Long	Delayed recovery from actual packet loss	Reduced throughput, increased latency for applications, poor user experience
Well-Tuned	Retransmit only when necessary, with minimal delay	Optimal throughput, efficient bandwidth utilization, fair network sharing
Static (not adaptive)	Appropriate for some paths, wrong for others	Inconsistent performance across different network conditions
Adaptive (dynamic)	Adjusts to actual network conditions	Consistent performance as network conditions change

The Round-Trip Time (RTT) Foundation

Network congestion: When routers are congested, packets spend more time in queues
Path changes: Routing updates can redirect packets through longer or shorter paths
Link characteristics: Wireless links, satellite links, and wired links have different latencies
Processing delays: Hosts and routers take varying amounts of time to process packets
Queuing delays: The primary source of RTT variance in most networks

Because RTT varies, TCP cannot use a single fixed timeout value. Instead, it must measure RTT and adapt the timeout accordingly. This is the essence of TCP's dynamic timeout calculation.

Components of Round-Trip Time

•Propagation Delay — The time for a signal to travel through the medium. Determined by the physical distance and speed of light in the medium (~200,000 km/s in fiber). This component is relatively stable for a given path.
•Transmission Delay — The time to push all bits of a packet onto the link. Equals packet size divided by link bandwidth. Higher bandwidth links have lower transmission delay.
•Processing Delay — Time spent by routers and hosts processing packet headers, making routing decisions, and performing error checking. Typically microseconds in modern hardware.
•Queuing Delay — Time spent waiting in router buffers. This is the most variable component and the primary cause of RTT fluctuation. During congestion, queuing delays can increase by orders of magnitude.

RTT Measurement in Practice

RTT Measurement Visualization:

Consider a TCP connection between New York and London:

Time 0ms:    Host A sends segment (Seq=1000)
             └─── Segment begins traversing the network ───┐
                                                           │
                  [Router 1] → [Router 2] → ... → [Router N]
                                                           │
Time 35ms:   Host B receives segment                       ◄
             └─── Host B generates ACK (Ack=1500) ───┐
                                                     │
Time 70ms:   Host A receives ACK                     ◄
             └─── Measured RTT = 70ms

RTT Estimation and Smoothing

The Original TCP Specification (RFC 793) Approach:

The original TCP specification used a simple smoothed RTT (SRTT) calculation:

SRTT = α × SRTT + (1 - α) × RTT_sample

srtt_calculation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# Original TCP SRTT calculation (RFC 793 style)
def update_srtt_original(srtt: float, rtt_sample: float, alpha: float = 0.875) -> float:
    """
    Update the Smoothed Round-Trip Time using EWMA.
    
    Args:
        srtt: Current smoothed RTT estimate (in ms)
        rtt_sample: New RTT measurement (in ms)
        alpha: Smoothing factor (typically 0.875 = 7/8)
    
    Returns:
        Updated SRTT estimate
    
    Example:
        >>> srtt = 100  # Initial estimate: 100ms
        >>> srtt = update_srtt_original(srtt, 80)   # New sample: 80ms
        >>> print(f"New SRTT: {srtt:.2f}ms")  # SRTT moves slightly toward 80
        New SRTT: 97.50ms
        
        >>> srtt = update_srtt_original(srtt, 120)  # New sample: 120ms  
        >>> print(f"New SRTT: {srtt:.2f}ms")  # SRTT moves toward 120
        New SRTT: 100.31ms
    """
    return alpha * srtt + (1 - alpha) * rtt_sample
 
 
# Simulation: How SRTT evolves over time
def simulate_srtt_evolution():
    """Demonstrate SRTT smoothing over varying RTT samples."""
    import random
    
    # Initial SRTT estimate
    srtt = 100.0  # 100ms initial estimate
    
    # Simulate RTT samples with some variation and a spike
    samples = [95, 102, 98, 105, 97, 150, 103, 99, 101, 98]
    
    print("RTT Measurement Smoothing Simulation")
    print("=" * 50)
    print(f"Initial SRTT estimate: {srtt:.2f}ms")
    print(f"Smoothing factor (α): 0.875 (7/8)")
    print()
    
    for i, sample in enumerate(samples):
        old_srtt = srtt
        srtt = update_srtt_original(srtt, sample)
        delta = srtt - old_srtt
        print(f"Sample {i+1}: RTT={sample:3d}ms → SRTT: {old_srtt:.2f} → {srtt:.2f}ms (Δ={delta:+.2f})")
    
    print()
    print("Notice how the spike (150ms) affects SRTT only gradually,")
    print("preventing overreaction to temporary conditions.")
 
 
if __name__ == "__main__":
    simulate_srtt_evolution()

Why Smoothing Matters:

Consider what happens without smoothing. If TCP simply used the last RTT sample as its timeout:

Sample 1: RTT = 100ms → RTO = 100ms
Sample 2: RTT = 150ms (congestion spike) → RTO = 150ms
Sample 3: RTT = 95ms → RTO = 95ms
If the actual RTT is 105ms, the RTO of 95ms triggers unnecessary retransmission!

With EWMA smoothing (α = 0.875):

Initial SRTT = 100ms
After 150ms sample: SRTT = 0.875 × 100 + 0.125 × 150 = 106.25ms
After 95ms sample: SRTT = 0.875 × 106.25 + 0.125 × 95 = 104.84ms

The smoothed estimate remains stable despite individual sample variations.

RTT Variance and Jacobson's Algorithm

The original SRTT approach had a critical flaw: it didn't account for RTT variance. Consider two scenarios:

Scenario A: Low-Variance Network

RTT samples: 100, 102, 99, 101, 100ms
Mean RTT ≈ 100ms, very consistent

Scenario B: High-Variance Network

RTT samples: 50, 150, 80, 120, 100ms
Mean RTT ≈ 100ms, highly variable

Jacobson's Key Insight

Jacobson's Algorithm (RFC 6298):

The algorithm maintains two state variables:

SRTT (Smoothed Round-Trip Time): The exponentially weighted average of RTT
RTTVAR (RTT Variance): A measure of RTT variability

On the first RTT measurement:

SRTT = R (the first RTT sample)
RTTVAR = R / 2
RTO = SRTT + max(G, K × RTTVAR)

Where G is the clock granularity and K = 4.

On subsequent RTT measurements:

RTTVAR = (1 - β) × RTTVAR + β × |SRTT - R|
SRTT = (1 - α) × SRTT + α × R
RTO = SRTT + max(G, K × RTTVAR)

Where α = 1/8, β = 1/4, and K = 4.

Why K = 4?

jacobson_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
class JacobsonRTOEstimator:
    """
    Implements Jacobson's algorithm for RTO estimation (RFC 6298).
    
    This is the algorithm used by modern TCP implementations to calculate
    the retransmission timeout based on RTT measurements.
    """
    
    def __init__(self, clock_granularity: float = 1.0, min_rto: float = 1000.0):
        """
        Initialize the RTO estimator.
        
        Args:
            clock_granularity: Timer granularity in ms (G in RFC 6298)
            min_rto: Minimum RTO value in ms (RFC 6298 recommends 1 second)
        """
        self.srtt: float | None = None      # Smoothed RTT
        self.rttvar: float | None = None    # RTT variance
        self.rto: float = 1000.0            # Current RTO (default 1 second)
        self.clock_granularity = clock_granularity
        self.min_rto = min_rto
        
        # Constants from RFC 6298
        self.alpha = 1/8    # SRTT smoothing factor
        self.beta = 1/4     # RTTVAR smoothing factor
        self.K = 4          # Variance multiplier
    
    def update(self, rtt_sample: float) -> float:
        """
        Update RTO estimate with a new RTT measurement.
        
        Args:
            rtt_sample: A new RTT measurement in milliseconds
            
        Returns:
            The updated RTO value in milliseconds
        """
        if self.srtt is None:
            # First measurement - initialize state
            self.srtt = rtt_sample
            self.rttvar = rtt_sample / 2
        else:
            # Subsequent measurements - apply Jacobson's algorithm
            # Note: RTTVAR must be updated BEFORE SRTT (uses old SRTT value)
            
            # Calculate absolute deviation from current estimate
            deviation = abs(self.srtt - rtt_sample)
            
            # Update RTTVAR: exponentially weighted deviation
            self.rttvar = (1 - self.beta) * self.rttvar + self.beta * deviation
            
            # Update SRTT: exponentially weighted average
            self.srtt = (1 - self.alpha) * self.srtt + self.alpha * rtt_sample
        
        # Calculate new RTO
        # RTO = SRTT + max(G, K * RTTVAR)
        self.rto = self.srtt + max(self.clock_granularity, self.K * self.rttvar)
        
        # Apply minimum RTO (RFC 6298: SHOULD be 1 second)
        self.rto = max(self.rto, self.min_rto)
        
        return self.rto
    
    def get_state(self) -> dict:
        """Return current estimator state for debugging/monitoring."""
        return {
            "srtt": self.srtt,
            "rttvar": self.rttvar,
            "rto": self.rto,
        }
 
 
def demonstrate_jacobson():
    """Demonstrate Jacobson's algorithm with real-world scenarios."""
    
    print("=" * 70)
    print("Jacobson's Algorithm Demonstration")
    print("=" * 70)
    
    # Scenario 1: Stable network
    print("\n📊 SCENARIO 1: Stable Low-Latency Network")
    print("-" * 50)
    estimator = JacobsonRTOEstimator(min_rto=200)  # Lower min for demo
    
    stable_samples = [100, 102, 99, 101, 98, 103, 100, 97, 102, 99]
    
    for i, sample in enumerate(stable_samples):
        rto = estimator.update(sample)
        state = estimator.get_state()
        print(f"Sample {i+1:2d}: RTT={sample:3d}ms | "
              f"SRTT={state['srtt']:.1f}ms | "
              f"RTTVAR={state['rttvar']:.1f}ms | "
              f"RTO={rto:.1f}ms")
    
    print(f"\n✓ Final RTO for stable network: {estimator.rto:.1f}ms")
    print("  Notice: Low RTTVAR leads to tight RTO close to SRTT")
    
    # Scenario 2: Variable network
    print("\n📊 SCENARIO 2: High-Variance Network")
    print("-" * 50)
    estimator = JacobsonRTOEstimator(min_rto=200)
    
    variable_samples = [100, 150, 80, 130, 60, 140, 90, 120, 70, 110]
    
    for i, sample in enumerate(variable_samples):
        rto = estimator.update(sample)
        state = estimator.get_state()
        print(f"Sample {i+1:2d}: RTT={sample:3d}ms | "
              f"SRTT={state['srtt']:.1f}ms | "
              f"RTTVAR={state['rttvar']:.1f}ms | "
              f"RTO={rto:.1f}ms")
    
    print(f"\n✓ Final RTO for variable network: {estimator.rto:.1f}ms")
    print("  Notice: High RTTVAR leads to more conservative RTO")
    
    # Scenario 3: Network with sudden congestion
    print("\n📊 SCENARIO 3: Sudden Congestion Event")
    print("-" * 50)
    estimator = JacobsonRTOEstimator(min_rto=200)
    
    # Start stable, then congestion spike, then recovery
    congestion_samples = [100, 98, 102, 99, 250, 300, 280, 150, 110, 102]
    
    for i, sample in enumerate(congestion_samples):
        rto = estimator.update(sample)
        state = estimator.get_state()
        marker = " ← congestion!" if 250 <= sample <= 300 else ""
        print(f"Sample {i+1:2d}: RTT={sample:3d}ms | "
              f"SRTT={state['srtt']:.1f}ms | "
              f"RTTVAR={state['rttvar']:.1f}ms | "
              f"RTO={rto:.1f}ms{marker}")
    
    print(f"\n✓ Algorithm adapts to congestion and recovers gradually")
 
 
if __name__ == "__main__":
    demonstrate_jacobson()

Computing the Retransmission Timeout (RTO)

The complete RTO calculation, as specified in RFC 6298, involves several rules and boundary conditions that ensure robust behavior:

Rule 1: Initial RTO

Rule 2: First RTT Measurement

When the first RTT sample R is collected:

SRTT ← R
RTTVAR ← R / 2
RTO ← SRTT + max(G, K × RTTVAR)

Rule 3: Subsequent Measurements

For each new RTT sample R':

RTTVAR ← (1 - β) × RTTVAR + β × |SRTT - R'|
SRTT ← (1 - α) × SRTT + α × R'
RTO ← SRTT + max(G, K × RTTVAR)

Rule 4: RTO Bounds

The computed RTO must be bounded:

Minimum RTO: SHOULD be 1 second (to prevent excessive retransmission)
Maximum RTO: MAY be imposed by implementations (often 60-120 seconds)

RTO Algorithm Constants and Their Rationale
Constant	Value	Purpose	Rationale
α (alpha)	1/8 = 0.125	SRTT smoothing factor	Balances responsiveness vs stability; 1/8 chosen for efficient binary arithmetic
β (beta)	1/4 = 0.25	RTTVAR smoothing factor	Variance needs faster adaptation than mean; 1/4 provides good tracking
K	4	Variance multiplier	4 standard deviations covers 99.99% of normal distribution
G	Clock granularity	Timer resolution	Accounts for system timer limitations; typically 1-10ms on modern systems
MinRTO	1 second	Minimum timeout	Prevents retransmission storms in local/fast networks
MaxRTO	60+ seconds	Maximum timeout	Prevents indefinite waiting; allows eventual failure detection

The 1-Second Minimum RTO Debate

Practical Considerations:

Timer Granularity (G):

Integer Arithmetic:

The choice of α = 1/8 and β = 1/4 is deliberate: these allow efficient computation using bit shifts instead of floating-point division:

// Efficient SRTT update using bit shifts
new_srtt = old_srtt + (rtt_sample - old_srtt) >> 3;  // Divide by 8

// Efficient RTTVAR update
new_rttvar = old_rttvar + (|deviation| - old_rttvar) >> 2;  // Divide by 4

This was crucial for early TCP implementations on limited hardware and remains efficient today.

Exponential Backoff on Retransmission

Why Exponential Backoff?

If a segment times out, there are two possibilities:

The segment was genuinely lost (network problem)
The RTO was too short (timeout was premature)

Exponential backoff addresses both scenarios by progressively backing off, giving the network time to recover.

exponential_backoff.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def calculate_backoff_schedule(initial_rto: float, max_retries: int = 6, 
                                  max_rto: float = 120000.0) -> list[dict]:
    """
    Calculate the exponential backoff schedule for TCP retransmissions.
    
    Args:
        initial_rto: Initial RTO value in milliseconds
        max_retries: Maximum number of retransmission attempts
        max_rto: Maximum RTO cap in milliseconds (default 120 seconds)
    
    Returns:
        List of dictionaries with retry info
    """
    schedule = []
    current_rto = initial_rto
    cumulative_time = 0
    
    for retry in range(max_retries + 1):
        schedule.append({
            "attempt": retry,
            "rto_ms": current_rto,
            "rto_seconds": current_rto / 1000,
            "cumulative_ms": cumulative_time,
            "cumulative_seconds": cumulative_time / 1000,
        })
        
        cumulative_time += current_rto
        current_rto = min(current_rto * 2, max_rto)  # Double with cap
    
    return schedule
 
 
def demonstrate_backoff():
    """Show TCP exponential backoff in action."""
    
    print("TCP Retransmission Exponential Backoff")
    print("=" * 70)
    print()
    
    # Typical internet RTO scenario
    initial_rto = 1000  # 1 second (RFC 6298 minimum)
    schedule = calculate_backoff_schedule(initial_rto, max_retries=6)
    
    print("Scenario: Typical Internet connection (RTO starts at 1 second)")
    print("-" * 70)
    print(f"{'Attempt':<10} {'RTO':<15} {'Cumulative Wait':<20} {'Status'}")
    print("-" * 70)
    
    for entry in schedule:
        attempt = entry['attempt']
        rto = entry['rto_seconds']
        cumulative = entry['cumulative_seconds']
        
        if attempt == 0:
            status = "Initial transmission"
        elif attempt <= 2:
            status = "Probably recoverable"
        elif attempt <= 4:
            status = "Significant delay"
        else:
            status = "Connection likely dead"
        
        print(f"{attempt:<10} {rto:>10.1f}s    {cumulative:>10.1f}s           {status}")
    
    total_time = schedule[-1]['cumulative_seconds'] + schedule[-1]['rto_seconds']
    print("-" * 70)
    print(f"Total time before giving up: {total_time:.0f} seconds ({total_time/60:.1f} minutes)")
    
    print()
    print("Key Observations:")
    print("• Each retry doubles the timeout (exponential growth)")
    print("• After 6 retries, total wait time exceeds 2 minutes")
    print("• This prevents TCP from hammering a failing path")
    print("• Applications should implement their own timeouts for responsiveness")
 
 
if __name__ == "__main__":
    demonstrate_backoff()

Backoff Reset

Maximum Retransmissions:

TCP implementations typically limit the number of retransmission attempts. After exhausting retries, TCP gives up and reports a connection failure to the application. Common limits:

SYN retransmissions: 2-6 attempts (establishing new connections)
Data retransmissions: 8-15 attempts (established connections)
Total time limit: Often 9-15 minutes before abandoning

Linux, for example, uses tcp_retries1 (typically 3) for warning/soft error and tcp_retries2 (typically 15) for hard failure. The actual timeout depends on the RTO at each step.

Karn's Algorithm: Handling Ambiguous Acknowledgments

There's a subtle but critical problem with measuring RTT when retransmissions occur. Consider this scenario:

TCP sends segment S at time T0
Timer expires; TCP retransmits S at time T1
ACK arrives at time T2

Question: What is the RTT?

If the ACK is for the original segment: RTT = T2 - T0
If the ACK is for the retransmission: RTT = T2 - T1

TCP cannot distinguish which segment the ACK acknowledges. This is the retransmission ambiguity problem.

Using Wrong RTT Estimates

•If ACK was for original but we use T2-T1: RTT estimate is too short
•Leads to increasingly aggressive timeouts
•More spurious retransmissions occur
•Network congestion worsens
•Positive feedback loop → congestion collapse

Karn's Algorithm Solution

•Do NOT update SRTT/RTTVAR from retransmitted segments
•Only use RTT from segments acknowledged on first transmission
•Maintain backed-off RTO until non-ambiguous ACK
•Avoids contaminating estimates with bad data
•Prevents positive feedback degradation

Karn's Algorithm (1987):

Phil Karn and Craig Partridge proposed a simple but effective solution:

Rule 1: When a timeout occurs and a segment is retransmitted, do not update SRTT or RTTVAR based on any acknowledgment for that segment.

Rule 2: Keep the backed-off RTO (doubled value) in effect until a non-ambiguous acknowledgment is received.

Rule 3: Only update RTT estimates based on segments that were acknowledged on their first transmission.

This ensures that RTT estimates are never contaminated by ambiguous measurements.

karns_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
from dataclasses import dataclass
from enum import Enum
from typing import Optional
 
class SegmentState(Enum):
    SENT = "sent"
    RETRANSMITTED = "retransmitted"
    ACKNOWLEDGED = "acknowledged"
 
@dataclass
class TrackedSegment:
    """Track state of a sent segment for RTT measurement."""
    sequence_number: int
    first_send_time: float
    retransmit_count: int = 0
    state: SegmentState = SegmentState.SENT
 
 
class TCPRTOManagerWithKarn:
    """
    TCP RTO manager implementing Karn's algorithm.
    
    Demonstrates how RTT measurements are handled with
    proper ambiguity resolution.
    """
    
    def __init__(self):
        self.srtt: Optional[float] = None
        self.rttvar: Optional[float] = None
        self.rto: float = 1000.0  # Initial 1 second
        self.backed_off_rto: float = 1000.0  # Current operational RTO
        
        # Track outstanding segments
        self.segments: dict[int, TrackedSegment] = {}
        
        # Statistics
        self.rtt_samples_used = 0
        self.rtt_samples_discarded = 0
    
    def segment_sent(self, seq: int, send_time: float):
        """Record a segment being sent for the first time."""
        self.segments[seq] = TrackedSegment(
            sequence_number=seq,
            first_send_time=send_time
        )
    
    def segment_retransmitted(self, seq: int, retransmit_time: float):
        """Record a segment being retransmitted (timeout occurred)."""
        if seq in self.segments:
            seg = self.segments[seq]
            seg.retransmit_count += 1
            seg.state = SegmentState.RETRANSMITTED
            
            # Apply exponential backoff to operational RTO
            self.backed_off_rto = min(self.backed_off_rto * 2, 120000)
            
            print(f"⚠️  Segment {seq} retransmitted (attempt #{seg.retransmit_count})")
            print(f"    Backed-off RTO: {self.backed_off_rto:.0f}ms")
    
    def ack_received(self, ack: int, recv_time: float) -> Optional[float]:
        """
        Process acknowledgment. Returns RTT if measurement was valid.
        Implements Karn's algorithm for ambiguity handling.
        """
        # Find the acknowledged segment
        acked_seqs = [s for s in self.segments if s < ack]
        
        if not acked_seqs:
            return None
        
        measured_rtt = None
        
        for seq in acked_seqs:
            seg = self.segments[seq]
            
            if seg.retransmit_count > 0:
                # KARN'S ALGORITHM: Do not use this RTT sample
                self.rtt_samples_discarded += 1
                print(f"📊 Segment {seq}: ACK received but RTT DISCARDED (ambiguous)")
                print(f"   Segment was retransmitted - cannot determine which copy was ACKed")
            else:
                # First transmission was acknowledged - safe to measure
                rtt = recv_time - seg.first_send_time
                measured_rtt = rtt
                self._update_rtt_estimates(rtt)
                self.rtt_samples_used += 1
                
                # KARN's ALGORITHM: Reset backed-off RTO only on clean measurement
                self.backed_off_rto = self.rto
                
                print(f"✓ Segment {seq}: RTT measured = {rtt:.1f}ms (used for estimate)")
            
            del self.segments[seq]
        
        return measured_rtt
    
    def _update_rtt_estimates(self, rtt_sample: float):
        """Update SRTT/RTTVAR using Jacobson's algorithm."""
        if self.srtt is None:
            self.srtt = rtt_sample
            self.rttvar = rtt_sample / 2
        else:
            deviation = abs(self.srtt - rtt_sample)
            self.rttvar = 0.75 * self.rttvar + 0.25 * deviation
            self.srtt = 0.875 * self.srtt + 0.125 * rtt_sample
        
        self.rto = self.srtt + max(1, 4 * self.rttvar)
        self.rto = max(self.rto, 200)  # Minimum 200ms for demo
    
    def get_current_rto(self) -> float:
        """Get the current operational RTO (includes backoff)."""
        return self.backed_off_rto
    
    def print_statistics(self):
        """Print measurement statistics."""
        print(f"\n📈 RTT Measurement Statistics:")
        print(f"   Samples used: {self.rtt_samples_used}")
        print(f"   Samples discarded (Karn): {self.rtt_samples_discarded}")
        print(f"   Current SRTT: {self.srtt:.1f}ms" if self.srtt else "   SRTT: Not yet measured")
        print(f"   Current RTO: {self.rto:.1f}ms")
 
 
def demonstrate_karns_algorithm():
    """Demonstrate Karn's algorithm with a realistic scenario."""
    
    print("=" * 70)
    print("Karn's Algorithm Demonstration")
    print("=" * 70)
    print()
    
    manager = TCPRTOManagerWithKarn()
    
    # Scenario: Mix of successful and retransmitted segments
    print("Timeline of events:")
    print("-" * 50)
    
    # Segment 1: Successful first transmission
    print("\nT=0ms:    Send Segment 1000")
    manager.segment_sent(1000, 0)
    
    # Segment 2: Will need retransmission
    print("T=50ms:   Send Segment 1500")
    manager.segment_sent(1500, 50)
    
    # ACK for segment 1 arrives
    print("T=120ms:  ACK 1500 received")
    manager.ack_received(1500, 120)
    
    # Segment 2 times out
    print("T=1100ms: Timeout for Segment 1500")
    manager.segment_retransmitted(1500, 1100)
    
    # Segment 3 sent during backoff period
    print("T=1200ms: Send Segment 2000")
    manager.segment_sent(2000, 1200)
    
    # ACK for retransmitted segment 2
    print("T=1400ms: ACK 2000 received")
    manager.ack_received(2000, 1400)
    
    # ACK for segment 3 (clean measurement)
    print("T=1350ms: ACK 2500 received")
    manager.ack_received(2500, 1350)
    
    manager.print_statistics()
 
 
if __name__ == "__main__":
    demonstrate_karns_algorithm()

Retransmission Timer in Practice

Understanding how retransmission timers behave in real systems is crucial for diagnosing network performance issues. Let's examine practical considerations and how to observe timer behavior.

Linux TCP Stack Configurables:

The Linux kernel exposes several parameters that affect retransmission behavior:

# View current settings
sysctl net.ipv4.tcp_retries1      # Threshold for "soft" errors (default: 3)
sysctl net.ipv4.tcp_retries2      # Max data retransmissions (default: 15)
sysctl net.ipv4.tcp_syn_retries   # SYN retransmissions (default: 6)
sysctl net.ipv4.tcp_synack_retries # SYN-ACK retries (default: 5)

# RTO-related (may not be directly settable on all systems)
# Initial RTO is typically hardcoded or calculated

Observing Retransmissions:

Tools for monitoring retransmission behavior:

# Netstat (shows retransmission statistics)
netstat -s | grep -i retrans

# SS (socket statistics - shows per-connection RTO)
ss -ti dst 192.168.1.1

# Tcpdump/Wireshark (capture retransmissions)
tcpdump -n "tcp[tcpflags] & tcp-syn != 0"

Common Retransmission Scenarios and Diagnosis
Symptom	Possible Cause	Diagnostic Approach	Resolution
High retransmission rate (>1%)	Network congestion or link errors	Check router buffer utilization, link error counters	Add capacity, fix faulty hardware, implement QoS
Spurious retransmissions	RTO too aggressive or RTT spike	Compare RTO vs actual RTT in captures	Check for competing traffic, verify min RTO settings
Very long recovery times	RTO too conservative or continued loss	Monitor RTO progression during outage	Check for persistent path problems, verify backoff logic
SYN retransmissions only	Firewall blocking, service down, or SYN flood mitigation	Check server SYN queue, firewall rules	Increase SYN queue, whitelist legitimate sources, tune SYN cookies
Connection timeouts	Path is dead or experiencing severe packet loss	Traceroute, check for routing blackholes	Route around problem, contact network operator

Reading Wireshark Retransmission Analysis

Performance Implications:

Retransmission timer behavior has profound performance implications:

Summary: Mastering Retransmission Timers

We've taken a comprehensive journey through TCP's retransmission timer mechanism. Let's consolidate the essential knowledge:

Key Takeaways

•Retransmission Timer Purpose — Enables reliable delivery by triggering retransmission when ACKs don't arrive within expected time. This is the fundamental mechanism that makes TCP reliable.
•RTT-Based Calculation — The timeout must adapt to actual network conditions. Static timeouts fail in the dynamic environment of the Internet.
•Jacobson's Algorithm — Uses exponentially weighted moving averages for both RTT (SRTT) and variance (RTTVAR) to calculate RTO = SRTT + 4×RTTVAR.
•Exponential Backoff — Each retransmission doubles the RTO, preventing congestion amplification and handling ambiguous conditions.
•Karn's Algorithm — RTT measurements from retransmitted segments are discarded to prevent contaminating estimates with ambiguous data.
•Practical Trade-offs — The 1-second minimum RTO protects the Internet but penalizes low-latency environments. Understanding this helps diagnose performance issues.

What's Next:

Page Complete

1 / 5