Loading content...
Every time TCP sends a segment, it faces a critical decision: How long should I wait for acknowledgment before assuming the segment was lost? Set the timer too short, and you'll waste bandwidth retransmitting segments that were actually on their way. Set it too long, and you'll sit idle while valuable transmission time ticks away.
This decision—the Retransmission Timeout (RTO)—is the culmination of everything we've studied: RTT sampling, Jacobson's variance-based estimation, and Karn's algorithm for handling ambiguity. RFC 6298, "Computing TCP's Retransmission Timer," codifies the complete algorithm that modern TCP implementations follow.
In this page, we'll dissect RFC 6298 step by step, understanding not just what the algorithm specifies, but why each element exists.
By the end of this page, you will understand the complete RTO calculation algorithm as specified in RFC 6298, all the constants and bounds involved, the clock granularity consideration, how to initialize the algorithm, detailed update procedures, and how RTO fits into TCP's larger retransmission framework.
RFC 6298, published in 2011, obsoletes the earlier RFC 2988 and provides the current standard for TCP RTO computation. It consolidates decades of research and operational experience into a precise specification.
The RFC defines:
The fundamental RTO calculation is:
RTO = SRTT + max(G, K × RTTVAR)
Where:
| Constant | Value | Purpose | Implementation Note |
|---|---|---|---|
| α (SRTT smoothing) | 1/8 | Weight for new RTT sample in mean estimation | Implemented as >> 3 |
| β (RTTVAR smoothing) | 1/4 | Weight for new deviation in variance estimation | Implemented as >> 2 |
| K (variance multiplier) | 4 | Safety margin multiplier for variance term | Implemented as << 2 |
| Minimum RTO | ≥ 1 second | Lower bound on RTO to prevent spurious retrans | Some systems use 200ms |
| G (granularity) | System-dependent | Clock timer resolution | Often negligible on modern systems |
RFC 6298 made one significant change from RFC 2988: it changed the minimum RTO recommendation from 1 second (MUST) to a more flexible requirement. However, it still recommends 1 second as a safe minimum. Some modern implementations (especially in data centers) use lower minimums like 200ms, which can improve latency but requires careful consideration of delayed ACK timers.
When a TCP connection is first established, there's no RTT history to work with. RFC 6298 specifies a two-phase initialization:
Until the first RTT sample is obtained (typically from the SYN-ACK during connection establishment):
RTO = 1 second
This is a conservative initial value. It's long enough to accommodate most networks while short enough not to stall connection setup excessively.
Rationale: Setting RTO too low initially could cause spurious retransmissions of SYN packets, potentially preventing connection establishment on high-latency links. Setting it too high delays connection setup if the first SYN is lost.
When the first RTT measurement R is made:
SRTT = R
RTTVAR = R / 2
RTO = SRTT + max(G, K × RTTVAR) = R + max(G, 4 × R/2) = R + max(G, 2R)
Assuming G < 2R (which is almost always true):
RTO = R + 2R = 3R
The initial RTO is thus 3× the first RTT measurement.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
class TCPRTO: """TCP RTO Calculator following RFC 6298.""" # Constants from RFC 6298 ALPHA = 1/8 # SRTT smoothing factor BETA = 1/4 # RTTVAR smoothing factor K = 4 # Variance multiplier MIN_RTO = 1000 # Minimum RTO in milliseconds (1 second) INITIAL_RTO = 1000 # Initial RTO before any measurements MAX_RTO = 60000 # Maximum RTO in milliseconds (60 seconds) CLOCK_GRANULARITY = 1 # G: Assume 1ms granularity on modern systems def __init__(self): # State: None indicates no measurements yet self.srtt = None self.rttvar = None self.rto = self.INITIAL_RTO # Start with 1 second self._first_measurement = True def on_first_measurement(self, R: float): """ Handle the first RTT measurement. Per RFC 6298 Section 2.2: - SRTT <- R - RTTVAR <- R/2 - RTO <- SRTT + max(G, K*RTTVAR) """ self.srtt = R self.rttvar = R / 2 # Calculate RTO with granularity consideration variance_term = max(self.CLOCK_GRANULARITY, self.K * self.rttvar) self.rto = self.srtt + variance_term # Apply minimum bound self.rto = max(self.rto, self.MIN_RTO) self._first_measurement = False print(f"First measurement: R={R}ms") print(f" SRTT={self.srtt}ms, RTTVAR={self.rttvar}ms") print(f" RTO={self.rto}ms (= {R} + 4×{R/2} = 3×{R}ms)") # Example: Connection to a server with 100ms RTTrto_calc = TCPRTO()print(f"Initial RTO (no measurements): {rto_calc.rto}ms") # First RTT measurement from SYN-ACKrto_calc.on_first_measurement(100)# Output: RTO = 300ms (3 × first measurement)Setting the initial variance to half the first measurement is a heuristic that errs on the side of caution:
This heuristic has proven robust across decades of Internet operation.
The first RTT measurement typically comes from the three-way handshake: the time between sending SYN and receiving SYN-ACK. This provides an RTT estimate before any application data is transmitted, allowing data segments to use a properly calibrated RTO from the start.
After the first measurement, each new RTT sample R' updates the estimator using Jacobson's algorithm. RFC 6298 Section 2.3 specifies:
Err = R' - SRTT
The error is the difference between the new sample and the current estimate.
RTTVAR = (1 - β) × RTTVAR + β × |Err|
RTTVAR = (1 - 1/4) × RTTVAR + (1/4) × |Err|
RTTVAR = 3/4 × RTTVAR + 1/4 × |Err|
Important: RTTVAR is updated before SRTT. This ensures we use the old SRTT value (not yet updated) when calculating the error magnitude.
SRTT = (1 - α) × SRTT + α × R'
SRTT = (1 - 1/8) × SRTT + (1/8) × R'
SRTT = 7/8 × SRTT + 1/8 × R'
RTO = SRTT + max(G, K × RTTVAR)
RTO = SRTT + max(G, 4 × RTTVAR)
RTO = max(RTO, MinRTO)
RTO = min(RTO, MaxRTO) (optional but common)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
def on_subsequent_measurement(self, R_prime: float): """ Handle subsequent RTT measurements. Per RFC 6298 Section 2.3, the update order matters: 1. Update RTTVAR using OLD SRTT 2. Update SRTT 3. Recompute RTO """ # Step 1: Compute error using OLD SRTT err = R_prime - self.srtt # Step 2: Update RTTVAR (using OLD SRTT for error calculation) # RTTVAR = (1 - β) * RTTVAR + β * |err| # With β = 1/4: RTTVAR = 3/4 * RTTVAR + 1/4 * |err| abs_err = abs(err) self.rttvar = (1 - self.BETA) * self.rttvar + self.BETA * abs_err # Alternative formulation: # self.rttvar = self.rttvar + self.BETA * (abs_err - self.rttvar) # Step 3: Update SRTT # SRTT = (1 - α) * SRTT + α * R' # With α = 1/8: SRTT = 7/8 * SRTT + 1/8 * R' self.srtt = (1 - self.ALPHA) * self.srtt + self.ALPHA * R_prime # Alternative formulation: # self.srtt = self.srtt + self.ALPHA * err # Uses err from step 1 # Step 4: Compute RTO variance_term = max(self.CLOCK_GRANULARITY, self.K * self.rttvar) self.rto = self.srtt + variance_term # Step 5: Apply bounds self.rto = max(self.rto, self.MIN_RTO) self.rto = min(self.rto, self.MAX_RTO) return self.rto # Example trace:rto_calc = TCPRTO()rto_calc.on_first_measurement(100) # First: RTO = 300ms measurements = [105, 95, 102, 98, 100] # Stable networkfor r in measurements: rto = rto_calc.on_subsequent_measurement(r) print(f"R={r}ms -> SRTT={rto_calc.srtt:.1f}, RTTVAR={rto_calc.rttvar:.1f}, RTO={rto:.1f}") # Output shows RTTVAR decreasing as network proves stableThis ordering is critical and often implemented incorrectly. Consider what happens if we update SRTT first:
The error calculated with the new SRTT is artificially reduced because SRTT has already moved toward R'. This dampens RTTVAR incorrectly.
By updating RTTVAR first, we measure the deviation from what we expected (old SRTT), not from an already-adjusted value.
Updating SRTT before RTTVAR is a common implementation bug. It appears to work in testing because the error is subtle—RTTVAR decreases slightly faster than it should. The problem manifests as overly aggressive RTO in variable networks, leading to occasional spurious retransmissions.
RFC 6298 includes the granularity term G in the RTO calculation:
RTO = SRTT + max(G, K × RTTVAR)
This term ensures that RTO never depends solely on a variance estimate that might be smaller than the clock resolution.
In the early days of TCP, system clocks had coarse granularity—often 500ms or even 1 second ticks. This created several problems:
Including max(G, 4×RTTVAR) ensures RTO is at least one clock tick more than SRTT, even if RTTVAR is tiny.
| Era | Typical Granularity | Impact on RTO |
|---|---|---|
| 1980s Unix | 500ms - 1s | Significant: Many RTTs fall within one tick |
| 1990s Systems | 10ms - 100ms | Moderate: Important for LAN connections |
| 2000s Systems | 1ms - 10ms | Minor: Mostly affects high-speed LANs |
| Modern Systems | 1μs - 1ms | Negligible: G term rarely dominates |
On modern systems with microsecond or better clock resolution, the G term is effectively negligible. However, it remains in the specification for:
12345678910111213141516171819202122232425262728293031323334353637
/* RFC 6298 RTO calculation with granularity */ /* Clock granularity in same units as SRTT/RTTVAR (e.g., milliseconds) */#define CLOCK_G 1 /* 1ms granularity on modern systems */ /* Calculate RTO from SRTT and RTTVAR */unsigned int calculate_rto(unsigned int srtt, unsigned int rttvar) { unsigned int variance_term; unsigned int rto; /* RFC 6298: RTO = SRTT + max(G, K*RTTVAR) */ variance_term = 4 * rttvar; /* K = 4 */ if (variance_term < CLOCK_G) { variance_term = CLOCK_G; /* max(G, K*RTTVAR) */ } rto = srtt + variance_term; /* Apply minimum bound from RFC 6298 */ if (rto < MIN_RTO) { rto = MIN_RTO; } /* Apply maximum bound (implementation-specific) */ if (rto > MAX_RTO) { rto = MAX_RTO; } return rto;} /* * Note: In practice, the MIN_RTO bound (typically 1 second) * is usually larger than any value that would result from * the G term dominating, so G is often irrelevant. */In most practical scenarios, the minimum RTO bound (1 second per RFC 6298) is much larger than the granularity term. The granularity consideration is mainly relevant for high-speed, low-latency networks where the calculated RTO might otherwise be in the single-digit milliseconds—which itself is below the minimum RTO. The G term thus rarely affects actual RTO values in conforming implementations.
RFC 6298 specifies bounds on the RTO value, and these bounds have important rationale:
RFC 6298 states:
Whenever RTO is computed, if it is less than 1 second then the RTO SHOULD be rounded up to 1 second.
The RFC acknowledges this may be relaxed in controlled environments but maintains 1 second as the default recommendation.
RFC 6298 does not mandate a maximum RTO, but implementations typically enforce one (often 60-120 seconds). The rationale:
In controlled environments (data centers, private networks), operators sometimes relax the minimum RTO to values like 200ms or even lower. This is acceptable when:
| Environment | Typical MinRTO | Typical MaxRTO | Notes |
|---|---|---|---|
| Public Internet | 1 second | 60 seconds | RFC 6298 recommendation |
| Enterprise LAN | 200ms - 1s | 30-60 seconds | Often configurable |
| Data Center | 20ms - 200ms | 10-30 seconds | Optimized for low latency |
| High-Frequency Trading | <1ms | 100ms - 1s | Extreme tuning, specialized stacks |
Lowering minimum RTO below 1 second in uncontrolled environments is dangerous. A network hiccup or delayed ACK can trigger spurious retransmissions, potentially leading to: (1) wasted bandwidth, (2) unnecessary congestion response, (3) degraded throughput due to cwnd reduction. Only relax MinRTO when you fully control both endpoints and the network path.
The RTO calculation tells us what the timeout should be, but RFC 6298 also specifies when to set and reset the timer:
When a segment containing data is sent (including a retransmission), if the timer is not running, start it running so that it will expire after RTO seconds.
The timer starts when the first unacknowledged segment is sent. It doesn't restart for every segment—only if it's not already running.
When all outstanding data has been acknowledged, turn off the retransmission timer.
When the receiver has acknowledged everything, there's nothing to time out on.
When an ACK is received that acknowledges new data, restart the retransmission timer so that it will expire after RTO seconds.
Each ACK that makes progress restarts the timer with the current RTO value. This ensures the timer reflects the most recent segment, not an old one.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
class TCPRetransmissionTimer: """Manages the retransmission timer per RFC 6298 Section 5.""" def __init__(self): self.timer_running = False self.timer_expiry = None self.rto_calculator = TCPRTO() # Track unacknowledged data self.snd_una = 0 # Oldest unacknowledged byte self.snd_nxt = 0 # Next byte to send def send_data(self, segment): """Called when data is transmitted.""" segment_end = segment.sequence + len(segment.data) self.snd_nxt = max(self.snd_nxt, segment_end) # Rule 5.1: Start timer if not already running if not self.timer_running and self.snd_una < self.snd_nxt: self._start_timer() def receive_ack(self, ack_num, segment): """Called when ACK is received.""" # Check if ACK acknowledges new data if ack_num > self.snd_una: # New data acknowledged self.snd_una = ack_num # Update RTO estimate if applicable (Karn's algorithm handled elsewhere) if not segment.was_retransmitted: sample_rtt = self._get_rtt_for_segment(segment) self.rto_calculator.on_measurement(sample_rtt) # Rule 5.2: All data acknowledged? if self.snd_una >= self.snd_nxt: self._stop_timer() # Rule 5.3: Still have outstanding data? Restart timer elif self.snd_una < self.snd_nxt: self._restart_timer() def handle_timeout(self): """Called when the retransmission timer expires.""" # This is a genuine timeout (not a spurious one) # RFC 6298 Section 5.5: Backoff RTO self.rto_calculator.backoff() # RFC 6298 Section 5.4: Retransmit earliest unacknowledged segment self._retransmit_segment(self.snd_una) # RFC 6298 Section 5.6: Restart timer with backed-off RTO self._start_timer() def _start_timer(self): """Start the retransmission timer.""" self.timer_running = True rto = self.rto_calculator.get_rto() self.timer_expiry = current_time() + rto print(f"Timer started: expires in {rto}ms") def _stop_timer(self): """Stop the retransmission timer.""" self.timer_running = False self.timer_expiry = None print("Timer stopped: all data acknowledged") def _restart_timer(self): """Restart the timer with current RTO.""" rto = self.rto_calculator.get_rto() self.timer_expiry = current_time() + rto print(f"Timer restarted: expires in {rto}ms")Restarting the timer when new data is acknowledged prevents a subtle problem:
By restarting on each progressive ACK, we ensure the timer tracks the actual oldest unacknowledged data.
RFC 6298 recommends a single retransmission timer per connection, not per-segment timers. This simplifies implementation and aligns with the cumulative ACK nature of TCP. The timer always tracks the oldest unacknowledged segment; when it's acknowledged, the timer restarts for the next oldest.
Let's bring everything together into a complete, RFC 6298-compliant implementation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160
"""Complete RFC 6298 RTO Implementation This implementation includes:- Jacobson's algorithm for SRTT/RTTVAR estimation- Karn's algorithm for retransmission handling- All bounds and constraints from RFC 6298- Timer management per RFC 6298 Section 5""" from dataclasses import dataclassfrom typing import Optionalimport time @dataclassclass RTOState: """Complete RTO calculator state.""" srtt: Optional[float] = None # Smoothed RTT (ms) rttvar: Optional[float] = None # RTT Variance (ms) rto: float = 1000 # Retransmission timeout (ms) # Constants (RFC 6298) ALPHA: float = 1/8 BETA: float = 1/4 K: float = 4 G: float = 1 # Clock granularity (ms) MIN_RTO: float = 1000 # 1 second MAX_RTO: float = 60000 # 60 seconds def on_first_rtt(self, R: float) -> float: """Handle first RTT measurement. Section 2.2.""" self.srtt = R self.rttvar = R / 2 self._update_rto() return self.rto def on_rtt_measurement(self, R: float) -> float: """Handle subsequent RTT measurement. Section 2.3.""" if self.srtt is None: return self.on_first_rtt(R) # IMPORTANT: Calculate err using OLD SRTT err = R - self.srtt # Update RTTVAR first (uses old SRTT) self.rttvar = (1 - self.BETA) * self.rttvar + self.BETA * abs(err) # Update SRTT self.srtt = (1 - self.ALPHA) * self.srtt + self.ALPHA * R self._update_rto() return self.rto def on_timeout(self) -> float: """Handle retransmission timeout. Section 5.5.""" # Exponential backoff self.rto = min(self.rto * 2, self.MAX_RTO) return self.rto def _update_rto(self): """Compute RTO from current SRTT and RTTVAR.""" # RTO = SRTT + max(G, K * RTTVAR) variance_term = max(self.G, self.K * self.rttvar) self.rto = self.srtt + variance_term # Apply bounds self.rto = max(self.rto, self.MIN_RTO) self.rto = min(self.rto, self.MAX_RTO) class TCPConnectionRTO: """ Complete TCP connection RTO management. Includes segment tracking and Karn's algorithm. """ def __init__(self): self.state = RTOState() # Segment tracking: seq_num -> (send_time, was_retransmitted) self.pending = {} # Timer state self.timer_expiry: Optional[float] = None def segment_sent(self, seq_num: int, is_retransmit: bool = False): """Record segment transmission.""" now = time.time() * 1000 # Current time in ms if seq_num in self.pending: # Mark as retransmitted (for Karn's algorithm) self.pending[seq_num] = (self.pending[seq_num][0], True) else: self.pending[seq_num] = (now, is_retransmit) # Start/restart timer if self.timer_expiry is None: self.timer_expiry = now + self.state.rto def ack_received(self, ack_num: int): """Process acknowledgment.""" now = time.time() * 1000 # Find acknowledged segments acked_seqs = [s for s in self.pending.keys() if s < ack_num] for seq in acked_seqs: send_time, was_retransmitted = self.pending.pop(seq) # Karn's Rule 1: Only use clean samples if not was_retransmitted: sample_rtt = now - send_time self.state.on_rtt_measurement(sample_rtt) # Timer management if not self.pending: # All acknowledged: stop timer self.timer_expiry = None else: # Restart timer for remaining data self.timer_expiry = now + self.state.rto def check_timeout(self) -> bool: """Check if timeout has occurred.""" if self.timer_expiry is None: return False now = time.time() * 1000 if now >= self.timer_expiry: # Timeout occurred # Karn's Rule 2: Back off self.state.on_timeout() return True return False def get_current_rto(self) -> float: return self.state.rto # === Demonstration ===if __name__ == "__main__": conn = TCPConnectionRTO() print("=== RFC 6298 RTO Calculation Demo ===\n") # Simulate connection establishment print("Connection established, first RTT measurement: 100ms") conn.state.on_first_rtt(100) print(f" SRTT={conn.state.srtt}ms, RTTVAR={conn.state.rttvar}ms, RTO={conn.state.rto}ms") print(f" (Initial RTO = 3 × first RTT = 300ms, bounded to min 1000ms)\n") # Simulate some data transfer measurements = [95, 105, 98, 102, 100, 97, 103] print("Subsequent RTT measurements:") for i, rtt in enumerate(measurements, 2): conn.state.on_rtt_measurement(rtt) print(f" Sample {i}: R={rtt}ms -> SRTT={conn.state.srtt:.1f}, RTTVAR={conn.state.rttvar:.1f}, RTO={conn.state.rto:.1f}") print("\n(Note: RTO stays at minimum 1000ms even though calculated value is lower)")Real TCP implementations add additional complexity: handling out-of-order segments, SACK-based retransmission, multiple segment tracking, and integration with congestion control. This implementation captures the core RFC 6298 algorithm; production stacks build significant infrastructure around it.
We've now covered the complete RTO calculation as specified in RFC 6298. Let's consolidate the key takeaways:
What's next:
We've seen how RTO is calculated and how timeouts trigger backoff. But what happens after a timeout? The next page explores Exponential Backoff in depth—the mechanism that progressively increases RTO after repeated timeouts, preventing network overload during congestion events.
You now understand the complete RFC 6298 RTO calculation algorithm—initialization, updates, bounds, timer management, and implementation details. This is the culmination of RTT estimation, Jacobson's algorithm, and Karn's algorithm into a practical, standardized procedure. Next, we'll dive deep into exponential backoff behavior.