Loading content...
Consider a fundamental problem in measurement: you're timing how long it takes for a message to reach its destination and for a reply to return. Simple enough. But what if the message gets lost and you must send it again? When the reply finally arrives, how do you know which message it's responding to—the original or the retransmission?
This is the retransmission ambiguity problem, and it plagued early TCP implementations. Worse, naively using RTT samples from retransmitted segments led to a pernicious failure mode: the RTT estimator would become corrupted, causing either massive over-timeouts or catastrophic under-timeouts that triggered retransmission storms.
Phil Karn, in collaboration with Craig Partridge, published an elegant solution in 1987. Karn's algorithm addresses this problem with a simple but powerful insight: if you can't make a clean measurement, don't make the measurement at all.
By the end of this page, you will understand the retransmission ambiguity problem in depth, why incorrect handling can corrupt RTT estimates, Karn's two-part solution (don't measure + back off), how Karn's algorithm complements Jacobson's algorithm, and how modern TCP timestamps eliminate the ambiguity entirely.
We introduced this problem briefly in Page 1, but let's examine it thoroughly. The problem occurs in any scenario where:
The fundamental question: Which transmission does the ACK acknowledge?
Scenario A: Original was delivered, ACK acknowledges original
Scenario B: Original was lost, ACK acknowledges retransmission
The critical problem: Standard TCP acknowledgments carry no information to distinguish these scenarios. The ACK simply says "I've received all bytes up to X." It doesn't say "I received them from the transmission you sent at time Y."
The problem isn't merely that we don't know the true RTT. Either interpretation we choose has a 50% chance of being wrong, and the wrong choice corrupts our RTT estimator. Repeated wrong choices cause the estimator to drift, potentially by orders of magnitude.
Let's trace through what happens when TCP implementations guess incorrectly.
This strategy timestamps RTT from the first transmission:
SampleRTT = T₃ - T₁
| Actual Scenario | Measured RTT | Actual RTT | Effect on Estimator |
|---|---|---|---|
| A (Original delivered) | T₃ - T₁ | T₃ - T₁ | Correct |
| B (Original lost) | T₃ - T₁ (long) | T₃ - T₂ (short) | Inflated: SRTT and RTTVAR increase |
When Scenario B occurs but we use T₁:
Failure mode: "RTO inflation" — Every timeout leads to a larger RTO, creating a positive feedback loop where retransmission becomes increasingly conservative. Throughput drops to near zero on lossy links.
This strategy timestamps RTT from the retransmission:
SampleRTT = T₃ - T₂
| Actual Scenario | Measured RTT | Actual RTT | Effect on Estimator |
|---|---|---|---|
| A (Original delivered) | T₃ - T₂ (short) | T₃ - T₁ (long) | Deflated: SRTT and RTTVAR decrease |
| B (Original lost) | T₃ - T₂ | T₃ - T₂ | Correct |
When Scenario A occurs but we use T₂:
Failure mode: "Retransmission storm" — Deflated RTO causes spurious retransmissions, which cause more ambiguous samples, which further deflate RTO. This creates a positive feedback loop that floods the network with duplicate packets, worsening congestion.
Neither guessing strategy is safe. One leads to unbounded RTO growth (connection stalls). The other leads to RTO collapse and retransmission storms (congestion amplification). The only safe approach is to not use ambiguous samples at all.
Phil Karn and Craig Partridge published their solution in 1987 in a paper titled "Improving Round-Trip Time Estimates in Reliable Transport Protocols." The algorithm consists of two synergistic rules:
When a segment is retransmitted, do not use the resulting ACK to update the RTT estimator (SRTT and RTTVAR).
This rule is elegantly simple: if we can't be sure which transmission the ACK corresponds to, we simply don't use it. We wait for an unambiguous measurement—one where the segment was not retransmitted.
When a timeout occurs and triggers retransmission, back off the RTO (typically doubling it) and keep it at that backed-off value until a successful (non-retransmitted) transmission completes.
This rule is essential because Rule 1 alone would be dangerous. Consider:
The solution: exponential backoff. Each timeout doubles the RTO. This backed-off value is maintained until we get a clean sample from a non-retransmitted segment.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
class TCPConnection: def __init__(self): # RTT estimator state self.srtt = None # Smoothed RTT self.rttvar = None # RTT Variance self.rto = 1000 # Initial RTO: 1 second (ms) # Segment tracking self.pending_segments = {} # seq_num -> (send_time, retransmitted_flag) self.num_retransmits = {} # seq_num -> count def send_segment(self, segment): """Record send time and track if this is a retransmission.""" seq_num = segment.sequence_number send_time = current_timestamp() if seq_num in self.pending_segments: # This is a retransmission old_send_time, _ = self.pending_segments[seq_num] self.pending_segments[seq_num] = (old_send_time, True) # Mark as retransmitted self.num_retransmits[seq_num] = self.num_retransmits.get(seq_num, 0) + 1 else: # First transmission self.pending_segments[seq_num] = (send_time, False) self.transmit(segment) def receive_ack(self, ack): """Process ACK and potentially update RTT estimate.""" ack_num = ack.acknowledgment_number receive_time = current_timestamp() for seq_num in list(self.pending_segments.keys()): if seq_num < ack_num: send_time, was_retransmitted = self.pending_segments.pop(seq_num) # KARN'S RULE 1: Only update estimator if NOT retransmitted if not was_retransmitted: sample_rtt = receive_time - send_time self.update_rtt_estimator(sample_rtt) # KARN'S RULE 2 (reset): Successful non-retransmit # means we can now trust our estimates again # RTO is recalculated from the estimator, not held at backed-off value else: # Retransmitted segment acknowledged # DO NOT update SRTT/RTTVAR # Keep the backed-off RTO pass # Clean up retransmit count if seq_num in self.num_retransmits: del self.num_retransmits[seq_num] def handle_timeout(self, seq_num): """Handle retransmission timeout.""" # KARN'S RULE 2: Exponential backoff self.rto = min(self.rto * 2, MAX_RTO) # Double RTO, up to maximum # Retransmit the segment (will be marked as retransmitted) segment = self.get_segment(seq_num) self.send_segment(segment) def update_rtt_estimator(self, sample_rtt): """Jacobson's algorithm for updating SRTT/RTTVAR.""" if self.srtt is None: # First sample self.srtt = sample_rtt self.rttvar = sample_rtt / 2 else: # Subsequent samples err = sample_rtt - self.srtt self.srtt = self.srtt + (err / 8) self.rttvar = self.rttvar + (abs(err) - self.rttvar) / 4 # Recalculate RTO self.rto = self.srtt + 4 * self.rttvar self.rto = max(self.rto, MIN_RTO) # Apply minimum boundKarn's algorithm doesn't try to guess which scenario occurred. It accepts that ambiguous measurements are worse than no measurements. By combining "don't measure" with "back off," it preserves the integrity of the RTT estimator while still adapting to worsening network conditions through the backoff mechanism.
It's tempting to simplify Karn's algorithm to just one of the rules. Let's see why both are essential:
If we still update SRTT/RTTVAR using ambiguous samples but apply backoff:
Result: The estimator corruption problem remains; backoff just delays its effects.
If we refuse ambiguous samples but don't back off:
| Event | RTO | Actual RTT | Outcome |
|---|---|---|---|
| Initial state | 200ms | 100ms | Working fine |
| Network degrades | 200ms | 500ms | Timeout (expected) |
| Retransmit | 200ms (unchanged) | 500ms | Segment sent; don't measure |
| ACK arrives | 200ms (no update) | 500ms | Timeout again on next segment |
| Retransmit again | 200ms (still) | 500ms | Stuck in perpetual timeout loop |
Without backoff, if the network genuinely slows down and causes a timeout, we can't adapt. We refuse to update from the resulting sample (correctly, due to ambiguity), but we also don't increase RTO. Every subsequent segment times out.
Result: The connection becomes stuck in a perpetual timeout-retransmit loop, never adapting to the new network conditions.
Together, the rules form a complete solution: Rule 1 protects the long-term estimator from corruption, while Rule 2 provides short-term adaptation to worsening conditions. When conditions stabilize, clean samples eventually arrive, the estimator updates correctly, and the backed-off RTO is replaced with an accurate calculation.
Karn's algorithm and Jacobson's algorithm are designed to work together as complementary pieces of the RTO calculation:
| Algorithm | Responsibility |
|---|---|
| Jacobson's | How to update SRTT and RTTVAR given an RTT sample; how to calculate RTO from these estimates |
| Karn's | Which samples to feed into Jacobson's algorithm; how to adjust RTO when samples are unavailable |
Neither algorithm is complete without the other:
Combining both algorithms gives us the complete TCP RTO calculation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
"""Complete TCP RTO Calculation: Jacobson + Karn This implementation follows RFC 6298 which incorporates both algorithms.""" class TCPTimer: # Constants (from RFC 6298) ALPHA = 1/8 # SRTT smoothing factor BETA = 1/4 # RTTVAR smoothing factor K = 4 # Variance multiplier MIN_RTO = 1000 # 1 second minimum (ms) MAX_RTO = 60000 # 60 second maximum (ms) def __init__(self): self.srtt = None self.rttvar = None self.rto = self.MIN_RTO # Initial RTO self._rto_backed_off = False def process_ack(self, sample_rtt: float, was_retransmitted: bool): """ Process an acknowledgment and potentially update RTO. Args: sample_rtt: Measured RTT for this sample (ms) was_retransmitted: True if the segment was retransmitted """ # KARN'S RULE 1: Don't use retransmitted samples if was_retransmitted: # Do not update SRTT/RTTVAR # Keep current (possibly backed-off) RTO return # Unambiguous sample: apply Jacobson's algorithm if self.srtt is None: # First measurement (RFC 6298 Section 2.2) self.srtt = sample_rtt self.rttvar = sample_rtt / 2 else: # Subsequent measurements (RFC 6298 Section 2.3) # Note: Use SRTT from before update for RTTVAR calculation err = sample_rtt - self.srtt # Update RTTVAR first (uses old SRTT) self.rttvar = (1 - self.BETA) * self.rttvar + self.BETA * abs(err) # Update SRTT self.srtt = (1 - self.ALPHA) * self.srtt + self.ALPHA * sample_rtt # Calculate new RTO self.rto = self.srtt + self.K * self.rttvar # Apply bounds self.rto = max(self.rto, self.MIN_RTO) self.rto = min(self.rto, self.MAX_RTO) # Clear backoff flag - we have a fresh measurement self._rto_backed_off = False def handle_timeout(self): """ Handle a retransmission timeout. KARN'S RULE 2: Back off the RTO. """ # Exponential backoff: double the RTO self.rto = min(self.rto * 2, self.MAX_RTO) self._rto_backed_off = True # The backed-off RTO is used until a successful # non-retransmitted segment is acknowledged def get_rto(self) -> float: """Get the current RTO value for segment transmission.""" return self.rto # Example: Simulating TCP behaviortimer = TCPTimer() # First segment sent and acknowledged (RTT = 100ms)timer.process_ack(sample_rtt=100, was_retransmitted=False)print(f"After first sample: RTO = {timer.rto:.1f}ms")# RTO = 100 + 4*50 = 300ms # Second segment acknowledged (RTT = 120ms)timer.process_ack(sample_rtt=120, was_retransmitted=False)print(f"After second sample: RTO = {timer.rto:.1f}ms") # Timeout occurs on third segmenttimer.handle_timeout()print(f"After timeout (backoff): RTO = {timer.rto:.1f}ms") # Retransmission acknowledged - don't update estimatortimer.process_ack(sample_rtt=150, was_retransmitted=True)print(f"After retransmit ACK (no update): RTO = {timer.rto:.1f}ms") # Next segment succeeds without retransmissiontimer.process_ack(sample_rtt=110, was_retransmitted=False)print(f"After clean sample: RTO = {timer.rto:.1f}ms")RFC 6298 (which obsoletes RFC 2988) provides the standard specification for TCP RTO computation. It explicitly incorporates both Jacobson's variance-based estimation and Karn's handling of retransmission ambiguity. Implementations should follow RFC 6298 for interoperability.
Karn's algorithm solves the ambiguity problem by refusing to measure ambiguous samples. But what if we could eliminate the ambiguity entirely? That's what TCP timestamps (RFC 7323) accomplish.
The TCP Timestamps option adds two 32-bit fields to the TCP header:
When a sender transmits a segment, it includes its current timestamp in TSval. When the receiver acknowledges, it echoes that timestamp back in TSecr.
With timestamps, the sender knows exactly which transmission the ACK corresponds to:
Even though the segment was retransmitted, we can now safely use the RTT sample because we know which transmission it corresponds to.
With timestamps, Karn's Rule 1 becomes less critical—we can safely use samples from retransmitted segments. However, Karn's Rule 2 (exponential backoff) remains important:
Modern TCPs with timestamp support thus:
TCP timestamps are widely supported and often enabled by default in modern operating systems. They add 12 bytes to each TCP header (10 bytes option + 2 bytes padding for alignment), but the benefits for RTT measurement and PAWS protection far outweigh this modest overhead in most scenarios.
Implementing Karn's algorithm correctly requires attention to several practical details:
If a segment is retransmitted multiple times before an ACK arrives:
T₁: Send segment (seq=1000)
T₂: Timeout → retransmit
T₃: Timeout → retransmit again
T₄: ACK arrives
Without timestamps, all three transmissions are ambiguous. The segment is marked as retransmitted, and no sample is taken.
With timestamps, the TSecr in the ACK indicates which transmission was received. We can calculate RTT relative to that specific transmission.
Backoff should be bounded to prevent RTO from growing unboundedly:
12345678910111213141516171819202122
# Typical backoff implementation with limits MAX_RTO = 60000 # 60 seconds maximumMAX_RETRIES = 15 # Maximum retransmission attempts def handle_timeout(self, retransmit_count): """Apply exponential backoff with limits.""" # Check if we've exceeded maximum retries if retransmit_count >= MAX_RETRIES: # Connection presumed dead, abort self.abort_connection() return # Apply exponential backoff # Note: each timeout doubles RTO from its CURRENT value # not from the original calculated value self.rto = min(self.rto * 2, MAX_RTO) # Some implementations also track backoff depth # for diagnostic purposes self.backoff_depth = retransmit_countConsider:
In this case:
Modern implementations track per-segment retransmission status to maximize usable samples while still protecting the estimator.
If a timeout was spurious (ACK was just delayed, not lost), the estimator should ideally not be penalized. Some TCP extensions (like F-RTO, RFC 5682) attempt to detect spurious timeouts and undo the backoff. However, the basic Karn's algorithm doesn't distinguish spurious from genuine timeouts—it backs off either way, which is conservative but safe.
Different TCP stacks implement variations on Karn's algorithm. Some are more aggressive in keeping the backed-off RTO; others more quickly return to the calculated value. When diagnosing network performance issues, consulting the specific implementation's behavior is important.
Karn's algorithm embodies a profound engineering principle: when you can't make a clean measurement, it's better to not measure than to measure incorrectly. Let's consolidate the key takeaways:
What's next:
We've now covered how to measure RTT (Page 1), how to compute smooth estimates and variance (Page 2 - Jacobson's algorithm), and which samples to use (this page - Karn's algorithm). The next page brings it all together: RTO Calculation—the complete formula for the Retransmission Timeout, including all the bounds, adjustments, and practical considerations specified in RFC 6298.
You now understand Karn's algorithm for handling retransmission ambiguity—the problem it solves, why both rules are necessary, how it interacts with Jacobson's algorithm, and how TCP timestamps provide a modern solution. Next, we'll see the complete RTO calculation as specified in RFC 6298.