Loading learning content...
In the realm of computer networking, reliability is not a given—it is an achievement. The Internet Protocol (IP) layer beneath TCP makes no guarantees about packet delivery. Packets can be lost due to router buffer overflow, corrupted by transmission errors, duplicated by network anomalies, or simply vanish into the void of a congested network path. Yet TCP promises reliable, in-order delivery to applications. How does it keep this promise?
The answer lies in one of TCP's most critical mechanisms: the retransmission timer. This timer is the sentinel that watches over every segment TCP sends, ready to trigger retransmission if acknowledgment doesn't arrive in time. Understanding this timer is essential to understanding TCP's reliability guarantee—and to diagnosing and optimizing TCP performance in production systems.
By the end of this page, you will understand: why retransmission timers are necessary, how TCP determines the optimal timeout value, the mathematical foundations of RTT estimation and RTO calculation, the mechanisms that handle retransmission, and the real-world implications of timer misconfiguration. You'll gain the expertise to diagnose retransmission-related performance issues and understand the tradeoffs inherent in TCP's design.
To understand why retransmission timers are essential, we must first understand the problem TCP is solving. Consider what happens when a host sends a segment across the Internet:
Scenario 1: Segment Loss Host A sends a segment to Host B. The segment traverses multiple routers, but one router's buffer overflows due to congestion. The segment is dropped. Host B never receives it and therefore never sends an acknowledgment. Without a mechanism to detect this loss, Host A would wait forever.
Scenario 2: ACK Loss Host A sends a segment to Host B. Host B receives it correctly and sends an acknowledgment. However, this ACK packet is lost in transit. From Host A's perspective, this is indistinguishable from Scenario 1—it sent a segment and received no acknowledgment.
Scenario 3: Excessive Delay Host A sends a segment. Due to network congestion, routing changes, or link problems, the segment or its acknowledgment is severely delayed (but not lost). Host A must decide: is this packet lost and needs retransmission, or is it just delayed?
The retransmission timer provides TCP's answer to all three scenarios: if an acknowledgment isn't received within a timeout period, assume the segment was lost and retransmit it.
The retransmission timer embodies a critical tradeoff. If the timeout is too short, TCP will retransmit segments unnecessarily (the acknowledgment was just delayed), wasting bandwidth and potentially worsening congestion. If the timeout is too long, TCP will wait too long before recovering from actual loss, reducing throughput. Finding the optimal timeout value is one of TCP's most delicate balancing acts.
| RTO Characteristic | Consequence | Performance Impact |
|---|---|---|
| Too Short | Spurious retransmissions (retransmitting data that isn't lost) | Wasted bandwidth, potential congestion amplification, unfair use of network resources |
| Too Long | Delayed recovery from actual packet loss | Reduced throughput, increased latency for applications, poor user experience |
| Well-Tuned | Retransmit only when necessary, with minimal delay | Optimal throughput, efficient bandwidth utilization, fair network sharing |
| Static (not adaptive) | Appropriate for some paths, wrong for others | Inconsistent performance across different network conditions |
| Adaptive (dynamic) | Adjusts to actual network conditions | Consistent performance as network conditions change |
The retransmission timeout (RTO) must be based on the actual time it takes for a segment to reach its destination and for the acknowledgment to return. This time is called the Round-Trip Time (RTT). But RTT is not a fixed value—it varies continuously based on:
Because RTT varies, TCP cannot use a single fixed timeout value. Instead, it must measure RTT and adapt the timeout accordingly. This is the essence of TCP's dynamic timeout calculation.
TCP measures RTT by recording the time when a segment is sent and noting when its acknowledgment arrives. However, this measurement has complexities: if a segment is retransmitted, TCP cannot determine if the received ACK is for the original or retransmitted segment (Karn's problem). We'll address this later when discussing Karn's algorithm.
RTT Measurement Visualization:
Consider a TCP connection between New York and London:
Time 0ms: Host A sends segment (Seq=1000)
└─── Segment begins traversing the network ───┐
│
[Router 1] → [Router 2] → ... → [Router N]
│
Time 35ms: Host B receives segment ◄
└─── Host B generates ACK (Ack=1500) ───┐
│
Time 70ms: Host A receives ACK ◄
└─── Measured RTT = 70ms
In this example, the measured RTT is 70ms. But the next RTT sample might be 65ms, 80ms, or 120ms depending on current network conditions. TCP must smooth these measurements to produce a stable timeout value.
Given the inherent variability of RTT, TCP cannot simply use the most recent RTT measurement as the timeout value. A single high RTT sample (perhaps due to temporary congestion) would set an overly conservative timeout, while a single low sample would set an aggressive timeout that triggers spurious retransmissions.
TCP addresses this through Exponential Weighted Moving Average (EWMA), which smooths RTT measurements over time. The key insight is that recent measurements should carry more weight than older ones, but the estimate should not react too aggressively to any single sample.
The Original TCP Specification (RFC 793) Approach:
The original TCP specification used a simple smoothed RTT (SRTT) calculation:
SRTT = α × SRTT + (1 - α) × RTT_sample
Where α is typically 0.875 (7/8), meaning the new estimate is 87.5% based on the previous estimate and 12.5% on the new sample. This creates a smooth, stable estimate that gradually adapts to changing conditions.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
# Original TCP SRTT calculation (RFC 793 style)def update_srtt_original(srtt: float, rtt_sample: float, alpha: float = 0.875) -> float: """ Update the Smoothed Round-Trip Time using EWMA. Args: srtt: Current smoothed RTT estimate (in ms) rtt_sample: New RTT measurement (in ms) alpha: Smoothing factor (typically 0.875 = 7/8) Returns: Updated SRTT estimate Example: >>> srtt = 100 # Initial estimate: 100ms >>> srtt = update_srtt_original(srtt, 80) # New sample: 80ms >>> print(f"New SRTT: {srtt:.2f}ms") # SRTT moves slightly toward 80 New SRTT: 97.50ms >>> srtt = update_srtt_original(srtt, 120) # New sample: 120ms >>> print(f"New SRTT: {srtt:.2f}ms") # SRTT moves toward 120 New SRTT: 100.31ms """ return alpha * srtt + (1 - alpha) * rtt_sample # Simulation: How SRTT evolves over timedef simulate_srtt_evolution(): """Demonstrate SRTT smoothing over varying RTT samples.""" import random # Initial SRTT estimate srtt = 100.0 # 100ms initial estimate # Simulate RTT samples with some variation and a spike samples = [95, 102, 98, 105, 97, 150, 103, 99, 101, 98] print("RTT Measurement Smoothing Simulation") print("=" * 50) print(f"Initial SRTT estimate: {srtt:.2f}ms") print(f"Smoothing factor (α): 0.875 (7/8)") print() for i, sample in enumerate(samples): old_srtt = srtt srtt = update_srtt_original(srtt, sample) delta = srtt - old_srtt print(f"Sample {i+1}: RTT={sample:3d}ms → SRTT: {old_srtt:.2f} → {srtt:.2f}ms (Δ={delta:+.2f})") print() print("Notice how the spike (150ms) affects SRTT only gradually,") print("preventing overreaction to temporary conditions.") if __name__ == "__main__": simulate_srtt_evolution()Why Smoothing Matters:
Consider what happens without smoothing. If TCP simply used the last RTT sample as its timeout:
With EWMA smoothing (α = 0.875):
The smoothed estimate remains stable despite individual sample variations.
The original SRTT approach had a critical flaw: it didn't account for RTT variance. Consider two scenarios:
Scenario A: Low-Variance Network
Scenario B: High-Variance Network
Both scenarios have the same average RTT, but they require very different timeout values. In Scenario A, an RTO of 110ms would almost never trigger spuriously. In Scenario B, the same RTO would cause many spurious retransmissions.
In 1988, Van Jacobson published his seminal paper addressing this problem. Jacobson's algorithm computes both the smoothed RTT and the RTT deviation (variance), using both to calculate a more appropriate RTO.
The timeout should be set high enough to accommodate normal RTT variance but not so high that it delays loss recovery. By tracking RTT deviation, TCP can set aggressive timeouts when the network is stable and conservative timeouts when the network is variable—automatically adapting to current conditions.
Jacobson's Algorithm (RFC 6298):
The algorithm maintains two state variables:
On the first RTT measurement:
SRTT = R (the first RTT sample)
RTTVAR = R / 2
RTO = SRTT + max(G, K × RTTVAR)
Where G is the clock granularity and K = 4.
On subsequent RTT measurements:
RTTVAR = (1 - β) × RTTVAR + β × |SRTT - R|
SRTT = (1 - α) × SRTT + α × R
RTO = SRTT + max(G, K × RTTVAR)
Where α = 1/8, β = 1/4, and K = 4.
Why K = 4?
For a normal distribution, ~99.99% of samples fall within μ ± 4σ. Using K = 4 means the timeout should accommodate approximately 99.99% of legitimate delays, making spurious retransmissions extremely rare while still detecting actual loss promptly.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
class JacobsonRTOEstimator: """ Implements Jacobson's algorithm for RTO estimation (RFC 6298). This is the algorithm used by modern TCP implementations to calculate the retransmission timeout based on RTT measurements. """ def __init__(self, clock_granularity: float = 1.0, min_rto: float = 1000.0): """ Initialize the RTO estimator. Args: clock_granularity: Timer granularity in ms (G in RFC 6298) min_rto: Minimum RTO value in ms (RFC 6298 recommends 1 second) """ self.srtt: float | None = None # Smoothed RTT self.rttvar: float | None = None # RTT variance self.rto: float = 1000.0 # Current RTO (default 1 second) self.clock_granularity = clock_granularity self.min_rto = min_rto # Constants from RFC 6298 self.alpha = 1/8 # SRTT smoothing factor self.beta = 1/4 # RTTVAR smoothing factor self.K = 4 # Variance multiplier def update(self, rtt_sample: float) -> float: """ Update RTO estimate with a new RTT measurement. Args: rtt_sample: A new RTT measurement in milliseconds Returns: The updated RTO value in milliseconds """ if self.srtt is None: # First measurement - initialize state self.srtt = rtt_sample self.rttvar = rtt_sample / 2 else: # Subsequent measurements - apply Jacobson's algorithm # Note: RTTVAR must be updated BEFORE SRTT (uses old SRTT value) # Calculate absolute deviation from current estimate deviation = abs(self.srtt - rtt_sample) # Update RTTVAR: exponentially weighted deviation self.rttvar = (1 - self.beta) * self.rttvar + self.beta * deviation # Update SRTT: exponentially weighted average self.srtt = (1 - self.alpha) * self.srtt + self.alpha * rtt_sample # Calculate new RTO # RTO = SRTT + max(G, K * RTTVAR) self.rto = self.srtt + max(self.clock_granularity, self.K * self.rttvar) # Apply minimum RTO (RFC 6298: SHOULD be 1 second) self.rto = max(self.rto, self.min_rto) return self.rto def get_state(self) -> dict: """Return current estimator state for debugging/monitoring.""" return { "srtt": self.srtt, "rttvar": self.rttvar, "rto": self.rto, } def demonstrate_jacobson(): """Demonstrate Jacobson's algorithm with real-world scenarios.""" print("=" * 70) print("Jacobson's Algorithm Demonstration") print("=" * 70) # Scenario 1: Stable network print("\n📊 SCENARIO 1: Stable Low-Latency Network") print("-" * 50) estimator = JacobsonRTOEstimator(min_rto=200) # Lower min for demo stable_samples = [100, 102, 99, 101, 98, 103, 100, 97, 102, 99] for i, sample in enumerate(stable_samples): rto = estimator.update(sample) state = estimator.get_state() print(f"Sample {i+1:2d}: RTT={sample:3d}ms | " f"SRTT={state['srtt']:.1f}ms | " f"RTTVAR={state['rttvar']:.1f}ms | " f"RTO={rto:.1f}ms") print(f"\n✓ Final RTO for stable network: {estimator.rto:.1f}ms") print(" Notice: Low RTTVAR leads to tight RTO close to SRTT") # Scenario 2: Variable network print("\n📊 SCENARIO 2: High-Variance Network") print("-" * 50) estimator = JacobsonRTOEstimator(min_rto=200) variable_samples = [100, 150, 80, 130, 60, 140, 90, 120, 70, 110] for i, sample in enumerate(variable_samples): rto = estimator.update(sample) state = estimator.get_state() print(f"Sample {i+1:2d}: RTT={sample:3d}ms | " f"SRTT={state['srtt']:.1f}ms | " f"RTTVAR={state['rttvar']:.1f}ms | " f"RTO={rto:.1f}ms") print(f"\n✓ Final RTO for variable network: {estimator.rto:.1f}ms") print(" Notice: High RTTVAR leads to more conservative RTO") # Scenario 3: Network with sudden congestion print("\n📊 SCENARIO 3: Sudden Congestion Event") print("-" * 50) estimator = JacobsonRTOEstimator(min_rto=200) # Start stable, then congestion spike, then recovery congestion_samples = [100, 98, 102, 99, 250, 300, 280, 150, 110, 102] for i, sample in enumerate(congestion_samples): rto = estimator.update(sample) state = estimator.get_state() marker = " ← congestion!" if 250 <= sample <= 300 else "" print(f"Sample {i+1:2d}: RTT={sample:3d}ms | " f"SRTT={state['srtt']:.1f}ms | " f"RTTVAR={state['rttvar']:.1f}ms | " f"RTO={rto:.1f}ms{marker}") print(f"\n✓ Algorithm adapts to congestion and recovers gradually") if __name__ == "__main__": demonstrate_jacobson()The complete RTO calculation, as specified in RFC 6298, involves several rules and boundary conditions that ensure robust behavior:
Rule 1: Initial RTO
Before any RTT measurements are taken (such as when opening a new connection), the RTO should be set to a conservative default value. RFC 6298 recommends 1 second, though some implementations use 3 seconds or allow configuration.
Rule 2: First RTT Measurement
When the first RTT sample R is collected:
SRTT ← R
RTTVAR ← R / 2
RTO ← SRTT + max(G, K × RTTVAR)
Rule 3: Subsequent Measurements
For each new RTT sample R':
RTTVAR ← (1 - β) × RTTVAR + β × |SRTT - R'|
SRTT ← (1 - α) × SRTT + α × R'
RTO ← SRTT + max(G, K × RTTVAR)
Rule 4: RTO Bounds
The computed RTO must be bounded:
| Constant | Value | Purpose | Rationale |
|---|---|---|---|
| α (alpha) | 1/8 = 0.125 | SRTT smoothing factor | Balances responsiveness vs stability; 1/8 chosen for efficient binary arithmetic |
| β (beta) | 1/4 = 0.25 | RTTVAR smoothing factor | Variance needs faster adaptation than mean; 1/4 provides good tracking |
| K | 4 | Variance multiplier | 4 standard deviations covers 99.99% of normal distribution |
| G | Clock granularity | Timer resolution | Accounts for system timer limitations; typically 1-10ms on modern systems |
| MinRTO | 1 second | Minimum timeout | Prevents retransmission storms in local/fast networks |
| MaxRTO | 60+ seconds | Maximum timeout | Prevents indefinite waiting; allows eventual failure detection |
The RFC 6298 recommendation of a 1-second minimum RTO is controversial. In modern data center networks with RTTs under 1ms, this creates massive inefficiency—if a packet is lost, TCP must wait 1000x the RTT before retransmitting. Some data center TCP variants (like DCTCP) reduce this minimum, but doing so on the Internet could cause congestion collapse due to spurious retransmissions.
Practical Considerations:
Timer Granularity (G):
Historically, Unix systems had timer granularity of 500ms, meaning the minimum measurable time was half a second. Modern systems have granularity in the 1-10ms range, allowing much finer-grained timeouts. The max(G, K × RTTVAR) term ensures that even with zero variance, the timeout includes at least one clock tick.
Integer Arithmetic:
The choice of α = 1/8 and β = 1/4 is deliberate: these allow efficient computation using bit shifts instead of floating-point division:
// Efficient SRTT update using bit shifts
new_srtt = old_srtt + (rtt_sample - old_srtt) >> 3; // Divide by 8
// Efficient RTTVAR update
new_rttvar = old_rttvar + (|deviation| - old_rttvar) >> 2; // Divide by 4
This was crucial for early TCP implementations on limited hardware and remains efficient today.
When a retransmission timeout occurs and TCP retransmits a segment, it doesn't simply use the same RTO for the retransmitted segment. Instead, TCP applies exponential backoff: each successive retransmission doubles the RTO.
Why Exponential Backoff?
If a segment times out, there are two possibilities:
In case 1, the network might be congested. Retransmitting aggressively could worsen congestion. In case 2, the network is slower than estimated—retransmitting at the same rate would cause repeated spurious retransmissions.
Exponential backoff addresses both scenarios by progressively backing off, giving the network time to recover.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
def calculate_backoff_schedule(initial_rto: float, max_retries: int = 6, max_rto: float = 120000.0) -> list[dict]: """ Calculate the exponential backoff schedule for TCP retransmissions. Args: initial_rto: Initial RTO value in milliseconds max_retries: Maximum number of retransmission attempts max_rto: Maximum RTO cap in milliseconds (default 120 seconds) Returns: List of dictionaries with retry info """ schedule = [] current_rto = initial_rto cumulative_time = 0 for retry in range(max_retries + 1): schedule.append({ "attempt": retry, "rto_ms": current_rto, "rto_seconds": current_rto / 1000, "cumulative_ms": cumulative_time, "cumulative_seconds": cumulative_time / 1000, }) cumulative_time += current_rto current_rto = min(current_rto * 2, max_rto) # Double with cap return schedule def demonstrate_backoff(): """Show TCP exponential backoff in action.""" print("TCP Retransmission Exponential Backoff") print("=" * 70) print() # Typical internet RTO scenario initial_rto = 1000 # 1 second (RFC 6298 minimum) schedule = calculate_backoff_schedule(initial_rto, max_retries=6) print("Scenario: Typical Internet connection (RTO starts at 1 second)") print("-" * 70) print(f"{'Attempt':<10} {'RTO':<15} {'Cumulative Wait':<20} {'Status'}") print("-" * 70) for entry in schedule: attempt = entry['attempt'] rto = entry['rto_seconds'] cumulative = entry['cumulative_seconds'] if attempt == 0: status = "Initial transmission" elif attempt <= 2: status = "Probably recoverable" elif attempt <= 4: status = "Significant delay" else: status = "Connection likely dead" print(f"{attempt:<10} {rto:>10.1f}s {cumulative:>10.1f}s {status}") total_time = schedule[-1]['cumulative_seconds'] + schedule[-1]['rto_seconds'] print("-" * 70) print(f"Total time before giving up: {total_time:.0f} seconds ({total_time/60:.1f} minutes)") print() print("Key Observations:") print("• Each retry doubles the timeout (exponential growth)") print("• After 6 retries, total wait time exceeds 2 minutes") print("• This prevents TCP from hammering a failing path") print("• Applications should implement their own timeouts for responsiveness") if __name__ == "__main__": demonstrate_backoff()When TCP successfully receives an acknowledgment, it resets the backoff. The RTO returns to its calculated value based on SRTT and RTTVAR. However, after a retransmission, Karn's algorithm (discussed next) prevents using the ambiguous RTT sample for updating the estimates.
Maximum Retransmissions:
TCP implementations typically limit the number of retransmission attempts. After exhausting retries, TCP gives up and reports a connection failure to the application. Common limits:
Linux, for example, uses tcp_retries1 (typically 3) for warning/soft error and tcp_retries2 (typically 15) for hard failure. The actual timeout depends on the RTO at each step.
There's a subtle but critical problem with measuring RTT when retransmissions occur. Consider this scenario:
Question: What is the RTT?
TCP cannot distinguish which segment the ACK acknowledges. This is the retransmission ambiguity problem.
Karn's Algorithm (1987):
Phil Karn and Craig Partridge proposed a simple but effective solution:
Rule 1: When a timeout occurs and a segment is retransmitted, do not update SRTT or RTTVAR based on any acknowledgment for that segment.
Rule 2: Keep the backed-off RTO (doubled value) in effect until a non-ambiguous acknowledgment is received.
Rule 3: Only update RTT estimates based on segments that were acknowledged on their first transmission.
This ensures that RTT estimates are never contaminated by ambiguous measurements.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169
from dataclasses import dataclassfrom enum import Enumfrom typing import Optional class SegmentState(Enum): SENT = "sent" RETRANSMITTED = "retransmitted" ACKNOWLEDGED = "acknowledged" @dataclassclass TrackedSegment: """Track state of a sent segment for RTT measurement.""" sequence_number: int first_send_time: float retransmit_count: int = 0 state: SegmentState = SegmentState.SENT class TCPRTOManagerWithKarn: """ TCP RTO manager implementing Karn's algorithm. Demonstrates how RTT measurements are handled with proper ambiguity resolution. """ def __init__(self): self.srtt: Optional[float] = None self.rttvar: Optional[float] = None self.rto: float = 1000.0 # Initial 1 second self.backed_off_rto: float = 1000.0 # Current operational RTO # Track outstanding segments self.segments: dict[int, TrackedSegment] = {} # Statistics self.rtt_samples_used = 0 self.rtt_samples_discarded = 0 def segment_sent(self, seq: int, send_time: float): """Record a segment being sent for the first time.""" self.segments[seq] = TrackedSegment( sequence_number=seq, first_send_time=send_time ) def segment_retransmitted(self, seq: int, retransmit_time: float): """Record a segment being retransmitted (timeout occurred).""" if seq in self.segments: seg = self.segments[seq] seg.retransmit_count += 1 seg.state = SegmentState.RETRANSMITTED # Apply exponential backoff to operational RTO self.backed_off_rto = min(self.backed_off_rto * 2, 120000) print(f"⚠️ Segment {seq} retransmitted (attempt #{seg.retransmit_count})") print(f" Backed-off RTO: {self.backed_off_rto:.0f}ms") def ack_received(self, ack: int, recv_time: float) -> Optional[float]: """ Process acknowledgment. Returns RTT if measurement was valid. Implements Karn's algorithm for ambiguity handling. """ # Find the acknowledged segment acked_seqs = [s for s in self.segments if s < ack] if not acked_seqs: return None measured_rtt = None for seq in acked_seqs: seg = self.segments[seq] if seg.retransmit_count > 0: # KARN'S ALGORITHM: Do not use this RTT sample self.rtt_samples_discarded += 1 print(f"📊 Segment {seq}: ACK received but RTT DISCARDED (ambiguous)") print(f" Segment was retransmitted - cannot determine which copy was ACKed") else: # First transmission was acknowledged - safe to measure rtt = recv_time - seg.first_send_time measured_rtt = rtt self._update_rtt_estimates(rtt) self.rtt_samples_used += 1 # KARN's ALGORITHM: Reset backed-off RTO only on clean measurement self.backed_off_rto = self.rto print(f"✓ Segment {seq}: RTT measured = {rtt:.1f}ms (used for estimate)") del self.segments[seq] return measured_rtt def _update_rtt_estimates(self, rtt_sample: float): """Update SRTT/RTTVAR using Jacobson's algorithm.""" if self.srtt is None: self.srtt = rtt_sample self.rttvar = rtt_sample / 2 else: deviation = abs(self.srtt - rtt_sample) self.rttvar = 0.75 * self.rttvar + 0.25 * deviation self.srtt = 0.875 * self.srtt + 0.125 * rtt_sample self.rto = self.srtt + max(1, 4 * self.rttvar) self.rto = max(self.rto, 200) # Minimum 200ms for demo def get_current_rto(self) -> float: """Get the current operational RTO (includes backoff).""" return self.backed_off_rto def print_statistics(self): """Print measurement statistics.""" print(f"\n📈 RTT Measurement Statistics:") print(f" Samples used: {self.rtt_samples_used}") print(f" Samples discarded (Karn): {self.rtt_samples_discarded}") print(f" Current SRTT: {self.srtt:.1f}ms" if self.srtt else " SRTT: Not yet measured") print(f" Current RTO: {self.rto:.1f}ms") def demonstrate_karns_algorithm(): """Demonstrate Karn's algorithm with a realistic scenario.""" print("=" * 70) print("Karn's Algorithm Demonstration") print("=" * 70) print() manager = TCPRTOManagerWithKarn() # Scenario: Mix of successful and retransmitted segments print("Timeline of events:") print("-" * 50) # Segment 1: Successful first transmission print("\nT=0ms: Send Segment 1000") manager.segment_sent(1000, 0) # Segment 2: Will need retransmission print("T=50ms: Send Segment 1500") manager.segment_sent(1500, 50) # ACK for segment 1 arrives print("T=120ms: ACK 1500 received") manager.ack_received(1500, 120) # Segment 2 times out print("T=1100ms: Timeout for Segment 1500") manager.segment_retransmitted(1500, 1100) # Segment 3 sent during backoff period print("T=1200ms: Send Segment 2000") manager.segment_sent(2000, 1200) # ACK for retransmitted segment 2 print("T=1400ms: ACK 2000 received") manager.ack_received(2000, 1400) # ACK for segment 3 (clean measurement) print("T=1350ms: ACK 2500 received") manager.ack_received(2500, 1350) manager.print_statistics() if __name__ == "__main__": demonstrate_karns_algorithm()Understanding how retransmission timers behave in real systems is crucial for diagnosing network performance issues. Let's examine practical considerations and how to observe timer behavior.
Linux TCP Stack Configurables:
The Linux kernel exposes several parameters that affect retransmission behavior:
# View current settings
sysctl net.ipv4.tcp_retries1 # Threshold for "soft" errors (default: 3)
sysctl net.ipv4.tcp_retries2 # Max data retransmissions (default: 15)
sysctl net.ipv4.tcp_syn_retries # SYN retransmissions (default: 6)
sysctl net.ipv4.tcp_synack_retries # SYN-ACK retries (default: 5)
# RTO-related (may not be directly settable on all systems)
# Initial RTO is typically hardcoded or calculated
Observing Retransmissions:
Tools for monitoring retransmission behavior:
# Netstat (shows retransmission statistics)
netstat -s | grep -i retrans
# SS (socket statistics - shows per-connection RTO)
ss -ti dst 192.168.1.1
# Tcpdump/Wireshark (capture retransmissions)
tcpdump -n "tcp[tcpflags] & tcp-syn != 0"
| Symptom | Possible Cause | Diagnostic Approach | Resolution |
|---|---|---|---|
| High retransmission rate (>1%) | Network congestion or link errors | Check router buffer utilization, link error counters | Add capacity, fix faulty hardware, implement QoS |
| Spurious retransmissions | RTO too aggressive or RTT spike | Compare RTO vs actual RTT in captures | Check for competing traffic, verify min RTO settings |
| Very long recovery times | RTO too conservative or continued loss | Monitor RTO progression during outage | Check for persistent path problems, verify backoff logic |
| SYN retransmissions only | Firewall blocking, service down, or SYN flood mitigation | Check server SYN queue, firewall rules | Increase SYN queue, whitelist legitimate sources, tune SYN cookies |
| Connection timeouts | Path is dead or experiencing severe packet loss | Traceroute, check for routing blackholes | Route around problem, contact network operator |
Wireshark's TCP analysis automatically identifies retransmissions and categorizes them. Look for '[TCP Retransmission]' (segment sent again after timeout) and '[TCP Spurious Retransmission]' (original was actually ACKed, we retransmitted unnecessarily). The 'tcp.analysis.retransmission' display filter shows all retransmitted segments.
Performance Implications:
Retransmission timer behavior has profound performance implications:
Data Center Networks: With RTTs of 100μs to 1ms, the 1-second minimum RTO means that a single packet loss causes 1000-10000× the RTT in delay. This is why data center TCP variants like DCTCP and specialized switches with explicit congestion notification (ECN) are important—they prevent loss in the first place.
Wide Area Networks: For intercontinental connections with 100-300ms RTTs, the RTO calculation works well. The adaptive algorithm tracks varying conditions, and the 1-second minimum rarely applies since K × RTTVAR typically exceeds it.
Satellite Links: With RTTs of 600ms+, TCP must wait substantial time for ACKs. Aggressive retransmission would be catastrophic. Performance Enhancing Proxies (PEPs) sometimes terminate TCP connections and use link-optimized protocols over the satellite hop.
Lossy Wireless Links: Random wireless loss (not congestion) triggers TCP retransmission and congestion control, reducing throughput unnecessarily. Link-layer retransmission (e.g., WiFi ARQ) often handles this faster than TCP can.
We've taken a comprehensive journey through TCP's retransmission timer mechanism. Let's consolidate the essential knowledge:
What's Next:
The retransmission timer is just one of several timers that govern TCP's behavior. In the next page, we'll explore the persistence timer, which addresses a completely different problem: preventing deadlock when the receiver's window closes. Together, these timers form a robust system that handles the myriad edge cases of network communication.
You now have a deep understanding of TCP's retransmission timer—from the mathematical foundations of RTT estimation to practical debugging techniques. This knowledge is essential for diagnosing TCP performance issues and understanding how TCP achieves reliable delivery over an unreliable network.