Loading content...
Acknowledgments tell the sender that data arrived successfully. But what happens when the ACK never comes? The sender could wait forever, stuck in eternal limbo—a situation clearly unacceptable for any practical system.
Timeouts solve this dilemma. By setting a timer when each frame is transmitted, the sender establishes a deadline. If the deadline passes without acknowledgment, the sender takes action—typically retransmitting the frame.
This simple concept—"wait, but not forever"—underlies every reliable protocol. Yet the implementation details are surprisingly nuanced:
In this section, we explore all dimensions of timeout and retransmission in Stop-and-Wait ARQ, from theoretical foundations to practical implementation.
By the end of this page, you will understand how to calculate appropriate timeout values, implement robust timer mechanisms, handle retransmission correctly, and avoid the pitfalls that lead to protocol failure or inefficiency.
Consider what happens without timeouts:
Scenario: Silent Failure
The protocol has deadlocked. Neither party can proceed. The sender waits for an ACK that will never come. The receiver waits for a frame that already vanished.
The Fundamental Problem:
In an unreliable channel, silence is ambiguous. When the sender doesn't receive an ACK, it might mean:
Without additional information, the sender cannot distinguish these cases. Timeouts provide a resolution: "I don't know what happened, but I've waited long enough. I'll try again."
Notice that retransmission handles all failure types identically. The sender doesn't need to diagnose the problem—it just retries. This uniform approach simplifies the protocol dramatically while maintaining correctness.
Setting the timeout correctly is one of the most important and subtle aspects of protocol design. Set it too short, and you'll retransmit unnecessarily, wasting bandwidth and potentially causing confusion. Set it too long, and error recovery takes forever.
The Ideal Timeout:
The timeout should be just longer than the maximum expected round-trip time (RTT). This allows legitimate ACKs to arrive while still detecting true failures promptly.
Round-Trip Time Components:
RTT = T_frame + T_prop(forward) + T_process + T_ack + T_prop(return)
Where:
Simplified Formula:
Since T_prop(forward) = T_prop(return) = T_prop, and T_process is typically negligible:
RTT ≈ T_frame + 2 × T_prop + T_ack
And since T_ack is usually much smaller than T_frame:
RTT ≈ T_frame + 2 × T_prop
Example Calculation:
Consider a 1 Mbps link spanning 200 km:
Step 1: Calculate transmission times
T_frame = 1000 bits / 1,000,000 bps = 1 ms
T_ack = 50 bits / 1,000,000 bps = 0.05 ms
Step 2: Calculate propagation delay
T_prop = 200,000 m / (2 × 10⁸ m/s) = 1 ms
Step 3: Calculate RTT
RTT = T_frame + 2 × T_prop + T_ack
RTT = 1 + 2(1) + 0.05 = 3.05 ms
Step 4: Set timeout
Timeout = RTT + Safety_Margin
Timeout = 3.05 + 1.0 = 4.05 ms (rounded to ~5 ms)
The safety margin accounts for variation in processing times, link conditions, and clock precision.
A timeout too short (say, 2 ms in the example) would cause premature retransmission—the ACK is still in transit when the timer fires. A timeout too long (say, 500 ms) would delay error recovery dramatically—half a second wasted waiting when the frame was lost on the first bit.
| Timeout Setting | Consequence | Example |
|---|---|---|
| Too Short | Premature retransmission, duplicates, wasted bandwidth | Timeout = 2ms when RTT = 3ms |
| Too Long | Slow error recovery, poor responsiveness | Timeout = 500ms when RTT = 3ms |
| Optimal | Fast recovery without false positives | Timeout = RTT + small margin |
When the timeout fires, the sender must execute the retransmission process correctly. This involves more than simply re-sending the frame.
Step-by-Step Retransmission:
function on_timeout_expired():
// Step 1: Retrieve the stored frame
frame = stored_frame // Kept since original transmission
// Step 2: Retransmit the frame
transmit(frame)
// Step 3: Restart the timer
start_timer(timeout_value)
// Step 4: Update statistics (optional but useful)
retransmission_count++
// Step 5: Check retry limit (implementation specific)
if retransmission_count > MAX_RETRIES:
signal_failure_to_upper_layer()
abort_transmission()
Critical Points:
Use the stored frame: You must transmit the exact same frame, not generate a new one. The sequence number, the data, everything must match the original.
Restart the timer: After retransmitting, start the timer again. The ACK for the retransmission might also be lost.
Track retries: Infinite retransmission isn't practical. If the channel is permanently broken, eventually give up and report failure upward.
Why Retransmit the Same Frame?
This might seem obvious, but it's worth emphasizing:
The Retry Limit:
In practice, Stop-and-Wait implementations include a maximum retry count:
| System | Typical Retry Limit | After Exceeding |
|---|---|---|
| HDLC | 3-10 retries | Report error to upper layer |
| Modems | 3 retries | Connection failure |
| Custom | Configurable | Application-specific handling |
Exponential Backoff (Advanced):
Some implementations increase the timeout after each retry:
timeout = base_timeout × 2^(retry_count)
This "exponential backoff" helps when network congestion is causing losses—backing off reduces load and allows the network to recover. While not standard in basic Stop-and-Wait, the concept appears in many protocols (Ethernet, TCP).
Pure Stop-and-Wait typically uses a fixed timeout, not exponential backoff. However, understanding backoff is valuable because it appears in CSMA/CD (Ethernet collision handling) and TCP (congestion control). The principle is the same: if something failed, wait longer before trying again.
Implementing timers correctly is crucial for protocol reliability. There are several approaches, each with trade-offs.
Hardware Timer Approach:
Dedicated hardware timer that generates an interrupt when it expires:
Advantages:
- Precise timing
- Low CPU overhead
- Guaranteed interrupt delivery
Disadvantages:
- Limited number of hardware timers
- Hardware-dependent implementation
- Interrupt handling complexity
Software Timer Approach:
Periodic polling or software scheduling:
Advantages:
- Flexible, unlimited virtual timers
- Platform-independent
- Easier debugging
Disadvantages:
- Less precise (depends on polling frequency)
- CPU overhead for timer management
- May miss exact expiration time
Timer State Management:
The timer has discrete states that must be managed carefully:
Timer States:
┌─────────┐ start() ┌─────────┐
│ IDLE │ ───────────────>│ RUNNING │
└─────────┘ └─────────┘
↑ │
│ stop() │
└────────────────────────────┤
│
(timeout)
│
↓
┌─────────────┐
│ EXPIRED │
│ (callback) │
└─────────────┘
Race Conditions to Avoid:
Timer management has subtle race conditions:
ACK arrives just as timer fires: The timer callback might execute after the ACK was received but before it was fully processed. Solution: Disable timer interrupt during ACK processing, or use careful state checking.
Stop called on already-expired timer: If the timer has already fired, stopping it does nothing. Ensure the timeout handler checks whether an ACK has arrived.
Multiple timer starts without stops: Each new frame transmission should stop any existing timer before starting a new one (though in Stop-and-Wait, there's only one frame at a time, so this is less of an issue).
In interrupt-driven implementations, the timeout handler runs as an interrupt service routine while the main code might be processing an ACK. Use mutual exclusion (disable interrupts briefly, or use locks) to prevent concurrent access to shared state like the current sequence number and stored frame.
Pseudocode for Safe Timer Handling:
// Shared state (protected by mutex/interrupt disable)
shared_state = {
timer_armed: false,
awaiting_ack: false,
current_seq: 0,
stored_frame: null
}
function send_frame(data):
disable_interrupts()
frame = create_frame(data, shared_state.current_seq)
shared_state.stored_frame = frame
shared_state.awaiting_ack = true
shared_state.timer_armed = true
transmit(frame)
start_hardware_timer(timeout_value)
enable_interrupts()
function timer_interrupt_handler(): // ISR
disable_interrupts() // Usually automatic in ISR
if shared_state.timer_armed and shared_state.awaiting_ack:
// Legitimate timeout
transmit(shared_state.stored_frame)
start_hardware_timer(timeout_value) // Restart
// else: Timer fired after ACK received, ignore
enable_interrupts()
function ack_received(ack):
disable_interrupts()
if verify_ack(ack) and ack.seq == shared_state.current_seq:
stop_hardware_timer()
shared_state.timer_armed = false
shared_state.awaiting_ack = false
shared_state.current_seq = 1 - shared_state.current_seq
shared_state.stored_frame = null
signal_ready_for_next()
enable_interrupts()
What happens when the timeout fires prematurely—before the ACK could possibly arrive? This is a correctness challenge that Stop-and-Wait must handle gracefully.
Scenario: Premature Timeout
Sender Receiver
| |
|-------- Frame 0 ---------> |
| |
| [Timer starts] |
| [Received] |
| [Send ACK0] |
| |
| [Timer expires - TOO EARLY!] |
| |
|-------- Frame 0 ---------> (retransmit) |
| |
|<--------- ACK0 ----------- (original) |
| |
| [Got ACK0, advance to seq=1] |
| |
| [Received - duplicate!]
| [Send ACK0] |
| |
|-------- Frame 1 ---------> |
| |
|<--------- ACK0 ----------- (from retransmit)
| [Wrong seq! Expected ACK1, got ACK0] |
| [Ignore] |
Analysis:
The premature timeout caused an unnecessary retransmission, but the protocol still works correctly:
Stop-and-Wait tolerates premature timeouts—they cause wasted bandwidth but not protocol failure. The sequence number mechanism ensures correctness. This robustness is intentional: timers can never be perfectly set, especially on variable-latency networks.
The Cost of Premature Timeouts:
While correctness is preserved, premature timeouts are expensive:
| Cost | Description |
|---|---|
| Bandwidth waste | Duplicate frame consumes channel capacity |
| Processing waste | Receiver must process duplicate, generate extra ACK |
| Potential confusion | Late ACKs require careful handling |
| Reduced throughput | Time spent on unnecessary work |
Adaptive Timeout (Preview):
To minimize premature timeouts, some protocols dynamically adjust the timeout based on observed RTT:
Estimated_RTT = α × Estimated_RTT + (1 - α) × Measured_RTT
Timeout = Estimated_RTT + 4 × Deviation
This adaptive approach is used in TCP and reduces both premature timeouts (from underestimating RTT) and slow recovery (from overestimating RTT). While not standard in basic Stop-and-Wait, understanding the concept is valuable.
When a single retransmission isn't enough, the sender may experience multiple consecutive timeouts. This situation requires special consideration.
Scenario: Persistent Channel Failure
Sender Receiver
| |
|-------- Frame 0 -----X (lost) |
| [Timer expires] |
|-------- Frame 0 -----X (lost again) |
| [Timer expires] |
|-------- Frame 0 -----X (lost again) |
| [Timer expires] |
... |
If the channel is persistently broken (e.g., cable cut, severe interference), retransmissions will fail indefinitely.
The Retry Limit:
Practical implementations define a maximum retry count:
MAX_RETRIES = 5 // Configuration parameter
retry_count = 0
function on_timeout():
retry_count++
if retry_count > MAX_RETRIES:
// Channel appears dead
report_error("Maximum retries exceeded")
reset_protocol() // Prepare for recovery
return
// Still within limit, retry
retransmit(stored_frame)
start_timer(timeout_value)
| Strategy | Behavior | Use Case |
|---|---|---|
| Fixed timeout, fixed limit | Same timeout each retry, give up after N | Simple, predictable |
| Fixed timeout, no limit | Retry forever (dangerous!) | Never—can hang forever |
| Exponential backoff, fixed limit | Increasing timeout, give up after N | Congestion-prone networks |
| Exponential backoff, time limit | Increasing timeout, give up after T seconds | User-facing applications |
What to Do After Retry Limit:
When the retry limit is exceeded:
function on_max_retries_exceeded():
// Notify upper layer
network_layer.frame_undeliverable(stored_frame.data)
// Clean up
stored_frame = null
retry_count = 0
awaiting_ack = false
// Depending on protocol:
// Option A: Reset connection
initiate_connection_reset()
// Option B: Just move on
signal_ready_for_next_frame()
Different applications have different requirements. For file transfer, retry until success or explicit abort. For real-time audio, old data is useless—stop retrying and move on. Ensure your retry policy matches application semantics.
Different network environments have vastly different characteristics, affecting timeout configuration.
Local Area Network (LAN):
Wide Area Network (WAN):
Satellite Link (GEO):
| Network Type | Typical RTT | Recommended Timeout | Key Consideration |
|---|---|---|---|
| LAN (Ethernet) | < 1 ms | 5-10 ms | Very fast, tight timing possible |
| WAN (Internet) | 20-200 ms | 500 ms - 2 s | Variable, need margin for congestion |
| GEO Satellite | 250+ ms | 600+ ms | Long baseline, less variability |
| LEO Satellite | 20-40 ms | 100-200 ms | Lower than GEO, constellation variations |
| Deep Space | 3-20 minutes | Special protocols | Light-speed limit, different paradigm |
Satellite Stop-and-Wait Problem:
Satellite links expose Stop-and-Wait's efficiency problem dramatically:
GEO Satellite Example:
- RTT = 500 ms
- Frame size = 1000 bytes = 8000 bits
- Bandwidth = 1 Mbps
T_frame = 8000 / 1,000,000 = 8 ms
Channel utilization = T_frame / RTT = 8 / 500 = 1.6%
The sender transmits for 8 ms, then waits 492 ms for the ACK. Over 98% of the time, the channel sits idle. This is why Stop-and-Wait is rarely used on high-latency links—the efficiency is unacceptable.
For satellite links, protocols use sliding window (Go-Back-N or Selective Repeat) to keep multiple frames in flight. With a window of 60 frames, the sender can keep transmitting continuously during the 500 ms RTT, achieving much higher utilization.
Timeout and retransmission form the recovery backbone of Stop-and-Wait ARQ. Without them, any loss would cause permanent deadlock. Let's consolidate our understanding:
The Timeout Triad:
Three numbers define timeout behavior:
Getting these right requires understanding the network environment and application requirements.
Looking Ahead:
With timeout and retransmission mastered, we move to a subtle but crucial topic: sequence numbers. Why does Stop-and-Wait use only 1 bit? What problems would arise with no sequence numbers? How does this scale to more complex protocols?
You now understand the theory and practice of timeout and retransmission in Stop-and-Wait ARQ. This knowledge extends directly to all reliable protocols—the principles of timeout calculation, timer management, and retry handling apply universally.