Stop And Wait Arq - Learning Module

Loading content...

0/228

Timeout and Retransmission in Stop-and-Wait ARQ

The Necessity of Timeouts

Acknowledgments tell the sender that data arrived successfully. But what happens when the ACK never comes? The sender could wait forever, stuck in eternal limbo—a situation clearly unacceptable for any practical system.

Timeouts solve this dilemma. By setting a timer when each frame is transmitted, the sender establishes a deadline. If the deadline passes without acknowledgment, the sender takes action—typically retransmitting the frame.

This simple concept—"wait, but not forever"—underlies every reliable protocol. Yet the implementation details are surprisingly nuanced:

How long should the timeout be?
What triggers a timeout?
How should the sender respond?
What if network conditions change?

In this section, we explore all dimensions of timeout and retransmission in Stop-and-Wait ARQ, from theoretical foundations to practical implementation.

Learning Objectives

By the end of this page, you will understand how to calculate appropriate timeout values, implement robust timer mechanisms, handle retransmission correctly, and avoid the pitfalls that lead to protocol failure or inefficiency.

Why Timeouts Are Essential

Consider what happens without timeouts:

Scenario: Silent Failure

Sender transmits Frame₀
Frame₀ is lost due to electromagnetic interference
Receiver never sees Frame₀, never sends ACK₀
Sender waits for ACK₀...
...and waits...
...forever.

The protocol has deadlocked. Neither party can proceed. The sender waits for an ACK that will never come. The receiver waits for a frame that already vanished.

The Fundamental Problem:

In an unreliable channel, silence is ambiguous. When the sender doesn't receive an ACK, it might mean:

The frame was lost
The frame was corrupted (receiver discarded it)
The ACK was lost
The ACK was corrupted (sender discarded it)
Everything is fine, but the ACK is still in transit

Without additional information, the sender cannot distinguish these cases. Timeouts provide a resolution: "I don't know what happened, but I've waited long enough. I'll try again."

What Timeouts Recover From

•Lost frames: Data frame vanishes in transit—retransmission resends it
•Corrupted frames: Receiver discards corrupt frame—retransmission provides fresh copy
•Lost ACKs: ACK vanishes—retransmission triggers re-ACK from receiver
•Corrupted ACKs: Sender discards corrupt ACK—retransmission triggers re-ACK
•Temporary outages: Brief channel failure—retransmission succeeds when channel recovers

The Unified Recovery

Notice that retransmission handles all failure types identically. The sender doesn't need to diagnose the problem—it just retries. This uniform approach simplifies the protocol dramatically while maintaining correctness.

Calculating the Timeout Value

Setting the timeout correctly is one of the most important and subtle aspects of protocol design. Set it too short, and you'll retransmit unnecessarily, wasting bandwidth and potentially causing confusion. Set it too long, and error recovery takes forever.

The Ideal Timeout:

The timeout should be just longer than the maximum expected round-trip time (RTT). This allows legitimate ACKs to arrive while still detecting true failures promptly.

Round-Trip Time Components:

RTT = T_frame + T_prop(forward) + T_process + T_ack + T_prop(return)

Where:

T_frame: Time to transmit the data frame = Frame_Size / Bandwidth
T_prop(forward): Propagation delay from sender to receiver = Distance / Signal_Speed
T_process: Time for receiver to process frame and generate ACK (usually negligible)
T_ack: Time to transmit the ACK frame = ACK_Size / Bandwidth
T_prop(return): Propagation delay from receiver to sender = Distance / Signal_Speed

Simplified Formula:

Since T_prop(forward) = T_prop(return) = T_prop, and T_process is typically negligible:

RTT ≈ T_frame + 2 × T_prop + T_ack

And since T_ack is usually much smaller than T_frame:

RTT ≈ T_frame + 2 × T_prop

Example Calculation:

Consider a 1 Mbps link spanning 200 km:

Frame size: 1000 bits
ACK size: 50 bits
Propagation speed: 2 × 10⁸ m/s (fiber/copper)

Step 1: Calculate transmission times

T_frame = 1000 bits / 1,000,000 bps = 1 ms
T_ack = 50 bits / 1,000,000 bps = 0.05 ms

Step 2: Calculate propagation delay

T_prop = 200,000 m / (2 × 10⁸ m/s) = 1 ms

Step 3: Calculate RTT

RTT = T_frame + 2 × T_prop + T_ack
RTT = 1 + 2(1) + 0.05 = 3.05 ms

Step 4: Set timeout

Timeout = RTT + Safety_Margin
Timeout = 3.05 + 1.0 = 4.05 ms (rounded to ~5 ms)

The safety margin accounts for variation in processing times, link conditions, and clock precision.

Timeout Setting Trade-offs

A timeout too short (say, 2 ms in the example) would cause premature retransmission—the ACK is still in transit when the timer fires. A timeout too long (say, 500 ms) would delay error recovery dramatically—half a second wasted waiting when the frame was lost on the first bit.

Timeout Value Impact
Timeout Setting	Consequence	Example
Too Short	Premature retransmission, duplicates, wasted bandwidth	Timeout = 2ms when RTT = 3ms
Too Long	Slow error recovery, poor responsiveness	Timeout = 500ms when RTT = 3ms
Optimal	Fast recovery without false positives	Timeout = RTT + small margin

The Retransmission Process

When the timeout fires, the sender must execute the retransmission process correctly. This involves more than simply re-sending the frame.

Step-by-Step Retransmission:

function on_timeout_expired():
    // Step 1: Retrieve the stored frame
    frame = stored_frame  // Kept since original transmission
    
    // Step 2: Retransmit the frame
    transmit(frame)
    
    // Step 3: Restart the timer
    start_timer(timeout_value)
    
    // Step 4: Update statistics (optional but useful)
    retransmission_count++
    
    // Step 5: Check retry limit (implementation specific)
    if retransmission_count > MAX_RETRIES:
        signal_failure_to_upper_layer()
        abort_transmission()

Critical Points:

Use the stored frame: You must transmit the exact same frame, not generate a new one. The sequence number, the data, everything must match the original.
Restart the timer: After retransmitting, start the timer again. The ACK for the retransmission might also be lost.
Track retries: Infinite retransmission isn't practical. If the channel is permanently broken, eventually give up and report failure upward.

Why Retransmit the Same Frame?

This might seem obvious, but it's worth emphasizing:

The sequence number must match what the receiver expects
The data must be identical to ensure consistency
A different frame would confuse the receiver's state machine

The Retry Limit:

In practice, Stop-and-Wait implementations include a maximum retry count:

System	Typical Retry Limit	After Exceeding
HDLC	3-10 retries	Report error to upper layer
Modems	3 retries	Connection failure
Custom	Configurable	Application-specific handling

Exponential Backoff (Advanced):

Some implementations increase the timeout after each retry:

timeout = base_timeout × 2^(retry_count)

This "exponential backoff" helps when network congestion is causing losses—backing off reduces load and allows the network to recover. While not standard in basic Stop-and-Wait, the concept appears in many protocols (Ethernet, TCP).

Backoff in Stop-and-Wait

Pure Stop-and-Wait typically uses a fixed timeout, not exponential backoff. However, understanding backoff is valuable because it appears in CSMA/CD (Ethernet collision handling) and TCP (congestion control). The principle is the same: if something failed, wait longer before trying again.

Timer Implementation Details

Implementing timers correctly is crucial for protocol reliability. There are several approaches, each with trade-offs.

Hardware Timer Approach:

Dedicated hardware timer that generates an interrupt when it expires:

Advantages:
- Precise timing
- Low CPU overhead
- Guaranteed interrupt delivery

Disadvantages:
- Limited number of hardware timers
- Hardware-dependent implementation
- Interrupt handling complexity

Software Timer Approach:

Periodic polling or software scheduling:

Advantages:
- Flexible, unlimited virtual timers
- Platform-independent
- Easier debugging

Disadvantages:
- Less precise (depends on polling frequency)
- CPU overhead for timer management
- May miss exact expiration time

Timer State Management:

The timer has discrete states that must be managed carefully:

Timer States:
┌─────────┐     start()     ┌─────────┐
│  IDLE   │ ───────────────>│ RUNNING │
└─────────┘                  └─────────┘
     ↑                            │
     │         stop()             │
     └────────────────────────────┤
                                  │
                            (timeout)
                                  │
                                  ↓
                          ┌─────────────┐
                          │  EXPIRED    │
                          │ (callback)  │
                          └─────────────┘

Race Conditions to Avoid:

Timer management has subtle race conditions:

ACK arrives just as timer fires: The timer callback might execute after the ACK was received but before it was fully processed. Solution: Disable timer interrupt during ACK processing, or use careful state checking.
Stop called on already-expired timer: If the timer has already fired, stopping it does nothing. Ensure the timeout handler checks whether an ACK has arrived.
Multiple timer starts without stops: Each new frame transmission should stop any existing timer before starting a new one (though in Stop-and-Wait, there's only one frame at a time, so this is less of an issue).

Concurrency Hazards

In interrupt-driven implementations, the timeout handler runs as an interrupt service routine while the main code might be processing an ACK. Use mutual exclusion (disable interrupts briefly, or use locks) to prevent concurrent access to shared state like the current sequence number and stored frame.

Pseudocode for Safe Timer Handling:

// Shared state (protected by mutex/interrupt disable)
shared_state = {
    timer_armed: false,
    awaiting_ack: false,
    current_seq: 0,
    stored_frame: null
}

function send_frame(data):
    disable_interrupts()
    frame = create_frame(data, shared_state.current_seq)
    shared_state.stored_frame = frame
    shared_state.awaiting_ack = true
    shared_state.timer_armed = true
    transmit(frame)
    start_hardware_timer(timeout_value)
    enable_interrupts()

function timer_interrupt_handler():  // ISR
    disable_interrupts()  // Usually automatic in ISR
    if shared_state.timer_armed and shared_state.awaiting_ack:
        // Legitimate timeout
        transmit(shared_state.stored_frame)
        start_hardware_timer(timeout_value)  // Restart
    // else: Timer fired after ACK received, ignore
    enable_interrupts()

function ack_received(ack):
    disable_interrupts()
    if verify_ack(ack) and ack.seq == shared_state.current_seq:
        stop_hardware_timer()
        shared_state.timer_armed = false
        shared_state.awaiting_ack = false
        shared_state.current_seq = 1 - shared_state.current_seq
        shared_state.stored_frame = null
        signal_ready_for_next()
    enable_interrupts()

Handling Premature Timeouts

What happens when the timeout fires prematurely—before the ACK could possibly arrive? This is a correctness challenge that Stop-and-Wait must handle gracefully.

Scenario: Premature Timeout

Sender                                     Receiver
   |                                           |
   |-------- Frame 0 --------->                |
   |                                           |
   | [Timer starts]                            |
   |                              [Received]   |
   |                              [Send ACK0]  |
   |                                           |
   | [Timer expires - TOO EARLY!]              |
   |                                           |
   |-------- Frame 0 --------->  (retransmit)  |
   |                                           |
   |<--------- ACK0 -----------  (original)    |
   |                                           |
   | [Got ACK0, advance to seq=1]              |
   |                                           |
   |                              [Received - duplicate!]
   |                              [Send ACK0]  |
   |                                           |
   |-------- Frame 1 --------->                |
   |                                           |
   |<--------- ACK0 -----------  (from retransmit)
   | [Wrong seq! Expected ACK1, got ACK0]      |
   | [Ignore]                                  |

Analysis:

The premature timeout caused an unnecessary retransmission, but the protocol still works correctly:

The original ACK₀ arrives and is processed correctly
The duplicate Frame₀ is recognized and re-ACKed (but not delivered twice)
The late ACK₀ (from the retransmission) is ignored because the sender now expects ACK₁

Robustness Despite Imperfection

Stop-and-Wait tolerates premature timeouts—they cause wasted bandwidth but not protocol failure. The sequence number mechanism ensures correctness. This robustness is intentional: timers can never be perfectly set, especially on variable-latency networks.

The Cost of Premature Timeouts:

While correctness is preserved, premature timeouts are expensive:

Cost	Description
Bandwidth waste	Duplicate frame consumes channel capacity
Processing waste	Receiver must process duplicate, generate extra ACK
Potential confusion	Late ACKs require careful handling
Reduced throughput	Time spent on unnecessary work

Adaptive Timeout (Preview):

To minimize premature timeouts, some protocols dynamically adjust the timeout based on observed RTT:

Estimated_RTT = α × Estimated_RTT + (1 - α) × Measured_RTT
Timeout = Estimated_RTT + 4 × Deviation

This adaptive approach is used in TCP and reduces both premature timeouts (from underestimating RTT) and slow recovery (from overestimating RTT). While not standard in basic Stop-and-Wait, understanding the concept is valuable.

Multiple Consecutive Timeouts

When a single retransmission isn't enough, the sender may experience multiple consecutive timeouts. This situation requires special consideration.

Scenario: Persistent Channel Failure

Sender                                     Receiver
   |                                           |
   |-------- Frame 0 -----X    (lost)          |
   | [Timer expires]                           |
   |-------- Frame 0 -----X    (lost again)    |
   | [Timer expires]                           |
   |-------- Frame 0 -----X    (lost again)    |
   | [Timer expires]                           |
   ...                                        |

If the channel is persistently broken (e.g., cable cut, severe interference), retransmissions will fail indefinitely.

The Retry Limit:

Practical implementations define a maximum retry count:

MAX_RETRIES = 5  // Configuration parameter
retry_count = 0

function on_timeout():
    retry_count++
    if retry_count > MAX_RETRIES:
        // Channel appears dead
        report_error("Maximum retries exceeded")
        reset_protocol()  // Prepare for recovery
        return
    
    // Still within limit, retry
    retransmit(stored_frame)
    start_timer(timeout_value)

Retry Strategy Comparison
Strategy	Behavior	Use Case
Fixed timeout, fixed limit	Same timeout each retry, give up after N	Simple, predictable
Fixed timeout, no limit	Retry forever (dangerous!)	Never—can hang forever
Exponential backoff, fixed limit	Increasing timeout, give up after N	Congestion-prone networks
Exponential backoff, time limit	Increasing timeout, give up after T seconds	User-facing applications

What to Do After Retry Limit:

When the retry limit is exceeded:

Report to upper layer: The network layer should know the frame couldn't be delivered
Clear pending state: Discard the stored frame, reset sequence number if needed
Connection reset: In connection-oriented protocols, may need to re-establish the link
Logging/alerting: Record the failure for diagnostics

function on_max_retries_exceeded():
    // Notify upper layer
    network_layer.frame_undeliverable(stored_frame.data)
    
    // Clean up
    stored_frame = null
    retry_count = 0
    awaiting_ack = false
    
    // Depending on protocol:
    // Option A: Reset connection
    initiate_connection_reset()
    
    // Option B: Just move on
    signal_ready_for_next_frame()

Retry Semantics

Different applications have different requirements. For file transfer, retry until success or explicit abort. For real-time audio, old data is useless—stop retrying and move on. Ensure your retry policy matches application semantics.

Timeout in Different Network Environments

Different network environments have vastly different characteristics, affecting timeout configuration.

Local Area Network (LAN):

Propagation delay: ~5 μs for 1 km
Bandwidth: 100 Mbps to 100 Gbps
RTT: Often < 1 ms
Timeout: A few milliseconds
Characteristics: Short delays, rare losses

Wide Area Network (WAN):

Propagation delay: ~5 ms per 1000 km
Bandwidth: Variable (10 Mbps to 100 Gbps)
RTT: 20-200 ms typical
Timeout: Hundreds of milliseconds
Characteristics: Longer delays, more variable

Satellite Link (GEO):

Propagation delay: ~120 ms one way (geostationary orbit at 36,000 km)
Bandwidth: A few Mbps typically
RTT: ~250 ms minimum
Timeout: 500 ms or more
Characteristics: High latency, expensive bandwidth

Timeout Requirements by Network Type
Network Type	Typical RTT	Recommended Timeout	Key Consideration
LAN (Ethernet)	< 1 ms	5-10 ms	Very fast, tight timing possible
WAN (Internet)	20-200 ms	500 ms - 2 s	Variable, need margin for congestion
GEO Satellite	250+ ms	600+ ms	Long baseline, less variability
LEO Satellite	20-40 ms	100-200 ms	Lower than GEO, constellation variations
Deep Space	3-20 minutes	Special protocols	Light-speed limit, different paradigm

Satellite Stop-and-Wait Problem:

Satellite links expose Stop-and-Wait's efficiency problem dramatically:

GEO Satellite Example:
- RTT = 500 ms
- Frame size = 1000 bytes = 8000 bits
- Bandwidth = 1 Mbps

T_frame = 8000 / 1,000,000 = 8 ms
Channel utilization = T_frame / RTT = 8 / 500 = 1.6%

The sender transmits for 8 ms, then waits 492 ms for the ACK. Over 98% of the time, the channel sits idle. This is why Stop-and-Wait is rarely used on high-latency links—the efficiency is unacceptable.

Satellite Solutions

For satellite links, protocols use sliding window (Go-Back-N or Selective Repeat) to keep multiple frames in flight. With a window of 60 frames, the sender can keep transmitting continuously during the 500 ms RTT, achieving much higher utilization.

Summary: Mastering Timeout and Retransmission

Timeout and retransmission form the recovery backbone of Stop-and-Wait ARQ. Without them, any loss would cause permanent deadlock. Let's consolidate our understanding:

Key Takeaways

•Timeouts prevent deadlock: Without them, lost frames or ACKs would block the protocol forever
•Timeout = RTT + safety margin: Set slightly longer than maximum expected round-trip time
•Too short causes unnecessary retransmissions: Wastes bandwidth but preserves correctness
•Too long delays error recovery: Hurts performance when losses actually occur
•Retransmit the exact same frame: Sequence number, data, everything must match the original
•Restart timer after retransmission: The retransmission might also be lost
•Use retry limits in practice: Don't retry forever—report failure after reasonable attempts
•Network environment determines timeout scale: Milliseconds for LAN, seconds for satellite

The Timeout Triad:

Three numbers define timeout behavior:

Timeout Value: How long to wait before retransmitting
Retry Limit: How many attempts before giving up
Safety Margin: Buffer added to RTT estimate

Getting these right requires understanding the network environment and application requirements.

Looking Ahead:

With timeout and retransmission mastered, we move to a subtle but crucial topic: sequence numbers. Why does Stop-and-Wait use only 1 bit? What problems would arise with no sequence numbers? How does this scale to more complex protocols?

Timeout Fundamentals Complete

You now understand the theory and practice of timeout and retransmission in Stop-and-Wait ARQ. This knowledge extends directly to all reliable protocols—the principles of timeout calculation, timer management, and retry handling apply universally.

Timeout and Retransmission in Stop-and-Wait ARQ

The Necessity of Timeouts

This simple concept—"wait, but not forever"—underlies every reliable protocol. Yet the implementation details are surprisingly nuanced:

How long should the timeout be?
What triggers a timeout?
How should the sender respond?
What if network conditions change?

In this section, we explore all dimensions of timeout and retransmission in Stop-and-Wait ARQ, from theoretical foundations to practical implementation.

Learning Objectives

Why Timeouts Are Essential

Consider what happens without timeouts:

Scenario: Silent Failure

Sender transmits Frame₀
Frame₀ is lost due to electromagnetic interference
Receiver never sees Frame₀, never sends ACK₀
Sender waits for ACK₀...
...and waits...
...forever.

The protocol has deadlocked. Neither party can proceed. The sender waits for an ACK that will never come. The receiver waits for a frame that already vanished.

The Fundamental Problem:

In an unreliable channel, silence is ambiguous. When the sender doesn't receive an ACK, it might mean:

The frame was lost
The frame was corrupted (receiver discarded it)
The ACK was lost
The ACK was corrupted (sender discarded it)
Everything is fine, but the ACK is still in transit

Without additional information, the sender cannot distinguish these cases. Timeouts provide a resolution: "I don't know what happened, but I've waited long enough. I'll try again."

What Timeouts Recover From

•Lost frames: Data frame vanishes in transit—retransmission resends it
•Corrupted frames: Receiver discards corrupt frame—retransmission provides fresh copy
•Lost ACKs: ACK vanishes—retransmission triggers re-ACK from receiver
•Corrupted ACKs: Sender discards corrupt ACK—retransmission triggers re-ACK
•Temporary outages: Brief channel failure—retransmission succeeds when channel recovers

The Unified Recovery

Calculating the Timeout Value

The Ideal Timeout:

The timeout should be just longer than the maximum expected round-trip time (RTT). This allows legitimate ACKs to arrive while still detecting true failures promptly.

Round-Trip Time Components:

RTT = T_frame + T_prop(forward) + T_process + T_ack + T_prop(return)

Where:

T_frame: Time to transmit the data frame = Frame_Size / Bandwidth
T_prop(forward): Propagation delay from sender to receiver = Distance / Signal_Speed
T_process: Time for receiver to process frame and generate ACK (usually negligible)
T_ack: Time to transmit the ACK frame = ACK_Size / Bandwidth
T_prop(return): Propagation delay from receiver to sender = Distance / Signal_Speed

Simplified Formula:

Since T_prop(forward) = T_prop(return) = T_prop, and T_process is typically negligible:

RTT ≈ T_frame + 2 × T_prop + T_ack

And since T_ack is usually much smaller than T_frame:

RTT ≈ T_frame + 2 × T_prop

Example Calculation:

Consider a 1 Mbps link spanning 200 km:

Frame size: 1000 bits
ACK size: 50 bits
Propagation speed: 2 × 10⁸ m/s (fiber/copper)

Step 1: Calculate transmission times

T_frame = 1000 bits / 1,000,000 bps = 1 ms
T_ack = 50 bits / 1,000,000 bps = 0.05 ms

Step 2: Calculate propagation delay

T_prop = 200,000 m / (2 × 10⁸ m/s) = 1 ms

Step 3: Calculate RTT

RTT = T_frame + 2 × T_prop + T_ack
RTT = 1 + 2(1) + 0.05 = 3.05 ms

Step 4: Set timeout

Timeout = RTT + Safety_Margin
Timeout = 3.05 + 1.0 = 4.05 ms (rounded to ~5 ms)

The safety margin accounts for variation in processing times, link conditions, and clock precision.

Timeout Setting Trade-offs

Timeout Value Impact
Timeout Setting	Consequence	Example
Too Short	Premature retransmission, duplicates, wasted bandwidth	Timeout = 2ms when RTT = 3ms
Too Long	Slow error recovery, poor responsiveness	Timeout = 500ms when RTT = 3ms
Optimal	Fast recovery without false positives	Timeout = RTT + small margin

The Retransmission Process

When the timeout fires, the sender must execute the retransmission process correctly. This involves more than simply re-sending the frame.

Step-by-Step Retransmission:

function on_timeout_expired():
    // Step 1: Retrieve the stored frame
    frame = stored_frame  // Kept since original transmission
    
    // Step 2: Retransmit the frame
    transmit(frame)
    
    // Step 3: Restart the timer
    start_timer(timeout_value)
    
    // Step 4: Update statistics (optional but useful)
    retransmission_count++
    
    // Step 5: Check retry limit (implementation specific)
    if retransmission_count > MAX_RETRIES:
        signal_failure_to_upper_layer()
        abort_transmission()

Critical Points:

Use the stored frame: You must transmit the exact same frame, not generate a new one. The sequence number, the data, everything must match the original.
Restart the timer: After retransmitting, start the timer again. The ACK for the retransmission might also be lost.
Track retries: Infinite retransmission isn't practical. If the channel is permanently broken, eventually give up and report failure upward.

Why Retransmit the Same Frame?

This might seem obvious, but it's worth emphasizing:

The sequence number must match what the receiver expects
The data must be identical to ensure consistency
A different frame would confuse the receiver's state machine

The Retry Limit:

In practice, Stop-and-Wait implementations include a maximum retry count:

System	Typical Retry Limit	After Exceeding
HDLC	3-10 retries	Report error to upper layer
Modems	3 retries	Connection failure
Custom	Configurable	Application-specific handling

Exponential Backoff (Advanced):

Some implementations increase the timeout after each retry:

timeout = base_timeout × 2^(retry_count)

Backoff in Stop-and-Wait

Timer Implementation Details

Implementing timers correctly is crucial for protocol reliability. There are several approaches, each with trade-offs.

Hardware Timer Approach:

Dedicated hardware timer that generates an interrupt when it expires:

Advantages:
- Precise timing
- Low CPU overhead
- Guaranteed interrupt delivery

Disadvantages:
- Limited number of hardware timers
- Hardware-dependent implementation
- Interrupt handling complexity

Software Timer Approach:

Periodic polling or software scheduling:

Advantages:
- Flexible, unlimited virtual timers
- Platform-independent
- Easier debugging

Disadvantages:
- Less precise (depends on polling frequency)
- CPU overhead for timer management
- May miss exact expiration time

Timer State Management:

The timer has discrete states that must be managed carefully:

Timer States:
┌─────────┐     start()     ┌─────────┐
│  IDLE   │ ───────────────>│ RUNNING │
└─────────┘                  └─────────┘
     ↑                            │
     │         stop()             │
     └────────────────────────────┤
                                  │
                            (timeout)
                                  │
                                  ↓
                          ┌─────────────┐
                          │  EXPIRED    │
                          │ (callback)  │
                          └─────────────┘

Race Conditions to Avoid:

Timer management has subtle race conditions:

ACK arrives just as timer fires: The timer callback might execute after the ACK was received but before it was fully processed. Solution: Disable timer interrupt during ACK processing, or use careful state checking.
Stop called on already-expired timer: If the timer has already fired, stopping it does nothing. Ensure the timeout handler checks whether an ACK has arrived.
Multiple timer starts without stops: Each new frame transmission should stop any existing timer before starting a new one (though in Stop-and-Wait, there's only one frame at a time, so this is less of an issue).

Concurrency Hazards

Pseudocode for Safe Timer Handling:

// Shared state (protected by mutex/interrupt disable)
shared_state = {
    timer_armed: false,
    awaiting_ack: false,
    current_seq: 0,
    stored_frame: null
}

function send_frame(data):
    disable_interrupts()
    frame = create_frame(data, shared_state.current_seq)
    shared_state.stored_frame = frame
    shared_state.awaiting_ack = true
    shared_state.timer_armed = true
    transmit(frame)
    start_hardware_timer(timeout_value)
    enable_interrupts()

function timer_interrupt_handler():  // ISR
    disable_interrupts()  // Usually automatic in ISR
    if shared_state.timer_armed and shared_state.awaiting_ack:
        // Legitimate timeout
        transmit(shared_state.stored_frame)
        start_hardware_timer(timeout_value)  // Restart
    // else: Timer fired after ACK received, ignore
    enable_interrupts()

function ack_received(ack):
    disable_interrupts()
    if verify_ack(ack) and ack.seq == shared_state.current_seq:
        stop_hardware_timer()
        shared_state.timer_armed = false
        shared_state.awaiting_ack = false
        shared_state.current_seq = 1 - shared_state.current_seq
        shared_state.stored_frame = null
        signal_ready_for_next()
    enable_interrupts()

Handling Premature Timeouts

What happens when the timeout fires prematurely—before the ACK could possibly arrive? This is a correctness challenge that Stop-and-Wait must handle gracefully.

Scenario: Premature Timeout

Sender                                     Receiver
   |                                           |
   |-------- Frame 0 --------->                |
   |                                           |
   | [Timer starts]                            |
   |                              [Received]   |
   |                              [Send ACK0]  |
   |                                           |
   | [Timer expires - TOO EARLY!]              |
   |                                           |
   |-------- Frame 0 --------->  (retransmit)  |
   |                                           |
   |<--------- ACK0 -----------  (original)    |
   |                                           |
   | [Got ACK0, advance to seq=1]              |
   |                                           |
   |                              [Received - duplicate!]
   |                              [Send ACK0]  |
   |                                           |
   |-------- Frame 1 --------->                |
   |                                           |
   |<--------- ACK0 -----------  (from retransmit)
   | [Wrong seq! Expected ACK1, got ACK0]      |
   | [Ignore]                                  |

Analysis:

The premature timeout caused an unnecessary retransmission, but the protocol still works correctly:

The original ACK₀ arrives and is processed correctly
The duplicate Frame₀ is recognized and re-ACKed (but not delivered twice)
The late ACK₀ (from the retransmission) is ignored because the sender now expects ACK₁

Robustness Despite Imperfection

The Cost of Premature Timeouts:

While correctness is preserved, premature timeouts are expensive:

Cost	Description
Bandwidth waste	Duplicate frame consumes channel capacity
Processing waste	Receiver must process duplicate, generate extra ACK
Potential confusion	Late ACKs require careful handling
Reduced throughput	Time spent on unnecessary work

Adaptive Timeout (Preview):

To minimize premature timeouts, some protocols dynamically adjust the timeout based on observed RTT:

Estimated_RTT = α × Estimated_RTT + (1 - α) × Measured_RTT
Timeout = Estimated_RTT + 4 × Deviation

Multiple Consecutive Timeouts

When a single retransmission isn't enough, the sender may experience multiple consecutive timeouts. This situation requires special consideration.

Scenario: Persistent Channel Failure

Sender                                     Receiver
   |                                           |
   |-------- Frame 0 -----X    (lost)          |
   | [Timer expires]                           |
   |-------- Frame 0 -----X    (lost again)    |
   | [Timer expires]                           |
   |-------- Frame 0 -----X    (lost again)    |
   | [Timer expires]                           |
   ...                                        |

If the channel is persistently broken (e.g., cable cut, severe interference), retransmissions will fail indefinitely.

The Retry Limit:

Practical implementations define a maximum retry count:

MAX_RETRIES = 5  // Configuration parameter
retry_count = 0

function on_timeout():
    retry_count++
    if retry_count > MAX_RETRIES:
        // Channel appears dead
        report_error("Maximum retries exceeded")
        reset_protocol()  // Prepare for recovery
        return
    
    // Still within limit, retry
    retransmit(stored_frame)
    start_timer(timeout_value)

Retry Strategy Comparison
Strategy	Behavior	Use Case
Fixed timeout, fixed limit	Same timeout each retry, give up after N	Simple, predictable
Fixed timeout, no limit	Retry forever (dangerous!)	Never—can hang forever
Exponential backoff, fixed limit	Increasing timeout, give up after N	Congestion-prone networks
Exponential backoff, time limit	Increasing timeout, give up after T seconds	User-facing applications

What to Do After Retry Limit:

When the retry limit is exceeded:

Report to upper layer: The network layer should know the frame couldn't be delivered
Clear pending state: Discard the stored frame, reset sequence number if needed
Connection reset: In connection-oriented protocols, may need to re-establish the link
Logging/alerting: Record the failure for diagnostics

function on_max_retries_exceeded():
    // Notify upper layer
    network_layer.frame_undeliverable(stored_frame.data)
    
    // Clean up
    stored_frame = null
    retry_count = 0
    awaiting_ack = false
    
    // Depending on protocol:
    // Option A: Reset connection
    initiate_connection_reset()
    
    // Option B: Just move on
    signal_ready_for_next_frame()

Retry Semantics

Timeout in Different Network Environments

Different network environments have vastly different characteristics, affecting timeout configuration.

Local Area Network (LAN):

Propagation delay: ~5 μs for 1 km
Bandwidth: 100 Mbps to 100 Gbps
RTT: Often < 1 ms
Timeout: A few milliseconds
Characteristics: Short delays, rare losses

Wide Area Network (WAN):

Propagation delay: ~5 ms per 1000 km
Bandwidth: Variable (10 Mbps to 100 Gbps)
RTT: 20-200 ms typical
Timeout: Hundreds of milliseconds
Characteristics: Longer delays, more variable

Satellite Link (GEO):

Propagation delay: ~120 ms one way (geostationary orbit at 36,000 km)
Bandwidth: A few Mbps typically
RTT: ~250 ms minimum
Timeout: 500 ms or more
Characteristics: High latency, expensive bandwidth

Timeout Requirements by Network Type
Network Type	Typical RTT	Recommended Timeout	Key Consideration
LAN (Ethernet)	< 1 ms	5-10 ms	Very fast, tight timing possible
WAN (Internet)	20-200 ms	500 ms - 2 s	Variable, need margin for congestion
GEO Satellite	250+ ms	600+ ms	Long baseline, less variability
LEO Satellite	20-40 ms	100-200 ms	Lower than GEO, constellation variations
Deep Space	3-20 minutes	Special protocols	Light-speed limit, different paradigm

Satellite Stop-and-Wait Problem:

Satellite links expose Stop-and-Wait's efficiency problem dramatically:

GEO Satellite Example:
- RTT = 500 ms
- Frame size = 1000 bytes = 8000 bits
- Bandwidth = 1 Mbps

T_frame = 8000 / 1,000,000 = 8 ms
Channel utilization = T_frame / RTT = 8 / 500 = 1.6%

Satellite Solutions

Summary: Mastering Timeout and Retransmission

Timeout and retransmission form the recovery backbone of Stop-and-Wait ARQ. Without them, any loss would cause permanent deadlock. Let's consolidate our understanding:

Key Takeaways

•Timeouts prevent deadlock: Without them, lost frames or ACKs would block the protocol forever
•Timeout = RTT + safety margin: Set slightly longer than maximum expected round-trip time
•Too short causes unnecessary retransmissions: Wastes bandwidth but preserves correctness
•Too long delays error recovery: Hurts performance when losses actually occur
•Retransmit the exact same frame: Sequence number, data, everything must match the original
•Restart timer after retransmission: The retransmission might also be lost
•Use retry limits in practice: Don't retry forever—report failure after reasonable attempts
•Network environment determines timeout scale: Milliseconds for LAN, seconds for satellite

The Timeout Triad:

Three numbers define timeout behavior:

Timeout Value: How long to wait before retransmitting
Retry Limit: How many attempts before giving up
Safety Margin: Buffer added to RTT estimate

Getting these right requires understanding the network environment and application requirements.

Looking Ahead:

Timeout Fundamentals Complete