Loading content...
In the physical world, communication channels are inherently unreliable. Electromagnetic interference, signal attenuation, thermal noise, cosmic radiation, and countless other phenomena conspire to corrupt, delay, or entirely destroy data as it traverses the network. The Data Link Layer faces a profound challenge: how do we build reliable communication on top of unreliable physical infrastructure?
This is not merely an academic concern. Every file you download, every message you send, every financial transaction you complete depends on mechanisms that detect when data has been corrupted or lost, and automatically recover from these failures—all without human intervention.
The Stop-and-Wait Automatic Repeat reQuest (ARQ) protocol represents the simplest, most intuitive solution to this problem. Understanding Stop-and-Wait is crucial because:
By the end of this page, you will thoroughly understand how Stop-and-Wait ARQ operates, including the precise sequence of events during successful transmission, the role of each protocol component, and why this simple protocol forms the conceptual foundation for all reliable data transfer mechanisms.
Before diving into Stop-and-Wait specifically, let's establish what ARQ means and why it exists.
Automatic Repeat reQuest (ARQ) is a family of error-control protocols that rely on two key mechanisms:
The "automatic" in ARQ is significant—the protocol handles error recovery without human intervention. The "request" originally referred to the receiver explicitly asking for retransmission, though modern implementations often use implicit signaling.
The Core Contract:
ARQ protocols establish a contract between sender and receiver:
ARQ protocols contrast with Forward Error Correction (FEC), where the sender includes enough redundant information that the receiver can correct errors without retransmission. ARQ is more bandwidth-efficient when errors are rare; FEC is better when feedback is impractical (satellite, broadcast) or latency is critical. Many modern systems use hybrid approaches.
The Three Pillars of ARQ:
Every ARQ protocol rests on three fundamental mechanisms:
| Mechanism | Purpose | Implementation |
|---|---|---|
| Error Detection | Identify corrupted frames | CRC, checksum, parity |
| Acknowledgment | Confirm successful receipt | ACK frames, piggybacking |
| Retransmission | Recover from errors/loss | Timeouts, explicit requests |
Stop-and-Wait is the simplest implementation of these mechanisms. It sends one frame, waits for acknowledgment, then sends the next. No pipelining, no buffering, no complex state management—just the essential elements of reliable transmission.
Stop-and-Wait ARQ derives its name from its fundamental behavior: after sending a frame, the sender stops and waits for an acknowledgment before proceeding to the next frame.
This paradigm has profound implications:
The Sender's Perspective:
The Receiver's Perspective:
Why "Stop-and-Wait" Works:
The elegance of Stop-and-Wait lies in its simplicity. At any moment:
This simplicity eliminates entire classes of problems that plague more complex protocols: buffer management, out-of-order delivery, flow control complexity.
Think of Stop-and-Wait like a polite conversation where you say one sentence, then wait for the other person to nod before continuing. It's slow but ensures mutual understanding. You wouldn't speak for 10 minutes without checking if your listener is still with you.
The Simple State Machine:
Stop-and-Wait can be modeled as a minimal finite state machine:
Sender States:
Receiver States:
The transitions between these states follow predictable rules, making the protocol easy to implement and verify.
Let's trace through the complete operation of Stop-and-Wait ARQ in a successful transmission scenario. Understanding this sequence precisely is essential for analyzing the protocol's behavior.
Scenario Setup:
Critical Timing Relationships:
The time to complete one frame transmission consists of:
Total Time = T_frame + T_prop + T_process + T_ack + T_prop
Where:
This timing is fundamental to understanding Stop-and-Wait's efficiency, which we'll analyze in detail in a later section.
Notice that during T_prop (both directions), the sender sits idle. If propagation delay is large relative to transmission time (as in satellite links or long-haul fiber), the channel utilization becomes devastatingly poor. This is the fundamental efficiency limitation of Stop-and-Wait.
Each component of the Stop-and-Wait protocol serves a specific purpose. Let's examine them systematically.
1. The Data Frame Structure:
A Stop-and-Wait data frame contains:
| Field | Size (typical) | Purpose |
|---|---|---|
| Start Delimiter | 1 byte | Frame boundary identification |
| Sequence Number | 1 bit | Distinguish consecutive frames (0 or 1) |
| Data Payload | Variable | Actual data from network layer |
| CRC | 2-4 bytes | Error detection code |
| End Delimiter | 1 byte | Frame boundary identification |
Why Only 1 Bit for Sequence Number?
This is a key insight. Since Stop-and-Wait only has one outstanding frame at a time, we only need to distinguish between "this frame" and "next frame." A single bit (0 or 1) suffices: frames alternate 0, 1, 0, 1, and so on. This minimal sequence number has profound implications for the protocol's simplicity.
2. The Acknowledgment Frame:
ACK frames are deliberately minimal:
| Field | Size | Purpose |
|---|---|---|
| Type | 1 bit | Distinguish ACK from data |
| Sequence Number | 1 bit | Identify which frame is being acknowledged |
| CRC | 2 bytes | Error detection for ACK itself |
ACKs must be small because they're administrative overhead. A large ACK would waste bandwidth on every single data frame.
3. The Retransmission Timer:
The timer is perhaps the most subtle component. It must be set appropriately:
In practice, the timeout is set to:
Timeout = 2 × T_prop + T_frame + T_ack + Safety_Margin
The safety margin accounts for variations in processing time and network conditions.
In real implementations (like TCP, which uses a variant of Stop-and-Wait for connection setup), timeouts are adaptive. The protocol measures actual round-trip times and adjusts the timeout dynamically. This is especially important for variable-latency networks like the Internet.
4. The CRC (Cyclic Redundancy Check):
The CRC serves as the guardian of data integrity. For Stop-and-Wait:
CRC can detect:
This high detection capability is why CRC, rather than simple checksums, is preferred at the data link layer.
5. Protocol Buffers:
Even simple Stop-and-Wait requires minimal buffering:
This minimal memory requirement is a key advantage of Stop-and-Wait in resource-constrained environments.
Understanding Stop-and-Wait requires visualizing the message flow over time. Let's examine a successful transmission followed by various error scenarios.
Successful Transmission Timeline:
Sender A Receiver B
| |
| -------- Frame 0 (Seq=0) ---------> |
| |
| [Timer starts] [Receives] |
| [CRC OK] |
| [Deliver] |
| |
| <-------- ACK (Seq=0) ------------- |
| |
| [Timer stops] |
| [Advance to Seq=1] |
| |
| -------- Frame 1 (Seq=1) ---------> |
| |
... ...
Notice the characteristic "staircase" pattern—each frame must complete its full round trip before the next begins.
Time flows downward in these diagrams. The horizontal dimension represents the physical separation between sender and receiver. The diagonal lines represent message propagation—their slope indicates the propagation delay.
Key Observations from the Diagram:
Idle Periods: The sender spends significant time waiting. The channel is "empty" during this time—no useful data flows.
Round-Trip Dependency: Each frame's completion depends on the full round-trip time (RTT = 2 × propagation delay + transmission times).
Strict Ordering: Frames are processed in strict sequence. There's no possibility of out-of-order delivery or receiver confusion.
Deterministic Behavior: Given the timings, you can predict exactly when each event occurs. This determinism makes analysis straightforward.
The Efficiency Visual:
If we shade the diagram to show when the channel is carrying useful data versus when it's idle:
|████|░░░░░░░░░░░░░░░░░░░░░░░░░░░░|████|░░░░░░░░...
^ Frame transmission ^ Next frame
←――――― Idle waiting ―――――→
The ratio of shaded (busy) to total time is the channel utilization—and for Stop-and-Wait on high-latency channels, this ratio can be devastatingly low.
A well-designed protocol maintains certain invariants—properties that remain true throughout execution. Understanding these invariants helps verify correctness and debug implementations.
Stop-and-Wait Invariants:
Correctness Reasoning:
Why do these guarantees hold? Let's reason through key scenarios:
Claim: Frames are never delivered out of order.
Proof Sketch: Since the sender only transmits one frame at a time and waits for its acknowledgment, the receiver sees at most one new frame at a time. The receiver delivers frames immediately upon acceptance. With only one "in-flight" frame, reordering is impossible—there's nothing to reorder against.
Claim: Frames are never delivered twice (no duplicates).
Proof Sketch: The receiver checks the sequence number. If it matches the expected number, the frame is delivered and the expected number is toggled. If a duplicate arrives (same sequence number as already delivered), the receiver recognizes it, discards it, but still sends an ACK (to prevent infinite retransmission by the sender).
Claim: No frame is lost (assuming finite loss probability).
Proof Sketch: If a frame or its ACK is lost, the sender's timer expires, and the frame is retransmitted. This repeats until successful. With probability of success > 0 on each attempt, eventual success is guaranteed (though expected time may be long for high-loss channels).
These guarantees assume: (1) errors are transient, not permanent, (2) the timeout is set correctly, (3) the CRC detects all corruption. If these assumptions fail—say, the channel is permanently broken or the CRC misses an error—the guarantees don't hold. Real systems include higher-level mechanisms (end-to-end checksums, application-level acknowledgments) as safety nets.
Implementing Stop-and-Wait correctly requires attention to several practical details that academic descriptions sometimes gloss over.
Timer Implementation:
The retransmission timer is critical and tricky:
// Sender logic
function send_frame(data):
frame = create_frame(data, current_seq)
transmit(frame)
stored_frame = frame // Keep for potential retransmission
start_timer(timeout_value)
function on_timer_expired():
transmit(stored_frame) // Retransmit same frame
start_timer(timeout_value) // Restart timer
function on_ack_received(ack):
if ack.seq == current_seq:
stop_timer()
current_seq = 1 - current_seq // Toggle: 0→1 or 1→0
notify_ready_for_next_frame()
// Else: ignore (unexpected ACK)
Duplicate Detection at Receiver:
// Receiver logic
expected_seq = 0
function on_frame_received(frame):
if crc_check(frame) fails:
return // Silently discard corrupted frame
if frame.seq == expected_seq:
deliver_to_network_layer(frame.data)
expected_seq = 1 - expected_seq // Toggle
// Always ACK the received seq (even duplicates)
// This handles case where our previous ACK was lost
send_ack(frame.seq)
The receiver sends an ACK even for duplicate frames. Why? If frame 0 arrived and we sent ACK0, but ACK0 was lost, the sender will retransmit frame 0. We must re-send ACK0 to break the deadlock. Ignoring the duplicate frame while still acknowledging it achieves both duplicate suppression and forward progress.
Edge Cases to Handle:
| Scenario | Correct Behavior |
|---|---|
| ACK arrives after timeout | Compare seq; if matches, accept (late ACK) |
| ACK for wrong sequence | Ignore; likely a delayed duplicate |
| Corrupted ACK received | Ignore; timer will handle retransmission |
| Multiple timeouts for same frame | Keep retransmitting; maybe increase timeout |
| Receiver gets same frame repeatedly | Discard data, but always acknowledge |
Common Implementation Bugs:
We've established the complete operational picture of Stop-and-Wait ARQ. Let's consolidate our understanding:
The Foundation for What's Ahead:
Stop-and-Wait represents the conceptual kernel from which all ARQ protocols derive. The mechanisms we've studied—acknowledgments, timeouts, sequence numbers, retransmission—appear in every reliable protocol, from the Data Link Layer to TCP at the Transport Layer and beyond.
In subsequent pages, we'll explore:
You now understand the complete basic operation of Stop-and-Wait ARQ—the simplest and most fundamental reliable transmission protocol. This foundation is essential for understanding the more sophisticated protocols (Go-Back-N, Selective Repeat) and real-world protocols like TCP that build upon these principles.