Computer NetworksError Control

Error Control in the Data Link Layer

LevelIntermediate

Duration60 mins

TopicError Control

1 / 5

Error Control Need

The Unreliable Reality of Physical Transmission

Imagine sending a critical financial transaction across a network—a wire transfer of $1,000,000. The data traverses copper cables, fiber optic strands, and wireless radio waves, each medium introducing its own sources of noise, interference, and signal degradation. A single flipped bit could transform $1000000 into $9000000 or worse, corrupt the recipient's account number entirely. Without robust error control mechanisms, modern digital communication would be fundamentally untrustworthy.

This page addresses a profound question that underpins all reliable networking: Why do we need error control at the Data Link Layer, and what makes it so critically important?

What You Will Learn

By the end of this page, you will understand the fundamental sources of transmission errors, why physical channels are inherently unreliable, how errors manifest at the bit level, the catastrophic consequences of unchecked errors, and why the Data Link Layer bears primary responsibility for error management in layered network architectures.

The Physical Reality of Transmission

To understand why error control is necessary, we must first confront the fundamental reality that all physical communication channels are noisy and imperfect. This isn't a flaw that can be engineered away—it's a consequence of the laws of physics themselves.

The Signal-to-Noise Challenge:

When data is transmitted across any physical medium, the electrical, optical, or electromagnetic signals that represent bits encounter a hostile environment. The transmitted signal—whether a voltage level on a copper wire, a light pulse in fiber, or a radio wave through air—doesn't travel in isolation. It interacts with the environment, picking up unwanted energy (noise) that distorts the original waveform.

The receiver must interpret these corrupted signals and reconstruct the original binary data. When noise energy becomes comparable to signal energy, bit decisions become unreliable, and errors occur.

Fundamental Sources of Transmission Errors

•Thermal Noise (Johnson-Nyquist Noise) — Every conductor at a temperature above absolute zero generates random electrical noise due to thermal agitation of electrons. This is mathematically unavoidable and sets a fundamental limit on signal-to-noise ratios.
•Electromagnetic Interference (EMI) — External electromagnetic fields from power lines, motors, fluorescent lights, and other electronic devices couple into transmission media, adding spurious energy to signals.
•Crosstalk — In cable bundles, signals in one wire induce voltages in adjacent wires through electromagnetic coupling. This inter-channel interference increases with frequency and cable density.
•Impulse Noise — Sudden, high-amplitude disturbances from lightning, switching transients, or power surges create brief but intense signal corruption that can flip multiple consecutive bits.
•Attenuation and Distortion — As signals travel through media, they lose energy (attenuation) and different frequency components travel at different speeds (dispersion), causing signal distortion that complicates bit detection.
•Multipath Fading — In wireless transmission, signals reflecting off surfaces arrive at the receiver via multiple paths with different delays, causing constructive and destructive interference that rapidly varies signal strength.

The Inescapable Nature of Errors

No amount of engineering can eliminate transmission errors entirely. We can reduce error rates through better cables, stronger signals, and sophisticated modulation, but errors will always occur at some non-zero rate. The question is not if errors happen, but how often and how we handle them.

Bit Error Rate and Error Patterns

Understanding error control requires a quantitative framework for describing how errors occur. The primary metric is the Bit Error Rate (BER), defined as the probability that any individual transmitted bit will be received incorrectly.

Formal Definition:

$$\text{BER} = \frac{\text{Number of Erroneous Bits}}{\text{Total Bits Transmitted}}$$

Typical BER Values by Medium:

Bit Error Rates Across Different Transmission Media
Transmission Medium	Typical BER Range	Errors per 1 TB Transmitted
Fiber Optic (long-haul)	10⁻¹² to 10⁻¹⁵	0.007 to 7 errors
Fiber Optic (metro)	10⁻⁹ to 10⁻¹²	7 to 7,000 errors
Ethernet (Cat 6)	10⁻¹⁰ to 10⁻¹²	0.7 to 70 errors
Wireless LAN (802.11)	10⁻⁵ to 10⁻⁸	70 million to 70,000 errors
Satellite Link	10⁻⁵ to 10⁻⁷	70 million to 700,000 errors
Mobile Cellular (4G/5G)	10⁻³ to 10⁻⁶	7 billion to 7 million errors

Single-Bit vs. Burst Errors:

Errors don't occur uniformly. Understanding error patterns is as important as understanding error rates:

Single-Bit Errors:

Only one bit in a data unit is changed
Caused by white Gaussian noise (thermal noise)
More common in high-SNR environments
Relatively easy to detect and correct

Burst Errors:

Multiple consecutive bits are corrupted
Caused by impulse noise, deep fades, or interference
A burst of length n means at least the first and last of n consecutive bits are in error
Much harder to detect and correct
Dominate in wireless and noisy environments

Why Burst Errors Are Particularly Dangerous

A burst error that corrupts 15 consecutive bits might include all 16 bits of a TCP checksum or the entire sequence number field. If not detected, this could cause data to be delivered to the wrong socket or out of order—catastrophic failures masked as valid data. Error control mechanisms must be designed to handle bursts, not just random single-bit errors.

Mathematical Insight: Error Probability in Frames

•Given a BER of p and a frame of n bits, the probability of the frame being error-free (assuming independent errors) is: (1 - p)ⁿ
•For a 1500-byte Ethernet frame (12,000 bits) with BER = 10⁻⁶: P(error-free) = (0.999999)^12000 ≈ 0.988 or 1.2% frame error rate
•For wireless with BER = 10⁻⁴ and same frame: P(error-free) = (0.9999)^12000 ≈ 0.30 or 70% frame error rate
•This mathematical reality explains why wireless protocols use aggressive error control while wired Ethernet relies more on end-to-end TCP retransmission.

Consequences of Unchecked Errors

What happens when transmission errors go undetected and uncorrected? The consequences range from minor inconveniences to catastrophic system failures. Understanding these consequences illuminates why error control cannot be optional.

Data Integrity Corruption:

Without error control, corrupted data would be delivered to applications as if it were correct. Consider the implications across different domains:

Impact of Undetected Errors Across Domains
Domain	Type of Data	Error Impact	Real-World Consequence
Finance	Transaction amounts	Changed digits	$10,000 becomes $90,000; incorrect balances
Healthcare	Patient records	Corrupted values	Wrong dosage administered; misdiagnosis
Aviation	Flight control data	Sensor readings flipped	Aircraft system malfunctions
Database	Index structures	Pointer corruption	Data loss, database corruption
Firmware Updates	Executable code	Instruction changes	Device bricking, security vulnerabilities
Scientific Research	Experimental data	Measurement alterations	Invalid conclusions, wasted research

Protocol Breakdown:

Beyond application data, errors can corrupt protocol headers themselves, causing systematic communication failures:

Source/Destination Address Corruption: Frames delivered to wrong recipients, causing data leaks or connection failures
Length Field Corruption: Receivers misinterpret frame boundaries, causing frame loss or buffer overflows
Sequence Number Corruption: Out-of-order delivery, duplicate processing, or gap creation in reliable streams
Checksum Field Corruption: Paradoxically, corrupted checksums might make corrupted data appear valid (defeating detection)
Protocol Type Corruption: Data interpreted by wrong layer-3 protocol, causing undefined behavior

The Silent Failure Problem

The most dangerous errors are those that go undetected. A detected error can be handled—retransmitted, flagged, or escalated. An undetected error propagates silently through the system, corrupting caches, databases, and application state. By the time symptoms appear, the root cause may be impossible to trace. Error detection is therefore even more critical than error correction.

Cascade Effects of Uncontrolled Errors

•Session Termination — Repeated protocol violations from header corruption cause connections to be torn down, disrupting user sessions.
•Retransmission Storms — Upper layers detecting timeouts trigger excessive retransmissions, amplifying network load and causing congestion.
•Resource Exhaustion — Corrupted control messages can trick systems into allocating excessive resources, causing denial of service.
•Security Breaches — Carefully crafted errors (or random ones that happen to be exploitable) can bypass authentication or authorization checks.
•Data Replication Failure — Distributed systems propagate corrupted data to replicas, making the corruption permanent and widespread.

Why Error Control at the Data Link Layer?

Error control could theoretically occur at any layer of the protocol stack. Transport layer protocols like TCP include error detection and retransmission. Application protocols can add their own integrity checks. So why is the Data Link Layer specifically tasked with error control?

The Principle of Local Recovery:

Errors are best handled close to where they occur. The Data Link Layer operates on individual network hops—the direct connection between adjacent nodes (host-to-switch, switch-to-switch, switch-to-host). When an error occurs on a single link, the DLL can detect and potentially correct it before the damage propagates through the network.

Without DLL Error Control

•Error on hop 1 propagates through all subsequent hops
•End-to-end protocol must detect error after full RTT
•Retransmission traverses entire path again
•Wasted bandwidth on all intermediate links
•Higher latency for error recovery
•Compound probability: errors on any hop affect entire path

With DLL Error Control

•Error on hop 1 detected and handled locally
•Immediate detection (within hop RTT)
•Retransmission only on the affected link
•Other links unaffected
•Low-latency local recovery
•Errors handled per-hop, reducing compound probability

Efficiency Considerations:

Consider a 10-hop path where each link has a 1% frame error rate (typical of noisy wireless). Without per-hop error control:

End-to-end success probability: (0.99)¹⁰ ≈ 90.4%
End-to-end retransmissions consume full path bandwidth

With per-hop error control and local retransmission:

Each hop handles its own errors independently
Only the affected link sees retransmission traffic
Effective end-to-end reliability approaches 99.9%+ with minimal overhead

The Layered Responsibility Model:

The Data Link Layer error control isn't meant to replace transport-layer reliability—it supplements it. The DLL handles the high-frequency, low-latency errors from physical transmission, while TCP handles end-to-end semantics, congestion control, and application-visible reliability. This division of labor optimizes both performance and reliability.

Link-Layer vs. Transport-Layer Error Control

DLL error control handles hop-by-hop reliability for physical transmission errors. Transport-layer error control (TCP) handles end-to-end reliability including packet loss from congestion, routing failures, and buffering. Both are necessary; neither alone is sufficient. The interaction between these layers is a fundamental topic in network design.

The Two Pillars: Detection and Correction

Error control encompasses two fundamentally different approaches, each with distinct capabilities, costs, and appropriate use cases:

Error Detection:

The receiver determines whether an error has occurred without knowing what the error was or how to fix it. Upon detecting an error, the receiver typically requests retransmission of the corrupted data.

Error Correction (Forward Error Correction - FEC):

The transmitter adds sufficient redundancy that the receiver can not only detect errors but also determine the correct original data without retransmission.

Comparing Error Detection vs. Error Correction
Characteristic	Error Detection	Error Correction (FEC)
Redundancy Required	Low (typically 2-4 bytes)	High (10-30%+ of data)
Bandwidth Efficiency	High when errors rare	Lower due to added redundancy
Latency Impact	Adds RTT for retransmission	No retransmission delay
Best For	Reliable channels (wired)	Unreliable/high-latency (satellite, wireless)
Real-Time Suitability	Poor (retransmission delay)	Excellent (immediate correction)
Complexity	Simple to implement	Mathematically sophisticated
Examples	CRC, Checksum, Parity	Hamming Codes, Reed-Solomon, LDPC

When to Use Each Approach:

Detection with Retransmission (ARQ) is preferred when:

The channel has low error rates (retransmissions are rare)
Round-trip time is short (retransmission penalty is acceptable)
Bandwidth is precious and shouldn't be wasted on redundancy
Example: Ethernet LANs, fiber backbones

Forward Error Correction is preferred when:

Retransmission is impractical (satellite with 500ms+ RTT)
Real-time delivery is required (video streaming, voice)
Channel conditions are severe (deep space communication)
Errors are predictable and FEC can be tuned
Example: DVB-S2 satellite TV, CD/DVD storage, 5G wireless

Hybrid Approaches:

Modern systems often combine both: FEC corrects the common minor errors without delay, while ARQ handles the rare cases where FEC fails. This provides both low latency and high reliability.

The Detection-Correction Tradeoff

There's a fundamental tradeoff: error correction requires more redundancy than error detection. A code that can detect 4 errors might only correct 2. Designers must balance the costs of redundancy (reduced throughput) against the costs of retransmission (increased latency). This tradeoff drives much of the sophistication in modern link-layer protocols.

Error Control in Practice: Real Protocol Mechanisms

Having established why error control is necessary, let's preview how real protocols implement it. This sets the stage for the detailed mechanisms we'll explore in subsequent pages.

Error Detection Mechanisms:

Common Error Detection Techniques

•Parity Bits — Simplest form; adds one bit to make 1-count even or odd. Detects single-bit errors only. Used historically in memory ECC.
•Checksums — Sum of data words with overflow wrap-around. Used by IP, TCP, UDP. Simple but weak against some error patterns.
•Cyclic Redundancy Check (CRC) — Polynomial division generating a remainder. Extremely powerful burst error detection. Used by Ethernet, HDLC, USB.
•Hash Functions — Cryptographic hashes (SHA, MD5) for integrity verification. Overkill for transmission errors but provides tamper detection.

Error Handling Strategies:

Once an error is detected, the protocol must decide what to do:

Error Handling Options

•Silent Discard — Drop the corrupted frame without notification. Upper layers will handle retransmission through their own reliability mechanisms.
•Explicit NAK — Send a Negative Acknowledgment to the sender requesting immediate retransmission of the specific erroneous frame.
•Timeout-Based — Rely on the sender's timeout; if no ACK arrives, sender retransmits. Simpler but adds latency.
•Selective Rejection — Accept subsequent correct frames while specifically requesting retransmission of the error frame (used in Selective Repeat ARQ).
•Correct and Accept — If FEC is used, correct the error and accept the frame without any retransmission signaling.

Preview: ARQ Protocols

The detailed mechanisms for coordinating retransmissions—Automatic Repeat Request (ARQ) protocols—form a major topic in error control. We'll explore Stop-and-Wait, Go-Back-N, and Selective Repeat ARQ protocols in depth, including their efficiency analysis, sequence number requirements, and implementation tradeoffs.

Layer Interactions and Design Considerations

Error control at the Data Link Layer doesn't exist in isolation—it interacts with mechanisms at other layers in complex ways. Understanding these interactions is crucial for designing efficient, robust protocols.

Interaction with Physical Layer:

The physical layer provides the raw bit pipe, and its characteristics directly determine DLL error control requirements:

Modulation scheme affects BER and error patterns
Signal power influences error rates (higher power = fewer errors, but limited by regulations and hardware)
Coding at physical layer (e.g., 4B/5B, 8B/10B) provides some error detection through invalid symbol detection
Equalization and timing recovery quality affects whether errors are random or bursty

Interaction with Network Layer:

The network layer assumes a reasonably reliable hop-by-hop transmission:

IP header checksum protects routing information but not payload
TTL decrement assumes frames aren't duplicated by DLL retransmission bugs
Fragmentation relies on DLL delivering complete frames or nothing (no partial frames)

Interaction with Transport Layer:

TCP's design assumes certain behaviors from lower layers:

Reordering tolerance — TCP handles some reordering, so DLL isn't strictly required to deliver in-order
Duplicate handling — TCP's sequence numbers handle duplicates, so DLL generating occasional duplicates from retransmission uncertainty is acceptable
End-to-end principle — TCP provides ultimate reliability; DLL optimization reduces retransmission overhead, improving throughput

The ACK Implosion Problem

Over-engineering DLL reliability can backfire. If the DLL retransmits aggressively while TCP simultaneously retransmits, you get redundant copies consuming bandwidth. If DLL buffers frames during retransmission, you increase latency and jitter. Modern design carefully limits DLL retransmission attempts to avoid interference with TCP congestion control.

Design Principles for DLL Error Control

•Fail Fast — Detect errors quickly and escalate to higher layers rather than attempting indefinite recovery at the DLL.
•Limit Retransmissions — Bound retransmission attempts (e.g., maximum 7 times) to prevent stale data and allow congestion control to operate.
•Preserve Order Where Possible — Especially for interactive traffic; reordering causes head-of-line blocking at TCP.
•Minimize State — DLL devices (switches, NICs) process millions of frames; state per-frame must be tiny.
•Hardware Acceleration — CRC computation and error detection should be in hardware for line-rate processing.

Summary: The Foundation of Reliable Communication

We've established the fundamental rationale for error control at the Data Link Layer. Let's consolidate the key insights:

Key Takeaways

•Physical channels are inherently noisy — Thermal noise, EMI, crosstalk, and impulse noise make perfect transmission impossible. Errors are not bugs; they're physics.
•Bit Error Rate (BER) quantifies reliability — Different media exhibit vastly different BERs, from 10⁻¹⁵ for fiber to 10⁻³ for mobile wireless. Protocol design must match channel characteristics.
•Errors manifest as single-bit or burst patterns — Burst errors are particularly dangerous as they can corrupt entire protocol headers or critical data fields.
•Undetected errors are catastrophic — Silent data corruption propagates through systems, corrupting databases, crashing applications, and potentially endangering lives.
•DLL error control enables local recovery — Per-hop error handling prevents error propagation, reduces retransmission overhead, and improves end-to-end efficiency.
•Detection and correction serve different needs — Detection with retransmission optimizes bandwidth; FEC optimizes latency. Hybrid approaches balance both.
•Error control interacts with all layers — Effective design considers physical layer characteristics, network layer assumptions, and transport layer reliability mechanisms.

What's Next:

With the foundation established, we'll dive into the mechanics of error detection. The next page explores how techniques like parity, checksums, and CRC compute redundancy codes that reveal the presence of errors—the first step toward reliable communication.

Page Complete

You now understand why error control is fundamental to the Data Link Layer. The noisy nature of physical transmission, the devastating consequences of unchecked errors, and the efficiency of local error handling all motivate the sophisticated mechanisms we'll explore throughout this module. Next, we'll examine error detection techniques in detail.

1 / 5

Loading learning content...

Computer NetworksError Control

Error Control in the Data Link Layer

LevelIntermediate

Duration60 mins

TopicError Control

1 / 5

Error Control Need

The Unreliable Reality of Physical Transmission

This page addresses a profound question that underpins all reliable networking: Why do we need error control at the Data Link Layer, and what makes it so critically important?

What You Will Learn

The Physical Reality of Transmission

The Signal-to-Noise Challenge:

The receiver must interpret these corrupted signals and reconstruct the original binary data. When noise energy becomes comparable to signal energy, bit decisions become unreliable, and errors occur.

Fundamental Sources of Transmission Errors

•Thermal Noise (Johnson-Nyquist Noise) — Every conductor at a temperature above absolute zero generates random electrical noise due to thermal agitation of electrons. This is mathematically unavoidable and sets a fundamental limit on signal-to-noise ratios.
•Electromagnetic Interference (EMI) — External electromagnetic fields from power lines, motors, fluorescent lights, and other electronic devices couple into transmission media, adding spurious energy to signals.
•Crosstalk — In cable bundles, signals in one wire induce voltages in adjacent wires through electromagnetic coupling. This inter-channel interference increases with frequency and cable density.
•Impulse Noise — Sudden, high-amplitude disturbances from lightning, switching transients, or power surges create brief but intense signal corruption that can flip multiple consecutive bits.
•Attenuation and Distortion — As signals travel through media, they lose energy (attenuation) and different frequency components travel at different speeds (dispersion), causing signal distortion that complicates bit detection.
•Multipath Fading — In wireless transmission, signals reflecting off surfaces arrive at the receiver via multiple paths with different delays, causing constructive and destructive interference that rapidly varies signal strength.

The Inescapable Nature of Errors

Bit Error Rate and Error Patterns

Formal Definition:

$$\text{BER} = \frac{\text{Number of Erroneous Bits}}{\text{Total Bits Transmitted}}$$

Typical BER Values by Medium:

Bit Error Rates Across Different Transmission Media
Transmission Medium	Typical BER Range	Errors per 1 TB Transmitted
Fiber Optic (long-haul)	10⁻¹² to 10⁻¹⁵	0.007 to 7 errors
Fiber Optic (metro)	10⁻⁹ to 10⁻¹²	7 to 7,000 errors
Ethernet (Cat 6)	10⁻¹⁰ to 10⁻¹²	0.7 to 70 errors
Wireless LAN (802.11)	10⁻⁵ to 10⁻⁸	70 million to 70,000 errors
Satellite Link	10⁻⁵ to 10⁻⁷	70 million to 700,000 errors
Mobile Cellular (4G/5G)	10⁻³ to 10⁻⁶	7 billion to 7 million errors

Single-Bit vs. Burst Errors:

Errors don't occur uniformly. Understanding error patterns is as important as understanding error rates:

Single-Bit Errors:

Only one bit in a data unit is changed
Caused by white Gaussian noise (thermal noise)
More common in high-SNR environments
Relatively easy to detect and correct

Burst Errors:

Multiple consecutive bits are corrupted
Caused by impulse noise, deep fades, or interference
A burst of length n means at least the first and last of n consecutive bits are in error
Much harder to detect and correct
Dominate in wireless and noisy environments

Why Burst Errors Are Particularly Dangerous

Mathematical Insight: Error Probability in Frames

•Given a BER of p and a frame of n bits, the probability of the frame being error-free (assuming independent errors) is: (1 - p)ⁿ
•For a 1500-byte Ethernet frame (12,000 bits) with BER = 10⁻⁶: P(error-free) = (0.999999)^12000 ≈ 0.988 or 1.2% frame error rate
•For wireless with BER = 10⁻⁴ and same frame: P(error-free) = (0.9999)^12000 ≈ 0.30 or 70% frame error rate
•This mathematical reality explains why wireless protocols use aggressive error control while wired Ethernet relies more on end-to-end TCP retransmission.

Consequences of Unchecked Errors

Data Integrity Corruption:

Without error control, corrupted data would be delivered to applications as if it were correct. Consider the implications across different domains:

Impact of Undetected Errors Across Domains
Domain	Type of Data	Error Impact	Real-World Consequence
Finance	Transaction amounts	Changed digits	$10,000 becomes $90,000; incorrect balances
Healthcare	Patient records	Corrupted values	Wrong dosage administered; misdiagnosis
Aviation	Flight control data	Sensor readings flipped	Aircraft system malfunctions
Database	Index structures	Pointer corruption	Data loss, database corruption
Firmware Updates	Executable code	Instruction changes	Device bricking, security vulnerabilities
Scientific Research	Experimental data	Measurement alterations	Invalid conclusions, wasted research

Protocol Breakdown:

Beyond application data, errors can corrupt protocol headers themselves, causing systematic communication failures:

Source/Destination Address Corruption: Frames delivered to wrong recipients, causing data leaks or connection failures
Length Field Corruption: Receivers misinterpret frame boundaries, causing frame loss or buffer overflows
Sequence Number Corruption: Out-of-order delivery, duplicate processing, or gap creation in reliable streams
Checksum Field Corruption: Paradoxically, corrupted checksums might make corrupted data appear valid (defeating detection)
Protocol Type Corruption: Data interpreted by wrong layer-3 protocol, causing undefined behavior

The Silent Failure Problem

Cascade Effects of Uncontrolled Errors

•Session Termination — Repeated protocol violations from header corruption cause connections to be torn down, disrupting user sessions.
•Retransmission Storms — Upper layers detecting timeouts trigger excessive retransmissions, amplifying network load and causing congestion.
•Resource Exhaustion — Corrupted control messages can trick systems into allocating excessive resources, causing denial of service.
•Security Breaches — Carefully crafted errors (or random ones that happen to be exploitable) can bypass authentication or authorization checks.
•Data Replication Failure — Distributed systems propagate corrupted data to replicas, making the corruption permanent and widespread.

Why Error Control at the Data Link Layer?

The Principle of Local Recovery:

Without DLL Error Control

•Error on hop 1 propagates through all subsequent hops
•End-to-end protocol must detect error after full RTT
•Retransmission traverses entire path again
•Wasted bandwidth on all intermediate links
•Higher latency for error recovery
•Compound probability: errors on any hop affect entire path

With DLL Error Control

•Error on hop 1 detected and handled locally
•Immediate detection (within hop RTT)
•Retransmission only on the affected link
•Other links unaffected
•Low-latency local recovery
•Errors handled per-hop, reducing compound probability

Efficiency Considerations:

Consider a 10-hop path where each link has a 1% frame error rate (typical of noisy wireless). Without per-hop error control:

End-to-end success probability: (0.99)¹⁰ ≈ 90.4%
End-to-end retransmissions consume full path bandwidth

With per-hop error control and local retransmission:

Each hop handles its own errors independently
Only the affected link sees retransmission traffic
Effective end-to-end reliability approaches 99.9%+ with minimal overhead

The Layered Responsibility Model:

Link-Layer vs. Transport-Layer Error Control

The Two Pillars: Detection and Correction

Error control encompasses two fundamentally different approaches, each with distinct capabilities, costs, and appropriate use cases:

Error Detection:

Error Correction (Forward Error Correction - FEC):

The transmitter adds sufficient redundancy that the receiver can not only detect errors but also determine the correct original data without retransmission.

Comparing Error Detection vs. Error Correction
Characteristic	Error Detection	Error Correction (FEC)
Redundancy Required	Low (typically 2-4 bytes)	High (10-30%+ of data)
Bandwidth Efficiency	High when errors rare	Lower due to added redundancy
Latency Impact	Adds RTT for retransmission	No retransmission delay
Best For	Reliable channels (wired)	Unreliable/high-latency (satellite, wireless)
Real-Time Suitability	Poor (retransmission delay)	Excellent (immediate correction)
Complexity	Simple to implement	Mathematically sophisticated
Examples	CRC, Checksum, Parity	Hamming Codes, Reed-Solomon, LDPC

When to Use Each Approach:

Detection with Retransmission (ARQ) is preferred when:

The channel has low error rates (retransmissions are rare)
Round-trip time is short (retransmission penalty is acceptable)
Bandwidth is precious and shouldn't be wasted on redundancy
Example: Ethernet LANs, fiber backbones

Forward Error Correction is preferred when:

Retransmission is impractical (satellite with 500ms+ RTT)
Real-time delivery is required (video streaming, voice)
Channel conditions are severe (deep space communication)
Errors are predictable and FEC can be tuned
Example: DVB-S2 satellite TV, CD/DVD storage, 5G wireless

Hybrid Approaches:

Modern systems often combine both: FEC corrects the common minor errors without delay, while ARQ handles the rare cases where FEC fails. This provides both low latency and high reliability.

The Detection-Correction Tradeoff

Error Control in Practice: Real Protocol Mechanisms

Having established why error control is necessary, let's preview how real protocols implement it. This sets the stage for the detailed mechanisms we'll explore in subsequent pages.

Error Detection Mechanisms:

Common Error Detection Techniques

•Parity Bits — Simplest form; adds one bit to make 1-count even or odd. Detects single-bit errors only. Used historically in memory ECC.
•Checksums — Sum of data words with overflow wrap-around. Used by IP, TCP, UDP. Simple but weak against some error patterns.
•Cyclic Redundancy Check (CRC) — Polynomial division generating a remainder. Extremely powerful burst error detection. Used by Ethernet, HDLC, USB.
•Hash Functions — Cryptographic hashes (SHA, MD5) for integrity verification. Overkill for transmission errors but provides tamper detection.

Error Handling Strategies:

Once an error is detected, the protocol must decide what to do:

Error Handling Options

•Silent Discard — Drop the corrupted frame without notification. Upper layers will handle retransmission through their own reliability mechanisms.
•Explicit NAK — Send a Negative Acknowledgment to the sender requesting immediate retransmission of the specific erroneous frame.
•Timeout-Based — Rely on the sender's timeout; if no ACK arrives, sender retransmits. Simpler but adds latency.
•Selective Rejection — Accept subsequent correct frames while specifically requesting retransmission of the error frame (used in Selective Repeat ARQ).
•Correct and Accept — If FEC is used, correct the error and accept the frame without any retransmission signaling.

Preview: ARQ Protocols

Layer Interactions and Design Considerations

Interaction with Physical Layer:

The physical layer provides the raw bit pipe, and its characteristics directly determine DLL error control requirements:

Modulation scheme affects BER and error patterns
Signal power influences error rates (higher power = fewer errors, but limited by regulations and hardware)
Coding at physical layer (e.g., 4B/5B, 8B/10B) provides some error detection through invalid symbol detection
Equalization and timing recovery quality affects whether errors are random or bursty

Interaction with Network Layer:

The network layer assumes a reasonably reliable hop-by-hop transmission:

IP header checksum protects routing information but not payload
TTL decrement assumes frames aren't duplicated by DLL retransmission bugs
Fragmentation relies on DLL delivering complete frames or nothing (no partial frames)

Interaction with Transport Layer:

TCP's design assumes certain behaviors from lower layers:

Reordering tolerance — TCP handles some reordering, so DLL isn't strictly required to deliver in-order
Duplicate handling — TCP's sequence numbers handle duplicates, so DLL generating occasional duplicates from retransmission uncertainty is acceptable
End-to-end principle — TCP provides ultimate reliability; DLL optimization reduces retransmission overhead, improving throughput

The ACK Implosion Problem

Design Principles for DLL Error Control

•Fail Fast — Detect errors quickly and escalate to higher layers rather than attempting indefinite recovery at the DLL.
•Limit Retransmissions — Bound retransmission attempts (e.g., maximum 7 times) to prevent stale data and allow congestion control to operate.
•Preserve Order Where Possible — Especially for interactive traffic; reordering causes head-of-line blocking at TCP.
•Minimize State — DLL devices (switches, NICs) process millions of frames; state per-frame must be tiny.
•Hardware Acceleration — CRC computation and error detection should be in hardware for line-rate processing.

Summary: The Foundation of Reliable Communication

We've established the fundamental rationale for error control at the Data Link Layer. Let's consolidate the key insights:

Key Takeaways

•Physical channels are inherently noisy — Thermal noise, EMI, crosstalk, and impulse noise make perfect transmission impossible. Errors are not bugs; they're physics.
•Bit Error Rate (BER) quantifies reliability — Different media exhibit vastly different BERs, from 10⁻¹⁵ for fiber to 10⁻³ for mobile wireless. Protocol design must match channel characteristics.
•Errors manifest as single-bit or burst patterns — Burst errors are particularly dangerous as they can corrupt entire protocol headers or critical data fields.
•Undetected errors are catastrophic — Silent data corruption propagates through systems, corrupting databases, crashing applications, and potentially endangering lives.
•DLL error control enables local recovery — Per-hop error handling prevents error propagation, reduces retransmission overhead, and improves end-to-end efficiency.
•Detection and correction serve different needs — Detection with retransmission optimizes bandwidth; FEC optimizes latency. Hybrid approaches balance both.
•Error control interacts with all layers — Effective design considers physical layer characteristics, network layer assumptions, and transport layer reliability mechanisms.

What's Next:

Page Complete

1 / 5