Loading learning content...
Binary Exponential Backoff provides an elegant mechanism for collision resolution—stations back off for increasingly long periods, spreading their retransmission attempts across time. But what if, despite this mechanism, a station continues to experience collisions? Should it retry forever?
The answer is definitively no. Ethernet imposes a maximum retry limit of 16 collisions. After experiencing 16 consecutive collisions on a single frame, the station abandons its transmission attempt and reports failure to the upper layer. This limit is not a design flaw—it's a crucial safety mechanism that prevents pathological behavior, bounds worst-case latency, and maintains network health under extreme conditions.
This page explores why the 16-retry limit exists, the mathematical probability of reaching this limit, what happens when a frame is discarded, how errors propagate to upper layers, and the network conditions that might cause excessive collisions.
IEEE 802.3 specifies that a station must attempt transmission up to 16 times (one initial transmission plus 15 retries, or equivalently, up to 16 collision events). If all 16 attempts result in collisions, the frame is discarded and the station reports an Excessive Collision Error to the MAC client (typically the network layer).
Official Specification:
attemptLimit = 16
IF collision_count > attemptLimit THEN
Discard frame
Report excessiveCollisionError to MAC client
Reset collision_count to 0
Proceed to next frame in queue (if any)
END IF
| Collision Count | Action | Backoff Window | Status |
|---|---|---|---|
| 0 | Initial transmission | N/A | Transmitting |
| 1-10 | Retry with growing window | [0, 2^n - 1] | Normal retry |
| 11-15 | Retry with max window | [0, 1023] | Extended retry |
| 16 | Retry with max window | [0, 1023] | Final attempt |
16 | ABORT | N/A | Frame discarded |
Why 16 Specifically?
The choice of 16 as the maximum retry count balances several factors:
Statistical Adequacy: After 16 attempts with a window of 1024 slots, a two-station collision has been given ample opportunity to resolve. The probability of 16 consecutive collisions between two stations is astronomically small under normal conditions.
Bounded Worst-Case Delay: 16 retries bounds the maximum time a frame can consume the station's transmission resources. This prevents a single problematic frame from blocking the queue indefinitely.
Network Health Signal: Reaching 16 collisions is so improbable under normal operation that it signals severe network problems (faulty equipment, gross misconfiguration, or malicious behavior). Continuing to retry would likely be futile.
Historical Consistency: The limit of 16 has been part of Ethernet since the earliest DIX standards and represents decades of practical validation.
If you observe excessive collision errors in your network, do not simply dismiss them. This error indicates something is fundamentally wrong—possibly a malfunctioning NIC, cable issues, network design violations, or even a denial-of-service attack.
Understanding the probability of 16 consecutive collisions reveals why the retry limit is rarely reached in healthy networks—and why reaching it is a red flag.
Two-Station Collision Probability:
Consider the simplest case: two stations colliding repeatedly. After each collision n, both select a random value from [0, 2^min(n,10) - 1]. They collide again only if they select the same value.
123456789101112131415161718192021222324
P(16 consecutive collisions | two stations) After collision 1: P(re-collision) = 1/2After collision 2: P(re-collision) = 1/4After collision 3: P(re-collision) = 1/8After collision 4: P(re-collision) = 1/16After collision 5: P(re-collision) = 1/32After collision 6: P(re-collision) = 1/64After collision 7: P(re-collision) = 1/128After collision 8: P(re-collision) = 1/256After collision 9: P(re-collision) = 1/512After collision 10: P(re-collision) = 1/1024After collision 11-16: P(re-collision) = 1/1024 (truncated window) P(16 consecutive collisions) = (1/2) × (1/4) × (1/8) × ... × (1/512) × (1/1024)^7= 1 / (2 × 4 × 8 × 16 × 32 × 64 × 128 × 256 × 512 × 1024^7)= 1 / (2^(1+2+3+4+5+6+7+8+9) × 2^(10×7))= 1 / (2^45 × 2^70)= 1 / 2^115≈ 2.4 × 10^-35 This is effectively zero—you would need to wait longer than the age of the universe to observe this by random chance.The probability 2.4 × 10^-35 is so small as to be physically impossible in practice. For comparison, the probability of a cosmic ray flipping a bit in your RAM is about 10^-15 per byte per month—ten billion trillion times more likely than 16 consecutive random collisions.
With More Stations:
As the number of contending stations increases, the probability of extended collision sequences rises:
| Stations | P(10 collisions) | P(16 collisions) | Interpretation |
|---|---|---|---|
| 2 | ~10^-15 | ~10^-35 | Essentially impossible |
| 10 | ~10^-8 | ~10^-20 | Once per trillion years |
| 100 | ~10^-3 | ~10^-10 | Rare but measurable |
| 1000 | ~10^-1 | ~10^-3 | May occur under heavy load |
Note: These are rough approximations. The exact calculations depend on collision patterns and window utilization.
Implication:
If your network experiences excessive collision errors:
When a frame experiences 16 consecutive collisions and is discarded, the Ethernet MAC layer must report this failure to its client—typically the network layer (IP) or a higher-layer protocol. This reporting follows a well-defined interface.
MAC Service Primitive:
The IEEE 802.3 standard defines the interface between the MAC sublayer and its client through service primitives. When excessive collisions occur, the MAC reports:
MA_DATA.confirm (
destination_address,
status = EXCESSIVE_COLLISION_ERROR
)
This primitive indicates that the frame addressed to the specified destination could not be delivered due to excessive collisions.
| Status | Meaning | MAC Action | Typical Cause |
|---|---|---|---|
| SUCCESS | Frame transmitted successfully | Normal completion | Normal operation |
| EXCESSIVE_COLLISION_ERROR | Frame discarded after 16 collisions | Frame dropped | Network problem |
| CARRIER_SENSE_ERROR | Carrier never returned to idle | Frame dropped | Medium stuck busy |
| UNDERRUN_ERROR | Data not provided fast enough | Frame dropped | System too slow |
What Higher Layers See:
IP Layer: Receives notification that the datagram was not transmitted. IP itself is unreliable and typically does not retry—it simply discards the datagram. The sender is not notified at the IP level.
TCP Layer: TCP uses its own acknowledgment mechanism. If an Ethernet frame (carrying a TCP segment) is discarded, TCP will not receive an ACK and will eventually retransmit. TCP handles the loss transparently.
UDP Layer: Like IP, UDP provides no reliability. The datagram is simply lost. Applications using UDP must handle potential losses themselves.
Application Layer: Unless using reliable transport (TCP), the application may not know the frame was lost until the expected response times out.
123456789101112131415161718192021222324252627
Layer Behavior When Ethernet Reports Excessive Collisions═══════════════════════════════════════════════════════════════════════ Ethernet MAC: Discard frame, increment excessiveCollisions counter, report EXCESSIVE_COLLISION_ERROR to network layer Network (IP): Log error (if configured), discard datagram, No notification to sender (IP is unreliable) Transport: - TCP: Sender doesn't receive ACK → retransmit after timeout (Completely transparent to TCP applications) - UDP: Datagram lost, no retry (UDP is unreliable) Application may timeout waiting for response Application: - Over TCP: May notice slight delay (TCP retransmission time) Usually transparent unless extreme packet loss - Over UDP: Request lost, app may timeout and retry (if designed to) Example: DNS retry after few seconds Network Admin: - SNMP counters increase for excessiveCollisions - Monitoring systems may alert - Investigation required to find root causeUnlike some protocols, Ethernet provides no acknowledgment mechanism. The sender knows its frame was discarded (locally) but the original sender application—which may have initiated a request—has no direct notification. Only reliable protocols like TCP handle this through their own ACK mechanisms.
Excessive collisions don't occur randomly—they indicate specific network problems. Understanding potential causes helps with rapid diagnosis and resolution.
Hardware Faults:
Network Design Violations:
Software/Configuration Issues:
The single most common cause of excessive collisions in modern networks is duplex mismatch. When a switch port is set to full-duplex and the connected NIC is set to half-duplex (or vice versa), the half-duplex side will detect 'collisions' that the full-duplex side ignores. Always verify duplex settings match on both ends.
When excessive collisions occur—even occasionally—they have ripple effects throughout the network and application stack.
Direct Performance Impact:
Wasted Bandwidth: Each collision wastes the time spent transmitting partial frames, jam signals, and backoff periods. With 16 collisions:
Transmission Delay: The frame that eventually gives up has consumed ~30+ ms of the station's transmit time—time during which other frames queued.
Queue Backup: While a station struggles with one problematic frame, its transmit queue grows. After the frame is discarded, the queue must be drained, causing additional delay for subsequent frames.
| Component | Per Collision | 16 Collisions Total | Notes |
|---|---|---|---|
| Partial frame transmission | ~51.2 μs | ~820 μs | Up to 64 bytes each |
| Jam signal | ~3.2 μs | ~51 μs | 32 bits each |
| Average backoff (cumulative) | Variable | ~26 ms | Dominates total time |
| IFG between attempts | 9.6 μs | ~154 μs | 96 bit-times each |
| TOTAL | ~27 ms | Per discarded frame |
Indirect Effects:
TCP Retransmission Delays: When a TCP segment is lost due to excessive collisions, TCP waits for its retransmission timeout (typically 200ms-1s) before resending. This is orders of magnitude longer than the Ethernet-level delay.
Application Timeouts: Applications may time out waiting for responses that were lost to excessive collisions, causing user-visible failures.
Congestion Feedback Loops: TCP interprets lost segments as congestion and reduces its sending rate. With chronic excessive collisions, TCP throughput plummets.
Jitter: Variable collision resolution times introduce jitter (variation in delay), problematic for real-time applications like VoIP or video.
Even a 0.1% excessive collision rate can significantly degrade application performance. Those frames trigger TCP retransmissions (200ms+ delays), application timeouts, and congestion control slowdowns far exceeding the 27ms Ethernet-level impact.
Proactive monitoring for excessive collisions can prevent performance problems from escalating. Various tools and techniques help identify and diagnose collision issues.
NIC Statistics:
Every Ethernet NIC maintains counters for various events. Relevant counters for collision monitoring include:
| Counter | Meaning | Normal Value | Problem Threshold |
|---|---|---|---|
| singleCollisionFrames | Frames succeeded after 1 collision | Some expected | 5% of frames |
| multipleCollisionFrames | Frames succeeded after 2-15 collisions | Rare | 1% of frames |
| excessiveCollisions | Frames discarded after 16 collisions | Zero | Any non-zero |
| lateCollisions | Collisions detected after slot time | Zero | Any non-zero |
| deferredTransmissions | Frames delayed due to busy medium | Normal under load | 30% indicates congestion |
SNMP Monitoring:
In enterprise environments, SNMP (Simple Network Management Protocol) can poll these counters remotely:
IF-MIB::ifOutErrors - Total output errors (includes excessive collisions)
EtherLike-MIB::dot3StatsExcessiveCollisions - Excessive collision count
EtherLike-MIB::dot3StatsLateCollisions - Late collision count
EtherLike-MIB::dot3StatsSingleCollisionFrames
EtherLike-MIB::dot3StatsMultipleCollisionFrames
Network management systems can alert when these counters exceed thresholds.
Diagnostic Checklist:
When excessive collisions are detected:
For excessive collisions and late collisions, the normal value is exactly zero. Any non-zero count warrants investigation. These aren't events that should ever happen in a healthy network—they indicate definite problems.
The 16-retry limit has remained unchanged since Ethernet's earliest days. Understanding this persistence reveals how well the original design decisions hold up.
Original DIX Specification (1980):
The Digital-Intel-Xerox (DIX) Ethernet specification set the retry limit at 16, choosing a power of 2 for implementation convenience. The designers recognized:
16 satisfied all three criteria with margin to spare.
Why Has It Never Changed?
It Works: Decades of deployment have proven 16 is the right number. No compelling reason to change emerged.
Interoperability: Changing the limit would create compatibility issues between old and new equipment.
Irrelevance in Modern Networks: With switches eliminating collisions, the exact retry limit matters less. Even if changed, most devices never experience any collisions.
Conservative Engineering: Network standards favor stability. Without clear evidence of a problem, the standard body won't change a working parameter.
Contrast with Other Parameters:
Unlike the retry limit, other Ethernet parameters have evolved:
The retry limit's stability amid these changes demonstrates it was correctly specified from the beginning.
The 16-retry limit exemplifies successful protocol design: choose a value with sufficient margin that it never needs adjustment. The original designers got it exactly right, and 40+ years of exponential network growth haven't required a change.
We've comprehensively examined the maximum retry limit in Binary Exponential Backoff. Let's consolidate the essential understanding:
What's Next:
The final page of this module examines fairness considerations—how BEB affects different stations' chances of successful transmission, potential fairness issues, and how the protocol balances efficiency with equitable access.
You now understand the maximum retry mechanism in Ethernet—why it exists, how rarely it should be triggered, what causes it, and how to diagnose excessive collision problems. This knowledge is essential for network troubleshooting and understanding protocol reliability.