Loading learning content...
Digital communication systems face a fundamental paradox: to correctly interpret a stream of bits, the receiver must know precisely when each bit begins and ends—yet this timing information must somehow be embedded within the data itself. Without a separate clock signal (impractical for most applications), the receiver must extract timing from the data stream's transitions.
This creates an immediate problem: what happens when the data contains long sequences of identical bits? Consider the binary sequence 00000000 00000000: an NRZ signal would remain at a constant voltage for 16 bit periods. The receiver's clock, with its inherent drift, could easily slip by one or more bit positions, causing catastrophic synchronization failure.
4B/5B encoding emerged as an elegant solution to this problem. By mapping every 4-bit data nibble to a carefully selected 5-bit code word, 4B/5B guarantees that no transmitted sequence contains more than three consecutive zeros—ensuring adequate transitions for reliable clock recovery while imposing only 25% overhead (far less than Manchester's 100%).
By completing this page, you will understand the complete theory and implementation of 4B/5B encoding: the mathematical constraints that govern code design, the complete code table, special control symbols, DC balance properties, error detection capabilities, and its critical role in Fast Ethernet and FDDI. You'll gain the depth expected of a networking professional who understands physical layer design.
Block coding transforms data by mapping fixed-size groups (blocks) of input bits to larger groups of output bits. The extra bits—the redundancy—enable the encoding to satisfy constraints that raw data cannot guarantee.
In the general notation mB/nB:
For 4B/5B:
Design Goals:
The Mathematical Constraint:
To guarantee no more than 3 consecutive zeros across code word boundaries, each code word must satisfy:
Why? Because if code word A ends with 2 zeros and code word B starts with 1 zero, the concatenation AB has 2 + 1 = 3 consecutive zeros—our maximum allowed.
When designing block codes, you must consider not just individual code words, but all possible concatenations. A code word that looks fine in isolation (like 00110) might create run-length violations when followed by certain other code words. 4B/5B's leading/trailing zero constraint elegantly handles all possible combinations.
How many 5-bit patterns satisfy our constraints? Let's enumerate:
5-bit patterns = 32 total
Invalid patterns (more than 1 leading zero):
Invalid patterns (more than 2 trailing zeros):
Direct count of valid patterns: Patterns with 0 leading zeros: 1xxxx (16 patterns) Patterns with 1 leading zero: 01xxx (8 patterns) Total first-constraint compliant: 24 patterns
From these, eliminate trailing-zero violations:
Result: 23 valid code words
Of these:
| Data (Hex) | Data (Binary) | 4B/5B Code | of 1s | Notes |
|---|---|---|---|---|
| 0 | 0000 | 11110 | 4 | High transition density |
| 1 | 0001 | 01001 | 2 | Valid: 1 leading zero |
| 2 | 0010 | 10100 | 2 | Balanced |
| 3 | 0011 | 10101 | 3 | Balanced |
| 4 | 0100 | 01010 | 2 | Valid: 1 leading zero |
| 5 | 0101 | 01011 | 3 | Valid: 1 leading zero |
| 6 | 0110 | 01110 | 3 | Valid: 1 leading zero |
| 7 | 0111 | 01111 | 4 | Valid: 1 leading zero |
| 8 | 1000 | 10010 | 2 | Balanced |
| 9 | 1001 | 10011 | 3 | Balanced |
| A | 1010 | 10110 | 3 | Balanced |
| B | 1011 | 10111 | 4 | Balanced |
| C | 1100 | 11010 | 3 | Balanced |
| D | 1101 | 11011 | 4 | Balanced |
| E | 1110 | 11100 | 3 | 2 trailing zeros (max) |
| F | 1111 | 11101 | 4 | Balanced |
Beyond the 16 data symbols, 4B/5B reserves additional code words for link control functions. These control symbols manage frame boundaries, link status, and error conditions—essential functions that would otherwise require out-of-band signaling.
| Symbol | 4B/5B Code | Function | Usage Context |
|---|---|---|---|
| I (Idle) | 11111 | Link Idle | Transmitted between frames to maintain synchronization |
| J (Start 1) | 11000 | Start of Frame (1st) | First symbol of preamble, used with K |
| K (Start 2) | 10001 | Start of Frame (2nd) | Second symbol of preamble, follows J |
| T (Terminate) | 01101 | End of Frame | Marks end of data in frame |
| R (Reserved) | 00111 | Reserved/Reset | Link reset or reserved for future use |
| S (Set) | 11001 | Reserved | Reserved for future use |
| H (Halt) | 00100 | Halt | Signals intentional halt (FDDI) |
The J-K Start Delimiter:
Frame transmission begins with the J-K symbol pair:
The J-K combination:
The T Terminator:
Frames end with the T symbol (01101):
Continuous Idle Transmission:
Between frames, the transmitter continuously sends Idle symbols (11111):
Frame 1 [T] I I I I I I I I I I I [J K] Frame 2
↑
Idle fill maintains clock lock
Why not just keep the line quiet?
Note on I Symbol: The Idle symbol (11111) has five 1s, seemingly violating our "no long runs" rule. However, when combined with MLT-3 encoding:
Of the 32 possible 5-bit patterns, 9 are invalid (not assigned to data or control). If a receiver detects an invalid code, it indicates a transmission error. This provides basic error detection capability at the physical layer—before CRC checking occurs at the data link layer.
One limitation of 4B/5B is that it does not intrinsically guarantee DC balance. Understanding this limitation—and how systems work around it—reveals important principles of physical layer design.
What is DC Balance?
A DC-balanced signal has equal energy in positive and negative excursions over time, resulting in a zero average voltage. DC balance is critical for:
4B/5B's DC Imbalance:
Examine the code table:
| Data | 4B/5B Code | # of 1s | Disparity |
|---|---|---|---|
| 0 | 11110 | 4 | +4 |
| 1 | 01001 | 2 | +2 |
| ... | ... | ... | ... |
| F | 11101 | 4 | +4 |
Most codes have more 1s than 0s (disparity > 0). Random data will tend to produce more 1s than 0s, creating a DC offset.
MLT-3 Encoding (Fast Ethernet):
4B/5B's DC imbalance is partially mitigated by MLT-3:
Scrambling:
Fast Ethernet (100BASE-TX) adds a scrambler after 4B/5B encoding:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162
class Encoder4B5B: """ 4B/5B Block Encoder with DC Balance Analysis Implements the standard 4B/5B encoding used in Fast Ethernet and FDDI, with analysis tools for understanding DC characteristics. """ # Standard 4B/5B code table DATA_CODES = { 0x0: 0b11110, # 0000 -> 11110 0x1: 0b01001, # 0001 -> 01001 0x2: 0b10100, # 0010 -> 10100 0x3: 0b10101, # 0011 -> 10101 0x4: 0b01010, # 0100 -> 01010 0x5: 0b01011, # 0101 -> 01011 0x6: 0b01110, # 0110 -> 01110 0x7: 0b01111, # 0111 -> 01111 0x8: 0b10010, # 1000 -> 10010 0x9: 0b10011, # 1001 -> 10011 0xA: 0b10110, # 1010 -> 10110 0xB: 0b10111, # 1011 -> 10111 0xC: 0b11010, # 1100 -> 11010 0xD: 0b11011, # 1101 -> 11011 0xE: 0b11100, # 1110 -> 11100 0xF: 0b11101, # 1111 -> 11101 } # Control symbols CONTROL_CODES = { 'I': 0b11111, # Idle 'J': 0b11000, # Start-of-frame 1 'K': 0b10001, # Start-of-frame 2 'T': 0b01101, # End-of-frame 'R': 0b00111, # Reset 'S': 0b11001, # Set (reserved) 'H': 0b00100, # Halt } # Invalid codes (can be used for error detection) INVALID_CODES = [ 0b00000, 0b00001, 0b00010, 0b00011, 0b00101, 0b00110, 0b01000, 0b10000, 0b11111 ] def encode_nibble(self, nibble: int) -> int: """Encode a 4-bit nibble to 5-bit code.""" if nibble not in self.DATA_CODES: raise ValueError(f"Invalid nibble: {nibble}") return self.DATA_CODES[nibble] def encode_byte(self, byte: int) -> tuple[int, int]: """Encode a byte as two 5-bit codes (high nibble first).""" high = (byte >> 4) & 0xF low = byte & 0xF return (self.encode_nibble(high), self.encode_nibble(low)) def encode_data(self, data: bytes) -> list[int]: """Encode byte array to list of 5-bit codes.""" codes = [] for byte in data: high, low = self.encode_byte(byte) codes.extend([high, low]) return codes def calculate_disparity(self, code: int, bits: int = 5) -> int: """ Calculate running disparity contribution. Disparity = (number of 1s) - (number of 0s) Positive disparity means more 1s than 0s. """ ones = bin(code).count('1') zeros = bits - ones return ones - zeros def analyze_dc_balance(self, codes: list[int]) -> dict: """ Analyze DC balance of encoded stream. Returns statistics on running disparity and DC offset tendency. """ running_disparity = 0 max_disparity = 0 min_disparity = 0 disparity_history = [] for code in codes: disp = self.calculate_disparity(code) running_disparity += disp max_disparity = max(max_disparity, running_disparity) min_disparity = min(min_disparity, running_disparity) disparity_history.append(running_disparity) total_bits = len(codes) * 5 avg_disparity = running_disparity / len(codes) if codes else 0 return { "total_codes": len(codes), "total_bits": total_bits, "final_disparity": running_disparity, "max_disparity": max_disparity, "min_disparity": min_disparity, "avg_disparity_per_code": avg_disparity, "dc_offset_estimate": running_disparity / total_bits if total_bits else 0, "disparity_history": disparity_history } def check_run_length(self, codes: list[int]) -> dict: """ Verify that encoded stream satisfies run-length constraints. 4B/5B guarantees no more than 3 consecutive zeros. """ # Convert to bit stream bits = [] for code in codes: for i in range(4, -1, -1): bits.append((code >> i) & 1) max_zeros = 0 current_zeros = 0 for bit in bits: if bit == 0: current_zeros += 1 max_zeros = max(max_zeros, current_zeros) else: current_zeros = 0 return { "max_consecutive_zeros": max_zeros, "constraint_satisfied": max_zeros <= 3, "total_bits": len(bits) } # Demonstrationif __name__ == "__main__": encoder = Encoder4B5B() # Encode sample data sample_data = bytes([0x00, 0xFF, 0xA5, 0x5A, 0x12, 0x34]) print("4B/5B Encoding Analysis") print("=" * 60) print(f"Input data: {sample_data.hex().upper()}") codes = encoder.encode_data(sample_data) print(f"Encoded: {[format(c, '05b') for c in codes]}") # DC balance analysis dc_analysis = encoder.analyze_dc_balance(codes) print(f"\nDC Balance Analysis:") for key, value in dc_analysis.items(): if key != "disparity_history": print(f" {key}: {value}") # Run length check run_analysis = encoder.check_run_length(codes) print(f"\nRun Length Analysis:") for key, value in run_analysis.items(): print(f" {key}: {value}")8B/10B (Gigabit Ethernet, Fibre Channel):
8B/10B provides guaranteed DC balance through running disparity control:
| Property | 4B/5B | 8B/10B |
|---|---|---|
| Efficiency | 80% | 80% |
| DC Balance | Not guaranteed | Guaranteed |
| Max Run Length | 3 zeros | 5 identical |
| Control Symbols | 7 | 12 K-codes |
| Complexity | Low | Medium |
| Used In | Fast Ethernet, FDDI | Gigabit Ethernet, FC |
8B/10B Running Disparity:
4B/5B's lack of guaranteed DC balance is acceptable for Fast Ethernet because:
4B/5B was designed in the 1980s when chip complexity was expensive. Its simplicity—a single lookup table—made it practical for high-volume, low-cost implementations. 8B/10B's disparity tracking required more logic but delivered better signal quality, reflecting the different constraints of Gigabit-era design.
The combination of 4B/5B with MLT-3 in 100BASE-TX represents a masterful engineering compromise—achieving 100 Mbps over Category 5 cable with technology that was manufacturable at scale in the mid-1990s.
Transmission Path:
[MII] → [4B/5B Encoder] → [Scrambler] → [NRZ-I] → [MLT-3] → [Line Driver] → Cable
↓ ↓ ↓ ↓ ↓
4-bit 5-bit Randomized Binary 3-level
nibbles codes stream stream analog
Rate Expansion:
1. MII Interface: The MAC sends data to the PHY in 4-bit nibbles at 25 MHz via the standard MII bus. This interface abstracts the physical layer, allowing different PHY types (100BASE-TX, 100BASE-FX, etc.) to connect to the same MAC.
2. 4B/5B Encoding: Each 4-bit nibble is looked up in the code table, producing a 5-bit symbol. The encoder operates at 25 million nibbles/second, outputting 125 million bits/second.
3. Scrambling (NRZI Stream Cipher): The scrambler uses polynomial x¹¹ + x⁹ + 1:
scrambled_bit[n] = data_bit[n] XOR scrambled_bit[n-9] XOR scrambled_bit[n-11]
Scrambling provides:
4. NRZ-I Conversion: The scrambled stream is converted to NRZ-I (Non-Return to Zero Inverted):
This intermediate step simplifies the MLT-3 implementation.
5. MLT-3 Encoding: The NRZ-I stream drives the MLT-3 state machine:
6. Line Driving: The analog output driver delivers ±1V differential into the 100Ω cable impedance, with pre-emphasis to compensate for cable losses.
Idle and Preamble:
...I I I I J K [SFD] [Dest MAC] [Src MAC] [Type/Len] [Data...] [FCS] T R...
↑ ↑
Start Preamble/SFD encoded as 4B/5B data symbols
Delim
Inter-Packet Gap:
Start of Frame:
End of Frame:
| Parameter | Value | Derivation |
|---|---|---|
| Data Rate | 100 Mbps | User payload rate |
| 4B/5B Symbol Rate | 25 Msymbols/s | 100 Mbps / 4 bits |
| Line Rate | 125 Mbaud | 25M × 5 bits |
| Symbol Period | 8 ns | 1 / 125M |
| Max Frequency (MLT-3) | 31.25 MHz | 125M / 4 |
| Inter-Packet Gap | 0.96 µs | 96 bits at 100 Mbps |
| Max Cable Length | 100 m | Signal integrity limit |
The most common 4B/5B implementation errors involve incorrect handling of control symbols. The J-K start delimiter must be transmitted exactly at frame start—not earlier (which breaks inter-packet timing) or later (which corrupts preamble). Similarly, the T terminator must immediately follow the FCS—missing or duplicated terminators cause frame parsing failures.
Before Fast Ethernet adopted 4B/5B, the encoding scheme was developed for FDDI (Fiber Distributed Data Interface)—a 100 Mbps token-passing ring network that pioneered many of the physical layer techniques later used in Ethernet.
FDDI Overview:
Why 4B/5B for FDDI?
FDDI required:
4B/5B delivered efficient encoding suitable for fiber optic transmission where DC balance was less critical (fiber uses intensity modulation).
FDDI vs 100BASE-TX:
| Aspect | FDDI | 100BASE-TX |
|---|---|---|
| Line Coding | NRZI | MLT-3 |
| Media | Fiber / STP | UTP |
| Distance | 2 km | 100 m |
| Wavelength | 1300 nm | N/A (electrical) |
| Topology | Ring | Star |
NRZI in FDDI: FDDI uses NRZI (Non-Return to Zero Inverted) directly on fiber:
Unlike electrical transmission, fiber doesn't have DC balance issues (you're modulating light intensity, not voltage). This simplified the FDDI design compared to 100BASE-TX's MLT-3 requirement.
100BASE-FX (Fiber Fast Ethernet):
100VG-AnyLAN:
4B/5B established principles that influenced all subsequent high-speed encoding:
These concepts evolved into:
Despite FDDI's technical elegance and earlier introduction, Fast Ethernet won the market because it used existing Category 5 UTP cabling (already installed for 10BASE-T), cost less (no fiber connectors), and fit into the familiar Ethernet architecture. FDDI required new fiber infrastructure and expensive concentrators. Technology alone rarely wins—economics and compatibility matter too.
4B/5B encoding represents a fundamental technique in digital communications—the art of adding just enough redundancy to guarantee desired signal properties while minimizing overhead. Let's consolidate the essential knowledge:
| Parameter | Value | Notes |
|---|---|---|
| Code Rate | 4:5 | 4 data bits → 5 code bits |
| Efficiency | 80% | Higher than Manchester (50%) |
| Max Run | 3 zeros | Guaranteed by code design |
| Data Codes | 16 | Hex 0-F |
| Control Codes | 7 | I, J, K, T, R, S, H |
| Invalid Codes | 9 | Error detection capability |
| Primary Use | 100BASE-TX, FDDI | Combined with MLT-3 or NRZI |
Looking Ahead:
With 4B/5B mastered, you're ready to explore its successor: 8B/10B encoding. This more sophisticated scheme solves 4B/5B's DC balance limitation through running disparity control, enabling higher data rates and longer cable runs. 8B/10B became the foundation for Gigabit Ethernet, Fibre Channel, and numerous other high-speed interfaces.
You now possess a comprehensive understanding of 4B/5B block encoding—from its mathematical foundations through its implementation in Fast Ethernet and FDDI. This knowledge is essential for understanding all modern block coding schemes that built upon 4B/5B's pioneering design.