Loading content...
As networking speeds climbed from 100 Mbps to 1 Gbps and beyond, the limitations of 4B/5B encoding became increasingly problematic. While 4B/5B guaranteed sufficient transitions for clock recovery, it could not guarantee DC balance—and this became a critical constraint for the demanding links of Gigabit-class systems.
The Problem: AC-coupled interfaces (using capacitors or transformers to block DC) require signals with zero average voltage. Long runs of certain data patterns in 4B/5B could cause baseline wander—a slow drift in the receiver's reference voltage that would eventually cause bit errors. This limited both the distance and the data pattern tolerance of 4B/5B systems.
The Solution: In 1983, IBM engineers Al Widmer and Peter Franaszek invented 8B/10B encoding, a revolutionary scheme that guarantees DC balance through a technique called running disparity control. Each 8-bit byte is encoded as a 10-bit symbol, with two possible encodings for most bytes—one that adds positive disparity, one that adds negative. The encoder dynamically chooses between them to keep the cumulative disparity bounded.
8B/10B became the encoding of choice for an entire generation of high-speed interfaces: Gigabit Ethernet, Fibre Channel, USB 3.0, SATA, DisplayPort, HDMI, and countless others.
By completing this page, you will understand the complete theory and implementation of 8B/10B encoding: the 3B/4B and 5B/6B sub-encoding structure, running disparity management, the complete K-code control symbol set, error detection through code space violations, and implementations in Gigabit Ethernet, Fibre Channel, and PCI Express. You'll gain the depth expected of a systems architect who must make informed physical layer decisions.
8B/10B encoding maps each 8-bit byte to a 10-bit symbol. Unlike 4B/5B's simple lookup, 8B/10B uses a sophisticated two-stage encoding with disparity tracking to guarantee DC balance.
8B/10B is not a single 8-to-10-bit mapping. Instead, the 8-bit input is split into two parts:
Input: HGFEDCBA (8 bits)
└─┬─┘└┬┘
│ └── Lower 5 bits (EDCBA) → 5B/6B encoder → abcdei (6 bits)
└────── Upper 3 bits (HGF) → 3B/4B encoder → fghj (4 bits)
Output: abcdei fghj (10 bits, transmitted LSB first)
Why this split?
Reduced Lookup Table Size: A direct 8-to-10 mapping would require 256 entries × 10 bits × 2 disparities = 5120 bits of ROM. The split approach uses two smaller tables.
Disparity Management: The 5B/6B and 3B/4B sub-codes are designed so that their disparities can be independently tracked and compensated.
Historical Optimization: The original IBM implementation used discrete logic where smaller tables meant fewer gates.
Disparity Definition:
Disparity = (number of 1s) - (number of 0s)
For a code word:
Running Disparity (RD):
The encoder maintains a running disparity state: either RD- (negative) or RD+ (positive).
The encoder always selects the code version that drives RD toward balance:
Most 8B/10B codes come in complementary pairs: one with positive disparity, one with negative. These pairs are bit-wise complements of each other! If code X has disparity +2, then ~X (all bits inverted) has disparity -2. This elegant property makes the encoder trivial: keep one version in the table and XOR with a mask when the other is needed.
The lower 5 bits (EDCBA) encode to 6 bits (abcdei). The 5B/6B code must ensure:
Example 5B/6B Mappings:
| D.x (Data) | EDCBA | abcdei (RD-) | abcdei (RD+) | Disparity |
|---|---|---|---|---|
| D.0 | 00000 | 100111 | 011000 | ±2 |
| D.1 | 00001 | 011101 | 100010 | ±2 |
| D.7 | 00111 | 111000 | 000111 | 0/0 |
| D.11 | 01011 | 110100 | 001011 | 0/±2 |
| D.31 | 11111 | 101011 | 010100 | ±2 |
The upper 3 bits (HGF) encode to 4 bits (fghj). The 3B/4B code follows similar principles:
| D.y (Data) | HGF | fghj (RD-) | fghj (RD+) | Disparity |
|---|---|---|---|---|
| .0 | 000 | 1011 | 0100 | ±2 |
| .1 | 001 | 1001 | - | 0 |
| .2 | 010 | 0101 | - | 0 |
| .3 | 011 | 1100 | 0011 | ±2 |
| .7 | 111 | 1110 | 0001 | ±2 |
Data codes are written as D.x.y where:
For example, ASCII 'A' (0x41 = 0100 0001 = HGFEDCBA = 010 00001):
| Step | Operation | Bits | Notes |
|---|---|---|---|
| Input | ASCII 'A' | 01000001 | Hex 0x41 |
| Split | HGF + EDCBA | 010 + 00001 | Upper 3, lower 5 |
| Lookup x | D.1 at RD- | 011101 | 5B/6B code |
| Lookup y | .2 at RD- | 0101 | 3B/4B code |
| Combine | abcdei + fghj | 0111010101 | 10-bit symbol |
| Disparity | Sum 1s and 0s | 6-4 = +2 | New RD = RD+ |
Running disparity (RD) is the mechanism that guarantees 8B/10B's DC balance. Understanding RD is essential to understanding why 8B/10B succeeds where 4B/5B falls short.
Initial State: At system startup, RD is typically initialized to RD- (though the choice is arbitrary—the system will converge to correct operation within a few symbols).
Encoding Rule:
1. For the current byte, look up the 10-bit code
2. If the code has disparity 0, use it directly; RD unchanged
3. If the code has disparity ±2:
- If RD-, use the code version with disparity +2
- If RD+, use the code version with disparity -2
4. Update RD based on the disparity of the transmitted code:
- Positive disparity: RD becomes RD+
- Negative disparity: RD becomes RD-
- Zero disparity: RD unchanged
Key Insight: Because every non-neutral code flips the RD state, and the encoder always selects the disparity opposite to current RD, the running disparity oscillates between RD- and RD+ but never accumulates indefinitely.
Bounds:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152
class Encoder8B10B: """ 8B/10B Encoder with Running Disparity Tracking Demonstrates the core 8B/10B algorithm with disparity management. This simplified implementation shows the encoding logic; production implementations use full lookup tables for all 256 data codes. """ def __init__(self): self.rd_negative = True # Running disparity: True = RD-, False = RD+ # Simplified 5B/6B table (selected entries for demonstration) # Format: (RD- code, RD+ code, is_neutral) self.table_5b6b = { 0b00000: (0b100111, 0b011000, False), # D.0 0b00001: (0b011101, 0b100010, False), # D.1 0b00010: (0b101101, 0b010010, False), # D.2 0b00111: (0b111000, 0b000111, True), # D.7 (neutral) 0b01011: (0b110100, 0b110100, True), # D.11 (neutral) 0b10111: (0b101110, 0b010001, False), # D.23 0b11011: (0b110110, 0b001001, False), # D.27 0b11100: (0b111001, 0b000110, False), # D.28 0b11111: (0b101011, 0b010100, False), # D.31 } # Simplified 3B/4B table self.table_3b4b = { 0b000: (0b1011, 0b0100, False), # .0 0b001: (0b1001, 0b1001, True), # .1 (neutral) 0b010: (0b0101, 0b0101, True), # .2 (neutral) 0b011: (0b1100, 0b0011, False), # .3 0b100: (0b1101, 0b0010, False), # .4 0b101: (0b1010, 0b1010, True), # .5 (neutral) 0b110: (0b0110, 0b0110, True), # .6 (neutral) 0b111: (0b1110, 0b0001, False), # .7 (special handling for some x values) } def encode_byte(self, byte: int) -> tuple[int, int]: """ Encode an 8-bit byte to a 10-bit symbol. Returns: Tuple of (10-bit code, disparity change) """ # Split into x (lower 5 bits) and y (upper 3 bits) x = byte & 0x1F # EDCBA y = (byte >> 5) & 0x07 # HGF # Look up 5B/6B code for x if x not in self.table_5b6b: # For demo, return a placeholder abcdei = 0b111000 # D.7 as fallback x_neutral = True else: rd_neg_code, rd_pos_code, x_neutral = self.table_5b6b[x] abcdei = rd_neg_code if self.rd_negative else rd_pos_code # Calculate intermediate disparity after 6B code ones_6b = bin(abcdei).count('1') disp_6b = ones_6b - 6+ ones_6b # ones - zeros = ones - (6 - ones) = 2*ones - 6 disp_6b = 2 * ones_6b - 6 # Update intermediate RD intermediate_rd_negative = self.rd_negative if not x_neutral: intermediate_rd_negative = disp_6b < 0 # Look up 3B/4B code for y if y not in self.table_3b4b: fghj = 0b0101 # .2 as fallback y_neutral = True else: rd_neg_code, rd_pos_code, y_neutral = self.table_3b4b[y] fghj = rd_neg_code if intermediate_rd_negative else rd_pos_code # Combine into 10-bit code (abcdei in lower 6 bits, fghj in upper 4) code_10b = (fghj << 6) | abcdei # Calculate final disparity ones_4b = bin(fghj).count('1') disp_4b = 2 * ones_4b - 4 total_disp = disp_6b + disp_4b # Update running disparity if total_disp > 0: self.rd_negative = False elif total_disp < 0: self.rd_negative = True # If zero, RD unchanged return code_10b, total_disp def encode_stream(self, data: bytes) -> list[tuple[int, int]]: """Encode a byte stream, returning (code, disparity) pairs.""" results = [] for byte in data: code, disp = self.encode_byte(byte) results.append((code, disp)) return results def analyze_balance(self, codes: list[tuple[int, int]]) -> dict: """Analyze DC balance of encoded stream.""" total_ones = 0 total_bits = 0 rd_trace = [] self.rd_negative = True # Reset for analysis for code, disp in codes: ones = bin(code).count('1') total_ones += ones total_bits += 10 rd_trace.append('RD-' if self.rd_negative else 'RD+') # Update RD if disp > 0: self.rd_negative = False elif disp < 0: self.rd_negative = True return { "total_bits": total_bits, "total_ones": total_ones, "total_zeros": total_bits - total_ones, "dc_offset": (total_ones - (total_bits - total_ones)) / total_bits, "rd_trace": rd_trace[:10], # First 10 RD states "is_dc_balanced": abs(total_ones - (total_bits - total_ones)) <= 4 } # Demonstrationif __name__ == "__main__": encoder = Encoder8B10B() # Encode sample data test_data = bytes([0x00, 0xFF, 0xA5, 0x5A]) print("8B/10B Encoding Demonstration") print("=" * 60) results = encoder.encode_stream(test_data) for i, (byte, (code, disp)) in enumerate(zip(test_data, results)): print(f"Byte 0x{byte:02X}: Code {code:010b} (disp {disp:+d})") # Analyze larger stream encoder = Encoder8B10B() # Reset large_data = bytes(range(256)) # All possible bytes results = encoder.encode_stream(large_data) analysis = encoder.analyze_balance(results) print(f"\nDC Balance Analysis (256 bytes):") for k, v in analysis.items(): if k != "rd_trace": print(f" {k}: {v}")Code Space Violations:
Not all 10-bit patterns are valid 8B/10B codes. There are 2¹⁰ = 1024 possible patterns, but only:
Actual unique valid patterns: ~512 (exact count depends on neutral codes)
Disparity Violations:
Even if a received pattern is a valid code, receiving it with the wrong disparity indicates an error:
Example:
- Expecting RD- context, receive code for RD+ context
- This is a "disparity error"
- Indicates bit errors corrupted a code into a different valid code
Error Detection Capability:
8B/10B can detect:
This provides strong error detection at the physical layer, reducing the burden on higher-layer CRC checks.
While 8B/10B has excellent error detection, some multi-bit errors can transform one valid code into another without violating disparity—these are undetectable at the 8B/10B layer. This is why higher-layer CRCs remain necessary. The probability of undetectable errors is very low (~0.1% of multi-bit error patterns) but non-zero.
8B/10B reserves 12 special codes called K-codes (K for "Kontrol" in German, honoring the IBM Böblingen lab that contributed to the design). K-codes are distinguished from data codes by patterns that never appear in the data code space.
K-codes are identified by specific x.y combinations that produce unique 10-bit patterns:
| K-Code | Notation | 10-bit (RD-) | 10-bit (RD+) | Primary Use |
|---|---|---|---|---|
| K.28.0 | K28.0 | 001111 0100 | 110000 1011 | Fibre Channel |
| K.28.1 | K28.1 | 001111 1001 | 110000 0110 | Gigabit Ethernet Idle |
| K.28.2 | K28.2 | 001111 0101 | 110000 1010 | Alignment |
| K.28.3 | K28.3 | 001111 0011 | 110000 1100 | Alignment |
| K.28.4 | K28.4 | 001111 0010 | 110000 1101 | Alignment |
| K.28.5 | K28.5 | 001111 1010 | 110000 0101 | Comma Character |
| K.28.6 | K28.6 | 001111 0110 | 110000 1001 | Alignment |
| K.28.7 | K28.7 | 001111 1000 | 110000 0111 | Reserved |
| K.23.7 | K23.7 | 111010 1000 | 000101 0111 | End Bad Frame |
| K.27.7 | K27.7 | 110110 1000 | 001001 0111 | Start of Packet |
| K.29.7 | K29.7 | 101110 1000 | 010001 0111 | End of Packet |
| K.30.7 | K30.7 | 011110 1000 | 100001 0111 | Error Propagation |
The most important K-code is K.28.5, known as the "comma" character. Its pattern contains a unique bit sequence that cannot appear within any sequence of data codes:
K.28.5 (RD-): 0011111010 (contains "0011111")
K.28.5 (RD+): 1100000101 (contains "1100000")
The Comma Property:
The 7-bit patterns 0011111 and 1100000 never appear across data code boundaries or within data codes. This makes K.28.5 unambiguously identifiable regardless of where in the bit stream the receiver starts looking.
Uses of Comma:
The K.28 family is special because x=28 (binary 11100) produces the 5B/6B pattern 001111 (RD-) or 110000 (RD+)—the only 5B/6B code with four consecutive identical bits. Combined with various .y values, this creates the unique comma patterns. Other K-codes (K.23.7, K.27.7, K.29.7, K.30.7) use y=7 to create specific end-of-frame markers.
1000BASE-X Ordered Sets:
Gigabit Ethernet over fiber (1000BASE-X) uses K-codes in ordered sets:
| Ordered Set | K-Code Sequence | Purpose |
|---|---|---|
| /I1/ (Idle 1) | K.28.5 D.5.6 | Idle between frames |
| /I2/ (Idle 2) | K.28.5 D.16.2 | Alternate idle |
| /C1/ (Config 1) | K.28.5 D.21.5 D.x.y D.x.y | Auto-negotiation |
| /C2/ (Config 2) | K.28.5 D.2.2 D.x.y D.x.y | Auto-negotiation |
| /R/ (Carrier Extend) | K.23.7 | Extends carrier for half-duplex |
| /S/ (Start of Packet) | K.27.7 | Immediately precedes frame |
| /T/ (End of Packet) | K.29.7 | Immediately follows FCS |
| /E/ (Error) | K.30.7 | Propagates error indication |
Frame Delimiting:
...I I I I /S/ [Preamble] [SFD] [Frame...] [FCS] /T/ R R I I I...
↑ ↑
K.27.7 K.29.7
Unlike 4B/5B's J-K delimiter, 8B/10B uses single K-code markers (/S/ and /T/), providing cleaner frame boundaries.
K-codes follow the same disparity rules as data codes, but they can only appear in specific contexts:
8B/10B encoding found its most prominent applications in Gigabit Ethernet and Fibre Channel—two technologies that defined high-speed networking in the late 1990s and continue operating in millions of installations today.
Standard Variants:
Signal Chain:
[GMII] → [PCS: 8B/10B] → [PMA: SerDes] → [PMD: Laser/LED] → Fiber
↓ ↓ ↓
1 Gbps 1.25 Gbaud Optical
(8-bit) (10-bit) modulation
Key Parameters:
| Parameter | Value | Derivation |
|---|---|---|
| Data Rate | 1000 Mbps | User payload |
| Line Rate | 1250 Mbaud | 1000 × 10/8 |
| Symbol Period | 0.8 ns | 1/1.25G |
| Bit Period | 0.8 ns | Same (serial) |
| Minimum IPG | 12 bytes = 96 bits | 76.8 ns |
Fibre Channel is the dominant storage area network (SAN) interconnect, using 8B/10B for all speeds up to 8 Gbps.
Speed Evolution:
| Generation | Data Rate | Line Rate | 8B/10B |
|---|---|---|---|
| 1GFC | 1.0625 Gbps | 1.3281 Gbaud | Yes |
| 2GFC | 2.125 Gbps | 2.6563 Gbaud | Yes |
| 4GFC | 4.250 Gbps | 5.3125 Gbaud | Yes |
| 8GFC | 8.500 Gbps | 10.5188 Gbaud | Yes |
| 16GFC | 14.025 Gbps | 14.025 Gbaud | No (64B/66B) |
Note: Starting with 16GFC, Fibre Channel switched to 64B/66B encoding for improved efficiency (96.97% vs 80%).
[SOF] [Frame Header 24B] [Payload 0-2112B] [CRC 4B] [EOF]
↑ ↑
K.28.5 + ordered set K-code ordered set
Ordered Sets:
Fibre Channel continued using 8B/10B through 8GFC (2006) while Ethernet had moved to 64B/66B at 10 Gbps. This wasn't technical inertia—FC's deterministic latency requirements and simpler ordered sets made 8B/10B's 20% overhead acceptable. Only at 16GFC did the efficiency penalty become compelling enough to switch.
USB 3.0 SuperSpeed:
Serial ATA (SATA):
DisplayPort:
PCI Express Gen 1/2:
| Technology | Line Rate | Data Rate | Status |
|---|---|---|---|
| Gigabit Ethernet | 1.25 Gbaud | 1 Gbps | Active, mature |
| Fibre Channel 1-8G | 1.06-10.52 Gbaud | 1-8.5 Gbps | Active, legacy |
| USB 3.0 | 5 Gbaud | 4 Gbps | Active |
| SATA I/II/III | 1.5-6 Gbaud | 1.2-4.8 Gbps | Active |
| PCIe Gen 1/2 | 2.5-5 GT/s | 2-4 Gbps | Legacy |
| DisplayPort 1.0-1.2 | 1.62-5.4 Gbaud | 1.3-4.32 Gbps | Legacy |
8B/10B dominated high-speed encoding for nearly two decades, but its 20% overhead eventually became prohibitive at 10 Gbps and beyond. Understanding why—and what replaced it—completes our understanding of block coding evolution.
| Property | 4B/5B | 8B/10B | 64B/66B | 128B/130B |
|---|---|---|---|---|
| Efficiency | 80% | 80% | 96.97% | 98.46% |
| DC Balance | Not guaranteed | Guaranteed | Scrambled | Scrambled |
| Run Length | ≤3 zeros | ≤5 same | Scrambled | Scrambled |
| Error Detect | Invalid codes | Disparity + code | Sync header | Sync header |
| Comma Align | J-K pattern | K.28.5 | 01/10 header | 01/10 header |
| Complexity | Low | Medium | Higher | Higher |
| Max Speed* | ~200 Mbps | ~10 Gbps | 100+ Gbps | 100+ Gbps |
*Maximum practical speed is limited by other factors; these are typical deployment ceilings.
The 10 Gigabit Ethernet Problem:
At 10 Gbps data rate:
Impact of extra bandwidth:
The 18% reduction (12.5 → 10.3 Gbaud) translates to meaningful cost savings at 10G and becomes even more significant at 40G, 100G, and beyond.
How 64B/66B Works:
01 = Data block (all 8 bytes are data)10 = Control block (mix of control and data)Advantages over 8B/10B:
Disadvantages:
8B/10B remains preferred when deterministic timing and strong error detection are paramount (FC legacy, embedded systems). 64B/66B and beyond are preferred when bandwidth efficiency matters more (10G+ Ethernet, high-density data center links). The choice is fundamentally an engineering trade-off, not a matter of one being universally "better."
8B/10B encoding represents the pinnacle of deterministic block coding—a scheme that provides guaranteed DC balance, strong error detection, and reliable clock recovery through elegant mathematical design. Its influence extends far beyond its direct applications, shaping how engineers think about physical layer encoding.
Let's consolidate the essential knowledge:
| Parameter | Value | Notes |
|---|---|---|
| Code Rate | 8:10 | 8 data bits → 10 code bits |
| Efficiency | 80% | Same as 4B/5B, better properties |
| Max Run Length | 5 identical | Guaranteed by code design |
| Data Codes | 256 (D.x.y) | x: 0-31, y: 0-7 |
| Control Codes | 12 (K.x.y) | K.28.0-7, K.23/27/29/30.7 |
| Comma Pattern | K.28.5 | 0011111010 or 1100000101 |
| DC Balance | Guaranteed | Running disparity bounded ±4 |
| Primary Uses | GbE, FC 1-8G, USB 3.0 | Pre-10G high-speed serial |
Looking Ahead:
With 8B/10B mastered, you now understand the complete evolution of block coding from 4B/5B through the modern era. The final page in this module covers High-Speed Applications—synthesizing everything you've learned to understand how modern 100G, 400G, and terabit links combine advanced modulation (PAM-4, PAM-16), forward error correction, and high-efficiency encoding to push the boundaries of what's possible over copper and fiber.
You now possess a comprehensive understanding of 8B/10B encoding—from its mathematical foundations through its deployment in Gigabit Ethernet, Fibre Channel, and numerous other standards. This knowledge is essential for anyone working with high-speed serial interfaces or designing systems where physical layer characteristics matter.