Loading learning content...
While B8ZS and HDB3 solve the zero-run problem through targeted substitution, modern high-speed systems take a more comprehensive approach: complete randomization of the data stream. Rather than waiting for problematic patterns and substituting them, randomization transforms the entire data stream to have desirable statistical properties.
At the heart of this approach lies a remarkable mathematical structure: the Linear Feedback Shift Register (LFSR). LFSRs generate sequences that appear random but are completely deterministic, enabling transmitters and receivers to synchronize their transformations perfectly.
This page explores the mathematics, implementation, and applications of randomization in digital transmission—knowledge essential for understanding SONET/SDH, high-speed Ethernet, and virtually all modern optical communication systems.
By the end of this page, you will understand how LFSRs work, why their sequences make ideal scrambling codes, how additive and multiplicative scrambling differ, and how randomization is applied in real telecommunication standards.
Substitution techniques like B8ZS and HDB3 address one specific problem: runs of zeros. But high-speed systems face additional challenges that require more comprehensive solutions:
Limitations of Substitution Alone:
10101010 repeating creates spectral lines at specific frequenciesWhat Randomization Provides:
Randomization transforms the entire data stream by XORing it with a pseudo-random sequence. The result has statistical properties close to truly random data:
The Key Insight:
For any deterministic sequence S, if we have a truly random sequence R, then S ⊕ R (XOR) is also random. Since XOR is reversible (S ⊕ R ⊕ R = S), we can recover the original data if the receiver knows R.
The challenge: how do we generate R deterministically so that both transmitter and receiver can compute it independently? The answer is the Linear Feedback Shift Register.
Pseudo-random sequences are deterministic—given the same starting state, they produce the same sequence every time. This is essential for communications: the receiver must be able to reproduce the exact sequence to descramble the data. True randomness would make communication impossible!
A Linear Feedback Shift Register (LFSR) is a shift register whose input bit is a linear function (XOR) of its previous state. Despite this simple definition, LFSRs produce sequences with remarkable properties.
Basic LFSR Structure:
┌───┐ ┌───┐ ┌───┐ ┌───┐
│ D3│←─│ D2│←─│ D1│←─│ D0│←── (feedback)
└───┘ └───┘ └───┘ └───┘
│ │ │ │
└───────┴───────┴───────┴───→ (XOR) ─┐
│
(loops back) ←┘
In this 4-bit LFSR:
The Feedback Polynomial:
LFSR behavior is completely determined by which positions are XORed for feedback. This is described by a characteristic polynomial:
x⁴ + x³ + 1x⁷ + x⁶ + 1The polynomial notation follows these rules:
| Clock | D3 | D2 | D1 | D0 | Feedback | Output |
|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 0 | 1⊕0=1 | 0 |
| 1 | 1 | 1 | 0 | 0 | 1⊕0=1 | 0 |
| 2 | 1 | 1 | 1 | 0 | 1⊕0=1 | 0 |
| 3 | 1 | 1 | 1 | 1 | 1⊕1=0 | 1 |
| 4 | 0 | 1 | 1 | 1 | 0⊕1=1 | 1 |
| 5 | 1 | 0 | 1 | 1 | 1⊕1=0 | 1 |
| 6 | 0 | 1 | 0 | 1 | 0⊕1=1 | 1 |
| 7 | 1 | 0 | 1 | 0 | 1⊕0=1 | 0 |
| ... | ... | ... | ... | ... | ... | ... |
| 15 | 1 | 0 | 0 | 0 | — | Returns to initial state |
Maximum-Length Sequences:
An n-bit LFSR can be in 2ⁿ - 1 non-zero states (the all-zeros state is forbidden—it would stay there forever). If the polynomial is chosen correctly, the LFSR will visit all of these states before repeating. Such a polynomial is called primitive, and the resulting sequence is a maximum-length sequence (m-sequence).
Properties of Maximum-Length Sequences:
These properties make m-sequences ideal for scrambling: they're nearly balanced, have short runs, and produce flat spectra.
Not all polynomials produce maximum-length sequences. Finding primitive polynomials is a well-studied mathematical problem, and tables of primitive polynomials for various lengths are published in standards. Using a non-primitive polynomial results in shorter cycles, potentially missing some states.
LFSRs can be used for scrambling in two fundamentally different ways, each with distinct advantages and challenges:
Additive (Synchronous) Scrambling:
Transmitter: Receiver:
Data ──┬───→ ⊕ ───→ Line Line ───→ ⊕ ───→ Data
│ ↑ ↑
│ LFSR (free-running) LFSR (synchronized)
In additive scrambling:
Advantages:
Disadvantages:
Multiplicative (Self-Synchronizing) Scrambling:
Transmitter: Receiver:
Data ───→ ⊕ ───┬──→ Line Line ──┬──→ ⊕ ───→ Data
↑ │ │ ↑
└──LFSR LFSR←─┘
(fed by output) (fed by input)
In multiplicative scrambling:
n bits (where n = LFSR length), the receiver automatically synchronizesAdvantages:
Disadvantages:
n+1 bit errors in output| Property | Additive | Multiplicative |
|---|---|---|
| Synchronization | External required | Self-synchronizing |
| Error multiplication | None (1:1) | Up to n+1 errors per input error |
| Startup time | Instant (if pre-synced) | n bits |
| Implementation | Two independent LFSRs | Feedback from line |
| Reset requirement | At frame boundaries | Not required |
| Used in | SONET/SDH, 10GE | V.34 modems, some DSL |
| Complexity | Higher (sync needed) | Lower |
Modern high-performance systems generally prefer additive scrambling despite its synchronization requirement. Error multiplication in multiplicative scrambling becomes unacceptable at high error rates, and frame-based protocols provide natural synchronization points for additive scraamblers.
To truly understand why LFSRs make effective scramblers, we must examine their mathematical properties in detail.
The Run Length Distribution:
For a maximum-length sequence from an n-bit LFSR:
| Run Length | Number of Runs | Probability |
|---|---|---|
| 1 | 2ⁿ⁻² | 50% |
| 2 | 2ⁿ⁻³ | 25% |
| 3 | 2ⁿ⁻⁴ | 12.5% |
| k (k < n) | 2ⁿ⁻ᵏ⁻¹ | 2⁻ᵏ |
| n-1 | 1 (zeros) | — |
| n | 1 (ones) | — |
This geometric distribution closely matches what we'd expect from truly random data. The maximum run length is n bits, determined by the LFSR length.
Autocorrelation Properties:
The autocorrelation of an m-sequence measures how similar the sequence is to a shifted version of itself. For m-sequences:
This near-zero correlation for all non-zero shifts is remarkable—it means the sequence has no internal pattern that repeats before the full period. For scrambling, this ensures that regardless of when we start, the LFSR output has consistent statistical properties.
Spectral Properties:
The power spectral density of an m-sequence is nearly flat across all frequencies except:
For long LFSRs (n > 20), these imperfections are negligible. A 23-bit LFSR produces a sequence that repeats only every 8.4 million bits—spectral lines are far apart and easily filtered.
XOR Preservation:
A crucial property for scrambling is how XOR affects statistical properties:
If R is a maximum-length sequence and D is any deterministic sequence:
R ⊕ D has nearly the same statistical properties as RD become fragmentedD become decorrelatedD are spreadThis "whitening" effect is why LFSR-based scrambling works: any input data is transformed to have m-sequence-like properties.
LFSR length involves tradeoffs: longer LFSRs have better statistical properties and lower repetition rates, but require more hardware and (for multiplicative type) cause more error multiplication. Standards specify exact polynomials to ensure interoperability.
Implementing LFSRs for high-speed scrambling requires careful engineering to meet timing requirements while maintaining mathematical correctness.
Basic Hardware Implementation:
For polynomial x⁷ + x⁶ + 1:
┌────────────────────────────────────────────┐
│ │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐
└─→⊕─│D6 │←│D5 │←│D4 │←│D3 │←│D2 │←│D1 │←│D0 │
↑ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘
│ │
└────┘
(D6 XOR D0 feeds back to D6)
Each box is a D flip-flop clocked at the bit rate. The XOR gate computes feedback from taps.
| Approach | Speed | Area | Best For |
|---|---|---|---|
| Fibonacci (external XOR) | Moderate | Minimal | Low-speed, simple designs |
| Galois (internal XOR) | Higher | Minimal | Medium-speed designs |
| Parallel unrolling | Maximum | Large | High-speed (10G+) designs |
| Table-based | Software | Memory | CPU implementation |
Fibonacci vs. Galois Form:
The same polynomial can be implemented in two mathematically equivalent ways:
Fibonacci (LFSR-F): XOR gate is in the feedback path before entering the register
Galois (LFSR-G): XOR gates are distributed inside the register chain
Parallel Operation:
For 10 Gbps and beyond, serial LFSR operation is too slow. Instead, the LFSR is "unrolled" to compute multiple bits per clock:
This parallel approach requires more logic but enables operation at practical clock frequencies (e.g., 156.25 MHz for 10G at 64-bit width).
Synchronization Mechanisms:
For additive scrambling, synchronization options include:
PRBS (Pseudo-Random Bit Sequence) patterns generated by standard LFSRs are used extensively for testing. PRBS-7, PRBS-15, PRBS-23, and PRBS-31 are defined in ITU-T O.150 and used for bit error ratio testing. The receiver compares received data against locally-generated PRBS to count errors.
Let's examine how randomization is applied in major communication standards:
SONET/SDH Scrambling:
SONET (Synchronous Optical Network) and SDH (Synchronous Digital Hierarchy) use frame-synchronous scrambling:
The short period (127 bits) is intentional—it ensures the scrambler resets frequently, limiting damage from lost synchronization.
| Standard | Polynomial | Type | Period | Special Notes |
|---|---|---|---|---|
| SONET/SDH | x⁷ + x⁶ + 1 | Additive | 127 | Frame sync reset, row 1 not scrambled |
| 10GBASE-R | x⁵⁸ + x³⁹ + 1 | Additive | ~2.9×10¹⁷ | 64b/66b, lane-specific seeds |
| 100GBASE-R | x⁵⁸ + x³⁹ + 1 | Additive | ~2.9×10¹⁷ | 4 lanes with different seeds |
| OTN | x¹⁶ + x¹² + x³ + x + 1 | Additive | 65535 | Similar to SONET approach |
| DVB-S/S2 | x¹⁵ + x¹⁴ + 1 | Additive | 32767 | Sync byte not scrambled |
| SATA | x¹⁶ + x¹⁵ + x¹³ + x⁴ + 1 | Self-sync | 65535 | Multiplicative in 8b/10b |
64b/66b Encoding with Scrambling:
10 Gigabit Ethernet introduced 64b/66b encoding with integrated scrambling:
The 2-bit sync header is carefully chosen:
Multi-Lane Systems:
100 Gigabit Ethernet uses 4 lanes of 25 Gbps each. To prevent crosstalk between lanes from causing correlated errors, each lane uses a different scrambler seed:
Lane 0: Seed = (x⁵⁸ + x³⁹ + 1) initialized to all ones
Lane 1: Seed after 8191 clocks from lane 0
Lane 2: Seed after 16382 clocks from lane 0
Lane 3: Seed after 24573 clocks from lane 0
This offset ensures that even if the same data is transmitted on all lanes, the scrambled patterns are uncorrelated.
Most systems provide a mode to bypass scrambling for testing purposes. This allows verification of the scrambler/descrambler independently of other system functions. However, this mode should never be used in production—without scrambling, the system may fail for certain data patterns.
While scrambling transforms data to appear random, it's important to understand what scrambling does not provide:
Scrambling is NOT Encryption:
Why Scrambling Provides Zero Security:
Consider an attacker who can observe the scrambled line signal:
This is by design. Scrambling's purpose is signal conditioning, not security. For actual security, use IPsec, TLS, or MACsec (which operates at Layer 2 with proper encryption).
Other Limitations:
| Limitation | Impact | Mitigation |
|---|---|---|
| Pathological patterns | Certain input patterns produce bad output | Longer LFSRs, multiple polynomials |
| LFSR lockup | All-zeros state stays in zeros | Avoid all-zeros initialization |
| Error multiplication | One error → multiple (multiplicative) | Use additive scrambling |
| Sync loss | Garbage output until resync | Frame-synchronous reset |
| Complexity | Adds latency and logic | Usually negligible at line rates |
Pathological Patterns:
Certain input patterns can produce poor output even with scrambling. For example, if the input happens to match the LFSR sequence, the output is all zeros!
The probability of this is extremely low for long LFSRs (1 in 2ⁿ⁻¹ for each bit), but it's theoretically possible. Some systems use additional measures:
In practice, the combination of scrambling with appropriate line coding (like 64b/66b) provides robust performance across all data patterns.
Never rely on scrambling for data confidentiality. The polynomial is public, and descrambling requires no secret key. For security, implement proper encryption at the appropriate layer (MACsec for Layer 2, IPsec for Layer 3, TLS for Layer 4+).
We have explored the mathematical foundations and practical applications of randomization in digital transmission. Let's consolidate the essential knowledge:
Looking Ahead:
With randomization understood, you're ready for the final piece: Synchronization Benefits. The next page examines in detail how scrambling enables reliable clock recovery, exploring the relationship between transition density, PLL dynamics, and system reliability.
Understanding these synchronization benefits ties together everything we've learned about scrambling—from the fundamental problem through the specific techniques to their ultimate purpose: maintaining reliable communication.
You now understand the mathematics of randomization, LFSR operation, scrambling architectures, and practical applications in major standards. This knowledge is fundamental for working with any modern high-speed communication system.