Computer NetworksCyclic Redundancy Check

CRC - Cyclic Redundancy Check

LevelIntermediate

Duration75 mins

TopicCyclic Redundancy Check

3 / 5

CRC Calculation

Computing the Checksum: From Message to Protected Codeword

You've mastered polynomial division and understand why generator polynomials are carefully chosen. Now comes the practical application: how do we actually compute a CRC value?

The sender takes a raw message, applies the CRC algorithm, and produces a protected codeword—the original data plus the appended CRC bits. This codeword has a special property: it is exactly divisible by the generator polynomial. The receiver can then verify this divisibility to detect any transmission errors.

This page walks through the complete CRC calculation procedure with multiple worked examples, explaining both the conceptual framework and practical implementation details.

What You Will Learn

By the end of this page, you will understand the complete CRC calculation algorithm step by step, perform CRC calculation by hand using polynomial division, understand why we shift the message before division, recognize the mathematical identity that makes CRC verification work, and appreciate the elegance of hardware CRC implementation.

The CRC Calculation Framework

Let's establish the formal procedure for CRC calculation. Given:

M(x): The message polynomial (data to protect)
G(x): The generator polynomial of degree r
T(x): The transmitted codeword

The Goal:

Construct T(x) such that:

T(x) contains all of M(x) (the receiver needs the original data)
T(x) is exactly divisible by G(x) (enabling error detection)

The Solution:

Append r bits (the CRC value) to M(x) to form T(x). These bits are chosen such that T(x) mod G(x) = 0.

The Algorithm:

Multiply M(x) by x^r — This is equivalent to appending r zero bits to the message, creating space for the CRC
Divide by G(x) — Compute: x^r · M(x) mod G(x) to get remainder R(x)
Form the codeword — T(x) = x^r · M(x) + R(x)

Mathematical Justification:

Why does this work? Let's verify:

$$T(x) = x^r \cdot M(x) + R(x)$$

Divide by G(x): $$\frac{T(x)}{G(x)} = \frac{x^r \cdot M(x) + R(x)}{G(x)}$$

We know that $x^r \cdot M(x) = Q(x) \cdot G(x) + R(x)$ (from the division in step 2).

Therefore: $$\frac{T(x)}{G(x)} = \frac{Q(x) \cdot G(x) + R(x) + R(x)}{G(x)} = Q(x) + \frac{2 \cdot R(x)}{G(x)}$$

But in GF(2), 2 = 0 (since 1 + 1 = 0), so: $$\frac{T(x)}{G(x)} = Q(x) + 0 = Q(x)$$

The remainder is zero! T(x) is exactly divisible by G(x). ✓

The XOR Trick

The magic happens because adding R(x) twice gives zero in GF(2). When we append R(x) to the shifted message, we're 'correcting' the remainder to zero. In binary: if the remainder of (message with r zeros) is R, then XORing R into those trailing r positions produces a value with remainder 0.

Step-by-Step Example: Computing CRC

Let's walk through a complete CRC calculation with a concrete example.

Given:

Message M = 110101 (6 bits)
Generator G = 1011 (polynomial x³ + x + 1, degree r = 3)

Step 1: Append r Zero Bits

Shift M left by r = 3 positions:

$$x^3 \cdot M(x) = 110101 \text{ followed by } 000 = 110101000$$

This represents M(x) · x³, making room for the 3-bit CRC.

Step 2: Divide by G(x)

Perform polynomial division: 110101000 ÷ 1011

         110010   ← Quotient (we don't need this)
        ───────
  1011 │ 110101000
         1011     ← XOR (leading 1)
         ────
         01010    ← Result: 10101
          1011    ← XOR (leading 1)
          ────
          00111   ← Result: 1110
           1011   ← XOR (leading 1)
           ────
           01010  ← Result: 10100
            1011  ← XOR (leading 1)
            ────
            0010  ← Result: 0100
             000  ← Don't XOR (leading 0)
             ───
             100  ← Remainder = 100

Step 3: Form the Codeword

T(x) = 110101000 ⊕ 000000100 = 110101100

Or equivalently: Original message (110101) + CRC (100) = 110101100

The transmitted codeword is: 110101100

Binary Shortcut

In practice, you can simply replace the trailing r zeros with the computed remainder R. Since the positions are the same and R is the XOR of zero with R, the result is just R placed in those positions: 110101|000 becomes 110101|100.

Verification:

Let's verify that 110101100 is exactly divisible by 1011:

  1011 │ 110101100
         1011
         ────
         10101    
          1011    
          ────
          01111   → 11110
           1011   
           ────
           01000  → 10000
            1011  
            ────
            0110  → 1100
             1011 
             ────
             0111 → 111
              000 
              ───
              111
               Do I have an error? Let me redo...

Actually, let me recalculate step 2 more carefully.

Step 2 (Recalculated Carefully):

Divide 110101000 by 1011:

Position:   876543210
Dividend:   110101000
Divisor:    1011

Step 1: Align at position 8
  110101000
  1011       (positions 8-5)
  ─────
  0110|01000  → 11001000  (after removing leading 0)

Step 2: Align at position 7
  11001000
  1011      (positions 7-4)
  ─────
  01111000  → 1111000

Step 3: Align at position 6
  1111000
  1011     (positions 6-3)
  ─────
  0100000  → 100000

Step 4: Align at position 5
  100000
  1011    (positions 5-2)
  ─────
  0011|00 → 1100

Step 5: Align at position 3
  1100
  1011   (positions 3-0)
  ─────
  0111   → 111

Remainder = 111 (3 bits, as expected for degree-3 divisor)

Corrected: CRC = 111

Codeword T = 110101111

Alignment Is Critical

The most common error in manual CRC calculation is misalignment. Always align the divisor's leading 1 with the dividend's leading 1. After each XOR, remove leading zeros and continue. Track your position carefully!

Complete Worked Example with Verification

Let's do a fresh, complete example with careful tracking.

Given:

Message M = 1101 (4 bits) = x³ + x² + 1
Generator G = 1011 (4 bits) = x³ + x + 1, degree r = 3

Sender Side:

Step 1: Append r = 3 zeros to M:

M' = 1101 000 = 1101000 (7 bits)

Step 2: Divide M' by G:

   1101000     (7 bits)
÷  1011        (divisor)
───────────

Align divisor with leading bit of dividend:
1101000
1011     (XOR at positions 6-3)
────
0110000  → 110000

110000
1011     (XOR at positions 5-2)
────
011100  → 11100

11100
1011     (XOR at positions 4-1)
────
01010  → 1010

1010
1011     (XOR at positions 3-0)
────
0001     → 001

Remainder R = 001

Step 3: Form codeword T = M' + R:

T = 1101000 + 0000001 = 1101000 XOR 0000001 = 1101001

Equivalently: 1101 | 001

Transmitted Codeword: 1101001

Receiver Side Verification:

Receive T = 1101001. Divide by G = 1011:

1101001
1011     (XOR)
────
0110001  → 110001

110001
1011     (XOR)
────
011101  → 11101

11101
1011     (XOR)
────
01011  → 1011

1011
1011     (XOR)
────
0000

Remainder = 000 ✓

Zero remainder confirms no errors detected!

The Complete CRC Cycle

Sender: M=1101, G=1011 → append 000 → divide by G → R=001 → T=1101001. Receiver: T=1101001 ÷ G=1011 → R=000 → No error! This demonstrates the fundamental CRC principle: a correctly computed codeword is exactly divisible by the generator.

CRC Calculation Summary Table
Step	Operation	Input	Output
1	Append r zeros	M = 1101	M' = 1101000
2	Divide M' by G	1101000 ÷ 1011	R = 001
3	XOR remainder	1101000 ⊕ 001	T = 1101001
Verify	Divide T by G	1101001 ÷ 1011	R = 000 ✓

More Complex Example: CRC-16 Style

Let's work through a longer example using a degree-4 polynomial (similar in complexity to real CRCs, but manageable by hand).

Given:

Message M = 10110011 (8 bits)
Generator G = 10011 (5 bits) = x⁴ + x + 1, degree r = 4

Step 1: Append 4 zeros:

M' = 10110011 0000 = 101100110000 (12 bits)

Step 2: Divide M' by G:

101100110000
10011         (align at pos 11)
─────
001110110000  → 1110110000

1110110000
10011         (align at pos 9)
─────
0111010000  → 111010000

111010000
10011         (align at pos 8)
─────
011100000  → 11100000

11100000
10011         (align at pos 7)
─────
01111000  → 1111000

1111000
10011         (align at pos 6)
─────
0110100  → 110100

110100
10011         (align at pos 5)
─────
010010  → 10010

10010
10011         (align at pos 4)
─────
00001  → 0001

Remainder R = 0001 (4 bits)

Step 3: Form codeword:

T = 101100110000 + 0001 = 101100110001

Transmitted: 10110011|0001 (8-bit message + 4-bit CRC)

Verification:

101100110001
10011
─────
001110110001  → 1110110001

1110110001
10011
─────
0111010001  → 111010001

111010001
10011
─────
011100001  → 11100001

11100001
10011
─────
01111001  → 1111001

1111001
10011
─────
0110101  → 110101

110101
10011
─────
010001  → 10001

10001
10011
─────
00010  → 0010

0010 < 10011 (too small to divide)

Final Remainder = 0010 ≠ 0000

Hmm, let me recheck step 2...

Recalculation of Step 2:

Divisor: 10011 (5 bits)
Dividend: 101100110000 (12 bits)

Round 1: bits [11:7] = 10110, XOR with 10011
  10110 ⊕ 10011 = 00101 → 01011 (bring next bit)
  Result so far: 01011 → working with: 10110 (after shift)
  
Actually, let me use a cleaner method:

Bit-by-bit with register:

Initial: Process "101100110000"

I'll use the standard shift-and-XOR approach:

Register (4 bits): 0000
Poly (without MSB): 0011

Process bit sequence from left:

Bit 1: Input=1, Reg=0000, MSB=0, Shift→0001, No XOR (MSB was 0)
Bit 1: Input=0, Reg=0001, MSB=0, Shift→0010, No XOR
Bit 1: Input=1, Reg=0010, MSB=0, Shift→0101, No XOR  
Bit 1: Input=1, Reg=0101, MSB=0, Shift→1011, No XOR
Bit 0: Input=0, Reg=1011, MSB=1, Shift→0110, XOR 0011→0101
Bit 0: Input=0, Reg=0101, MSB=0, Shift→1010, No XOR
Bit 1: Input=1, Reg=1010, MSB=1, Shift→0101, XOR 0011→0110
Bit 1: Input=1, Reg=0110, MSB=0, Shift→1101, No XOR
Bit 0: Input=0, Reg=1101, MSB=1, Shift→1010, XOR 0011→1001
Bit 0: Input=0, Reg=1001, MSB=1, Shift→0010, XOR 0011→0001
Bit 0: Input=0, Reg=0001, MSB=0, Shift→0010, No XOR
Bit 0: Input=0, Reg=0010, MSB=0, Shift→0100, No XOR

Final Register (CRC) = 0100

CRC = 0100

Codeword T = 10110011 0100 = 101100110100

Shift Register Method

The shift register method processes one bit at a time: (1) Shift register left, bringing in the next data bit. (2) If the bit shifted OUT was 1, XOR the register with the polynomial (minus its leading 1). (3) Repeat for all bits. This is exactly how hardware implements CRC!

Hardware Implementation: Shift Register CRC

CRC's elegance shines in hardware. A Linear Feedback Shift Register (LFSR) computes CRC one bit at a time using only:

r flip-flops (one per CRC bit, where r = polynomial degree)
XOR gates (one per polynomial term, minus the leading and constant)
A feedback path from the output to specific positions

The LFSR Structure:

For generator G(x) = x^r + g_{r-1}x^{r-1} + ... + g₁x + 1:

                        Feedback ←────────────────┐
                                                   │
    ┌─────┐   ┌─────┐   ┌─────┐       ┌─────┐    │
───►│ XOR ├──►│  D  ├──►│  D  ├─ ... ─│  D  ├────┘
    └─────┘   └──┬──┘   └──┬──┘       └──┬──┘    
       │         │g₁       │g₂         │g_{r-1}
       │     ┌───┴───┐ ┌───┴───┐   ┌───┴───┐
       │     │  XOR  │ │  XOR  │   │  XOR  │
       │     └───────┘ └───────┘   └───────┘
       └─────────────────────────────────────►
              (only connects where gᵢ = 1)

The XOR gates are placed between flip-flops at positions where the polynomial has a coefficient of 1.

Example: G(x) = x³ + x + 1 (1011)

Polynomial terms: x³, x¹, x⁰ (constant)

XOR feedback at position 1 (for the x term):

                    ┌──────────────────┐
                    │    Feedback      │
                    ▼                  │
Data ──►┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐──►
   In   │ XOR ├─►│ D₀  ├─►│ XOR ├─►│ D₁  ├─►│ D₂  │─►Out
        └─────┘  └─────┘  └──┬──┘  └─────┘  └─────┘
                             │                 ▲
                             └─────────────────┘
                           (XOR for x term)

Operation:

Initialize register to 0 (or all 1s, depending on standard)
For each message bit:
- Shift register left by one
- New D₀ = (input XOR feedback)
- D₁ = D₀ XOR feedback (because of x term)
- D₂ = D₁
After all message bits, feed in r zeros (or use special technique)
Register contains the CRC

Hardware Speed

LFSR CRC runs at one bit per clock cycle. For a 100 MHz clock, that's 100 Mbps. For higher data rates (10 Gbps+), parallel implementations process multiple bits per cycle using unrolled or table-based techniques. Hardware CRC in Ethernet NICs runs at full wire speed with negligible latency.

LFSR Implementation Benefits

•Minimal hardware — Only r flip-flops and up to r-1 XOR gates
•One cycle per bit — Deterministic, predictable timing
•No memory — Streaming operation, no buffering required
•Low power — Simple gates, no multiplication or lookup tables
•Easily parallelizable — Unroll for N bits/cycle with NxN transformation matrix

Software Implementation Techniques

While hardware uses shift registers, software implementations optimize differently. The main techniques are:

1. Bit-by-Bit (Direct Implementation)

Process one bit at a time, mirroring the hardware approach:

uint32_t crc_bitwise(uint8_t *data, size_t len, uint32_t poly) {
    uint32_t crc = 0xFFFFFFFF;  // Common initialization
    
    for (size_t i = 0; i < len; i++) {
        crc ^= (uint32_t)data[i] << 24;  // Align byte to MSB
        
        for (int bit = 0; bit < 8; bit++) {
            if (crc & 0x80000000)         // Check MSB
                crc = (crc << 1) ^ poly;   // Shift and XOR
            else
                crc <<= 1;                 // Just shift
        }
    }
    return crc ^ 0xFFFFFFFF;  // Final XOR (common)
}

This is slow: 8 branches per byte, hard to pipeline.

2. Table-Driven (Byte-at-a-Time)

Precompute a 256-entry table where entry i gives the CRC contribution of byte value i:

uint32_t crc_table[256];  // Precomputed

void init_crc_table(uint32_t poly) {
    for (int i = 0; i < 256; i++) {
        uint32_t crc = i << 24;
        for (int bit = 0; bit < 8; bit++) {
            crc = (crc << 1) ^ ((crc & 0x80000000) ? poly : 0);
        }
        crc_table[i] = crc;
    }
}

uint32_t crc_table_driven(uint8_t *data, size_t len) {
    uint32_t crc = 0xFFFFFFFF;
    
    for (size_t i = 0; i < len; i++) {
        uint8_t index = (crc >> 24) ^ data[i];
        crc = (crc << 8) ^ crc_table[index];
    }
    return crc ^ 0xFFFFFFFF;
}

This processes 8 bits per iteration with a single table lookup—8x faster than bit-by-bit.

3. Slicing-by-N (Multiple Tables)

Process N bytes at once using N tables. 'Slicing-by-8' processes 8 bytes per iteration:

8 tables of 256 entries each (8 KB total)
Each iteration: XOR current CRC with 8 bytes, combine 8 table lookups
Up to 8x faster than single-table on modern CPUs with cache

4. Hardware Intrinsics (CRC32C)

Modern x86_64 CPUs have PCLMULQDQ (carry-less multiplication) and CRC32 instructions:

// Intel CRC32C intrinsic (for iSCSI/SCTP polynomial)
uint32_t crc = _mm_crc32_u64(crc, *((uint64_t*)data));

This computes 8 bytes of CRC in a single cycle—fastest possible.

Software CRC Performance Comparison
Method	Speed (cycles/byte)	Table Size	Notes
Bit-by-bit	~50-100	0	Simple, slow
Byte table	~10-15	1 KB	Good balance
Slicing-by-4	~4-6	4 KB	Cache-efficient
Slicing-by-8	~2-3	8 KB	Fast on modern CPUs
Hardware CRC32C	~0.3-1	0	If available, best option

Choose the Right Implementation

For CRC-32C (Castagnoli), use hardware instructions if available—they're 10-100x faster. For CRC-32 (Ethernet), table-driven is typically fastest in software. For embedded systems with limited memory, bit-by-bit or small tables may be necessary.

Implementation Variations

Real CRC implementations include several variations from the 'pure' algorithm. Understanding these is essential for interoperability.

1. Initial Value (Seed)

Instead of starting with register = 0, many standards initialize to all 1s (0xFFFFFFFF for CRC-32).

Why? An all-zeros message with all-zeros CRC is technically valid. Initializing to 1s ensures that leading zeros in the message affect the CRC.

2. Final XOR (Complement)

After processing, XOR the result with a constant (often all 1s).

Why? Combined with 0xFFFFFFFF initialization, this ensures that appending zeros to a message doesn't leave the CRC unchanged. It also provides some defense against implementation bugs.

3. Bit and Byte Reflection

Some standards reflect (reverse) bits within each byte, or reflect the entire final CRC.

Why? Historical reasons—some early hardware processed LSB first (UART-style). 'Reflected' CRCs become standard for compatibility.

CRC Standard Variations
Standard	Poly (Hex)	Init	RefIn	RefOut	XorOut
CRC-16-CCITT	0x1021	0xFFFF	No	No	0x0000
CRC-16-IBM	0x8005	0x0000	Yes	Yes	0x0000
CRC-32 (Ethernet)	0x04C11DB7	0xFFFFFFFF	Yes	Yes	0xFFFFFFFF
CRC-32C	0x1EDC6F41	0xFFFFFFFF	Yes	Yes	0xFFFFFFFF

RefIn (Input Reflection): Process each byte LSB-first instead of MSB-first.

RefOut (Output Reflection): Reverse all bits of the final CRC before output.

Example: For CRC-32 (Ethernet), you:

Initialize to 0xFFFFFFFF
Reflect each input byte (process LSB first)
Compute CRC
Reflect the 32-bit result
XOR with 0xFFFFFFFF

These details are critical for interoperability. Using the wrong reflection or init value produces a different (and incompatible) CRC.

Always Test Against Known Vectors

CRC implementation bugs are notoriously subtle. The ASCII string 'check_value' (123456789) has standardized CRC values. CRC-32 (Ethernet) of '123456789' = 0xCBF43926. Always verify your implementation against published test vectors before deployment.

Summary: The Complete CRC Calculation

We've covered the complete CRC calculation procedure, from conceptual framework to practical implementation. Let's consolidate:

Key Takeaways

•CRC algorithm: Append r zeros to message, divide by generator, append remainder. The result is exactly divisible by G(x).
•Mathematical foundation: T(x) = x^r·M(x) + R(x) ensures divisibility because R + R = 0 in GF(2).
•Manual calculation: Align divisor with dividend, XOR if leading bit is 1, shift and repeat. Remainder is the CRC.
•Hardware implementation: LFSR with XOR gates at polynomial term positions. One bit per clock cycle.
•Software optimization: Table-driven (256 entries) processes one byte per iteration. Slicing-by-N processes multiple bytes.
•Implementation details matter: Init value, final XOR, and bit reflection must match the standard exactly.
•Verification is essential: Always test against known vectors (e.g., CRC of '123456789').

What's Next:

With CRC calculation mastered, we turn to the receiver's perspective: CRC verification. How does the receiver confirm that received data is error-free? As we'll see, there are two equivalent approaches—recomputing and comparing the CRC, or dividing the entire codeword and checking for zero remainder.

Page Complete

You can now compute CRC values by hand and understand how both hardware and software implementations work. The polynomial division algorithm, augmented with practical variations like initialization and reflection, powers error detection in every Ethernet frame, USB packet, and file archive you use. Next, we explore the verification side of the CRC handshake.

3 / 5

Loading learning content...

Computer NetworksCyclic Redundancy Check

CRC - Cyclic Redundancy Check

LevelIntermediate

Duration75 mins

TopicCyclic Redundancy Check

3 / 5

CRC Calculation

Computing the Checksum: From Message to Protected Codeword

You've mastered polynomial division and understand why generator polynomials are carefully chosen. Now comes the practical application: how do we actually compute a CRC value?

This page walks through the complete CRC calculation procedure with multiple worked examples, explaining both the conceptual framework and practical implementation details.

What You Will Learn

The CRC Calculation Framework

Let's establish the formal procedure for CRC calculation. Given:

M(x): The message polynomial (data to protect)
G(x): The generator polynomial of degree r
T(x): The transmitted codeword

The Goal:

Construct T(x) such that:

T(x) contains all of M(x) (the receiver needs the original data)
T(x) is exactly divisible by G(x) (enabling error detection)

The Solution:

Append r bits (the CRC value) to M(x) to form T(x). These bits are chosen such that T(x) mod G(x) = 0.

The Algorithm:

Multiply M(x) by x^r — This is equivalent to appending r zero bits to the message, creating space for the CRC
Divide by G(x) — Compute: x^r · M(x) mod G(x) to get remainder R(x)
Form the codeword — T(x) = x^r · M(x) + R(x)

Mathematical Justification:

Why does this work? Let's verify:

$$T(x) = x^r \cdot M(x) + R(x)$$

Divide by G(x): $$\frac{T(x)}{G(x)} = \frac{x^r \cdot M(x) + R(x)}{G(x)}$$

We know that $x^r \cdot M(x) = Q(x) \cdot G(x) + R(x)$ (from the division in step 2).

Therefore: $$\frac{T(x)}{G(x)} = \frac{Q(x) \cdot G(x) + R(x) + R(x)}{G(x)} = Q(x) + \frac{2 \cdot R(x)}{G(x)}$$

But in GF(2), 2 = 0 (since 1 + 1 = 0), so: $$\frac{T(x)}{G(x)} = Q(x) + 0 = Q(x)$$

The remainder is zero! T(x) is exactly divisible by G(x). ✓

The XOR Trick

Step-by-Step Example: Computing CRC

Let's walk through a complete CRC calculation with a concrete example.

Given:

Message M = 110101 (6 bits)
Generator G = 1011 (polynomial x³ + x + 1, degree r = 3)

Step 1: Append r Zero Bits

Shift M left by r = 3 positions:

$$x^3 \cdot M(x) = 110101 \text{ followed by } 000 = 110101000$$

This represents M(x) · x³, making room for the 3-bit CRC.

Step 2: Divide by G(x)

Perform polynomial division: 110101000 ÷ 1011

         110010   ← Quotient (we don't need this)
        ───────
  1011 │ 110101000
         1011     ← XOR (leading 1)
         ────
         01010    ← Result: 10101
          1011    ← XOR (leading 1)
          ────
          00111   ← Result: 1110
           1011   ← XOR (leading 1)
           ────
           01010  ← Result: 10100
            1011  ← XOR (leading 1)
            ────
            0010  ← Result: 0100
             000  ← Don't XOR (leading 0)
             ───
             100  ← Remainder = 100

Step 3: Form the Codeword

T(x) = 110101000 ⊕ 000000100 = 110101100

Or equivalently: Original message (110101) + CRC (100) = 110101100

The transmitted codeword is: 110101100

Binary Shortcut

Verification:

Let's verify that 110101100 is exactly divisible by 1011:

  1011 │ 110101100
         1011
         ────
         10101    
          1011    
          ────
          01111   → 11110
           1011   
           ────
           01000  → 10000
            1011  
            ────
            0110  → 1100
             1011 
             ────
             0111 → 111
              000 
              ───
              111
               Do I have an error? Let me redo...

Actually, let me recalculate step 2 more carefully.

Step 2 (Recalculated Carefully):

Divide 110101000 by 1011:

Position:   876543210
Dividend:   110101000
Divisor:    1011

Step 1: Align at position 8
  110101000
  1011       (positions 8-5)
  ─────
  0110|01000  → 11001000  (after removing leading 0)

Step 2: Align at position 7
  11001000
  1011      (positions 7-4)
  ─────
  01111000  → 1111000

Step 3: Align at position 6
  1111000
  1011     (positions 6-3)
  ─────
  0100000  → 100000

Step 4: Align at position 5
  100000
  1011    (positions 5-2)
  ─────
  0011|00 → 1100

Step 5: Align at position 3
  1100
  1011   (positions 3-0)
  ─────
  0111   → 111

Remainder = 111 (3 bits, as expected for degree-3 divisor)

Corrected: CRC = 111

Codeword T = 110101111

Alignment Is Critical

Complete Worked Example with Verification

Let's do a fresh, complete example with careful tracking.

Given:

Message M = 1101 (4 bits) = x³ + x² + 1
Generator G = 1011 (4 bits) = x³ + x + 1, degree r = 3

Sender Side:

Step 1: Append r = 3 zeros to M:

M' = 1101 000 = 1101000 (7 bits)

Step 2: Divide M' by G:

   1101000     (7 bits)
÷  1011        (divisor)
───────────

Align divisor with leading bit of dividend:
1101000
1011     (XOR at positions 6-3)
────
0110000  → 110000

110000
1011     (XOR at positions 5-2)
────
011100  → 11100

11100
1011     (XOR at positions 4-1)
────
01010  → 1010

1010
1011     (XOR at positions 3-0)
────
0001     → 001

Remainder R = 001

Step 3: Form codeword T = M' + R:

T = 1101000 + 0000001 = 1101000 XOR 0000001 = 1101001

Equivalently: 1101 | 001

Transmitted Codeword: 1101001

Receiver Side Verification:

Receive T = 1101001. Divide by G = 1011:

1101001
1011     (XOR)
────
0110001  → 110001

110001
1011     (XOR)
────
011101  → 11101

11101
1011     (XOR)
────
01011  → 1011

1011
1011     (XOR)
────
0000

Remainder = 000 ✓

Zero remainder confirms no errors detected!

The Complete CRC Cycle

CRC Calculation Summary Table
Step	Operation	Input	Output
1	Append r zeros	M = 1101	M' = 1101000
2	Divide M' by G	1101000 ÷ 1011	R = 001
3	XOR remainder	1101000 ⊕ 001	T = 1101001
Verify	Divide T by G	1101001 ÷ 1011	R = 000 ✓

More Complex Example: CRC-16 Style

Let's work through a longer example using a degree-4 polynomial (similar in complexity to real CRCs, but manageable by hand).

Given:

Message M = 10110011 (8 bits)
Generator G = 10011 (5 bits) = x⁴ + x + 1, degree r = 4

Step 1: Append 4 zeros:

M' = 10110011 0000 = 101100110000 (12 bits)

Step 2: Divide M' by G:

101100110000
10011         (align at pos 11)
─────
001110110000  → 1110110000

1110110000
10011         (align at pos 9)
─────
0111010000  → 111010000

111010000
10011         (align at pos 8)
─────
011100000  → 11100000

11100000
10011         (align at pos 7)
─────
01111000  → 1111000

1111000
10011         (align at pos 6)
─────
0110100  → 110100

110100
10011         (align at pos 5)
─────
010010  → 10010

10010
10011         (align at pos 4)
─────
00001  → 0001

Remainder R = 0001 (4 bits)

Step 3: Form codeword:

T = 101100110000 + 0001 = 101100110001

Transmitted: 10110011|0001 (8-bit message + 4-bit CRC)

Verification:

101100110001
10011
─────
001110110001  → 1110110001

1110110001
10011
─────
0111010001  → 111010001

111010001
10011
─────
011100001  → 11100001

11100001
10011
─────
01111001  → 1111001

1111001
10011
─────
0110101  → 110101

110101
10011
─────
010001  → 10001

10001
10011
─────
00010  → 0010

0010 < 10011 (too small to divide)

Final Remainder = 0010 ≠ 0000

Hmm, let me recheck step 2...

Recalculation of Step 2:

Divisor: 10011 (5 bits)
Dividend: 101100110000 (12 bits)

Round 1: bits [11:7] = 10110, XOR with 10011
  10110 ⊕ 10011 = 00101 → 01011 (bring next bit)
  Result so far: 01011 → working with: 10110 (after shift)
  
Actually, let me use a cleaner method:

Bit-by-bit with register:

Initial: Process "101100110000"

I'll use the standard shift-and-XOR approach:

Register (4 bits): 0000
Poly (without MSB): 0011

Process bit sequence from left:

Bit 1: Input=1, Reg=0000, MSB=0, Shift→0001, No XOR (MSB was 0)
Bit 1: Input=0, Reg=0001, MSB=0, Shift→0010, No XOR
Bit 1: Input=1, Reg=0010, MSB=0, Shift→0101, No XOR  
Bit 1: Input=1, Reg=0101, MSB=0, Shift→1011, No XOR
Bit 0: Input=0, Reg=1011, MSB=1, Shift→0110, XOR 0011→0101
Bit 0: Input=0, Reg=0101, MSB=0, Shift→1010, No XOR
Bit 1: Input=1, Reg=1010, MSB=1, Shift→0101, XOR 0011→0110
Bit 1: Input=1, Reg=0110, MSB=0, Shift→1101, No XOR
Bit 0: Input=0, Reg=1101, MSB=1, Shift→1010, XOR 0011→1001
Bit 0: Input=0, Reg=1001, MSB=1, Shift→0010, XOR 0011→0001
Bit 0: Input=0, Reg=0001, MSB=0, Shift→0010, No XOR
Bit 0: Input=0, Reg=0010, MSB=0, Shift→0100, No XOR

Final Register (CRC) = 0100

CRC = 0100

Codeword T = 10110011 0100 = 101100110100

Shift Register Method

Hardware Implementation: Shift Register CRC

CRC's elegance shines in hardware. A Linear Feedback Shift Register (LFSR) computes CRC one bit at a time using only:

r flip-flops (one per CRC bit, where r = polynomial degree)
XOR gates (one per polynomial term, minus the leading and constant)
A feedback path from the output to specific positions

The LFSR Structure:

For generator G(x) = x^r + g_{r-1}x^{r-1} + ... + g₁x + 1:

                        Feedback ←────────────────┐
                                                   │
    ┌─────┐   ┌─────┐   ┌─────┐       ┌─────┐    │
───►│ XOR ├──►│  D  ├──►│  D  ├─ ... ─│  D  ├────┘
    └─────┘   └──┬──┘   └──┬──┘       └──┬──┘    
       │         │g₁       │g₂         │g_{r-1}
       │     ┌───┴───┐ ┌───┴───┐   ┌───┴───┐
       │     │  XOR  │ │  XOR  │   │  XOR  │
       │     └───────┘ └───────┘   └───────┘
       └─────────────────────────────────────►
              (only connects where gᵢ = 1)

The XOR gates are placed between flip-flops at positions where the polynomial has a coefficient of 1.

Example: G(x) = x³ + x + 1 (1011)

Polynomial terms: x³, x¹, x⁰ (constant)

XOR feedback at position 1 (for the x term):

                    ┌──────────────────┐
                    │    Feedback      │
                    ▼                  │
Data ──►┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐──►
   In   │ XOR ├─►│ D₀  ├─►│ XOR ├─►│ D₁  ├─►│ D₂  │─►Out
        └─────┘  └─────┘  └──┬──┘  └─────┘  └─────┘
                             │                 ▲
                             └─────────────────┘
                           (XOR for x term)

Operation:

Initialize register to 0 (or all 1s, depending on standard)
For each message bit:
- Shift register left by one
- New D₀ = (input XOR feedback)
- D₁ = D₀ XOR feedback (because of x term)
- D₂ = D₁
After all message bits, feed in r zeros (or use special technique)
Register contains the CRC

Hardware Speed

LFSR Implementation Benefits

•Minimal hardware — Only r flip-flops and up to r-1 XOR gates
•One cycle per bit — Deterministic, predictable timing
•No memory — Streaming operation, no buffering required
•Low power — Simple gates, no multiplication or lookup tables
•Easily parallelizable — Unroll for N bits/cycle with NxN transformation matrix

Software Implementation Techniques

While hardware uses shift registers, software implementations optimize differently. The main techniques are:

1. Bit-by-Bit (Direct Implementation)

Process one bit at a time, mirroring the hardware approach:

uint32_t crc_bitwise(uint8_t *data, size_t len, uint32_t poly) {
    uint32_t crc = 0xFFFFFFFF;  // Common initialization
    
    for (size_t i = 0; i < len; i++) {
        crc ^= (uint32_t)data[i] << 24;  // Align byte to MSB
        
        for (int bit = 0; bit < 8; bit++) {
            if (crc & 0x80000000)         // Check MSB
                crc = (crc << 1) ^ poly;   // Shift and XOR
            else
                crc <<= 1;                 // Just shift
        }
    }
    return crc ^ 0xFFFFFFFF;  // Final XOR (common)
}

This is slow: 8 branches per byte, hard to pipeline.

2. Table-Driven (Byte-at-a-Time)

Precompute a 256-entry table where entry i gives the CRC contribution of byte value i:

uint32_t crc_table[256];  // Precomputed

void init_crc_table(uint32_t poly) {
    for (int i = 0; i < 256; i++) {
        uint32_t crc = i << 24;
        for (int bit = 0; bit < 8; bit++) {
            crc = (crc << 1) ^ ((crc & 0x80000000) ? poly : 0);
        }
        crc_table[i] = crc;
    }
}

uint32_t crc_table_driven(uint8_t *data, size_t len) {
    uint32_t crc = 0xFFFFFFFF;
    
    for (size_t i = 0; i < len; i++) {
        uint8_t index = (crc >> 24) ^ data[i];
        crc = (crc << 8) ^ crc_table[index];
    }
    return crc ^ 0xFFFFFFFF;
}

This processes 8 bits per iteration with a single table lookup—8x faster than bit-by-bit.

3. Slicing-by-N (Multiple Tables)

Process N bytes at once using N tables. 'Slicing-by-8' processes 8 bytes per iteration:

8 tables of 256 entries each (8 KB total)
Each iteration: XOR current CRC with 8 bytes, combine 8 table lookups
Up to 8x faster than single-table on modern CPUs with cache

4. Hardware Intrinsics (CRC32C)

Modern x86_64 CPUs have PCLMULQDQ (carry-less multiplication) and CRC32 instructions:

// Intel CRC32C intrinsic (for iSCSI/SCTP polynomial)
uint32_t crc = _mm_crc32_u64(crc, *((uint64_t*)data));

This computes 8 bytes of CRC in a single cycle—fastest possible.

Software CRC Performance Comparison
Method	Speed (cycles/byte)	Table Size	Notes
Bit-by-bit	~50-100	0	Simple, slow
Byte table	~10-15	1 KB	Good balance
Slicing-by-4	~4-6	4 KB	Cache-efficient
Slicing-by-8	~2-3	8 KB	Fast on modern CPUs
Hardware CRC32C	~0.3-1	0	If available, best option

Choose the Right Implementation

Implementation Variations

Real CRC implementations include several variations from the 'pure' algorithm. Understanding these is essential for interoperability.

1. Initial Value (Seed)

Instead of starting with register = 0, many standards initialize to all 1s (0xFFFFFFFF for CRC-32).

Why? An all-zeros message with all-zeros CRC is technically valid. Initializing to 1s ensures that leading zeros in the message affect the CRC.

2. Final XOR (Complement)

After processing, XOR the result with a constant (often all 1s).

Why? Combined with 0xFFFFFFFF initialization, this ensures that appending zeros to a message doesn't leave the CRC unchanged. It also provides some defense against implementation bugs.

3. Bit and Byte Reflection

Some standards reflect (reverse) bits within each byte, or reflect the entire final CRC.

Why? Historical reasons—some early hardware processed LSB first (UART-style). 'Reflected' CRCs become standard for compatibility.

CRC Standard Variations
Standard	Poly (Hex)	Init	RefIn	RefOut	XorOut
CRC-16-CCITT	0x1021	0xFFFF	No	No	0x0000
CRC-16-IBM	0x8005	0x0000	Yes	Yes	0x0000
CRC-32 (Ethernet)	0x04C11DB7	0xFFFFFFFF	Yes	Yes	0xFFFFFFFF
CRC-32C	0x1EDC6F41	0xFFFFFFFF	Yes	Yes	0xFFFFFFFF

RefIn (Input Reflection): Process each byte LSB-first instead of MSB-first.

RefOut (Output Reflection): Reverse all bits of the final CRC before output.

Example: For CRC-32 (Ethernet), you:

Initialize to 0xFFFFFFFF
Reflect each input byte (process LSB first)
Compute CRC
Reflect the 32-bit result
XOR with 0xFFFFFFFF

These details are critical for interoperability. Using the wrong reflection or init value produces a different (and incompatible) CRC.

Always Test Against Known Vectors

Summary: The Complete CRC Calculation

We've covered the complete CRC calculation procedure, from conceptual framework to practical implementation. Let's consolidate:

Key Takeaways

•CRC algorithm: Append r zeros to message, divide by generator, append remainder. The result is exactly divisible by G(x).
•Mathematical foundation: T(x) = x^r·M(x) + R(x) ensures divisibility because R + R = 0 in GF(2).
•Manual calculation: Align divisor with dividend, XOR if leading bit is 1, shift and repeat. Remainder is the CRC.
•Hardware implementation: LFSR with XOR gates at polynomial term positions. One bit per clock cycle.
•Software optimization: Table-driven (256 entries) processes one byte per iteration. Slicing-by-N processes multiple bytes.
•Implementation details matter: Init value, final XOR, and bit reflection must match the standard exactly.
•Verification is essential: Always test against known vectors (e.g., CRC of '123456789').

What's Next:

Page Complete

3 / 5