Computer NetworksError Detection & Correction

Hamming Distance: The Foundation of Error Control

LevelIntermediate

Duration55 mins

TopicError Detection & Correction

5 / 5

Code Design: Building Codes with Desired Properties

From Theory to Construction

We've established that Hamming distance determines error detection and correction capabilities, and that minimum distance is the master parameter of any code. But knowing what we want isn't the same as knowing how to build it.

This page bridges theory and practice. We'll learn how to construct error-correcting codes with specific minimum distance properties using generator matrices, parity-check matrices, and systematic encoding. These techniques form the foundation for designing everything from simple parity codes to sophisticated Reed-Solomon and LDPC codes.

What You Will Learn

By the end of this page, you will understand generator and parity-check matrices and their duality, systematic encoding (preserving data bits in the codeword), how matrix structure determines minimum distance, step-by-step construction of codes with target properties, and practical design considerations for real-world applications.

Linear Codes: The Matrix Representation

Most practical error-correcting codes are linear codes, which can be completely described by matrices. This algebraic structure enables efficient encoding, decoding, and analysis.

The Generator Matrix (G):

A k × n matrix G defines a linear code. To encode a k-bit message m, we compute:

$$\textbf{c} = \textbf{m} \cdot G$$

The resulting codeword c is an n-bit vector. All linear combinations of G's rows are valid codewords.

The Parity-Check Matrix (H):

An (n-k) × n matrix H defines the same code from a different perspective. A word c is a valid codeword if and only if:

$$H \cdot \textbf{c}^T = \textbf{0}$$

This is the "parity check"—H encodes the constraints that codewords must satisfy.

The Duality Relationship:

G and H are related by orthogonality. If G = [I_k | P] (systematic form), then:

$$H = [-P^T | I_{n-k}] = [P^T | I_{n-k}]$$

(In binary/mod-2 arithmetic, -1 = 1, so we can drop the minus sign.)

Example: Hamming (7,4) Code

Generator matrix (systematic form):

    d₁ d₂ d₃ d₄ | p₁ p₂ p₃
G = [1  0  0  0  | 1  1  0 ]
    [0  1  0  0  | 1  0  1 ]
    [0  0  1  0  | 0  1  1 ]
    [0  0  0  1  | 1  1  1 ]

Parity-check matrix:

    d₁ d₂ d₃ d₄ | p₁ p₂ p₃
H = [1  1  0  1  | 1  0  0 ]
    [1  0  1  1  | 0  1  0 ]
    [0  1  1  1  | 0  0  1 ]

Interpreting the Matrices

G tells us how to encode: each row is a "basis codeword." H tells us how to check: each row is a constraint that codewords must satisfy (a parity equation). The syndrome s = H·r^T identifies which constraints a received word violates.

Systematic Encoding: Preserving Data Bits

A code is systematic if the original data bits appear unchanged within the codeword. The encoded word has the form:

$$\textbf{c} = [\textbf{m} | \textbf{p}]$$

where m is the original message and p is the parity/check bits.

Advantages of Systematic Encoding:

Simple extraction: To retrieve data, just take the first k bits—no decoding needed if no errors
Existing infrastructure: Legacy systems can read the data portion directly
Partial recovery: Even if check bits are lost, data portion may still be usable
Debugging: Easy to verify encoding by inspection

systematic_encoding.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import numpy as np
 
def systematic_encode(message: np.ndarray, P: np.ndarray) -> np.ndarray:
    """
    Systematic encoding using parity submatrix P.
    
    Generator matrix is G = [I_k | P] (systematic form)
    Codeword is c = [message | parity]
    where parity = message @ P (mod 2)
    
    Args:
        message: k-bit message vector
        P: k x (n-k) parity submatrix
    
    Returns:
        n-bit codeword [message | parity]
    """
    parity = message @ P % 2
    return np.concatenate([message, parity])
 
# Hamming (7,4) parity submatrix
P_74 = np.array([
    [1, 1, 0],  # d₁ contributes to p₁, p₂
    [1, 0, 1],  # d₂ contributes to p₁, p₃
    [0, 1, 1],  # d₃ contributes to p₂, p₃
    [1, 1, 1],  # d₄ contributes to p₁, p₂, p₃
], dtype=np.uint8)
 
# Encode some messages
messages = [
    np.array([0, 0, 0, 0], dtype=np.uint8),
    np.array([1, 0, 0, 0], dtype=np.uint8),
    np.array([1, 0, 1, 1], dtype=np.uint8),
    np.array([1, 1, 1, 1], dtype=np.uint8),
]
 
print("Hamming (7,4) Systematic Encoding")
print("=" * 45)
print("Message    → Codeword        [data | parity]")
print("-" * 45)
 
for m in messages:
    c = systematic_encode(m, P_74)
    m_str = ''.join(map(str, m))
    c_str = ''.join(map(str, c))
    print(f"{m_str}       → {c_str}      [{c_str[:4]} | {c_str[4:]}]")
 
# Output:
# Message    → Codeword        [data | parity]
# ---------------------------------------------
# 0000       → 0000000      [0000 | 000]
# 1000       → 1000110      [1000 | 110]
# 1011       → 1011010      [1011 | 010]
# 1111       → 1111111      [1111 | 111]

Converting to Systematic Form:

Not all generator matrices are in systematic form, but any full-rank G can be converted via Gaussian elimination. The process:

Apply row operations to get identity matrix in leftmost k columns
The rightmost n-k columns become the parity submatrix P
New G = [I_k | P] encodes the same codewords (in different order)

Non-Systematic Codes

Some codes are naturally non-systematic (e.g., convolutional codes, turbo codes). Full decoding is always needed, but the codes may have better performance characteristics for certain channels.

How Matrix Structure Determines Distance

The minimum distance of a linear code is determined by the structure of its parity-check matrix H. This connection is the key to designing codes with specific distance properties.

Key Theorem:

The minimum distance d of a linear code equals the minimum number of columns of H that are linearly dependent.

Equivalently, no d-1 or fewer columns of H sum to zero (mod 2).

Implications for Design:

Target d	H Requirement
d ≥ 2	No column is all zeros
d ≥ 3	No two columns are identical
d ≥ 4	No three columns sum to zero
d ≥ 5	No four columns sum to zero
...	...

Hamming Code Design via H:

Hamming codes achieve d = 3 by a clever construction: the columns of H are all nonzero r-bit patterns (for length n = 2^r - 1).

No zero column → d ≥ 2 ✓
All columns distinct → d ≥ 3 ✓
Some three columns sum to zero → d = 3 exactly

For r = 3: columns are 001, 010, 011, 100, 101, 110, 111 (binary 1-7).

Extended Hamming (d = 4):

To get d = 4, add an overall parity bit:

Add a new row of all 1s to H
Add a new column (for the new parity bit)
Now no 3 columns sum to zero → d ≥ 4

verify_distance_from_H.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import numpy as np
from itertools import combinations
 
def verify_minimum_distance(H: np.ndarray, claimed_d: int) -> bool:
    """
    Verify that no (claimed_d - 1) columns of H sum to zero mod 2.
    This confirms the minimum distance is at least claimed_d.
    """
    n = H.shape[1]  # Number of columns
    
    # Check all subsets of size (claimed_d - 1) or fewer
    for subset_size in range(1, claimed_d):
        for cols in combinations(range(n), subset_size):
            col_sum = np.sum(H[:, cols], axis=1) % 2
            if np.all(col_sum == 0):
                print(f"Found dependent columns: {cols}")
                return False
    
    return True
 
def find_minimum_distance(H: np.ndarray) -> int:
    """Find the actual minimum distance from H."""
    n = H.shape[1]
    
    for d in range(1, n + 2):
        if d == 1:
            # Check for zero columns
            for i in range(n):
                if np.all(H[:, i] == 0):
                    return 1
        else:
            # Check if any (d-1) columns are dependent
            for cols in combinations(range(n), d - 1):
                col_sum = np.sum(H[:, cols], axis=1) % 2
                if np.all(col_sum == 0):
                    return d - 1  # Found d-1 dependent → distance is d-1
    
    return n + 1  # No dependencies found up to all columns
 
# Hamming (7,4) parity-check matrix
H_74 = np.array([
    [1, 1, 0, 1, 1, 0, 0],
    [1, 0, 1, 1, 0, 1, 0],
    [0, 1, 1, 1, 0, 0, 1],
], dtype=np.uint8)
 
print("Hamming (7,4) Analysis:")
d = find_minimum_distance(H_74)
print(f"Minimum distance: {d}")
print(f"Verification for d ≥ 3: {verify_minimum_distance(H_74, 3)}")
print()
 
# Extended Hamming (8,4) - add overall parity
H_84 = np.array([
    [1, 1, 0, 1, 1, 0, 0, 0],  # Original rows
    [1, 0, 1, 1, 0, 1, 0, 0],
    [0, 1, 1, 1, 0, 0, 1, 0],
    [1, 1, 1, 1, 1, 1, 1, 1],  # Overall parity row
], dtype=np.uint8)
 
print("Extended Hamming (8,4) Analysis:")
d = find_minimum_distance(H_84)
print(f"Minimum distance: {d}")
print(f"Verification for d ≥ 4: {verify_minimum_distance(H_84, 4)}")

Column Selection = Code Design

Designing a code is largely about choosing H's columns wisely. Want d ≥ 4? Ensure no three columns XOR to zero. This constraint limits how many columns (and thus codewords) you can have—the fundamental rate-distance tradeoff.

Constructing a Code: A Worked Example

Let's construct a SEC-DED (Single Error Correction, Double Error Detection) code from scratch. We want:

SEC (d ≥ 3): Correct any single-bit error
DED (d ≥ 4): Detect any double-bit error
Combined: d = 4

Step 1: Choose Parameters

For 4 data bits with d = 4:

Start with Hamming (7,4) which has d = 3
Add one overall parity bit for d = 4
Result: n = 8, k = 4, d = 4

Step 2: Construct H for d = 4

Start with Hamming H (3 rows, 7 columns):

Each column is a unique nonzero 3-bit pattern
No two columns identical → d ≥ 3

Extend for d = 4:

Add row of all 1s (overall parity constraint)
Add column [0,0,0,1]^T for the new parity bit

Step 3: Derive G from H

With H in form [A | I_{n-k}], the generator matrix is G = [I_k | A^T].

Step 4: Verify

Check that no 3 columns of H sum to zero
Verify GH^T = 0 (codewords satisfy all parity checks)

construct_sec_ded.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
import numpy as np
 
def construct_extended_hamming():
    """
    Construct extended Hamming (8,4) SEC-DED code.
    Returns generator matrix G and parity-check matrix H.
    """
    # Step 1: Hamming (7,4) parity-check matrix
    # Columns are binary 1-7 in reverse bit order
    H_74 = np.array([
        [1, 1, 0, 1, 1, 0, 0],  # Check bit 1
        [1, 0, 1, 1, 0, 1, 0],  # Check bit 2
        [0, 1, 1, 1, 0, 0, 1],  # Check bit 3
    ], dtype=np.uint8)
    
    # Step 2: Extend for d = 4
    # Add overall parity row and column
    n, m = 8, 4
    H_84 = np.zeros((4, 8), dtype=np.uint8)
    H_84[:3, :7] = H_74
    H_84[3, :7] = 1  # Overall parity for first 7 bits
    H_84[3, 7] = 1   # The parity bit itself
    
    # Step 3: Extract A from H = [A | I_4] (rearrange if needed)
    # Here H is in [P^T | I] form already for systematic G
    # G = [I_4 | P] where P is transposed of first 4 cols of H
    
    P = H_84[:, :4].T  # Transpose of first 4 columns
    
    # Build G = [I_4 | <adjustment needed>]
    # Actually, let's rebuild properly
    # For systematic code: G = [I_k | A], H = [A^T | I_r]
    
    # Parity equations from H_74 analysis:
    # p1 = d1 + d2 + d4
    # p2 = d1 + d3 + d4  
    # p3 = d2 + d3 + d4
    # p4 = d1 + d2 + d3 + d4 + p1 + p2 + p3 (overall)
    
    # Simplifying p4: p4 = d1 + d2 + d3 + d4 + p1 + p2 + p3
    # Since p1,p2,p3 already cover all d's, 
    # p4 = overall parity of all 7 other bits
    
    G_84 = np.array([
        [1, 0, 0, 0,  1, 1, 0, 1],  # d1 → parity in p1, p2, p4
        [0, 1, 0, 0,  1, 0, 1, 1],  # d2 → parity in p1, p3, p4
        [0, 0, 1, 0,  0, 1, 1, 1],  # d3 → parity in p2, p3, p4
        [0, 0, 0, 1,  1, 1, 1, 1],  # d4 → parity in p1, p2, p3, p4
    ], dtype=np.uint8)
    
    return G_84, H_84
 
def verify_code(G, H):
    """Verify G and H define a valid code."""
    print("Verifying code construction...")
    
    # Check GH^T = 0
    product = (G @ H.T) % 2
    if np.all(product == 0):
        print("✓ GH^T = 0 (all codewords satisfy parity checks)")
    else:
        print("✗ GH^T ≠ 0 (error in construction)")
        return False
    
    # Generate all codewords and check minimum distance
    k = G.shape[0]
    min_weight = float('inf')
    
    for data in range(1, 2**k):  # Skip all-zeros
        data_vec = np.array([(data >> i) & 1 for i in range(k)], dtype=np.uint8)
        codeword = data_vec @ G % 2
        weight = np.sum(codeword)
        min_weight = min(min_weight, weight)
    
    print(f"✓ Minimum weight (= minimum distance): {min_weight}")
    
    return True
 
# Construct and verify
G, H = construct_extended_hamming()
print("Generator Matrix G (8,4):")
print(G)
print()
print("Parity-Check Matrix H (8,4):")
print(H)
print()
verify_code(G, H)
 
# Test encoding
test_message = np.array([1, 0, 1, 1], dtype=np.uint8)
codeword = test_message @ G % 2
print(f"\nEncoding test: {test_message} → {codeword}")

Design Validation

Always verify: (1) GH^T = 0, (2) minimum weight matches expected d, (3) encode/decode cycle works correctly. Bugs in code construction are subtle and catastrophic.

Syndrome-Based Error Location

For well-designed codes, the syndrome not only detects errors but locates them, enabling correction.

The Syndrome-Column Relationship:

For a single-bit error in position i, the syndrome equals column i of H:

$$s = H \cdot e^T = H \cdot [0 \ldots 1_i \ldots 0]^T = H_{:,i}$$

If all columns of H are distinct, the syndrome uniquely identifies the error position!

Hamming Code Design Principle:

Hamming codes are constructed so that column i of H is the binary representation of i. This means:

Syndrome = 000 → no error (or undetectable multi-bit error)
Syndrome = 001 → error in position 1
Syndrome = 010 → error in position 2
...
Syndrome = 111 → error in position 7

The syndrome directly tells us which bit to flip!

syndrome_location.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import numpy as np
 
def syndrome_decode(received: np.ndarray, H: np.ndarray) -> tuple:
    """
    Decode received word using syndrome.
    
    Returns:
        (syndrome, error_position, corrected)
        error_position is -1 if no error detected
    """
    syndrome = H @ received % 2
    
    if np.all(syndrome == 0):
        return (syndrome, -1, received)  # No error
    
    # Find which column of H matches syndrome
    n = H.shape[1]
    for i in range(n):
        if np.array_equal(H[:, i], syndrome):
            corrected = received.copy()
            corrected[i] ^= 1  # Flip the error bit
            return (syndrome, i, corrected)
    
    # No matching column - uncorrectable error pattern
    return (syndrome, None, None)
 
# Hamming (7,4) H matrix with column i = binary(i+1)
H_74 = np.array([
    [1, 0, 1, 0, 1, 0, 1],  # LSB of column index+1
    [0, 1, 1, 0, 0, 1, 1],  # Middle bit
    [0, 0, 0, 1, 1, 1, 1],  # MSB of column index+1
], dtype=np.uint8)
 
# Example: transmit codeword, introduce errors
G_74 = np.array([
    [1, 0, 0, 0, 1, 1, 0],
    [0, 1, 0, 0, 1, 0, 1],
    [0, 0, 1, 0, 0, 1, 1],
    [0, 0, 0, 1, 1, 1, 1],
], dtype=np.uint8)
 
message = np.array([1, 0, 1, 1], dtype=np.uint8)
codeword = message @ G_74 % 2
 
print("Syndrome-Based Error Location Demo")
print("=" * 50)
print(f"Transmitted: {codeword}")
print()
 
# Test various error scenarios
test_cases = [
    ("No error", codeword.copy()),
    ("Error in bit 0", codeword ^ np.array([1,0,0,0,0,0,0])),
    ("Error in bit 3", codeword ^ np.array([0,0,0,1,0,0,0])),
    ("Error in bit 6", codeword ^ np.array([0,0,0,0,0,0,1])),
    ("Two errors (0,1)", codeword ^ np.array([1,1,0,0,0,0,0])),
]
 
for description, received in test_cases:
    received = received.astype(np.uint8)
    syn, pos, corrected = syndrome_decode(received, H_74)
    syn_str = ''.join(map(str, syn))
    
    print(f"{description}:")
    print(f"  Received:  {received}")
    print(f"  Syndrome:  {syn_str}", end="")
    
    if pos == -1:
        print(" (no error)")
    elif pos is None:
        print(f" (uncorrectable)")
    else:
        print(f" → error at position {pos}")
        print(f"  Corrected: {corrected}")
    print()

Miscorrection Risk

If two or more errors occur in a single-error-correcting code, the syndrome will match some column of H (different from any actual error position). The decoder will "correct" the wrong bit, introducing a third error! This is why SEC-DED adds detection for double errors.

Code Family Design Principles

Different code families use different design principles to achieve desired distances. Understanding these principles helps select the right code for each application.

Code Family Design Strategies

•Hamming Codes: H columns are all nonzero r-bit patterns. Achieves d = 3, perfect sphere-packing. Extended with overall parity for d = 4.
•BCH Codes: Designed using roots of polynomials in extension fields. Adjacent roots guarantee minimum distance: 2t+1 consecutive roots → d ≥ 2t+1.
•Reed-Solomon Codes: BCH over larger alphabet (symbols, not bits). MDS codes: d = n-k+1. Excellent for burst errors due to symbol-level operation.
•LDPC Codes: H is sparse (low-density). Distance emerges from graph structure (girth, expansion). Near-Shannon-limit performance.
•Polar Codes: Construction based on channel polarization. Frozen bit positions determine distance. Provably capacity-achieving.
•Convolutional Codes: Infinite codewords via shift register. "Free distance" depends on constraint length. Decoded with Viterbi algorithm.

Comparing Code Family Trade-offs
Family	Distance Control	Encoding	Decoding	Best For
Hamming	Fixed (d=3,4)	Simple matrix	Simple syndrome	RAM ECC
BCH	Flexible (design t)	Polynomial	Berlekamp-Massey	Flash, disk
Reed-Solomon	MDS (maximum)	FFT-based	Algebraic	CDs, QR codes
LDPC	Depends on graph	Sparse matrix	Iterative belief prop.	5G, WiFi 6
Polar	Depends on frozen	Recursive	Successive cancel	5G control
Turbo	High (emergent)	Parallel concat.	Iterative MAP	Deep space

Matching Code to Channel

Random errors → BCH, LDPC excel. Burst errors → Reed-Solomon, interleaved codes. Very noisy channels → Turbo, LDPC, Polar (approach Shannon limit). Memory protection → Hamming, SEC-DED (simple, fast).

Practical Code Design Considerations

Theoretical optimality isn't everything. Real-world code selection involves many practical factors beyond just minimum distance.

Real-World Design Factors

•Encoding Complexity: Simple multiplication vs. FFT vs. recursive. Hardware encoders need fixed, predictable timing.
•Decoding Complexity: Syndrome lookup (fast) vs. iterative (powerful but variable latency). Real-time systems need bounded latency.
•Hardware Implementation: Parallelizable? Fits in available gates? Memory requirements for lookup tables?
•Power Consumption: Important for mobile devices. LDPC iterative decoding can be power-hungry.
•Error Floor: Some codes (Turbo, LDPC) have "error floors" where performance plateaus at high SNR. Critical for very-high-reliability applications.
•Latency: Block codes add delay (must receive entire block). Streaming applications may prefer convolutional codes.
•Interoperability: Standards matter. Using a non-standard code means no ecosystem support.

Case Study: ECC DRAM Selection

For server RAM ECC:

Speed critical: Must decode in memory controller, nanoseconds
Area constrained: Limited silicon budget for ECC logic
Single-error dominant: Cosmic rays typically flip one bit
Must detect double: Miscorrection worse than crash

Choice: SEC-DED extended Hamming

Simple syndrome decode: ~5 ns
Small lookup table: 2^8 entries
d = 4: corrects 1, detects 2
Matches error profile perfectly

Not Reed-Solomon (too slow) or LDPC (overkill, complex).

The Right Tool for the Job

The "best" code depends entirely on the application. A code optimal for deep-space probes (Turbo, high latency OK) is terrible for RAM (needs nanosecond decoding). Code selection is engineering, not just mathematics.

Summary: From Distance to Design

Code design translates distance requirements into concrete matrices and algorithms. Understanding the connection between H's structure and minimum distance is the key insight enabling systematic code construction.

Key Takeaways

•Generator and parity-check matrices: G encodes (c = mG), H checks (Hc^T = 0). Related by orthogonality.
•Systematic encoding: [message | parity] format. Easy data extraction, practical for most applications.
•H structure = minimum distance: d is the minimum number of linearly dependent columns of H. Design H to control d.
•Syndrome = error fingerprint: For single errors, syndrome equals the H column at the error position. Direct location.
•Code families: Hamming (simple SEC), BCH (flexible d), RS (MDS), LDPC (near-capacity), each with design trade-offs.
•Practical factors: Speed, power, area, latency, standards. The best code balances theory with engineering constraints.
•Validation is essential: Always verify GH^T = 0, check actual minimum weight, test encode/decode cycles.

Module Complete:

You've now mastered Hamming distance from concept to code design. You understand what Hamming distance measures, how it determines detection and correction capabilities, what minimum distance means for code power, and how to design codes with target distance properties.

This knowledge forms the foundation for understanding all error-control coding—from the simple parity bits in serial communication to the sophisticated LDPC codes in 5G networks and the Reed-Solomon codes on every Blu-ray disc.

Module Complete

Congratulations! You've completed the Hamming Distance module. You now have a deep understanding of how Hamming distance underlies all error detection and correction. This fundamental concept—counting bit differences—powers the reliable digital communication you use every day. Next, explore Hamming Codes to see these principles in a complete, practical implementation.

5 / 5

Loading learning content...

Computer NetworksError Detection & Correction

Hamming Distance: The Foundation of Error Control

LevelIntermediate

Duration55 mins

TopicError Detection & Correction

5 / 5

Code Design: Building Codes with Desired Properties

From Theory to Construction

What You Will Learn

Linear Codes: The Matrix Representation

Most practical error-correcting codes are linear codes, which can be completely described by matrices. This algebraic structure enables efficient encoding, decoding, and analysis.

The Generator Matrix (G):

A k × n matrix G defines a linear code. To encode a k-bit message m, we compute:

$$\textbf{c} = \textbf{m} \cdot G$$

The resulting codeword c is an n-bit vector. All linear combinations of G's rows are valid codewords.

The Parity-Check Matrix (H):

An (n-k) × n matrix H defines the same code from a different perspective. A word c is a valid codeword if and only if:

$$H \cdot \textbf{c}^T = \textbf{0}$$

This is the "parity check"—H encodes the constraints that codewords must satisfy.

The Duality Relationship:

G and H are related by orthogonality. If G = [I_k | P] (systematic form), then:

$$H = [-P^T | I_{n-k}] = [P^T | I_{n-k}]$$

(In binary/mod-2 arithmetic, -1 = 1, so we can drop the minus sign.)

Example: Hamming (7,4) Code

Generator matrix (systematic form):

    d₁ d₂ d₃ d₄ | p₁ p₂ p₃
G = [1  0  0  0  | 1  1  0 ]
    [0  1  0  0  | 1  0  1 ]
    [0  0  1  0  | 0  1  1 ]
    [0  0  0  1  | 1  1  1 ]

Parity-check matrix:

    d₁ d₂ d₃ d₄ | p₁ p₂ p₃
H = [1  1  0  1  | 1  0  0 ]
    [1  0  1  1  | 0  1  0 ]
    [0  1  1  1  | 0  0  1 ]

Interpreting the Matrices

Systematic Encoding: Preserving Data Bits

A code is systematic if the original data bits appear unchanged within the codeword. The encoded word has the form:

$$\textbf{c} = [\textbf{m} | \textbf{p}]$$

where m is the original message and p is the parity/check bits.

Advantages of Systematic Encoding:

Simple extraction: To retrieve data, just take the first k bits—no decoding needed if no errors
Existing infrastructure: Legacy systems can read the data portion directly
Partial recovery: Even if check bits are lost, data portion may still be usable
Debugging: Easy to verify encoding by inspection

systematic_encoding.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import numpy as np
 
def systematic_encode(message: np.ndarray, P: np.ndarray) -> np.ndarray:
    """
    Systematic encoding using parity submatrix P.
    
    Generator matrix is G = [I_k | P] (systematic form)
    Codeword is c = [message | parity]
    where parity = message @ P (mod 2)
    
    Args:
        message: k-bit message vector
        P: k x (n-k) parity submatrix
    
    Returns:
        n-bit codeword [message | parity]
    """
    parity = message @ P % 2
    return np.concatenate([message, parity])
 
# Hamming (7,4) parity submatrix
P_74 = np.array([
    [1, 1, 0],  # d₁ contributes to p₁, p₂
    [1, 0, 1],  # d₂ contributes to p₁, p₃
    [0, 1, 1],  # d₃ contributes to p₂, p₃
    [1, 1, 1],  # d₄ contributes to p₁, p₂, p₃
], dtype=np.uint8)
 
# Encode some messages
messages = [
    np.array([0, 0, 0, 0], dtype=np.uint8),
    np.array([1, 0, 0, 0], dtype=np.uint8),
    np.array([1, 0, 1, 1], dtype=np.uint8),
    np.array([1, 1, 1, 1], dtype=np.uint8),
]
 
print("Hamming (7,4) Systematic Encoding")
print("=" * 45)
print("Message    → Codeword        [data | parity]")
print("-" * 45)
 
for m in messages:
    c = systematic_encode(m, P_74)
    m_str = ''.join(map(str, m))
    c_str = ''.join(map(str, c))
    print(f"{m_str}       → {c_str}      [{c_str[:4]} | {c_str[4:]}]")
 
# Output:
# Message    → Codeword        [data | parity]
# ---------------------------------------------
# 0000       → 0000000      [0000 | 000]
# 1000       → 1000110      [1000 | 110]
# 1011       → 1011010      [1011 | 010]
# 1111       → 1111111      [1111 | 111]

Converting to Systematic Form:

Not all generator matrices are in systematic form, but any full-rank G can be converted via Gaussian elimination. The process:

Apply row operations to get identity matrix in leftmost k columns
The rightmost n-k columns become the parity submatrix P
New G = [I_k | P] encodes the same codewords (in different order)

Non-Systematic Codes

Some codes are naturally non-systematic (e.g., convolutional codes, turbo codes). Full decoding is always needed, but the codes may have better performance characteristics for certain channels.

How Matrix Structure Determines Distance

The minimum distance of a linear code is determined by the structure of its parity-check matrix H. This connection is the key to designing codes with specific distance properties.

Key Theorem:

The minimum distance d of a linear code equals the minimum number of columns of H that are linearly dependent.

Equivalently, no d-1 or fewer columns of H sum to zero (mod 2).

Implications for Design:

Target d	H Requirement
d ≥ 2	No column is all zeros
d ≥ 3	No two columns are identical
d ≥ 4	No three columns sum to zero
d ≥ 5	No four columns sum to zero
...	...

Hamming Code Design via H:

Hamming codes achieve d = 3 by a clever construction: the columns of H are all nonzero r-bit patterns (for length n = 2^r - 1).

No zero column → d ≥ 2 ✓
All columns distinct → d ≥ 3 ✓
Some three columns sum to zero → d = 3 exactly

For r = 3: columns are 001, 010, 011, 100, 101, 110, 111 (binary 1-7).

Extended Hamming (d = 4):

To get d = 4, add an overall parity bit:

Add a new row of all 1s to H
Add a new column (for the new parity bit)
Now no 3 columns sum to zero → d ≥ 4

verify_distance_from_H.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import numpy as np
from itertools import combinations
 
def verify_minimum_distance(H: np.ndarray, claimed_d: int) -> bool:
    """
    Verify that no (claimed_d - 1) columns of H sum to zero mod 2.
    This confirms the minimum distance is at least claimed_d.
    """
    n = H.shape[1]  # Number of columns
    
    # Check all subsets of size (claimed_d - 1) or fewer
    for subset_size in range(1, claimed_d):
        for cols in combinations(range(n), subset_size):
            col_sum = np.sum(H[:, cols], axis=1) % 2
            if np.all(col_sum == 0):
                print(f"Found dependent columns: {cols}")
                return False
    
    return True
 
def find_minimum_distance(H: np.ndarray) -> int:
    """Find the actual minimum distance from H."""
    n = H.shape[1]
    
    for d in range(1, n + 2):
        if d == 1:
            # Check for zero columns
            for i in range(n):
                if np.all(H[:, i] == 0):
                    return 1
        else:
            # Check if any (d-1) columns are dependent
            for cols in combinations(range(n), d - 1):
                col_sum = np.sum(H[:, cols], axis=1) % 2
                if np.all(col_sum == 0):
                    return d - 1  # Found d-1 dependent → distance is d-1
    
    return n + 1  # No dependencies found up to all columns
 
# Hamming (7,4) parity-check matrix
H_74 = np.array([
    [1, 1, 0, 1, 1, 0, 0],
    [1, 0, 1, 1, 0, 1, 0],
    [0, 1, 1, 1, 0, 0, 1],
], dtype=np.uint8)
 
print("Hamming (7,4) Analysis:")
d = find_minimum_distance(H_74)
print(f"Minimum distance: {d}")
print(f"Verification for d ≥ 3: {verify_minimum_distance(H_74, 3)}")
print()
 
# Extended Hamming (8,4) - add overall parity
H_84 = np.array([
    [1, 1, 0, 1, 1, 0, 0, 0],  # Original rows
    [1, 0, 1, 1, 0, 1, 0, 0],
    [0, 1, 1, 1, 0, 0, 1, 0],
    [1, 1, 1, 1, 1, 1, 1, 1],  # Overall parity row
], dtype=np.uint8)
 
print("Extended Hamming (8,4) Analysis:")
d = find_minimum_distance(H_84)
print(f"Minimum distance: {d}")
print(f"Verification for d ≥ 4: {verify_minimum_distance(H_84, 4)}")

Column Selection = Code Design

Constructing a Code: A Worked Example

Let's construct a SEC-DED (Single Error Correction, Double Error Detection) code from scratch. We want:

SEC (d ≥ 3): Correct any single-bit error
DED (d ≥ 4): Detect any double-bit error
Combined: d = 4

Step 1: Choose Parameters

For 4 data bits with d = 4:

Start with Hamming (7,4) which has d = 3
Add one overall parity bit for d = 4
Result: n = 8, k = 4, d = 4

Step 2: Construct H for d = 4

Start with Hamming H (3 rows, 7 columns):

Each column is a unique nonzero 3-bit pattern
No two columns identical → d ≥ 3

Extend for d = 4:

Add row of all 1s (overall parity constraint)
Add column [0,0,0,1]^T for the new parity bit

Step 3: Derive G from H

With H in form [A | I_{n-k}], the generator matrix is G = [I_k | A^T].

Step 4: Verify

Check that no 3 columns of H sum to zero
Verify GH^T = 0 (codewords satisfy all parity checks)

construct_sec_ded.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
import numpy as np
 
def construct_extended_hamming():
    """
    Construct extended Hamming (8,4) SEC-DED code.
    Returns generator matrix G and parity-check matrix H.
    """
    # Step 1: Hamming (7,4) parity-check matrix
    # Columns are binary 1-7 in reverse bit order
    H_74 = np.array([
        [1, 1, 0, 1, 1, 0, 0],  # Check bit 1
        [1, 0, 1, 1, 0, 1, 0],  # Check bit 2
        [0, 1, 1, 1, 0, 0, 1],  # Check bit 3
    ], dtype=np.uint8)
    
    # Step 2: Extend for d = 4
    # Add overall parity row and column
    n, m = 8, 4
    H_84 = np.zeros((4, 8), dtype=np.uint8)
    H_84[:3, :7] = H_74
    H_84[3, :7] = 1  # Overall parity for first 7 bits
    H_84[3, 7] = 1   # The parity bit itself
    
    # Step 3: Extract A from H = [A | I_4] (rearrange if needed)
    # Here H is in [P^T | I] form already for systematic G
    # G = [I_4 | P] where P is transposed of first 4 cols of H
    
    P = H_84[:, :4].T  # Transpose of first 4 columns
    
    # Build G = [I_4 | <adjustment needed>]
    # Actually, let's rebuild properly
    # For systematic code: G = [I_k | A], H = [A^T | I_r]
    
    # Parity equations from H_74 analysis:
    # p1 = d1 + d2 + d4
    # p2 = d1 + d3 + d4  
    # p3 = d2 + d3 + d4
    # p4 = d1 + d2 + d3 + d4 + p1 + p2 + p3 (overall)
    
    # Simplifying p4: p4 = d1 + d2 + d3 + d4 + p1 + p2 + p3
    # Since p1,p2,p3 already cover all d's, 
    # p4 = overall parity of all 7 other bits
    
    G_84 = np.array([
        [1, 0, 0, 0,  1, 1, 0, 1],  # d1 → parity in p1, p2, p4
        [0, 1, 0, 0,  1, 0, 1, 1],  # d2 → parity in p1, p3, p4
        [0, 0, 1, 0,  0, 1, 1, 1],  # d3 → parity in p2, p3, p4
        [0, 0, 0, 1,  1, 1, 1, 1],  # d4 → parity in p1, p2, p3, p4
    ], dtype=np.uint8)
    
    return G_84, H_84
 
def verify_code(G, H):
    """Verify G and H define a valid code."""
    print("Verifying code construction...")
    
    # Check GH^T = 0
    product = (G @ H.T) % 2
    if np.all(product == 0):
        print("✓ GH^T = 0 (all codewords satisfy parity checks)")
    else:
        print("✗ GH^T ≠ 0 (error in construction)")
        return False
    
    # Generate all codewords and check minimum distance
    k = G.shape[0]
    min_weight = float('inf')
    
    for data in range(1, 2**k):  # Skip all-zeros
        data_vec = np.array([(data >> i) & 1 for i in range(k)], dtype=np.uint8)
        codeword = data_vec @ G % 2
        weight = np.sum(codeword)
        min_weight = min(min_weight, weight)
    
    print(f"✓ Minimum weight (= minimum distance): {min_weight}")
    
    return True
 
# Construct and verify
G, H = construct_extended_hamming()
print("Generator Matrix G (8,4):")
print(G)
print()
print("Parity-Check Matrix H (8,4):")
print(H)
print()
verify_code(G, H)
 
# Test encoding
test_message = np.array([1, 0, 1, 1], dtype=np.uint8)
codeword = test_message @ G % 2
print(f"\nEncoding test: {test_message} → {codeword}")

Design Validation

Always verify: (1) GH^T = 0, (2) minimum weight matches expected d, (3) encode/decode cycle works correctly. Bugs in code construction are subtle and catastrophic.

Syndrome-Based Error Location

For well-designed codes, the syndrome not only detects errors but locates them, enabling correction.

The Syndrome-Column Relationship:

For a single-bit error in position i, the syndrome equals column i of H:

$$s = H \cdot e^T = H \cdot [0 \ldots 1_i \ldots 0]^T = H_{:,i}$$

If all columns of H are distinct, the syndrome uniquely identifies the error position!

Hamming Code Design Principle:

Hamming codes are constructed so that column i of H is the binary representation of i. This means:

Syndrome = 000 → no error (or undetectable multi-bit error)
Syndrome = 001 → error in position 1
Syndrome = 010 → error in position 2
...
Syndrome = 111 → error in position 7

The syndrome directly tells us which bit to flip!

syndrome_location.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import numpy as np
 
def syndrome_decode(received: np.ndarray, H: np.ndarray) -> tuple:
    """
    Decode received word using syndrome.
    
    Returns:
        (syndrome, error_position, corrected)
        error_position is -1 if no error detected
    """
    syndrome = H @ received % 2
    
    if np.all(syndrome == 0):
        return (syndrome, -1, received)  # No error
    
    # Find which column of H matches syndrome
    n = H.shape[1]
    for i in range(n):
        if np.array_equal(H[:, i], syndrome):
            corrected = received.copy()
            corrected[i] ^= 1  # Flip the error bit
            return (syndrome, i, corrected)
    
    # No matching column - uncorrectable error pattern
    return (syndrome, None, None)
 
# Hamming (7,4) H matrix with column i = binary(i+1)
H_74 = np.array([
    [1, 0, 1, 0, 1, 0, 1],  # LSB of column index+1
    [0, 1, 1, 0, 0, 1, 1],  # Middle bit
    [0, 0, 0, 1, 1, 1, 1],  # MSB of column index+1
], dtype=np.uint8)
 
# Example: transmit codeword, introduce errors
G_74 = np.array([
    [1, 0, 0, 0, 1, 1, 0],
    [0, 1, 0, 0, 1, 0, 1],
    [0, 0, 1, 0, 0, 1, 1],
    [0, 0, 0, 1, 1, 1, 1],
], dtype=np.uint8)
 
message = np.array([1, 0, 1, 1], dtype=np.uint8)
codeword = message @ G_74 % 2
 
print("Syndrome-Based Error Location Demo")
print("=" * 50)
print(f"Transmitted: {codeword}")
print()
 
# Test various error scenarios
test_cases = [
    ("No error", codeword.copy()),
    ("Error in bit 0", codeword ^ np.array([1,0,0,0,0,0,0])),
    ("Error in bit 3", codeword ^ np.array([0,0,0,1,0,0,0])),
    ("Error in bit 6", codeword ^ np.array([0,0,0,0,0,0,1])),
    ("Two errors (0,1)", codeword ^ np.array([1,1,0,0,0,0,0])),
]
 
for description, received in test_cases:
    received = received.astype(np.uint8)
    syn, pos, corrected = syndrome_decode(received, H_74)
    syn_str = ''.join(map(str, syn))
    
    print(f"{description}:")
    print(f"  Received:  {received}")
    print(f"  Syndrome:  {syn_str}", end="")
    
    if pos == -1:
        print(" (no error)")
    elif pos is None:
        print(f" (uncorrectable)")
    else:
        print(f" → error at position {pos}")
        print(f"  Corrected: {corrected}")
    print()

Miscorrection Risk

Code Family Design Principles

Different code families use different design principles to achieve desired distances. Understanding these principles helps select the right code for each application.

Code Family Design Strategies

•Hamming Codes: H columns are all nonzero r-bit patterns. Achieves d = 3, perfect sphere-packing. Extended with overall parity for d = 4.
•BCH Codes: Designed using roots of polynomials in extension fields. Adjacent roots guarantee minimum distance: 2t+1 consecutive roots → d ≥ 2t+1.
•Reed-Solomon Codes: BCH over larger alphabet (symbols, not bits). MDS codes: d = n-k+1. Excellent for burst errors due to symbol-level operation.
•LDPC Codes: H is sparse (low-density). Distance emerges from graph structure (girth, expansion). Near-Shannon-limit performance.
•Polar Codes: Construction based on channel polarization. Frozen bit positions determine distance. Provably capacity-achieving.
•Convolutional Codes: Infinite codewords via shift register. "Free distance" depends on constraint length. Decoded with Viterbi algorithm.

Comparing Code Family Trade-offs
Family	Distance Control	Encoding	Decoding	Best For
Hamming	Fixed (d=3,4)	Simple matrix	Simple syndrome	RAM ECC
BCH	Flexible (design t)	Polynomial	Berlekamp-Massey	Flash, disk
Reed-Solomon	MDS (maximum)	FFT-based	Algebraic	CDs, QR codes
LDPC	Depends on graph	Sparse matrix	Iterative belief prop.	5G, WiFi 6
Polar	Depends on frozen	Recursive	Successive cancel	5G control
Turbo	High (emergent)	Parallel concat.	Iterative MAP	Deep space

Matching Code to Channel

Practical Code Design Considerations

Theoretical optimality isn't everything. Real-world code selection involves many practical factors beyond just minimum distance.

Real-World Design Factors

•Encoding Complexity: Simple multiplication vs. FFT vs. recursive. Hardware encoders need fixed, predictable timing.
•Decoding Complexity: Syndrome lookup (fast) vs. iterative (powerful but variable latency). Real-time systems need bounded latency.
•Hardware Implementation: Parallelizable? Fits in available gates? Memory requirements for lookup tables?
•Power Consumption: Important for mobile devices. LDPC iterative decoding can be power-hungry.
•Error Floor: Some codes (Turbo, LDPC) have "error floors" where performance plateaus at high SNR. Critical for very-high-reliability applications.
•Latency: Block codes add delay (must receive entire block). Streaming applications may prefer convolutional codes.
•Interoperability: Standards matter. Using a non-standard code means no ecosystem support.

Case Study: ECC DRAM Selection

For server RAM ECC:

Speed critical: Must decode in memory controller, nanoseconds
Area constrained: Limited silicon budget for ECC logic
Single-error dominant: Cosmic rays typically flip one bit
Must detect double: Miscorrection worse than crash

Choice: SEC-DED extended Hamming

Simple syndrome decode: ~5 ns
Small lookup table: 2^8 entries
d = 4: corrects 1, detects 2
Matches error profile perfectly

Not Reed-Solomon (too slow) or LDPC (overkill, complex).

The Right Tool for the Job

Summary: From Distance to Design

Key Takeaways

•Generator and parity-check matrices: G encodes (c = mG), H checks (Hc^T = 0). Related by orthogonality.
•Systematic encoding: [message | parity] format. Easy data extraction, practical for most applications.
•H structure = minimum distance: d is the minimum number of linearly dependent columns of H. Design H to control d.
•Syndrome = error fingerprint: For single errors, syndrome equals the H column at the error position. Direct location.
•Code families: Hamming (simple SEC), BCH (flexible d), RS (MDS), LDPC (near-capacity), each with design trade-offs.
•Practical factors: Speed, power, area, latency, standards. The best code balances theory with engineering constraints.
•Validation is essential: Always verify GH^T = 0, check actual minimum weight, test encode/decode cycles.

Module Complete:

Module Complete

5 / 5