Loading learning content...
We've established that Hamming distance determines error detection and correction capabilities, and that minimum distance is the master parameter of any code. But knowing what we want isn't the same as knowing how to build it.
This page bridges theory and practice. We'll learn how to construct error-correcting codes with specific minimum distance properties using generator matrices, parity-check matrices, and systematic encoding. These techniques form the foundation for designing everything from simple parity codes to sophisticated Reed-Solomon and LDPC codes.
By the end of this page, you will understand generator and parity-check matrices and their duality, systematic encoding (preserving data bits in the codeword), how matrix structure determines minimum distance, step-by-step construction of codes with target properties, and practical design considerations for real-world applications.
Most practical error-correcting codes are linear codes, which can be completely described by matrices. This algebraic structure enables efficient encoding, decoding, and analysis.
The Generator Matrix (G):
A k × n matrix G defines a linear code. To encode a k-bit message m, we compute:
$$\textbf{c} = \textbf{m} \cdot G$$
The resulting codeword c is an n-bit vector. All linear combinations of G's rows are valid codewords.
The Parity-Check Matrix (H):
An (n-k) × n matrix H defines the same code from a different perspective. A word c is a valid codeword if and only if:
$$H \cdot \textbf{c}^T = \textbf{0}$$
This is the "parity check"—H encodes the constraints that codewords must satisfy.
The Duality Relationship:
G and H are related by orthogonality. If G = [I_k | P] (systematic form), then:
$$H = [-P^T | I_{n-k}] = [P^T | I_{n-k}]$$
(In binary/mod-2 arithmetic, -1 = 1, so we can drop the minus sign.)
Example: Hamming (7,4) Code
Generator matrix (systematic form):
d₁ d₂ d₃ d₄ | p₁ p₂ p₃
G = [1 0 0 0 | 1 1 0 ]
[0 1 0 0 | 1 0 1 ]
[0 0 1 0 | 0 1 1 ]
[0 0 0 1 | 1 1 1 ]
Parity-check matrix:
d₁ d₂ d₃ d₄ | p₁ p₂ p₃
H = [1 1 0 1 | 1 0 0 ]
[1 0 1 1 | 0 1 0 ]
[0 1 1 1 | 0 0 1 ]
G tells us how to encode: each row is a "basis codeword." H tells us how to check: each row is a constraint that codewords must satisfy (a parity equation). The syndrome s = H·r^T identifies which constraints a received word violates.
A code is systematic if the original data bits appear unchanged within the codeword. The encoded word has the form:
$$\textbf{c} = [\textbf{m} | \textbf{p}]$$
where m is the original message and p is the parity/check bits.
Advantages of Systematic Encoding:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
import numpy as np def systematic_encode(message: np.ndarray, P: np.ndarray) -> np.ndarray: """ Systematic encoding using parity submatrix P. Generator matrix is G = [I_k | P] (systematic form) Codeword is c = [message | parity] where parity = message @ P (mod 2) Args: message: k-bit message vector P: k x (n-k) parity submatrix Returns: n-bit codeword [message | parity] """ parity = message @ P % 2 return np.concatenate([message, parity]) # Hamming (7,4) parity submatrixP_74 = np.array([ [1, 1, 0], # d₁ contributes to p₁, p₂ [1, 0, 1], # d₂ contributes to p₁, p₃ [0, 1, 1], # d₃ contributes to p₂, p₃ [1, 1, 1], # d₄ contributes to p₁, p₂, p₃], dtype=np.uint8) # Encode some messagesmessages = [ np.array([0, 0, 0, 0], dtype=np.uint8), np.array([1, 0, 0, 0], dtype=np.uint8), np.array([1, 0, 1, 1], dtype=np.uint8), np.array([1, 1, 1, 1], dtype=np.uint8),] print("Hamming (7,4) Systematic Encoding")print("=" * 45)print("Message → Codeword [data | parity]")print("-" * 45) for m in messages: c = systematic_encode(m, P_74) m_str = ''.join(map(str, m)) c_str = ''.join(map(str, c)) print(f"{m_str} → {c_str} [{c_str[:4]} | {c_str[4:]}]") # Output:# Message → Codeword [data | parity]# ---------------------------------------------# 0000 → 0000000 [0000 | 000]# 1000 → 1000110 [1000 | 110]# 1011 → 1011010 [1011 | 010]# 1111 → 1111111 [1111 | 111]Converting to Systematic Form:
Not all generator matrices are in systematic form, but any full-rank G can be converted via Gaussian elimination. The process:
Some codes are naturally non-systematic (e.g., convolutional codes, turbo codes). Full decoding is always needed, but the codes may have better performance characteristics for certain channels.
The minimum distance of a linear code is determined by the structure of its parity-check matrix H. This connection is the key to designing codes with specific distance properties.
Key Theorem:
The minimum distance d of a linear code equals the minimum number of columns of H that are linearly dependent.
Equivalently, no d-1 or fewer columns of H sum to zero (mod 2).
Implications for Design:
| Target d | H Requirement |
|---|---|
| d ≥ 2 | No column is all zeros |
| d ≥ 3 | No two columns are identical |
| d ≥ 4 | No three columns sum to zero |
| d ≥ 5 | No four columns sum to zero |
| ... | ... |
Hamming Code Design via H:
Hamming codes achieve d = 3 by a clever construction: the columns of H are all nonzero r-bit patterns (for length n = 2^r - 1).
For r = 3: columns are 001, 010, 011, 100, 101, 110, 111 (binary 1-7).
Extended Hamming (d = 4):
To get d = 4, add an overall parity bit:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
import numpy as npfrom itertools import combinations def verify_minimum_distance(H: np.ndarray, claimed_d: int) -> bool: """ Verify that no (claimed_d - 1) columns of H sum to zero mod 2. This confirms the minimum distance is at least claimed_d. """ n = H.shape[1] # Number of columns # Check all subsets of size (claimed_d - 1) or fewer for subset_size in range(1, claimed_d): for cols in combinations(range(n), subset_size): col_sum = np.sum(H[:, cols], axis=1) % 2 if np.all(col_sum == 0): print(f"Found dependent columns: {cols}") return False return True def find_minimum_distance(H: np.ndarray) -> int: """Find the actual minimum distance from H.""" n = H.shape[1] for d in range(1, n + 2): if d == 1: # Check for zero columns for i in range(n): if np.all(H[:, i] == 0): return 1 else: # Check if any (d-1) columns are dependent for cols in combinations(range(n), d - 1): col_sum = np.sum(H[:, cols], axis=1) % 2 if np.all(col_sum == 0): return d - 1 # Found d-1 dependent → distance is d-1 return n + 1 # No dependencies found up to all columns # Hamming (7,4) parity-check matrixH_74 = np.array([ [1, 1, 0, 1, 1, 0, 0], [1, 0, 1, 1, 0, 1, 0], [0, 1, 1, 1, 0, 0, 1],], dtype=np.uint8) print("Hamming (7,4) Analysis:")d = find_minimum_distance(H_74)print(f"Minimum distance: {d}")print(f"Verification for d ≥ 3: {verify_minimum_distance(H_74, 3)}")print() # Extended Hamming (8,4) - add overall parityH_84 = np.array([ [1, 1, 0, 1, 1, 0, 0, 0], # Original rows [1, 0, 1, 1, 0, 1, 0, 0], [0, 1, 1, 1, 0, 0, 1, 0], [1, 1, 1, 1, 1, 1, 1, 1], # Overall parity row], dtype=np.uint8) print("Extended Hamming (8,4) Analysis:")d = find_minimum_distance(H_84)print(f"Minimum distance: {d}")print(f"Verification for d ≥ 4: {verify_minimum_distance(H_84, 4)}")Designing a code is largely about choosing H's columns wisely. Want d ≥ 4? Ensure no three columns XOR to zero. This constraint limits how many columns (and thus codewords) you can have—the fundamental rate-distance tradeoff.
Let's construct a SEC-DED (Single Error Correction, Double Error Detection) code from scratch. We want:
Step 1: Choose Parameters
For 4 data bits with d = 4:
Step 2: Construct H for d = 4
Start with Hamming H (3 rows, 7 columns):
Extend for d = 4:
Step 3: Derive G from H
With H in form [A | I_{n-k}], the generator matrix is G = [I_k | A^T].
Step 4: Verify
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
import numpy as np def construct_extended_hamming(): """ Construct extended Hamming (8,4) SEC-DED code. Returns generator matrix G and parity-check matrix H. """ # Step 1: Hamming (7,4) parity-check matrix # Columns are binary 1-7 in reverse bit order H_74 = np.array([ [1, 1, 0, 1, 1, 0, 0], # Check bit 1 [1, 0, 1, 1, 0, 1, 0], # Check bit 2 [0, 1, 1, 1, 0, 0, 1], # Check bit 3 ], dtype=np.uint8) # Step 2: Extend for d = 4 # Add overall parity row and column n, m = 8, 4 H_84 = np.zeros((4, 8), dtype=np.uint8) H_84[:3, :7] = H_74 H_84[3, :7] = 1 # Overall parity for first 7 bits H_84[3, 7] = 1 # The parity bit itself # Step 3: Extract A from H = [A | I_4] (rearrange if needed) # Here H is in [P^T | I] form already for systematic G # G = [I_4 | P] where P is transposed of first 4 cols of H P = H_84[:, :4].T # Transpose of first 4 columns # Build G = [I_4 | <adjustment needed>] # Actually, let's rebuild properly # For systematic code: G = [I_k | A], H = [A^T | I_r] # Parity equations from H_74 analysis: # p1 = d1 + d2 + d4 # p2 = d1 + d3 + d4 # p3 = d2 + d3 + d4 # p4 = d1 + d2 + d3 + d4 + p1 + p2 + p3 (overall) # Simplifying p4: p4 = d1 + d2 + d3 + d4 + p1 + p2 + p3 # Since p1,p2,p3 already cover all d's, # p4 = overall parity of all 7 other bits G_84 = np.array([ [1, 0, 0, 0, 1, 1, 0, 1], # d1 → parity in p1, p2, p4 [0, 1, 0, 0, 1, 0, 1, 1], # d2 → parity in p1, p3, p4 [0, 0, 1, 0, 0, 1, 1, 1], # d3 → parity in p2, p3, p4 [0, 0, 0, 1, 1, 1, 1, 1], # d4 → parity in p1, p2, p3, p4 ], dtype=np.uint8) return G_84, H_84 def verify_code(G, H): """Verify G and H define a valid code.""" print("Verifying code construction...") # Check GH^T = 0 product = (G @ H.T) % 2 if np.all(product == 0): print("✓ GH^T = 0 (all codewords satisfy parity checks)") else: print("✗ GH^T ≠ 0 (error in construction)") return False # Generate all codewords and check minimum distance k = G.shape[0] min_weight = float('inf') for data in range(1, 2**k): # Skip all-zeros data_vec = np.array([(data >> i) & 1 for i in range(k)], dtype=np.uint8) codeword = data_vec @ G % 2 weight = np.sum(codeword) min_weight = min(min_weight, weight) print(f"✓ Minimum weight (= minimum distance): {min_weight}") return True # Construct and verifyG, H = construct_extended_hamming()print("Generator Matrix G (8,4):")print(G)print()print("Parity-Check Matrix H (8,4):")print(H)print()verify_code(G, H) # Test encodingtest_message = np.array([1, 0, 1, 1], dtype=np.uint8)codeword = test_message @ G % 2print(f"\nEncoding test: {test_message} → {codeword}")Always verify: (1) GH^T = 0, (2) minimum weight matches expected d, (3) encode/decode cycle works correctly. Bugs in code construction are subtle and catastrophic.
For well-designed codes, the syndrome not only detects errors but locates them, enabling correction.
The Syndrome-Column Relationship:
For a single-bit error in position i, the syndrome equals column i of H:
$$s = H \cdot e^T = H \cdot [0 \ldots 1_i \ldots 0]^T = H_{:,i}$$
If all columns of H are distinct, the syndrome uniquely identifies the error position!
Hamming Code Design Principle:
Hamming codes are constructed so that column i of H is the binary representation of i. This means:
The syndrome directly tells us which bit to flip!
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
import numpy as np def syndrome_decode(received: np.ndarray, H: np.ndarray) -> tuple: """ Decode received word using syndrome. Returns: (syndrome, error_position, corrected) error_position is -1 if no error detected """ syndrome = H @ received % 2 if np.all(syndrome == 0): return (syndrome, -1, received) # No error # Find which column of H matches syndrome n = H.shape[1] for i in range(n): if np.array_equal(H[:, i], syndrome): corrected = received.copy() corrected[i] ^= 1 # Flip the error bit return (syndrome, i, corrected) # No matching column - uncorrectable error pattern return (syndrome, None, None) # Hamming (7,4) H matrix with column i = binary(i+1)H_74 = np.array([ [1, 0, 1, 0, 1, 0, 1], # LSB of column index+1 [0, 1, 1, 0, 0, 1, 1], # Middle bit [0, 0, 0, 1, 1, 1, 1], # MSB of column index+1], dtype=np.uint8) # Example: transmit codeword, introduce errorsG_74 = np.array([ [1, 0, 0, 0, 1, 1, 0], [0, 1, 0, 0, 1, 0, 1], [0, 0, 1, 0, 0, 1, 1], [0, 0, 0, 1, 1, 1, 1],], dtype=np.uint8) message = np.array([1, 0, 1, 1], dtype=np.uint8)codeword = message @ G_74 % 2 print("Syndrome-Based Error Location Demo")print("=" * 50)print(f"Transmitted: {codeword}")print() # Test various error scenariostest_cases = [ ("No error", codeword.copy()), ("Error in bit 0", codeword ^ np.array([1,0,0,0,0,0,0])), ("Error in bit 3", codeword ^ np.array([0,0,0,1,0,0,0])), ("Error in bit 6", codeword ^ np.array([0,0,0,0,0,0,1])), ("Two errors (0,1)", codeword ^ np.array([1,1,0,0,0,0,0])),] for description, received in test_cases: received = received.astype(np.uint8) syn, pos, corrected = syndrome_decode(received, H_74) syn_str = ''.join(map(str, syn)) print(f"{description}:") print(f" Received: {received}") print(f" Syndrome: {syn_str}", end="") if pos == -1: print(" (no error)") elif pos is None: print(f" (uncorrectable)") else: print(f" → error at position {pos}") print(f" Corrected: {corrected}") print()If two or more errors occur in a single-error-correcting code, the syndrome will match some column of H (different from any actual error position). The decoder will "correct" the wrong bit, introducing a third error! This is why SEC-DED adds detection for double errors.
Different code families use different design principles to achieve desired distances. Understanding these principles helps select the right code for each application.
| Family | Distance Control | Encoding | Decoding | Best For |
|---|---|---|---|---|
| Hamming | Fixed (d=3,4) | Simple matrix | Simple syndrome | RAM ECC |
| BCH | Flexible (design t) | Polynomial | Berlekamp-Massey | Flash, disk |
| Reed-Solomon | MDS (maximum) | FFT-based | Algebraic | CDs, QR codes |
| LDPC | Depends on graph | Sparse matrix | Iterative belief prop. | 5G, WiFi 6 |
| Polar | Depends on frozen | Recursive | Successive cancel | 5G control |
| Turbo | High (emergent) | Parallel concat. | Iterative MAP | Deep space |
Random errors → BCH, LDPC excel. Burst errors → Reed-Solomon, interleaved codes. Very noisy channels → Turbo, LDPC, Polar (approach Shannon limit). Memory protection → Hamming, SEC-DED (simple, fast).
Theoretical optimality isn't everything. Real-world code selection involves many practical factors beyond just minimum distance.
Case Study: ECC DRAM Selection
For server RAM ECC:
Choice: SEC-DED extended Hamming
Not Reed-Solomon (too slow) or LDPC (overkill, complex).
The "best" code depends entirely on the application. A code optimal for deep-space probes (Turbo, high latency OK) is terrible for RAM (needs nanosecond decoding). Code selection is engineering, not just mathematics.
Code design translates distance requirements into concrete matrices and algorithms. Understanding the connection between H's structure and minimum distance is the key insight enabling systematic code construction.
Module Complete:
You've now mastered Hamming distance from concept to code design. You understand what Hamming distance measures, how it determines detection and correction capabilities, what minimum distance means for code power, and how to design codes with target distance properties.
This knowledge forms the foundation for understanding all error-control coding—from the simple parity bits in serial communication to the sophisticated LDPC codes in 5G networks and the Reed-Solomon codes on every Blu-ray disc.
Congratulations! You've completed the Hamming Distance module. You now have a deep understanding of how Hamming distance underlies all error detection and correction. This fundamental concept—counting bit differences—powers the reliable digital communication you use every day. Next, explore Hamming Codes to see these principles in a complete, practical implementation.