Computer NetworksError Detection & Correction

Hamming Distance: The Foundation of Error Control

LevelIntermediate

Duration55 mins

TopicError Detection & Correction

3 / 5

Error Correction Capability: Recovering from Errors

Beyond Detection: Automatic Recovery

Detecting an error is useful—it tells us something went wrong. But in many situations, knowing about the error isn't enough. We need to fix it.

Consider deep-space probes like Voyager: when data takes hours to reach Earth, asking for retransmission is impractical. Or DVD players: a scratch on the disc shouldn't ruin your movie. Or computer memory: cosmic rays cause bit flips, but your computer should keep running correctly.

These scenarios require error correction—the ability to automatically recover the original data without retransmission. This capability, like detection, flows from Hamming distance, but with a crucial difference: correction requires more distance between codewords than detection alone.

What You Will Learn

By the end of this page, you will understand how error correction works geometrically in Hamming space, the precise formula relating minimum distance to correction capability, why correction requires more distance than detection, the tradeoff between detection and correction, and the principle of nearest-neighbor decoding.

The Fundamental Correction Principle

Error correction works by a surprisingly simple principle: decode to the nearest valid codeword. When we receive a word that isn't a valid codeword, we find the valid codeword closest to it (in Hamming distance) and assume that's what was sent.

Why This Works:

If codewords are sufficiently spread out in Hamming space, and only a small number of errors occurred, the received word will be closer to the original codeword than to any other valid codeword. The errors "pushed" the codeword away from its original position, but not far enough to be closer to a different codeword.

The Correction Formula:

A code with minimum distance dmin can correct all error patterns affecting up to t bits, where:

$$t = \left\lfloor \frac{d_{min} - 1}{2} \right\rfloor$$

Equivalently, correction requires:

$$d_{min} \geq 2t + 1$$

The Floor Function

The floor function ⌊x⌋ means "round down to the nearest integer." So ⌊2.5⌋ = 2, and ⌊3⌋ = 3. This means dmin = 3 gives t = ⌊1⌋ = 1, dmin = 4 gives t = ⌊1.5⌋ = 1, and dmin = 5 gives t = ⌊2⌋ = 2.

Minimum Distance and Correction Capability
dmin	Error Detection (dmin - 1)	Error Correction ⌊(dmin-1)/2⌋	Note
1	0 bits	0 bits	No redundancy
2	1 bit	0 bits	Detection only (parity)
3	2 bits	1 bit	SEC - Single Error Correction
4	3 bits	1 bit	SEC-DED capability possible
5	4 bits	2 bits	DEC - Double Error Correction
6	5 bits	2 bits	DEC-TED capability possible
7	6 bits	3 bits	TEC - Triple Error Correction

Why Correction Requires Double the Distance

Notice that correcting t errors requires dmin ≥ 2t + 1, while detecting t errors only requires dmin ≥ t + 1. Why does correction need roughly twice the distance?

The Geometric Answer:

Imagine two codewords A and B with distance d between them. Each codeword has a "sphere" around it—the set of all words within a certain distance.

For detection: spheres of radius t can touch but not overlap with any codeword. A t-error lands in the sphere but not on another codeword.
For correction: spheres of radius t around each codeword must not overlap at all. If they overlapped, a word in the overlapping region would be equidistant from multiple codewords—which one was sent?

Converting Mermaid diagram...

Mathematical Derivation:

Consider two codewords c₁ and c₂ with d(c₁, c₂) = dmin.

For t-error correction to work:

Any word within distance t of c₁ must be decoded as c₁
Any word within distance t of c₂ must be decoded as c₂
These regions must not overlap

The "spheres" of radius t around c₁ and c₂ must be disjoint. Since any word in between is at most t from one and at least (dmin - t) from the other:

$$t + t < d_{min}$$ $$2t < d_{min}$$ $$d_{min} \geq 2t + 1$$

Example:

With dmin = 5:

Correction spheres of radius 2 around each codeword are safe
A word with 2 errors is within distance 2 of the original codeword
It's at least distance 3 from any other codeword (since dmin = 5, distance to other codewords is at least 5 - 2 = 3)
The nearest codeword is unambiguously the original → Correction succeeds

The Correction Limit

If more than t errors occur, the received word might be closer to a different codeword than the original. The decoder will "correct" to the wrong codeword—introducing more errors rather than fixing them! This is called miscorrection.

Nearest Neighbor Decoding

The principle of correcting to the nearest codeword is formalized as nearest neighbor decoding (NND), also called minimum distance decoding.

Algorithm:

Receive word r
Compute distance from r to each valid codeword
Decode to the codeword with minimum distance
If there's a tie, either report failure or choose arbitrarily

Optimality:

Nearest neighbor decoding is optimal in the sense that it minimizes the probability of decoding error when all error patterns of a given weight are equally likely. This follows from a maximum likelihood argument: if errors are independent with probability p < 0.5 per bit, fewer errors are more likely than more errors.

nearest_neighbor_decode.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
from typing import List, Optional, Tuple
 
def hamming_distance(a: str, b: str) -> int:
    """Calculate Hamming distance between two binary strings."""
    return sum(c1 != c2 for c1, c2 in zip(a, b))
 
def nearest_neighbor_decode(
    received: str, 
    codewords: List[str]
) -> Tuple[Optional[str], int]:
    """
    Decode using nearest neighbor (minimum distance) decoding.
    
    Returns:
        Tuple of (decoded codeword, distance to that codeword)
        Returns (None, -1) if tie exists (ambiguous)
    """
    min_distance = float('inf')
    nearest = None
    tie_exists = False
    
    for codeword in codewords:
        d = hamming_distance(received, codeword)
        if d < min_distance:
            min_distance = d
            nearest = codeword
            tie_exists = False
        elif d == min_distance:
            tie_exists = True
    
    if tie_exists:
        return (None, -1)  # Ambiguous - report decoding failure
    
    return (nearest, min_distance)
 
# Example: Repetition code (codewords: 000, 111)
repetition_code = ["000", "111"]
 
test_cases = ["000", "001", "010", "011", "100", "101", "110", "111"]
 
print("Repetition (3,1) code decoding:")
print("-" * 40)
for received in test_cases:
    decoded, dist = nearest_neighbor_decode(received, repetition_code)
    errors_corrected = dist
    print(f"{received} → {decoded} (corrected {errors_corrected} error(s))")
 
# Output:
# 000 → 000 (corrected 0 error(s))
# 001 → 000 (corrected 1 error(s))
# 010 → 000 (corrected 1 error(s))
# 011 → 111 (corrected 1 error(s))  ← closest to 111
# 100 → 000 (corrected 1 error(s))
# 101 → 111 (corrected 1 error(s))
# 110 → 111 (corrected 1 error(s))
# 111 → 111 (corrected 0 error(s))

Observation from the Example:

The (3,1) repetition code has dmin = 3, so t = ⌊(3-1)/2⌋ = 1.

Words with 0-1 errors: correctly decoded
Words with 2 errors (e.g., if 000 becomes 011): miscorrected to 111

This demonstrates both the power and the limit of correction.

Standard Array Decoding

For larger codes, nearest neighbor decoding by brute-force distance computation is expensive. Practical decoders use clever algorithms like syndrome decoding (for linear codes) or algebraic decoding (for BCH/RS codes) that achieve the same result much faster.

The Detection-Correction Tradeoff

A code's minimum distance is a fixed resource. How we "spend" this resource—on detection, correction, or a combination—involves tradeoffs.

The Fundamental Relationship:

For a code with minimum distance dmin, we can:

Detect up to dmin - 1 errors (pure detection mode), OR
Correct up to t = ⌊(dmin-1)/2⌋ errors (pure correction mode), OR
Use a hybrid: correct up to t errors AND detect up to d additional errors where 2t + d + 1 ≤ dmin

Tradeoff Options for dmin = 5
Strategy	Corrections	Detections (additional)	Total Errors Handled
Pure detection	0	4	4 detected
Pure correction	2	0	2 corrected
Balanced (SEC-DED style)	1	2	1 corrected, 2 more detected
Correct 1, detect 3	1	2	Common in practice

SEC-DED: A Practical Example

The Single Error Correction, Double Error Detection (SEC-DED) strategy is widely used in computer memory. It requires dmin = 4:

Correct any single-bit error (t = 1, needs dmin ≥ 3)
Detect (but not correct) any double-bit error (needs 1 more bit of distance)

Why SEC-DED Matters:

Single-bit errors (most common) are automatically fixed
Double-bit errors are flagged rather than miscorrected
Without the extra detection bit, a double error might be "corrected" to the wrong codeword—worse than detecting the error!

This is why computer ECC memory uses Hamming codes augmented with an additional parity bit.

The Wisdom of SEC-DED

Miscorrection ("correcting" to the wrong codeword) is often worse than detection (knowing something's wrong). SEC-DED spends the extra distance to ensure double errors are caught rather than silently corrupted. This is why ECC memory is so reliable.

Correction Spheres: A Geometric View

The sphere packing interpretation of error correction provides powerful intuition and connects to deep mathematical results.

Hamming Bound (Sphere-Packing Bound):

In n-dimensional Hamming space with M codewords, each codeword has a correction sphere of radius t around it. These spheres must fit without overlapping. This constrains how many codewords can exist:

$$M \times V(n, t) \leq 2^n$$

Where V(n, t) is the volume of a Hamming sphere—the number of words within distance t:

$$V(n, t) = \sum_{i=0}^{t} \binom{n}{i}$$

This counts all words with 0, 1, 2, ..., t bits different from the center.

sphere_packing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
from math import comb, log2
 
def hamming_sphere_volume(n: int, t: int) -> int:
    """
    Calculate the volume of a Hamming sphere of radius t in n dimensions.
    This is the number of words within distance t of any center word.
    """
    return sum(comb(n, i) for i in range(t + 1))
 
def hamming_bound(n: int, t: int) -> int:
    """
    Calculate the maximum number of codewords for an (n, M, 2t+1) code.
    This is the sphere-packing (Hamming) bound.
    """
    total_space = 2 ** n
    sphere_volume = hamming_sphere_volume(n, t)
    return total_space // sphere_volume
 
# Examples for various codes
print("Hamming Bound Analysis")
print("=" * 60)
 
examples = [
    (7, 1),   # Hamming (7,4) code
    (15, 1),  # Hamming (15,11) code
    (7, 2),   # Stronger correction
    (23, 3),  # Golay code
]
 
for n, t in examples:
    sphere_vol = hamming_sphere_volume(n, t)
    max_M = hamming_bound(n, t)
    k = int(log2(max_M)) if max_M > 0 else 0
    dmin = 2 * t + 1
    
    print(f"n={n}, t={t} (dmin ≥ {dmin}):")
    print(f"  Sphere volume: V({n},{t}) = {sphere_vol}")
    print(f"  Max codewords M ≤ {max_M} (k ≤ {k} data bits)")
    print()
 
# Output:
# n=7, t=1 (dmin ≥ 3):
#   Sphere volume: V(7,1) = 8
#   Max codewords M ≤ 16 (k ≤ 4 data bits)  → Hamming (7,4) is perfect!
#
# n=15, t=1 (dmin ≥ 3):
#   Sphere volume: V(15,1) = 16
#   Max codewords M ≤ 2048 (k ≤ 11 data bits)  → Hamming (15,11) is perfect!

Perfect Codes:

When the Hamming bound holds with equality—when the spheres exactly fill the space with no gaps—the code is called perfect. Perfect codes are rare and special:

Hamming codes: (7,4,3), (15,11,3), (31,26,3), ... are perfect
Golay codes: The binary Golay (23,12,7) code is perfect
Repetition codes: The (2t+1, 1, 2t+1) codes are perfect
Trivial codes: The (n, 2ⁿ, 1) "no redundancy" code is trivially perfect

For most parameters, perfect codes don't exist—the spheres can't exactly tile the space.

The Perfection of Hamming Codes

Hamming codes are perfect: every n-bit pattern is either a codeword or within distance 1 of exactly one codeword. This means every single-bit error maps uniquely to the correction needed—no wasted redundancy, no ambiguity. This elegant efficiency is why Hamming codes remain important decades after their invention.

Efficient Correction: Syndrome Decoding

Computing distances to all codewords is expensive for large codes. Syndrome decoding provides an efficient alternative for linear codes.

The Syndrome:

For a linear code defined by a parity-check matrix H, the syndrome of a received word r is:

$$s = H \cdot r^T$$

The syndrome has a remarkable property:

If r is a valid codeword: s = 0 (the zero vector)
If r has errors: s depends only on the error pattern, not the original codeword

This means we can precompute a lookup table: syndrome → most likely error pattern.

syndrome_decode.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import numpy as np
from typing import List, Tuple
 
def syndrome_decode_hamming_7_4(received: str) -> Tuple[str, int]:
    """
    Decode a Hamming (7,4) code using syndrome decoding.
    
    Parity-check matrix H for Hamming (7,4):
    Columns are binary representations of 1-7
    
    Returns: (corrected codeword, position of error or -1 if none)
    """
    # Parity-check matrix for Hamming (7,4)
    # Each column is the binary representation of its position (1-7)
    H = np.array([
        [1, 0, 1, 0, 1, 0, 1],  # bit 0 of position
        [0, 1, 1, 0, 0, 1, 1],  # bit 1 of position
        [0, 0, 0, 1, 1, 1, 1],  # bit 2 of position
    ], dtype=np.uint8)
    
    # Convert received string to vector
    r = np.array([int(b) for b in received], dtype=np.uint8)
    
    # Calculate syndrome (mod 2 arithmetic)
    syndrome = (H @ r) % 2
    
    # Convert syndrome to error position (binary to decimal)
    error_position = syndrome[0] + 2*syndrome[1] + 4*syndrome[2]
    
    # Correct the error (if any)
    corrected = r.copy()
    if error_position > 0:  # Position 0 means no error
        corrected[error_position - 1] ^= 1  # Flip the erroneous bit
    
    corrected_str = ''.join(map(str, corrected))
    return (corrected_str, error_position - 1 if error_position > 0 else -1)
 
# Test cases
print("Syndrome Decoding for Hamming (7,4)")
print("=" * 50)
 
# Valid codeword (0000 encodes to 0000000)
test_cases = [
    ("0000000", "No error"),
    ("1000000", "Error in position 0"),
    ("0100000", "Error in position 1"),
    ("0010000", "Error in position 2"),
    ("1011000", "Valid codeword 1011000"),
    ("1111000", "1011000 with error in position 0"),
]
 
for received, description in test_cases:
    corrected, error_pos = syndrome_decode_hamming_7_4(received)
    if error_pos == -1:
        result = f"No error detected"
    else:
        result = f"Error at position {error_pos}, corrected to {corrected}"
    print(f"{received}: {result}")
    print(f"  ({description})")
    print()

Why Syndrome Decoding is Efficient:

Syndrome is small: For a (n, k) code, the syndrome has only (n - k) bits
Lookup table is small: Only 2^(n-k) possible syndromes, not 2^n patterns
Same table for all codewords: The syndrome depends only on errors, not the original message

For Hamming (7,4): only 8 syndromes (3 check bits), each mapping to a unique single-bit error (or no error). The table fits in 8 bytes!

Syndrome = Error Fingerprint

The syndrome is like a fingerprint of the error pattern. Multiple codewords with the same error pattern produce the same syndrome. This lets us correct without knowing which codeword was sent—we just fix the error pattern the syndrome identifies.

Error Correction in Practice

Error correction is everywhere, silently fixing errors before they cause problems. Here's how the principles translate to real systems:

Error Correction Applications

•ECC Memory (DRAM): Uses SECDED codes. A 72-bit word stores 64 data bits + 8 check bits. Corrects any 1-bit error, detects 2-bit errors. dmin = 4.
•QR Codes: Use Reed-Solomon codes. Can correct up to 30% erasures at the highest error correction level. This is why damaged QR codes often still scan.
•Digital Television (DVB): Concatenated codes: an outer Reed-Solomon code corrects burst errors, an inner convolutional code corrects random errors.
•Hard Drives & SSDs: LDPC codes (Low-Density Parity-Check) correct multiple bit errors per sector. Modern SSDs need strong ECC as flash cells wear out.
•Deep Space Communication: Voyager 1 & 2 use concatenated Reed-Solomon + convolutional codes. The Mars rovers use Turbo codes. These achieve near-Shannon-limit performance.
•5G Networks: LDPC and Polar codes. Polar codes (invented 2009) are the first codes proven to achieve channel capacity—a theoretical breakthrough now in practical use.

Code Comparison for Different Applications
Application	Code Type	Typical dmin	Correction Capability
RAM (ECC)	Hamming + parity	4	1-bit correction, 2-bit detection
CDs	CIRC (Reed-Solomon)	5	~4000 consecutive lost bits
DVDs	Reed-Solomon	Variable	~6000 consecutive lost bits
Blu-ray	LDPC + RS	High	Very high correction capability
Deep Space	Turbo/LDPC	Very high	Extremely low error rates (10⁻⁶)

The Hidden Infrastructure

You rarely notice error correction because it works. The scratched DVD plays fine. The cosmic ray that hit your RAM didn't crash your computer. The WiFi packet got through despite interference. Error correction is the invisible infrastructure of the digital world.

Summary: The Correction Formula

Error correction transforms Hamming distance from a measure of difference into a guarantee of recovery. The formula t = ⌊(dmin-1)/2⌋ captures the relationship: more distance means more correctable errors.

Key Takeaways

•The correction formula: A code with dmin can correct t = ⌊(dmin-1)/2⌋ bit errors.
•Nearest neighbor decoding: Correct to the closest codeword. Optimal for independent errors.
•Why correction needs more distance: Correction spheres must not overlap; detection spheres merely can't touch codewords.
•Detection-correction tradeoff: A fixed dmin can be spent on detection, correction, or a combination (e.g., SEC-DED).
•Sphere packing: Correction spheres fill Hamming space. Perfect codes (like Hamming) fill it exactly.
•Syndrome decoding: Efficient correction using the syndrome as an "error fingerprint" rather than distance computation.
•Ubiquitous application: Error correction runs silently in RAM, storage, communications, and space probes.

What's Next:

We've seen how minimum distance determines both detection (dmin - 1 errors) and correction (⌊(dmin-1)/2⌋ errors) capabilities. The next page formalizes this notion of minimum distance as a design parameter, exploring how codes are characterized and compared using the notation (n, k, dmin).

Page Complete

You now understand how minimum distance enables error correction. The formula t = ⌊(dmin-1)/2⌋ tells exactly how many errors can be fixed. Combined with detection, the full power of a code's minimum distance can be strategically deployed for the application's needs. Next, we deepen our understanding of minimum distance as a code design parameter.

3 / 5

Loading learning content...

Computer NetworksError Detection & Correction

Hamming Distance: The Foundation of Error Control

LevelIntermediate

Duration55 mins

TopicError Detection & Correction

3 / 5

Error Correction Capability: Recovering from Errors

Beyond Detection: Automatic Recovery

Detecting an error is useful—it tells us something went wrong. But in many situations, knowing about the error isn't enough. We need to fix it.

What You Will Learn

The Fundamental Correction Principle

Why This Works:

The Correction Formula:

A code with minimum distance dmin can correct all error patterns affecting up to t bits, where:

$$t = \left\lfloor \frac{d_{min} - 1}{2} \right\rfloor$$

Equivalently, correction requires:

$$d_{min} \geq 2t + 1$$

The Floor Function

Minimum Distance and Correction Capability
dmin	Error Detection (dmin - 1)	Error Correction ⌊(dmin-1)/2⌋	Note
1	0 bits	0 bits	No redundancy
2	1 bit	0 bits	Detection only (parity)
3	2 bits	1 bit	SEC - Single Error Correction
4	3 bits	1 bit	SEC-DED capability possible
5	4 bits	2 bits	DEC - Double Error Correction
6	5 bits	2 bits	DEC-TED capability possible
7	6 bits	3 bits	TEC - Triple Error Correction

Why Correction Requires Double the Distance

Notice that correcting t errors requires dmin ≥ 2t + 1, while detecting t errors only requires dmin ≥ t + 1. Why does correction need roughly twice the distance?

The Geometric Answer:

Imagine two codewords A and B with distance d between them. Each codeword has a "sphere" around it—the set of all words within a certain distance.

For detection: spheres of radius t can touch but not overlap with any codeword. A t-error lands in the sphere but not on another codeword.
For correction: spheres of radius t around each codeword must not overlap at all. If they overlapped, a word in the overlapping region would be equidistant from multiple codewords—which one was sent?

Converting Mermaid diagram...

Mathematical Derivation:

Consider two codewords c₁ and c₂ with d(c₁, c₂) = dmin.

For t-error correction to work:

Any word within distance t of c₁ must be decoded as c₁
Any word within distance t of c₂ must be decoded as c₂
These regions must not overlap

The "spheres" of radius t around c₁ and c₂ must be disjoint. Since any word in between is at most t from one and at least (dmin - t) from the other:

$$t + t < d_{min}$$ $$2t < d_{min}$$ $$d_{min} \geq 2t + 1$$

Example:

With dmin = 5:

Correction spheres of radius 2 around each codeword are safe
A word with 2 errors is within distance 2 of the original codeword
It's at least distance 3 from any other codeword (since dmin = 5, distance to other codewords is at least 5 - 2 = 3)
The nearest codeword is unambiguously the original → Correction succeeds

The Correction Limit

Nearest Neighbor Decoding

The principle of correcting to the nearest codeword is formalized as nearest neighbor decoding (NND), also called minimum distance decoding.

Algorithm:

Receive word r
Compute distance from r to each valid codeword
Decode to the codeword with minimum distance
If there's a tie, either report failure or choose arbitrarily

Optimality:

nearest_neighbor_decode.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
from typing import List, Optional, Tuple
 
def hamming_distance(a: str, b: str) -> int:
    """Calculate Hamming distance between two binary strings."""
    return sum(c1 != c2 for c1, c2 in zip(a, b))
 
def nearest_neighbor_decode(
    received: str, 
    codewords: List[str]
) -> Tuple[Optional[str], int]:
    """
    Decode using nearest neighbor (minimum distance) decoding.
    
    Returns:
        Tuple of (decoded codeword, distance to that codeword)
        Returns (None, -1) if tie exists (ambiguous)
    """
    min_distance = float('inf')
    nearest = None
    tie_exists = False
    
    for codeword in codewords:
        d = hamming_distance(received, codeword)
        if d < min_distance:
            min_distance = d
            nearest = codeword
            tie_exists = False
        elif d == min_distance:
            tie_exists = True
    
    if tie_exists:
        return (None, -1)  # Ambiguous - report decoding failure
    
    return (nearest, min_distance)
 
# Example: Repetition code (codewords: 000, 111)
repetition_code = ["000", "111"]
 
test_cases = ["000", "001", "010", "011", "100", "101", "110", "111"]
 
print("Repetition (3,1) code decoding:")
print("-" * 40)
for received in test_cases:
    decoded, dist = nearest_neighbor_decode(received, repetition_code)
    errors_corrected = dist
    print(f"{received} → {decoded} (corrected {errors_corrected} error(s))")
 
# Output:
# 000 → 000 (corrected 0 error(s))
# 001 → 000 (corrected 1 error(s))
# 010 → 000 (corrected 1 error(s))
# 011 → 111 (corrected 1 error(s))  ← closest to 111
# 100 → 000 (corrected 1 error(s))
# 101 → 111 (corrected 1 error(s))
# 110 → 111 (corrected 1 error(s))
# 111 → 111 (corrected 0 error(s))

Observation from the Example:

The (3,1) repetition code has dmin = 3, so t = ⌊(3-1)/2⌋ = 1.

Words with 0-1 errors: correctly decoded
Words with 2 errors (e.g., if 000 becomes 011): miscorrected to 111

This demonstrates both the power and the limit of correction.

Standard Array Decoding

The Detection-Correction Tradeoff

A code's minimum distance is a fixed resource. How we "spend" this resource—on detection, correction, or a combination—involves tradeoffs.

The Fundamental Relationship:

For a code with minimum distance dmin, we can:

Detect up to dmin - 1 errors (pure detection mode), OR
Correct up to t = ⌊(dmin-1)/2⌋ errors (pure correction mode), OR
Use a hybrid: correct up to t errors AND detect up to d additional errors where 2t + d + 1 ≤ dmin

Tradeoff Options for dmin = 5
Strategy	Corrections	Detections (additional)	Total Errors Handled
Pure detection	0	4	4 detected
Pure correction	2	0	2 corrected
Balanced (SEC-DED style)	1	2	1 corrected, 2 more detected
Correct 1, detect 3	1	2	Common in practice

SEC-DED: A Practical Example

The Single Error Correction, Double Error Detection (SEC-DED) strategy is widely used in computer memory. It requires dmin = 4:

Correct any single-bit error (t = 1, needs dmin ≥ 3)
Detect (but not correct) any double-bit error (needs 1 more bit of distance)

Why SEC-DED Matters:

Single-bit errors (most common) are automatically fixed
Double-bit errors are flagged rather than miscorrected
Without the extra detection bit, a double error might be "corrected" to the wrong codeword—worse than detecting the error!

This is why computer ECC memory uses Hamming codes augmented with an additional parity bit.

The Wisdom of SEC-DED

Correction Spheres: A Geometric View

The sphere packing interpretation of error correction provides powerful intuition and connects to deep mathematical results.

Hamming Bound (Sphere-Packing Bound):

In n-dimensional Hamming space with M codewords, each codeword has a correction sphere of radius t around it. These spheres must fit without overlapping. This constrains how many codewords can exist:

$$M \times V(n, t) \leq 2^n$$

Where V(n, t) is the volume of a Hamming sphere—the number of words within distance t:

$$V(n, t) = \sum_{i=0}^{t} \binom{n}{i}$$

This counts all words with 0, 1, 2, ..., t bits different from the center.

sphere_packing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
from math import comb, log2
 
def hamming_sphere_volume(n: int, t: int) -> int:
    """
    Calculate the volume of a Hamming sphere of radius t in n dimensions.
    This is the number of words within distance t of any center word.
    """
    return sum(comb(n, i) for i in range(t + 1))
 
def hamming_bound(n: int, t: int) -> int:
    """
    Calculate the maximum number of codewords for an (n, M, 2t+1) code.
    This is the sphere-packing (Hamming) bound.
    """
    total_space = 2 ** n
    sphere_volume = hamming_sphere_volume(n, t)
    return total_space // sphere_volume
 
# Examples for various codes
print("Hamming Bound Analysis")
print("=" * 60)
 
examples = [
    (7, 1),   # Hamming (7,4) code
    (15, 1),  # Hamming (15,11) code
    (7, 2),   # Stronger correction
    (23, 3),  # Golay code
]
 
for n, t in examples:
    sphere_vol = hamming_sphere_volume(n, t)
    max_M = hamming_bound(n, t)
    k = int(log2(max_M)) if max_M > 0 else 0
    dmin = 2 * t + 1
    
    print(f"n={n}, t={t} (dmin ≥ {dmin}):")
    print(f"  Sphere volume: V({n},{t}) = {sphere_vol}")
    print(f"  Max codewords M ≤ {max_M} (k ≤ {k} data bits)")
    print()
 
# Output:
# n=7, t=1 (dmin ≥ 3):
#   Sphere volume: V(7,1) = 8
#   Max codewords M ≤ 16 (k ≤ 4 data bits)  → Hamming (7,4) is perfect!
#
# n=15, t=1 (dmin ≥ 3):
#   Sphere volume: V(15,1) = 16
#   Max codewords M ≤ 2048 (k ≤ 11 data bits)  → Hamming (15,11) is perfect!

Perfect Codes:

When the Hamming bound holds with equality—when the spheres exactly fill the space with no gaps—the code is called perfect. Perfect codes are rare and special:

Hamming codes: (7,4,3), (15,11,3), (31,26,3), ... are perfect
Golay codes: The binary Golay (23,12,7) code is perfect
Repetition codes: The (2t+1, 1, 2t+1) codes are perfect
Trivial codes: The (n, 2ⁿ, 1) "no redundancy" code is trivially perfect

For most parameters, perfect codes don't exist—the spheres can't exactly tile the space.

The Perfection of Hamming Codes

Efficient Correction: Syndrome Decoding

Computing distances to all codewords is expensive for large codes. Syndrome decoding provides an efficient alternative for linear codes.

The Syndrome:

For a linear code defined by a parity-check matrix H, the syndrome of a received word r is:

$$s = H \cdot r^T$$

The syndrome has a remarkable property:

If r is a valid codeword: s = 0 (the zero vector)
If r has errors: s depends only on the error pattern, not the original codeword

This means we can precompute a lookup table: syndrome → most likely error pattern.

syndrome_decode.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import numpy as np
from typing import List, Tuple
 
def syndrome_decode_hamming_7_4(received: str) -> Tuple[str, int]:
    """
    Decode a Hamming (7,4) code using syndrome decoding.
    
    Parity-check matrix H for Hamming (7,4):
    Columns are binary representations of 1-7
    
    Returns: (corrected codeword, position of error or -1 if none)
    """
    # Parity-check matrix for Hamming (7,4)
    # Each column is the binary representation of its position (1-7)
    H = np.array([
        [1, 0, 1, 0, 1, 0, 1],  # bit 0 of position
        [0, 1, 1, 0, 0, 1, 1],  # bit 1 of position
        [0, 0, 0, 1, 1, 1, 1],  # bit 2 of position
    ], dtype=np.uint8)
    
    # Convert received string to vector
    r = np.array([int(b) for b in received], dtype=np.uint8)
    
    # Calculate syndrome (mod 2 arithmetic)
    syndrome = (H @ r) % 2
    
    # Convert syndrome to error position (binary to decimal)
    error_position = syndrome[0] + 2*syndrome[1] + 4*syndrome[2]
    
    # Correct the error (if any)
    corrected = r.copy()
    if error_position > 0:  # Position 0 means no error
        corrected[error_position - 1] ^= 1  # Flip the erroneous bit
    
    corrected_str = ''.join(map(str, corrected))
    return (corrected_str, error_position - 1 if error_position > 0 else -1)
 
# Test cases
print("Syndrome Decoding for Hamming (7,4)")
print("=" * 50)
 
# Valid codeword (0000 encodes to 0000000)
test_cases = [
    ("0000000", "No error"),
    ("1000000", "Error in position 0"),
    ("0100000", "Error in position 1"),
    ("0010000", "Error in position 2"),
    ("1011000", "Valid codeword 1011000"),
    ("1111000", "1011000 with error in position 0"),
]
 
for received, description in test_cases:
    corrected, error_pos = syndrome_decode_hamming_7_4(received)
    if error_pos == -1:
        result = f"No error detected"
    else:
        result = f"Error at position {error_pos}, corrected to {corrected}"
    print(f"{received}: {result}")
    print(f"  ({description})")
    print()

Why Syndrome Decoding is Efficient:

Syndrome is small: For a (n, k) code, the syndrome has only (n - k) bits
Lookup table is small: Only 2^(n-k) possible syndromes, not 2^n patterns
Same table for all codewords: The syndrome depends only on errors, not the original message

For Hamming (7,4): only 8 syndromes (3 check bits), each mapping to a unique single-bit error (or no error). The table fits in 8 bytes!

Syndrome = Error Fingerprint

Error Correction in Practice

Error correction is everywhere, silently fixing errors before they cause problems. Here's how the principles translate to real systems:

Error Correction Applications

•ECC Memory (DRAM): Uses SECDED codes. A 72-bit word stores 64 data bits + 8 check bits. Corrects any 1-bit error, detects 2-bit errors. dmin = 4.
•QR Codes: Use Reed-Solomon codes. Can correct up to 30% erasures at the highest error correction level. This is why damaged QR codes often still scan.
•Digital Television (DVB): Concatenated codes: an outer Reed-Solomon code corrects burst errors, an inner convolutional code corrects random errors.
•Hard Drives & SSDs: LDPC codes (Low-Density Parity-Check) correct multiple bit errors per sector. Modern SSDs need strong ECC as flash cells wear out.
•Deep Space Communication: Voyager 1 & 2 use concatenated Reed-Solomon + convolutional codes. The Mars rovers use Turbo codes. These achieve near-Shannon-limit performance.
•5G Networks: LDPC and Polar codes. Polar codes (invented 2009) are the first codes proven to achieve channel capacity—a theoretical breakthrough now in practical use.

Code Comparison for Different Applications
Application	Code Type	Typical dmin	Correction Capability
RAM (ECC)	Hamming + parity	4	1-bit correction, 2-bit detection
CDs	CIRC (Reed-Solomon)	5	~4000 consecutive lost bits
DVDs	Reed-Solomon	Variable	~6000 consecutive lost bits
Blu-ray	LDPC + RS	High	Very high correction capability
Deep Space	Turbo/LDPC	Very high	Extremely low error rates (10⁻⁶)

The Hidden Infrastructure

Summary: The Correction Formula

Key Takeaways

•The correction formula: A code with dmin can correct t = ⌊(dmin-1)/2⌋ bit errors.
•Nearest neighbor decoding: Correct to the closest codeword. Optimal for independent errors.
•Why correction needs more distance: Correction spheres must not overlap; detection spheres merely can't touch codewords.
•Detection-correction tradeoff: A fixed dmin can be spent on detection, correction, or a combination (e.g., SEC-DED).
•Sphere packing: Correction spheres fill Hamming space. Perfect codes (like Hamming) fill it exactly.
•Syndrome decoding: Efficient correction using the syndrome as an "error fingerprint" rather than distance computation.
•Ubiquitous application: Error correction runs silently in RAM, storage, communications, and space probes.

What's Next:

Page Complete

3 / 5