Hamming Distance - Learning Module

Loading content...

0/228

Error Detection Capability: How Many Errors Can We Catch?

From Distance to Detection

In the previous page, we established Hamming distance as the fundamental measure of difference between binary strings. We saw that the minimum distance of a code—the smallest distance between any two valid codewords—is the critical parameter. But exactly how does this minimum distance translate into error detection capability?

The answer is elegant: a code with minimum distance dmin can detect all error patterns affecting up to dmin - 1 bits. This simple formula encapsulates the entire theory of error detection, and understanding why it works provides deep insight into the nature of error control.

What You Will Learn

By the end of this page, you will understand the precise relationship between minimum distance and error detection capability, why the formula is dmin - 1 (not dmin), how to analyze detection guarantees for any code, and the important distinction between guaranteed detection and probabilistic detection.

The Fundamental Detection Principle

Error detection works by a simple principle: if the received word is not a valid codeword, errors must have occurred. The receiver checks whether the received data matches one of the allowed codewords. If not, something went wrong during transmission.

The Key Question:

Given a code with minimum distance dmin, what error patterns will always produce an invalid codeword (guaranteeing detection)?

The Answer:

Any error pattern affecting fewer than dmin bits will be detected. Mathematically:

$$\text{Guaranteed detection of up to } (d_{min} - 1) \text{ bit errors}$$

Why This Works:

Consider a valid codeword c that gets corrupted into received word r by t bit errors. The Hamming distance d(c, r) = t (since exactly t bits flipped).

For r to be mistaken as valid, it must equal some valid codeword. But all valid codewords are at least dmin apart from each other. If t < dmin, then r is within distance t < dmin of c, which means r cannot be a valid codeword—it's too close to c to be anything else.

Converting Mermaid diagram...

The Sphere Interpretation

Think of each codeword as surrounded by a "sphere" of radius (dmin - 1). Everything inside this sphere is detectable as erroneous because it's close to the codeword but not the codeword itself. The minimum distance ensures these spheres don't touch other valid codewords.

Understanding the "-1": The Critical Edge Case

Students often wonder: if codewords are dmin apart, why can we only detect dmin - 1 errors, not dmin errors?

The answer lies in what happens at exactly dmin errors.

The Boundary Case:

When exactly dmin bit errors occur, the received word might be transformed into another valid codeword. This is precisely what minimum distance means: there exists at least one pair of codewords with exactly dmin bits different between them.

Concrete Example:

Consider a code with dmin = 3 containing codewords 000 and 111.

Errors	Received	Status
0	000	Valid (correct)
1	001, 010, or 100	Invalid → Detected
2	011, 101, or 110	Invalid → Detected
3	111	Valid → UNDETECTED!

With exactly 3 errors (= dmin), the codeword 000 can transform into 111—another valid codeword. The receiver sees a valid codeword and has no way to know an error occurred.

The Danger Zone

At dmin errors, detection becomes unreliable. Some dmin-error patterns are detectable (if they don't land on another codeword), but at least one pattern is not. For guaranteed detection, we must stay below this threshold: at most dmin - 1 errors.

Formal Statement:

For a code with minimum distance dmin, any error pattern of weight t (where t < dmin) is guaranteed detectable. Error patterns of weight t ≥ dmin may or may not be detected—there is no guarantee.

The Contrapositive View:

Another way to understand this: for an error to go undetected, it must transform one valid codeword into another. The minimum number of bit flips to do this is dmin (by definition). Therefore, anything less than dmin flips cannot accomplish this transformation—the error is exposed.

Detection Regions in Hamming Space

Visualizing codewords and their "detection regions" in Hamming space makes the detection principle intuitive. Each codeword is surrounded by a region of non-codewords that, if received, signal an error.

For dmin = 3:

Detection Regions for dmin = 3 Code
Distance from Nearest Codeword	What It Represents	Detection Status
0	The codeword itself	Accepted as correct
1	1-bit error from some codeword	Detected (not a codeword)
2	2-bit errors from some codeword	Detected (not a codeword)
3+	3+ bit errors from nearest codeword	May be another codeword → May be undetected

The Geography of Hamming Space:

Imagine Hamming space as a vast landscape where valid codewords are islands. The "sea" between islands consists of all non-codewords. Error detection works because:

Islands are separated by at least dmin units of sea
Any received word in the sea indicates an error
Errors of < dmin bits always land in the sea
Errors of ≥ dmin bits might reach another island

The goal of code design is to place islands (codewords) strategically so that common error patterns always land in the sea.

Beyond Guaranteed Detection

While dmin - 1 is the guaranteed detection capacity, many codes can detect some errors beyond this limit. The guarantee applies to ALL error patterns of that weight; larger error patterns have probabilistic detection based on code structure.

Applying the Detection Formula

Let's apply the error detection formula e = dmin - 1 to various coding schemes we've encountered:

Example 1: Simple Parity Bit

With a single parity bit, valid codewords have even parity. Consider:

000 (data) → 0000 (with even parity)
001 (data) → 0011 (with even parity)
010 (data) → 0101 (with even parity)
...

Any two codewords differ in at least 2 positions (you can't change just the data without changing parity). Therefore dmin = 2.

Detection capability: dmin - 1 = 2 - 1 = 1 bit error

This confirms what we know: single parity detects single-bit errors but not double-bit errors.

Common Codes and Their Detection Capabilities
Code	dmin	Error Detection Capability	Notes
Single parity bit	2	1-bit errors	Simplest error detection
Two-dimensional parity	4	3-bit errors	Can also correct 1-bit errors
Hamming (7,4)	3	2-bit errors	Also corrects 1-bit errors
Repetition code (3x)	3	2-bit errors	High redundancy, low rate
Repetition code (5x)	5	4-bit errors	Very high redundancy
CRC-32	4+	3+ bit errors	Plus burst error detection

Example 2: Repetition Code

The simplest way to increase minimum distance is repetition. If we send each bit multiple times:

3-repetition: 0 → 000, 1 → 111. Distance between codewords = 3
5-repetition: 0 → 00000, 1 → 11111. Distance = 5
n-repetition: Distance = n

For 5-repetition: Detection capability = 5 - 1 = 4 bit errors

However, repetition codes are inefficient—the rate (useful bits per transmitted bit) is only 1/n.

The Rate-Distance Tradeoff

Increasing minimum distance always costs something: either more redundancy (lower rate) or more complex encoding/decoding. Good codes achieve high distance with reasonable overhead—this is the central challenge of coding theory.

Analyzing Detectable vs. Undetectable Errors

Understanding exactly which error patterns are detectable requires careful analysis. Let's work through a complete example.

Example Code:

Consider a (6, 3) code with 3 data bits encoded as 6-bit codewords:

Data	Codeword
000	000000
001	001110
010	010101
011	011011
100	100110
101	101001
110	110011
111	111100

First, let's find the minimum distance by examining pairwise distances:

minimum_distance_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def hamming_distance(a: str, b: str) -> int:
    """Calculate Hamming distance between two binary strings."""
    return sum(c1 != c2 for c1, c2 in zip(a, b))
 
# Define our code
codewords = [
    "000000", "001110", "010101", "011011",
    "100110", "101001", "110011", "111100"
]
 
# Find minimum distance
min_distance = float('inf')
min_pair = None
 
for i, c1 in enumerate(codewords):
    for j, c2 in enumerate(codewords):
        if i < j:  # Only check each pair once
            d = hamming_distance(c1, c2)
            print(f"d({c1}, {c2}) = {d}")
            if d < min_distance:
                min_distance = d
                min_pair = (c1, c2)
 
print(f"\nMinimum distance: {min_distance}")
print(f"Achieved by pair: {min_pair}")
print(f"Error detection capability: {min_distance - 1} bit errors")
 
# Output:
# d(000000, 001110) = 3
# d(000000, 010101) = 3
# ... (all pairs have distance 3 or 4)
# Minimum distance: 3
# Error detection capability: 2 bit errors

Analysis Results:

This code has dmin = 3, so it detects all 1-bit and 2-bit errors.

What About 3-bit Errors?

Some 3-bit errors are detected, some aren't:

Undetectable 3-bit error:

Transmit: 000000
Flip bits 2, 3, 4: 001110
001110 is a valid codeword → Error undetected!

Detectable 3-bit error:

Transmit: 000000
Flip bits 1, 2, 3: 111000
111000 is NOT a valid codeword → Error detected!

Error Patterns Matter

Not all error patterns of the same weight behave identically. Some 3-bit error patterns in our dmin=3 code are detected, others aren't. The guarantee is only for errors strictly less than dmin. Beyond that, detection depends on which specific bits flip.

Special Case: Burst Error Detection

So far, we've focused on detecting errors described by Hamming weight (number of flipped bits). But real transmission errors often occur in bursts—consecutive bit positions affected by a single disturbance like electrical interference or a scratch on a disk.

Burst Error Definition:

A burst error of length b is an error pattern where all erroneous bits are confined within b consecutive positions. The first and last bits of the burst must be in error, but intermediate bits may or may not be.

Example:

Original:  1 0 1 1 0 0 1 1 0 1
Received:  1 0 0 1 1 0 1 1 0 1
              ─────
            Burst of length 3

Why Burst Detection Differs:

Burst detection capability doesn't follow from Hamming distance directly. Special codes (like CRC) can detect all bursts up to a certain length, regardless of how many bits within the burst are affected.

CRC Burst Detection Properties

•CRC with degree r polynomial: Detects ALL bursts of length ≤ r
•Bursts of length r+1: Detected with probability 1 - 2^(-r+1)
•Bursts of length > r+1: Detected with probability 1 - 2^(-r)
•Random errors: CRC also provides good detection due to polynomial structure

Connection to Hamming Distance:

While burst detection is related to, but distinct from, Hamming distance analysis, there's an important relationship:

A burst of length b has Hamming weight between 2 and b (at least first and last bits affected)
If b < dmin, the burst is guaranteed detectable (it's a subset of random error detection)
For b ≥ dmin, burst detection depends on code structure beyond just distance

Practical Implication:

In real systems, codes are often chosen based on both their minimum distance (for random errors) and their burst detection capability. CRC codes excel at burst detection, while Hamming codes and BCH codes excel at random error detection.

Interleaving for Burst Errors

A powerful technique combines random-error codes with interleaving: spread consecutive bits across multiple codewords. A burst that affects one codeword's consecutive bits becomes single-bit errors in many codewords—which random-error codes handle well.

Probabilistic Detection: Beyond Guarantees

While dmin - 1 gives the guaranteed detection capacity, practical systems care about probabilistic detection—what fraction of larger error patterns actually get detected?

For Random Errors:

If errors occur independently with probability p per bit, and the code has n bits with M codewords:

Probability of undetected error ≈ (M - 1) × (fraction of space that maps to other codewords)
For a good code, this is approximately 2^(-r) where r = n - log₂(M) is the number of check bits

Example Calculation:

Consider a (7,4) Hamming code with 16 codewords in a space of 128 possible 7-bit strings:

16 valid codewords → 112 invalid patterns
Any single-bit error lands on an invalid pattern → 100% detected
Any 2-bit error lands on an invalid pattern → 100% detected
3-bit errors: some hit valid codewords, some don't

detection_probability.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
from itertools import combinations
from typing import List, Set
 
def hamming_distance(a: int, b: int) -> int:
    """Count differing bits between two integers."""
    return bin(a ^ b).count('1')
 
def analyze_detection_probability(codewords: List[int], n_bits: int):
    """
    Analyze what fraction of each error weight is detectable.
    """
    codeword_set: Set[int] = set(codewords)
    
    for weight in range(1, n_bits + 1):
        detected = 0
        undetected = 0
        
        for codeword in codewords:
            # Generate all error patterns of given weight
            for error_positions in combinations(range(n_bits), weight):
                error_mask = sum(1 << pos for pos in error_positions)
                received = codeword ^ error_mask
                
                if received in codeword_set:
                    undetected += 1
                else:
                    detected += 1
        
        total = detected + undetected
        detection_rate = detected / total if total > 0 else 1.0
        print(f"Weight {weight}: {detected}/{total} detected ({detection_rate:.1%})")
 
# Hamming (7,4) code - all 16 codewords
# For brevity, using generator matrix to construct codewords
def generate_hamming_74():
    G = [
        0b1101000,  # Row 1 of generator matrix
        0b1010100,  # Row 2
        0b0110010,  # Row 3
        0b1110001,  # Row 4
    ]
    codewords = []
    for data in range(16):  # All 4-bit data words
        codeword = 0
        for i in range(4):
            if data & (1 << i):
                codeword ^= G[i]
        codewords.append(codeword)
    return codewords
 
codewords_74 = generate_hamming_74()
print("Hamming (7,4) detection analysis:")
analyze_detection_probability(codewords_74, 7)
 
# Output:
# Weight 1: 112/112 detected (100.0%) ← All detected
# Weight 2: 336/336 detected (100.0%) ← All detected  
# Weight 3: 512/560 detected (91.4%)   ← Most detected
# Weight 4: 520/560 detected (92.9%)
# ...

The 91.4% Insight

Even beyond the guaranteed detection limit, most errors are still detected. For Hamming (7,4), over 91% of 3-bit errors are caught. In practice, the residual undetected error rate after coding is often acceptable for the application's reliability requirements.

Summary: The Detection Formula

Error detection capability flows directly from minimum Hamming distance. This simple relationship—detect up to dmin - 1 bit errors—is the foundation for understanding and comparing all error detection schemes.

Key Takeaways

•The detection formula: A code with minimum distance dmin can detect all error patterns of weight ≤ dmin - 1.
•Why minus one: At exactly dmin errors, some patterns transform one codeword into another, making detection impossible.
•Geometric interpretation: Codewords are "islands" in Hamming space; errors of < dmin bits always land in the "sea" of non-codewords.
•Beyond guarantees: Many error patterns of weight ≥ dmin are still detected; the formula gives the worst-case guarantee.
•Burst errors: Detected differently than random errors; codes like CRC excel at detecting consecutive-bit errors.
•Practical codes: Parity (dmin=2) detects 1 error; Hamming (dmin=3) detects 2 errors; repetition codes detect more but at high cost.

What's Next:

Detection tells us when errors occurred, but not how to fix them. The next page explores the even more powerful capability: error correction. We'll see that the same minimum distance that enables detection also enables correction—with a different formula that trades detection range for the ability to automatically recover the original data.

Page Complete

You now understand how minimum Hamming distance determines error detection capability. The formula e = dmin - 1 captures the guarantee: any error pattern flipping fewer than dmin bits is always detected. Next, we'll see how to go beyond detection to actually correct errors.