Loading content...
In the previous page, we established Hamming distance as the fundamental measure of difference between binary strings. We saw that the minimum distance of a code—the smallest distance between any two valid codewords—is the critical parameter. But exactly how does this minimum distance translate into error detection capability?
The answer is elegant: a code with minimum distance dmin can detect all error patterns affecting up to dmin - 1 bits. This simple formula encapsulates the entire theory of error detection, and understanding why it works provides deep insight into the nature of error control.
By the end of this page, you will understand the precise relationship between minimum distance and error detection capability, why the formula is dmin - 1 (not dmin), how to analyze detection guarantees for any code, and the important distinction between guaranteed detection and probabilistic detection.
Error detection works by a simple principle: if the received word is not a valid codeword, errors must have occurred. The receiver checks whether the received data matches one of the allowed codewords. If not, something went wrong during transmission.
The Key Question:
Given a code with minimum distance dmin, what error patterns will always produce an invalid codeword (guaranteeing detection)?
The Answer:
Any error pattern affecting fewer than dmin bits will be detected. Mathematically:
$$\text{Guaranteed detection of up to } (d_{min} - 1) \text{ bit errors}$$
Why This Works:
Consider a valid codeword c that gets corrupted into received word r by t bit errors. The Hamming distance d(c, r) = t (since exactly t bits flipped).
For r to be mistaken as valid, it must equal some valid codeword. But all valid codewords are at least dmin apart from each other. If t < dmin, then r is within distance t < dmin of c, which means r cannot be a valid codeword—it's too close to c to be anything else.
Think of each codeword as surrounded by a "sphere" of radius (dmin - 1). Everything inside this sphere is detectable as erroneous because it's close to the codeword but not the codeword itself. The minimum distance ensures these spheres don't touch other valid codewords.
Students often wonder: if codewords are dmin apart, why can we only detect dmin - 1 errors, not dmin errors?
The answer lies in what happens at exactly dmin errors.
The Boundary Case:
When exactly dmin bit errors occur, the received word might be transformed into another valid codeword. This is precisely what minimum distance means: there exists at least one pair of codewords with exactly dmin bits different between them.
Concrete Example:
Consider a code with dmin = 3 containing codewords 000 and 111.
| Errors | Transmitted: 000 | Received | Status |
|---|---|---|---|
| 0 | 000 | 000 | Valid (correct) |
| 1 | 000 | 001, 010, or 100 | Invalid → Detected |
| 2 | 000 | 011, 101, or 110 | Invalid → Detected |
| 3 | 000 | 111 | Valid → UNDETECTED! |
With exactly 3 errors (= dmin), the codeword 000 can transform into 111—another valid codeword. The receiver sees a valid codeword and has no way to know an error occurred.
At dmin errors, detection becomes unreliable. Some dmin-error patterns are detectable (if they don't land on another codeword), but at least one pattern is not. For guaranteed detection, we must stay below this threshold: at most dmin - 1 errors.
Formal Statement:
For a code with minimum distance dmin, any error pattern of weight t (where t < dmin) is guaranteed detectable. Error patterns of weight t ≥ dmin may or may not be detected—there is no guarantee.
The Contrapositive View:
Another way to understand this: for an error to go undetected, it must transform one valid codeword into another. The minimum number of bit flips to do this is dmin (by definition). Therefore, anything less than dmin flips cannot accomplish this transformation—the error is exposed.
Visualizing codewords and their "detection regions" in Hamming space makes the detection principle intuitive. Each codeword is surrounded by a region of non-codewords that, if received, signal an error.
For dmin = 3:
| Distance from Nearest Codeword | What It Represents | Detection Status |
|---|---|---|
| 0 | The codeword itself | Accepted as correct |
| 1 | 1-bit error from some codeword | Detected (not a codeword) |
| 2 | 2-bit errors from some codeword | Detected (not a codeword) |
| 3+ | 3+ bit errors from nearest codeword | May be another codeword → May be undetected |
The Geography of Hamming Space:
Imagine Hamming space as a vast landscape where valid codewords are islands. The "sea" between islands consists of all non-codewords. Error detection works because:
The goal of code design is to place islands (codewords) strategically so that common error patterns always land in the sea.
While dmin - 1 is the guaranteed detection capacity, many codes can detect some errors beyond this limit. The guarantee applies to ALL error patterns of that weight; larger error patterns have probabilistic detection based on code structure.
Let's apply the error detection formula e = dmin - 1 to various coding schemes we've encountered:
Example 1: Simple Parity Bit
With a single parity bit, valid codewords have even parity. Consider:
Any two codewords differ in at least 2 positions (you can't change just the data without changing parity). Therefore dmin = 2.
Detection capability: dmin - 1 = 2 - 1 = 1 bit error
This confirms what we know: single parity detects single-bit errors but not double-bit errors.
| Code | dmin | Error Detection Capability | Notes |
|---|---|---|---|
| Single parity bit | 2 | 1-bit errors | Simplest error detection |
| Two-dimensional parity | 4 | 3-bit errors | Can also correct 1-bit errors |
| Hamming (7,4) | 3 | 2-bit errors | Also corrects 1-bit errors |
| Repetition code (3x) | 3 | 2-bit errors | High redundancy, low rate |
| Repetition code (5x) | 5 | 4-bit errors | Very high redundancy |
| CRC-32 | 4+ | 3+ bit errors | Plus burst error detection |
Example 2: Repetition Code
The simplest way to increase minimum distance is repetition. If we send each bit multiple times:
For 5-repetition: Detection capability = 5 - 1 = 4 bit errors
However, repetition codes are inefficient—the rate (useful bits per transmitted bit) is only 1/n.
Increasing minimum distance always costs something: either more redundancy (lower rate) or more complex encoding/decoding. Good codes achieve high distance with reasonable overhead—this is the central challenge of coding theory.
Understanding exactly which error patterns are detectable requires careful analysis. Let's work through a complete example.
Example Code:
Consider a (6, 3) code with 3 data bits encoded as 6-bit codewords:
| Data | Codeword |
|---|---|
| 000 | 000000 |
| 001 | 001110 |
| 010 | 010101 |
| 011 | 011011 |
| 100 | 100110 |
| 101 | 101001 |
| 110 | 110011 |
| 111 | 111100 |
First, let's find the minimum distance by examining pairwise distances:
123456789101112131415161718192021222324252627282930313233
def hamming_distance(a: str, b: str) -> int: """Calculate Hamming distance between two binary strings.""" return sum(c1 != c2 for c1, c2 in zip(a, b)) # Define our codecodewords = [ "000000", "001110", "010101", "011011", "100110", "101001", "110011", "111100"] # Find minimum distancemin_distance = float('inf')min_pair = None for i, c1 in enumerate(codewords): for j, c2 in enumerate(codewords): if i < j: # Only check each pair once d = hamming_distance(c1, c2) print(f"d({c1}, {c2}) = {d}") if d < min_distance: min_distance = d min_pair = (c1, c2) print(f"\nMinimum distance: {min_distance}")print(f"Achieved by pair: {min_pair}")print(f"Error detection capability: {min_distance - 1} bit errors") # Output:# d(000000, 001110) = 3# d(000000, 010101) = 3# ... (all pairs have distance 3 or 4)# Minimum distance: 3# Error detection capability: 2 bit errorsAnalysis Results:
This code has dmin = 3, so it detects all 1-bit and 2-bit errors.
What About 3-bit Errors?
Some 3-bit errors are detected, some aren't:
Undetectable 3-bit error:
Detectable 3-bit error:
Not all error patterns of the same weight behave identically. Some 3-bit error patterns in our dmin=3 code are detected, others aren't. The guarantee is only for errors strictly less than dmin. Beyond that, detection depends on which specific bits flip.
So far, we've focused on detecting errors described by Hamming weight (number of flipped bits). But real transmission errors often occur in bursts—consecutive bit positions affected by a single disturbance like electrical interference or a scratch on a disk.
Burst Error Definition:
A burst error of length b is an error pattern where all erroneous bits are confined within b consecutive positions. The first and last bits of the burst must be in error, but intermediate bits may or may not be.
Example:
Original: 1 0 1 1 0 0 1 1 0 1
Received: 1 0 0 1 1 0 1 1 0 1
─────
Burst of length 3
Why Burst Detection Differs:
Burst detection capability doesn't follow from Hamming distance directly. Special codes (like CRC) can detect all bursts up to a certain length, regardless of how many bits within the burst are affected.
Connection to Hamming Distance:
While burst detection is related to, but distinct from, Hamming distance analysis, there's an important relationship:
Practical Implication:
In real systems, codes are often chosen based on both their minimum distance (for random errors) and their burst detection capability. CRC codes excel at burst detection, while Hamming codes and BCH codes excel at random error detection.
A powerful technique combines random-error codes with interleaving: spread consecutive bits across multiple codewords. A burst that affects one codeword's consecutive bits becomes single-bit errors in many codewords—which random-error codes handle well.
While dmin - 1 gives the guaranteed detection capacity, practical systems care about probabilistic detection—what fraction of larger error patterns actually get detected?
For Random Errors:
If errors occur independently with probability p per bit, and the code has n bits with M codewords:
Example Calculation:
Consider a (7,4) Hamming code with 16 codewords in a space of 128 possible 7-bit strings:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
from itertools import combinationsfrom typing import List, Set def hamming_distance(a: int, b: int) -> int: """Count differing bits between two integers.""" return bin(a ^ b).count('1') def analyze_detection_probability(codewords: List[int], n_bits: int): """ Analyze what fraction of each error weight is detectable. """ codeword_set: Set[int] = set(codewords) for weight in range(1, n_bits + 1): detected = 0 undetected = 0 for codeword in codewords: # Generate all error patterns of given weight for error_positions in combinations(range(n_bits), weight): error_mask = sum(1 << pos for pos in error_positions) received = codeword ^ error_mask if received in codeword_set: undetected += 1 else: detected += 1 total = detected + undetected detection_rate = detected / total if total > 0 else 1.0 print(f"Weight {weight}: {detected}/{total} detected ({detection_rate:.1%})") # Hamming (7,4) code - all 16 codewords# For brevity, using generator matrix to construct codewordsdef generate_hamming_74(): G = [ 0b1101000, # Row 1 of generator matrix 0b1010100, # Row 2 0b0110010, # Row 3 0b1110001, # Row 4 ] codewords = [] for data in range(16): # All 4-bit data words codeword = 0 for i in range(4): if data & (1 << i): codeword ^= G[i] codewords.append(codeword) return codewords codewords_74 = generate_hamming_74()print("Hamming (7,4) detection analysis:")analyze_detection_probability(codewords_74, 7) # Output:# Weight 1: 112/112 detected (100.0%) ← All detected# Weight 2: 336/336 detected (100.0%) ← All detected # Weight 3: 512/560 detected (91.4%) ← Most detected# Weight 4: 520/560 detected (92.9%)# ...Even beyond the guaranteed detection limit, most errors are still detected. For Hamming (7,4), over 91% of 3-bit errors are caught. In practice, the residual undetected error rate after coding is often acceptable for the application's reliability requirements.
Error detection capability flows directly from minimum Hamming distance. This simple relationship—detect up to dmin - 1 bit errors—is the foundation for understanding and comparing all error detection schemes.
What's Next:
Detection tells us when errors occurred, but not how to fix them. The next page explores the even more powerful capability: error correction. We'll see that the same minimum distance that enables detection also enables correction—with a different formula that trades detection range for the ability to automatically recover the original data.
You now understand how minimum Hamming distance determines error detection capability. The formula e = dmin - 1 captures the guarantee: any error pattern flipping fewer than dmin bits is always detected. Next, we'll see how to go beyond detection to actually correct errors.