Loading content...
In 1950, Richard Hamming was working at Bell Labs when he became frustrated with the unreliable machines of his era. When weekend operators encountered errors in his programs—errors his code had no way to recover from—the machines would simply abort his calculations. Out of this frustration came one of the most elegant concepts in information theory: a way to mathematically measure how "far apart" two pieces of data are.
This measure, now called Hamming distance, is deceptively simple: it counts the number of positions where two binary strings differ. Yet this simple counting unlocks a profound understanding of error detection and correction. Every error control code—from simple parity to sophisticated Reed-Solomon—derives its power from the Hamming distances between its codewords.
By the end of this page, you will understand the mathematical definition of Hamming distance, how to calculate it between any two binary strings, why it provides a unified framework for understanding all error control codes, and how the distances between codewords determine what errors can be detected and corrected.
The Hamming distance between two binary strings of equal length is the number of positions at which the corresponding bits differ. Named after Richard Hamming, this metric provides a rigorous mathematical framework for understanding how transmission errors affect data.
Formal Definition:
Given two binary strings a and b of length n, the Hamming distance d(a, b) is:
$$d(a, b) = \sum_{i=1}^{n} |a_i \oplus b_i|$$
Where ⊕ denotes the XOR (exclusive OR) operation. In simpler terms: XOR the two strings bit-by-bit, then count the number of 1s in the result.
XOR produces a 1 only where the input bits differ. This means XOR'ing two strings highlights exactly the positions where they disagree. Counting the 1s in the XOR result gives the Hamming distance. This is the most efficient way to compute Hamming distance in practice.
Worked Example:
Let's calculate the Hamming distance between 10110101 and 10011001:
String a: 1 0 1 1 0 1 0 1
String b: 1 0 0 1 1 0 0 1
─────────────────
a XOR b: 0 0 1 0 1 1 0 0
↑ ↑ ↑
Counting the 1s: positions 3, 5, and 6 differ.
d(10110101, 10011001) = 3
This means converting 10110101 into 10011001 requires exactly 3 bit flips. Equivalently stated: if transmission errors flip exactly these 3 bits, the first string becomes the second.
| String A | String B | XOR Result | Hamming Distance |
|---|---|---|---|
| 0000 | 0000 | 0000 | 0 (identical) |
| 0000 | 1111 | 1111 | 4 (opposite) |
| 1010 | 1001 | 0011 | 2 |
| 11100 | 00111 | 11011 | 4 |
| 1101011 | 1001001 | 0100010 | 2 |
Hamming distance isn't just a counting measure—it's a true metric in the mathematical sense. This means it satisfies specific properties that make it a reliable measure of "distance" in the space of binary strings. Understanding these properties reveals why Hamming distance works so well for error analysis.
Why the Triangle Inequality Matters:
The triangle inequality has profound implications for error correction. Imagine three codewords A, B, and C. If A and B are far apart (large Hamming distance), and B and C are far apart, then A and C must also be reasonably far apart. This "spreading" property is what allows code designers to ensure that all legal codewords maintain sufficient distance from each other.
Bounds on Hamming Distance:
For two binary strings of length n:
This means Hamming distance falls within [0, n] for n-bit strings.
The set of all n-bit binary strings, equipped with Hamming distance as the metric, forms a mathematical structure called Hamming space (or Hamming cube). In this space, each string is a point, and the distance between any two points is their Hamming distance. Error control coding is essentially the art of choosing codewords that are far apart in this space.
Understanding Hamming distance becomes much more intuitive when we visualize it geometrically. For small values of n, we can represent all possible n-bit strings as vertices of an n-dimensional hypercube, where edges connect strings that differ by exactly 1 bit.
The 3-bit Hamming Cube:
Consider all 3-bit strings: 000, 001, 010, 011, 100, 101, 110, 111. These form the vertices of a cube. Two vertices are connected by an edge if and only if their Hamming distance is 1.
Interpreting the Cube:
The shortest path between any two vertices (counting edges) equals their Hamming distance. To get from 000 to 111, you must flip 3 bits, so you must traverse at least 3 edges.
Error as Movement in Hamming Space:
When transmission errors occur, they move the received string from one point in Hamming space to another. A single-bit error moves exactly one edge. A 2-bit error moves two edges. This geometric perspective reveals the core insight of error control:
If valid codewords are sufficiently far apart in Hamming space, any small number of errors cannot turn one valid codeword into another—the error is detectable.
A "sphere" of radius r around a codeword in Hamming space contains all strings within Hamming distance ≤ r of that codeword. If these spheres don't overlap, any received string can be uniquely decoded to the nearest codeword—this is the basis of error correction.
In practice, Hamming distance computation needs to be fast. Network equipment, storage controllers, and CPUs implement these calculations in hardware. Understanding the underlying algorithms helps appreciate both the theory and its real-world implementation.
12345678910111213141516171819202122232425262728293031323334
def hamming_distance(a: int, b: int) -> int: """ Calculate Hamming distance between two integers. Uses XOR to find differing bits, then counts the 1s. This is the most efficient software approach. """ xor_result = a ^ b # Bits that differ are 1 # Count the number of 1 bits (population count) distance = 0 while xor_result: distance += xor_result & 1 # Check lowest bit xor_result >>= 1 # Shift right return distance def hamming_distance_strings(s1: str, s2: str) -> int: """ Calculate Hamming distance between two binary strings. Strings must be equal length. """ if len(s1) != len(s2): raise ValueError("Strings must have equal length") return sum(c1 != c2 for c1, c2 in zip(s1, s2)) # Examplesprint(hamming_distance(0b10110101, 0b10011001)) # Output: 3print(hamming_distance_strings("10110101", "10011001")) # Output: 3 # Using Python's built-in popcount (Python 3.10+)def hamming_distance_modern(a: int, b: int) -> int: return (a ^ b).bit_count()Hardware Acceleration:
Modern processors include dedicated instructions for population count (counting 1-bits):
| Architecture | Instruction | Introduced |
|---|---|---|
| x86/x64 | POPCNT | SSE4.2 (2008) |
| ARM | CNT (NEON), VCNT | ARMv8 |
| RISC-V | cpop | Bitmanip extension |
This means Hamming distance between two 64-bit values can be computed in 2-3 clock cycles on modern hardware: one XOR, one population count.
The Hamming weight (or population count) of a binary string is the number of 1s it contains. Hamming distance is the Hamming weight of the XOR of two strings. This relationship is why efficient popcount operations directly enable fast Hamming distance calculations.
The power of Hamming distance lies in how it connects to error detection and correction. This connection is both beautiful and practical—it transforms the abstract question "Can we detect/correct errors?" into the concrete question "How far apart are our codewords?"
The Fundamental Insight:
Consider a transmission where the original codeword c is corrupted into received word r. The number of bit errors is exactly d(c, r)—the Hamming distance between what was sent and what was received.
Now, if we have a set of valid codewords {c₁, c₂, c₃, ...}, and we receive a word r, there are three possibilities:
The Critical Question:
How many bit errors turn one valid codeword into another? The answer is the minimum Hamming distance between any pair of valid codewords in the code. This single number—denoted dmin—determines the entire error-handling capability of the code.
Example:
Suppose our code has only two codewords: 00000 and 11111. Their Hamming distance is 5.
This simple example shows that a minimum distance of 5 allows detection of up to 4-bit errors.
When errors transform one valid codeword into another valid codeword, detection is impossible—the receiver has no way to know anything went wrong. This is why maximum distance between codewords is crucial: it determines how many errors must occur before this undetectable transformation becomes possible.
While individual Hamming distances matter, what ultimately determines a code's power is its minimum Hamming distance—the smallest distance between any pair of valid codewords.
Formal Definition:
For a code C containing codewords {c₁, c₂, ..., cₖ}, the minimum distance is:
$$d_{min}(C) = \min_{i \neq j} d(c_i, c_j)$$
This is the "weakest link" in the code—the pair of codewords most easily confused by errors.
| Codeword | 000000 | 001110 | 010101 | 011011 | 100110 | 101001 |
|---|---|---|---|---|---|---|
| 000000 | — | 3 | 3 | 4 | 3 | 3 |
| 001110 | 3 | — | 4 | 3 | 4 | 4 |
| 010101 | 3 | 4 | — | 3 | 4 | 4 |
| 011011 | 4 | 3 | 3 | — | 3 | 3 |
| 100110 | 3 | 4 | 4 | 3 | — | 4 |
| 101001 | 3 | 4 | 4 | 3 | 4 | — |
In this example code, all pairwise distances are either 3 or 4. The minimum distance dmin = 3.
Why Minimum Distance Dominates:
Why focus on the minimum rather than average distance? Because error control must work for the worst case. If any two codewords are separated by only 2 bits, then a 2-bit error pattern can turn one into the other—making that error undetectable. It doesn't matter that other codeword pairs are far apart; the code fails at its weakest point.
This leads to the central principle of error control coding:
A code's reliability is determined by its minimum distance. All error detection and correction capabilities derive from this single number.
A code is often characterized by the notation (n, M, dmin) or sometimes [n, k, dmin], where n is the codeword length, M (or 2^k) is the number of codewords, and dmin is the minimum distance. For example, a (7, 16, 3) code has 7-bit codewords, 16 valid codewords, and minimum distance 3.
Analyzing a code by computing distances between all codeword pairs is computationally expensive. For a code with M codewords, there are M(M-1)/2 pairs to check. However, a special class of codes called linear codes provides a powerful shortcut.
What is a Linear Code?
A binary linear code is a set of codewords that forms a vector space over GF(2) (the field with two elements: 0 and 1). In practical terms, this means:
The Simplification:
For linear codes, there's a remarkable property:
$$d_{min} = \min_{c \neq 0} w(c)$$
Where w(c) is the Hamming weight (number of 1s) of codeword c.
This means we can find the minimum distance by simply finding the non-zero codeword with the smallest Hamming weight—no need to compute distances between all pairs!
Since 0...0 is a codeword, the distance from any codeword c to the all-zeros codeword equals the Hamming weight of c. Since XOR of any two codewords c₁ ⊕ c₂ is also a codeword, and d(c₁, c₂) = w(c₁ ⊕ c₂), the minimum distance between any two codewords equals the minimum weight of any non-zero codeword.
Example: A Simple Linear Code
Consider the code with codewords: {0000, 0110, 1001, 1111}
Let's verify it's linear:
Hamming weights:
Minimum non-zero weight = 2, so dmin = 2.
This is much faster than computing all 6 pairwise distances!
Hamming distance provides the mathematical vocabulary for discussing error control. Before Hamming's work, engineers knew empirically that adding redundancy helped catch errors—but they lacked a rigorous framework to analyze and optimize their codes. Hamming distance changed everything.
What's Next:
Now that we understand what Hamming distance is and why minimum distance matters, we'll explore exactly how minimum distance translates into error detection capability. The next page establishes the precise mathematical relationship: given a minimum distance dmin, how many errors can we reliably detect?
You now understand Hamming distance as the fundamental metric of error control coding. It's a simple counting measure—bits that differ—yet it provides the entire analytical framework for understanding what errors can be detected and corrected. Next, we'll see exactly how to use minimum distance to determine error detection power.