Loading content...
At the heart of every Hamming code lies a carefully orchestrated arrangement of check bits—parity bits positioned at specific locations that enable the magical property of error location, not just detection.
Understanding where check bits are placed and why they are placed there transforms Hamming codes from a black-box algorithm into an intuitive system you can design, implement, and debug from first principles.
This page dissects the anatomy of check bit positioning with surgical precision.
By the end of this page, you will understand exactly which positions hold check bits and which hold data bits, the binary arithmetic that determines coverage, step-by-step algorithms for computing check bit values, and how to implement Hamming encoding from scratch in any programming language.
In standard Hamming code notation, the codeword consists of n positions numbered 1 through n. Check bits (also called parity bits) occupy positions that are powers of two: 1, 2, 4, 8, 16, 32, and so on.
All other positions—3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, ...—hold data bits.
The Fundamental Rule:
Position p contains a check bit if and only if p = 2^k for some non-negative integer k.
This can be tested algorithmically: a position is a power of two if and only if it has exactly one bit set in its binary representation, which can be verified with (p & (p - 1)) == 0 && p != 0.
| Position | Binary | Type | Power of 2? | Check Bit Index |
|---|---|---|---|---|
| 1 | 0001 | Check (P₁) | Yes (2⁰) | 0 |
| 2 | 0010 | Check (P₂) | Yes (2¹) | 1 |
| 3 | 0011 | Data (D₁) | No | — |
| 4 | 0100 | Check (P₄) | Yes (2²) | 2 |
| 5 | 0101 | Data (D₂) | No | — |
| 6 | 0110 | Data (D₃) | No | — |
| 7 | 0111 | Data (D₄) | No | — |
| 8 | 1000 | Check (P₈) | Yes (2³) | 3 |
| 9 | 1001 | Data (D₅) | No | — |
| 10 | 1010 | Data (D₆) | No | — |
| 11 | 1011 | Data (D₇) | No | — |
| 12 | 1100 | Data (D₈) | No | — |
| 13 | 1101 | Data (D₉) | No | — |
| 14 | 1110 | Data (D₁₀) | No | — |
| 15 | 1111 | Data (D₁₁) | No | — |
Counting Data and Check Positions:
For an n-position codeword, the number of check bits is r = ⌊log₂(n)⌋ + 1 (positions 1, 2, 4, ..., up to the largest power of two ≤ n).
For Hamming(15,11): 4 check bits (positions 1, 2, 4, 8) and 11 data bits. For Hamming(31,26): 5 check bits (positions 1, 2, 4, 8, 16) and 26 data bits.
Think of the check bits as 'sparse sentinels' stationed at exponentially spaced intervals: every 1st, 2nd, 4th, 8th position. Data bits fill all the gaps between them. As the codeword grows, data bits increasingly outnumber check bits, improving efficiency.
Each check bit is responsible for monitoring a specific subset of positions. The coverage assignment follows a precise pattern based on binary arithmetic.
The Coverage Rule:
Check bit P_2^k covers all positions whose binary representation has a 1 in bit position k.
Equivalently: Position j is covered by check bit P_2^k if and only if (j & 2^k) ≠ 0.
This rule ensures that every position (except position 0, which doesn't exist in standard numbering) is covered by a unique subset of check bits—the subset corresponding to the 1-bits in the position's binary representation.
| Position | Binary | Covered by P₁? | Covered by P₂? | Covered by P₄? | Covered by P₈? |
|---|---|---|---|---|---|
| 1 | 0001 | ✓ | — | — | — |
| 2 | 0010 | — | ✓ | — | — |
| 3 | 0011 | ✓ | ✓ | — | — |
| 4 | 0100 | — | — | ✓ | — |
| 5 | 0101 | ✓ | — | ✓ | — |
| 6 | 0110 | — | ✓ | ✓ | — |
| 7 | 0111 | ✓ | ✓ | ✓ | — |
| 8 | 1000 | — | — | — | ✓ |
| 9 | 1001 | ✓ | — | — | ✓ |
| 10 | 1010 | — | ✓ | — | ✓ |
| 11 | 1011 | ✓ | ✓ | — | ✓ |
| 12 | 1100 | — | — | ✓ | ✓ |
| 13 | 1101 | ✓ | — | ✓ | ✓ |
| 14 | 1110 | — | ✓ | ✓ | ✓ |
| 15 | 1111 | ✓ | ✓ | ✓ | ✓ |
Key Observations from the Coverage Matrix:
Each check bit covers itself — P₁ is at position 1 and covers position 1; P₂ is at position 2 and covers position 2; etc.
Coverage follows a pattern:
Every data position is covered by at least two check bits — This is because data positions have at least two 1-bits in their binary representation (they're not powers of two).
Position 15 is covered by all check bits — Its binary representation (1111) has all bits set.
When any single bit at position j flips, exactly those check bits whose index k satisfies (j & 2^k) ≠ 0 will detect a parity violation. The pattern of failures directly encodes j in binary. This is the core insight that makes syndrome-based error location work.
During encoding, we must compute the value of each check bit based on the data bits in its coverage set. The goal is to establish a known parity (typically even parity) across each coverage set.
The Even Parity Convention:
For even parity, each check bit is set such that the XOR of all positions in its coverage set equals 0.
P_2^k = ⊕{bit[j] : j covers position j AND (j ≠ 2^k)}
Equivalently: XOR all data bits covered by this check bit; the result becomes the check bit value.
Algorithm for Encoding:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
def encode_hamming(data_bits: list[int]) -> list[int]: """ Encode data bits into a Hamming codeword. Args: data_bits: List of 0s and 1s representing the data to encode Returns: Complete Hamming codeword including check bits Example: encode_hamming([1, 0, 1, 1]) returns [0, 1, 1, 0, 0, 1, 1] (Hamming(7,4) encoding of 1011) """ m = len(data_bits) # Calculate number of check bits needed: 2^r >= m + r + 1 r = 0 while (1 << r) < m + r + 1: r += 1 n = m + r # Total codeword length # Initialize codeword with placeholders (0 for check bits initially) codeword = [0] * (n + 1) # 1-indexed, position 0 unused # Place data bits in non-power-of-two positions data_index = 0 for pos in range(1, n + 1): if not is_power_of_two(pos): codeword[pos] = data_bits[data_index] data_index += 1 # Calculate each check bit for k in range(r): check_pos = 1 << k # 2^k (positions 1, 2, 4, 8, ...) parity = 0 # XOR all positions covered by this check bit for pos in range(1, n + 1): if pos & check_pos: # Position is in this check bit's coverage parity ^= codeword[pos] codeword[check_pos] = parity return codeword[1:] # Return 0-indexed result def is_power_of_two(n: int) -> bool: """Check if n is a power of two.""" return n > 0 and (n & (n - 1)) == 0 # Demonstrationprint("Hamming Encoding Demonstration")print("=" * 50) # Encode 1011 using Hamming(7,4)data = [1, 0, 1, 1]encoded = encode_hamming(data) print(f"Data bits: {data}")print(f"Encoded: {encoded}")print() # Show position breakdownprint("Position breakdown (1-indexed):")print("Pos: 1 2 3 4 5 6 7")print("Type: P1 P2 D1 P4 D2 D3 D4")print(f"Value: {encoded[0]} {encoded[1]} {encoded[2]} {encoded[3]} {encoded[4]} {encoded[5]} {encoded[6]}")print() # Verify parity checksprint("Parity verification:")for k in range(3): check_pos = 1 << k positions_checked = [i+1 for i in range(7) if (i+1) & check_pos] values = [encoded[i] for i in range(7) if (i+1) & check_pos] xor_result = 0 for v in values: xor_result ^= v print(f"P{check_pos}: checks positions {positions_checked}, XOR = {xor_result} {'✓' if xor_result == 0 else '✗'}")Step-by-Step Encoding Example:
Let's encode the 4-bit data 1011 into Hamming(7,4) format:
Step 1: Determine codeword structure
| Position | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| Type | P₁ | P₂ | D₁ | P₄ | D₂ | D₃ | D₄ |
| Value | ? | ? | 1 | ? | 0 | 1 | 1 |
Step 3: Calculate P₁ (covers positions 1, 3, 5, 7)
Step 4: Calculate P₂ (covers positions 2, 3, 6, 7)
Step 5: Calculate P₄ (covers positions 4, 5, 6, 7)
| Position | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| Type | P₁ | P₂ | D₁ | P₄ | D₂ | D₃ | D₄ |
| Value | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
Data 1011 encodes to codeword 0110011. The three check bits (0, 1, 0) are computed to ensure even parity across each coverage set. This codeword is now protected against any single-bit error.
Different textbooks and implementations use varying conventions for bit positioning. Understanding these variations prevents confusion when studying different sources.
Convention 1: Systematic Form (Power-of-Two Placement)
This is the convention we've been using:
Convention 2: Non-Systematic Form (Data First)
Some practical implementations place data bits first, followed by check bits:
Convention 3: Zero-Indexed Positions
Some sources number positions starting from 0:
| Convention | Position 1 | Position 2 | Position 3 | Position 4 | Position 5 | Position 6 | Position 7 |
|---|---|---|---|---|---|---|---|
| Systematic (ours) | P₁ | P₂ | D₁ | P₄ | D₂ | D₃ | D₄ |
| Data-first | D₁ | D₂ | D₃ | D₄ | P₁ | P₂ | P₄ |
| Zero-indexed | P₁ | P₂ | D₁ | P₄ | D₂ | D₃ | D₄* |
When consulting multiple sources, always verify which convention is being used. The syndrome calculation and error correction procedures differ between conventions. Mixing conventions leads to incorrect results.
Converting Between Conventions:
When implementing systems that interface with external protocols, you may need to convert between conventions:
def systematic_to_data_first(systematic: list) -> list:
"""Convert from power-of-two convention to data-first convention."""
n = len(systematic)
data_bits = []
check_bits = []
for i, bit in enumerate(systematic, 1): # 1-indexed
if is_power_of_two(i):
check_bits.append(bit)
else:
data_bits.append(bit)
return data_bits + check_bits
The systematic (power-of-two) convention is generally preferred for teaching because it makes the relationship between syndrome values and error positions immediately apparent.
A visual representation helps solidify understanding of which check bits cover which positions. We can represent coverage as overlapping sets or as a geometric pattern.
Venn Diagram Interpretation:
For Hamming(7,4), imagine three overlapping circles labeled P₁, P₂, and P₄:
┌─────────────────────────┐
│ P₄ │
│ ┌─────────────┐ │
│ │ P₂ │ │
┌────────┼───┼──────┐ │ │
│ P₁ │ │ │ │ │
│ │ │ 7 │ 6 │ 4 │
│ 1 │ └──────┴──────┤ │
│ │ 3 │ │
└────────┼─────────────────┤ 5 │
│ 2 │ │
└─────────────────┴────────┘
The Skip-Check Pattern:
Another way to visualize coverage is the skip-check pattern:
| Check Bit | Pattern | Positions Covered |
|---|---|---|
| P₁ | Check 1, skip 1, repeat | 1, 3, 5, 7, 9, 11, 13, 15, ... |
| P₂ | Check 2, skip 2, repeat | 2-3, 6-7, 10-11, 14-15, ... |
| P₄ | Check 4, skip 4, repeat | 4-7, 12-15, 20-23, ... |
| P₈ | Check 8, skip 8, repeat | 8-15, 24-31, 40-47, ... |
This pattern emerges directly from how binary counting toggles each bit:
For P₁: start at 1, check 1, skip 1. For P₂: start at 2, check 2, skip 2. For P₄: start at 4, check 4, skip 4. The check bit position determines both the starting point and the check/skip length. This pattern extends indefinitely for larger codes.
The regular structure of check bit coverage enables highly efficient hardware implementations. Understanding these implementations reveals why Hamming codes remain popular in memory systems despite their age.
XOR Tree Architecture:
Each check bit is computed as the XOR of all data positions in its coverage set. This can be implemented as a tree of two-input XOR gates:
P₁
│
┌─┴─┐
XOR XOR
┌─┴─┐ ┌─┴─┐
XOR XOR XOR XOR
│ │ │ │
D₁ D₂ D₃ D₄ (for covered positions)
Parallel Computation:
All check bits can be computed simultaneously since they operate on independent (though overlapping) subsets of data bits. The encoding process has:
| Code | XOR Gates for Encoding | XOR Gates for Decoding | Total Latency (Gate Delays) |
|---|---|---|---|
| Hamming(7,4) | 9 | 12 | 3 |
| Hamming(15,11) | 26 | 34 | 4 |
| Hamming(31,26) | 60 | 78 | 5 |
| Hamming(63,57) | 133 | 172 | 6 |
| Hamming(127,120) | 287 | 371 | 7 |
Memory ECC Implementation:
In ECC memory systems, Hamming encoding and decoding happen on every memory access:
Write Path:
Read Path:
The entire encode/decode cycle adds only a few nanoseconds to memory latency—negligible compared to DRAM access times of 50-100 nanoseconds.
DDR5 memory includes on-die ECC, meaning error correction happens inside the DRAM chip itself using Hamming-like codes. This provides an additional layer of protection beyond system-level ECC, addressing the higher error rates of smaller memory cells.
Let's examine multiple algorithmic approaches for computing check bits, ranging from straightforward implementations to optimized versions.
Algorithm 1: Position-by-Position Iteration
The most intuitive approach: for each check position, iterate through all positions and XOR those in the coverage set.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132
def compute_check_bits_v1(codeword: list[int], n: int) -> list[int]: """ Algorithm 1: Position-by-position iteration. For each check bit position 2^k, scan all positions and XOR those whose index has the k-th bit set. Time complexity: O(n * log n) - for each of log(n) check bits, scan n positions """ result = codeword.copy() r = (n - 1).bit_length() # Number of check bit positions for k in range(r): check_pos = 1 << k # 2^k parity = 0 for pos in range(1, n + 1): if pos & check_pos: # Position covered by this check bit parity ^= result[pos - 1] # 0-indexed access result[check_pos - 1] = parity # Set check bit (already included in parity) return result def compute_check_bits_v2(data: list[int]) -> list[int]: """ Algorithm 2: Direct formula for each check bit. Pre-compute which data positions contribute to each check bit. More efficient for repeated encoding with same parameters. """ m = len(data) r = 0 while (1 << r) < m + r + 1: r += 1 n = m + r # Build mapping: which data index goes to which codeword position codeword = [0] * n data_positions = [] data_idx = 0 for pos in range(1, n + 1): if pos & (pos - 1): # Not a power of two codeword[pos - 1] = data[data_idx] data_positions.append((pos, data_idx)) data_idx += 1 # Compute each check bit directly for k in range(r): check_pos = 1 << k parity = 0 for pos, _ in data_positions: if pos & check_pos: parity ^= codeword[pos - 1] codeword[check_pos - 1] = parity return codeword def compute_check_bits_v3_matrix(data: list[int]) -> list[int]: """ Algorithm 3: Matrix multiplication approach. Use the generator matrix for a more algebraic implementation. This is how hardware implementations often work. """ m = len(data) r = 0 while (1 << r) < m + r + 1: r += 1 n = m + r # Build generator matrix G (m x n) # Each row corresponds to a data bit # The row shows where that bit contributes in the codeword G = [] data_idx = 0 for pos in range(1, n + 1): if pos & (pos - 1): # Data position row = [0] * n row[pos - 1] = 1 # Data bit in its position # Also contributes to check bits for k in range(r): check_pos = 1 << k if pos & check_pos: row[check_pos - 1] ^= 1 G.append(row) # Codeword = data * G (over GF(2)) codeword = [0] * n for i, d in enumerate(data): if d: for j in range(n): codeword[j] ^= G[i][j] return codeword # Demonstration and verificationprint("Check Bit Calculation Algorithms Comparison")print("=" * 60) test_data = [1, 0, 1, 1] # 4-bit data # Algorithm 1: Needs codeword with data pre-placedm, r = 4, 3n = m + rcodeword_v1 = [0] * ndata_idx = 0for pos in range(1, n + 1): if pos & (pos - 1): codeword_v1[pos - 1] = test_data[data_idx] data_idx += 1result_v1 = compute_check_bits_v1(codeword_v1, n) result_v2 = compute_check_bits_v2(test_data)result_v3 = compute_check_bits_v3_matrix(test_data) print(f"Input data: {test_data}")print(f"Algorithm 1: {result_v1}")print(f"Algorithm 2: {result_v2}")print(f"Algorithm 3: {result_v3}")print()print("All algorithms produce identical results:", result_v1 == result_v2 == result_v3)For software implementations, Algorithm 2 (direct formula) is typically fastest. For hardware, the matrix approach translates directly to XOR gate networks. Algorithm 1 is clearest for learning and debugging.
We have thoroughly examined how check bits are positioned in Hamming codes and the algorithms for computing their values. Let's consolidate the key takeaways:
What's Next:
With check bit positions and encoding thoroughly understood, the next page tackles syndrome calculation—the decoding side of Hamming codes. We'll see how the pattern of parity check failures (the syndrome) directly encodes the error position, enabling correction without knowing the original data.
You now have a complete understanding of check bit placement and calculation in Hamming codes. The power-of-two positioning strategy creates the overlapping coverage that enables error location. Next, we'll decode received codewords and locate errors using syndrome calculations.