Hamming Code - Learning Module

Loading content...

0/228

Check Bit Positions

The Architecture of Error Location

At the heart of every Hamming code lies a carefully orchestrated arrangement of check bits—parity bits positioned at specific locations that enable the magical property of error location, not just detection.

Understanding where check bits are placed and why they are placed there transforms Hamming codes from a black-box algorithm into an intuitive system you can design, implement, and debug from first principles.

This page dissects the anatomy of check bit positioning with surgical precision.

What You Will Learn

By the end of this page, you will understand exactly which positions hold check bits and which hold data bits, the binary arithmetic that determines coverage, step-by-step algorithms for computing check bit values, and how to implement Hamming encoding from scratch in any programming language.

Power-of-Two Positions Revisited

In standard Hamming code notation, the codeword consists of n positions numbered 1 through n. Check bits (also called parity bits) occupy positions that are powers of two: 1, 2, 4, 8, 16, 32, and so on.

All other positions—3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, ...—hold data bits.

The Fundamental Rule:

Position p contains a check bit if and only if p = 2^k for some non-negative integer k.

This can be tested algorithmically: a position is a power of two if and only if it has exactly one bit set in its binary representation, which can be verified with (p & (p - 1)) == 0 && p != 0.

Position Types for First 15 Positions
Position	Binary	Type	Power of 2?	Check Bit Index
1	0001	Check (P₁)	Yes (2⁰)	0
2	0010	Check (P₂)	Yes (2¹)	1
3	0011	Data (D₁)	No	—
4	0100	Check (P₄)	Yes (2²)	2
5	0101	Data (D₂)	No	—
6	0110	Data (D₃)	No	—
7	0111	Data (D₄)	No	—
8	1000	Check (P₈)	Yes (2³)	3
9	1001	Data (D₅)	No	—
10	1010	Data (D₆)	No	—
11	1011	Data (D₇)	No	—
12	1100	Data (D₈)	No	—
13	1101	Data (D₉)	No	—
14	1110	Data (D₁₀)	No	—
15	1111	Data (D₁₁)	No	—

Counting Data and Check Positions:

For an n-position codeword, the number of check bits is r = ⌊log₂(n)⌋ + 1 (positions 1, 2, 4, ..., up to the largest power of two ≤ n).

For Hamming(15,11): 4 check bits (positions 1, 2, 4, 8) and 11 data bits. For Hamming(31,26): 5 check bits (positions 1, 2, 4, 8, 16) and 26 data bits.

Visual Memory Aid

Think of the check bits as 'sparse sentinels' stationed at exponentially spaced intervals: every 1st, 2nd, 4th, 8th position. Data bits fill all the gaps between them. As the codeword grows, data bits increasingly outnumber check bits, improving efficiency.

Check Bit Coverage Assignments

Each check bit is responsible for monitoring a specific subset of positions. The coverage assignment follows a precise pattern based on binary arithmetic.

The Coverage Rule:

Check bit P_2^k covers all positions whose binary representation has a 1 in bit position k.

Equivalently: Position j is covered by check bit P_2^k if and only if (j & 2^k) ≠ 0.

This rule ensures that every position (except position 0, which doesn't exist in standard numbering) is covered by a unique subset of check bits—the subset corresponding to the 1-bits in the position's binary representation.

Coverage Matrix for Hamming(15,11)
Position	Binary	Covered by P₁?	Covered by P₂?	Covered by P₄?	Covered by P₈?
1	0001	✓	—	—	—
2	0010	—	✓	—	—
3	0011	✓	✓	—	—
4	0100	—	—	✓	—
5	0101	✓	—	✓	—
6	0110	—	✓	✓	—
7	0111	✓	✓	✓	—
8	1000	—	—	—	✓
9	1001	✓	—	—	✓
10	1010	—	✓	—	✓
11	1011	✓	✓	—	✓
12	1100	—	—	✓	✓
13	1101	✓	—	✓	✓
14	1110	—	✓	✓	✓
15	1111	✓	✓	✓	✓

Key Observations from the Coverage Matrix:

Each check bit covers itself — P₁ is at position 1 and covers position 1; P₂ is at position 2 and covers position 2; etc.
Coverage follows a pattern:
- P₁: Check 1, skip 1, check 1, skip 1, ... (alternating)
- P₂: Check 2, skip 2, check 2, skip 2, ... (pairs)
- P₄: Check 4, skip 4, check 4, skip 4, ... (quadruples)
- P₈: Check 8, skip 8, check 8, skip 8, ... (eight)
Every data position is covered by at least two check bits — This is because data positions have at least two 1-bits in their binary representation (they're not powers of two).
Position 15 is covered by all check bits — Its binary representation (1111) has all bits set.

Why This Coverage Works

When any single bit at position j flips, exactly those check bits whose index k satisfies (j & 2^k) ≠ 0 will detect a parity violation. The pattern of failures directly encodes j in binary. This is the core insight that makes syndrome-based error location work.

Computing Check Bit Values

During encoding, we must compute the value of each check bit based on the data bits in its coverage set. The goal is to establish a known parity (typically even parity) across each coverage set.

The Even Parity Convention:

For even parity, each check bit is set such that the XOR of all positions in its coverage set equals 0.

P_2^k = ⊕{bit[j] : j covers position j AND (j ≠ 2^k)}

Equivalently: XOR all data bits covered by this check bit; the result becomes the check bit value.

Algorithm for Encoding:

hamming_encoding.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def encode_hamming(data_bits: list[int]) -> list[int]:
    """
    Encode data bits into a Hamming codeword.
    
    Args:
        data_bits: List of 0s and 1s representing the data to encode
        
    Returns:
        Complete Hamming codeword including check bits
    
    Example:
        encode_hamming([1, 0, 1, 1]) returns [0, 1, 1, 0, 0, 1, 1]
        (Hamming(7,4) encoding of 1011)
    """
    m = len(data_bits)
    
    # Calculate number of check bits needed: 2^r >= m + r + 1
    r = 0
    while (1 << r) < m + r + 1:
        r += 1
    
    n = m + r  # Total codeword length
    
    # Initialize codeword with placeholders (0 for check bits initially)
    codeword = [0] * (n + 1)  # 1-indexed, position 0 unused
    
    # Place data bits in non-power-of-two positions
    data_index = 0
    for pos in range(1, n + 1):
        if not is_power_of_two(pos):
            codeword[pos] = data_bits[data_index]
            data_index += 1
    
    # Calculate each check bit
    for k in range(r):
        check_pos = 1 << k  # 2^k (positions 1, 2, 4, 8, ...)
        parity = 0
        
        # XOR all positions covered by this check bit
        for pos in range(1, n + 1):
            if pos & check_pos:  # Position is in this check bit's coverage
                parity ^= codeword[pos]
        
        codeword[check_pos] = parity
    
    return codeword[1:]  # Return 0-indexed result
 
 
def is_power_of_two(n: int) -> bool:
    """Check if n is a power of two."""
    return n > 0 and (n & (n - 1)) == 0
 
 
# Demonstration
print("Hamming Encoding Demonstration")
print("=" * 50)
 
# Encode 1011 using Hamming(7,4)
data = [1, 0, 1, 1]
encoded = encode_hamming(data)
 
print(f"Data bits:    {data}")
print(f"Encoded:      {encoded}")
print()
 
# Show position breakdown
print("Position breakdown (1-indexed):")
print("Pos:    1   2   3   4   5   6   7")
print("Type:   P1  P2  D1  P4  D2  D3  D4")
print(f"Value:  {encoded[0]}   {encoded[1]}   {encoded[2]}   {encoded[3]}   {encoded[4]}   {encoded[5]}   {encoded[6]}")
print()
 
# Verify parity checks
print("Parity verification:")
for k in range(3):
    check_pos = 1 << k
    positions_checked = [i+1 for i in range(7) if (i+1) & check_pos]
    values = [encoded[i] for i in range(7) if (i+1) & check_pos]
    xor_result = 0
    for v in values:
        xor_result ^= v
    print(f"P{check_pos}: checks positions {positions_checked}, XOR = {xor_result} {'✓' if xor_result == 0 else '✗'}")

Step-by-Step Encoding Example:

Let's encode the 4-bit data 1011 into Hamming(7,4) format:

Step 1: Determine codeword structure

4 data bits require 3 check bits (since 2³ = 8 ≥ 4 + 3 + 1 = 8)
Total codeword length: 7 bits
Check bit positions: 1, 2, 4
Data bit positions: 3, 5, 6, 7

Step 2: Place Data Bits
Position	1	2	3	4	5	6	7
Type	P₁	P₂	D₁	P₄	D₂	D₃	D₄
Value	?	?	1	?	0	1	1

Step 3: Calculate P₁ (covers positions 1, 3, 5, 7)

Already placed: position 3 = 1, position 5 = 0, position 7 = 1
XOR: 1 ⊕ 0 ⊕ 1 = 0
P₁ must make total parity = 0, so P₁ = 0

Step 4: Calculate P₂ (covers positions 2, 3, 6, 7)

Already placed: position 3 = 1, position 6 = 1, position 7 = 1
XOR: 1 ⊕ 1 ⊕ 1 = 1
P₂ must make total parity = 0, so P₂ = 1

Step 5: Calculate P₄ (covers positions 4, 5, 6, 7)

Already placed: position 5 = 0, position 6 = 1, position 7 = 1
XOR: 0 ⊕ 1 ⊕ 1 = 0
P₄ must make total parity = 0, so P₄ = 0

Final Codeword
Position	1	2	3	4	5	6	7
Type	P₁	P₂	D₁	P₄	D₂	D₃	D₄
Value	0	1	1	0	0	1	1

Encoding Complete

Data 1011 encodes to codeword 0110011. The three check bits (0, 1, 0) are computed to ensure even parity across each coverage set. This codeword is now protected against any single-bit error.

Alternative Position Conventions

Different textbooks and implementations use varying conventions for bit positioning. Understanding these variations prevents confusion when studying different sources.

Convention 1: Systematic Form (Power-of-Two Placement)

This is the convention we've been using:

Check bits at positions 1, 2, 4, 8, ...
Data bits fill remaining positions
Most common in theoretical treatments

Convention 2: Non-Systematic Form (Data First)

Some practical implementations place data bits first, followed by check bits:

Data bits at positions 1, 2, ..., m
Check bits at positions m+1, m+2, ..., m+r
Requires a different parity matrix
Used in some hardware implementations

Convention 3: Zero-Indexed Positions

Some sources number positions starting from 0:

Check bits at positions 0, 1, 3, 7, 15, ... (2^k - 1)
Changes the coverage formulas slightly
Less common but occasionally seen

Comparison of Hamming(7,4) Conventions
Convention	Position 1	Position 2	Position 3	Position 4	Position 5	Position 6	Position 7
Systematic (ours)	P₁	P₂	D₁	P₄	D₂	D₃	D₄
Data-first	D₁	D₂	D₃	D₄	P₁	P₂	P₄
Zero-indexed	P₁	P₂	D₁	P₄	D₂	D₃	D₄*

Convention Conflicts

When consulting multiple sources, always verify which convention is being used. The syndrome calculation and error correction procedures differ between conventions. Mixing conventions leads to incorrect results.

Converting Between Conventions:

When implementing systems that interface with external protocols, you may need to convert between conventions:

def systematic_to_data_first(systematic: list) -> list:
    """Convert from power-of-two convention to data-first convention."""
    n = len(systematic)
    data_bits = []
    check_bits = []
    
    for i, bit in enumerate(systematic, 1):  # 1-indexed
        if is_power_of_two(i):
            check_bits.append(bit)
        else:
            data_bits.append(bit)
    
    return data_bits + check_bits

The systematic (power-of-two) convention is generally preferred for teaching because it makes the relationship between syndrome values and error positions immediately apparent.

Visual Understanding of Coverage

A visual representation helps solidify understanding of which check bits cover which positions. We can represent coverage as overlapping sets or as a geometric pattern.

Venn Diagram Interpretation:

For Hamming(7,4), imagine three overlapping circles labeled P₁, P₂, and P₄:

              ┌─────────────────────────┐
              │         P₄               │
              │   ┌─────────────┐        │
              │   │      P₂     │        │
     ┌────────┼───┼──────┐      │        │
     │   P₁   │   │      │      │        │
     │        │   │ 7    │  6   │   4    │
     │   1    │   └──────┴──────┤        │
     │        │      3          │        │
     └────────┼─────────────────┤   5    │
              │   2             │        │
              └─────────────────┴────────┘

Position 1: Inside P₁ only
Position 2: Inside P₂ only
Position 3: Inside P₁ ∩ P₂
Position 4: Inside P₄ only
Position 5: Inside P₁ ∩ P₄
Position 6: Inside P₂ ∩ P₄
Position 7: Inside P₁ ∩ P₂ ∩ P₄

Coverage Pattern Summary

•P₁ region: Positions 1, 3, 5, 7 — every odd position
•P₂ region: Positions 2, 3, 6, 7 — positions with bit 1 set (2's place)
•P₄ region: Positions 4, 5, 6, 7 — positions with bit 2 set (4's place)
•Intersection P₁∩P₂: Positions 3, 7 — binary ends in '11'
•Intersection P₁∩P₄: Positions 5, 7 — binary has pattern '1X1'
•Intersection P₂∩P₄: Positions 6, 7 — binary has pattern '11X'
•Triple intersection: Position 7 only — binary '111'

The Skip-Check Pattern:

Another way to visualize coverage is the skip-check pattern:

Check Bit	Pattern	Positions Covered
P₁	Check 1, skip 1, repeat	1, 3, 5, 7, 9, 11, 13, 15, ...
P₂	Check 2, skip 2, repeat	2-3, 6-7, 10-11, 14-15, ...
P₄	Check 4, skip 4, repeat	4-7, 12-15, 20-23, ...
P₈	Check 8, skip 8, repeat	8-15, 24-31, 40-47, ...

This pattern emerges directly from how binary counting toggles each bit:

Bit 0 toggles every position (odd/even)
Bit 1 toggles every 2 positions
Bit 2 toggles every 4 positions
And so on...

Memorization Technique

For P₁: start at 1, check 1, skip 1. For P₂: start at 2, check 2, skip 2. For P₄: start at 4, check 4, skip 4. The check bit position determines both the starting point and the check/skip length. This pattern extends indefinitely for larger codes.

Hardware Implementation Considerations

The regular structure of check bit coverage enables highly efficient hardware implementations. Understanding these implementations reveals why Hamming codes remain popular in memory systems despite their age.

XOR Tree Architecture:

Each check bit is computed as the XOR of all data positions in its coverage set. This can be implemented as a tree of two-input XOR gates:

                     P₁
                      │
                    ┌─┴─┐
                   XOR  XOR
                  ┌─┴─┐ ┌─┴─┐
                 XOR XOR XOR XOR
                  │   │   │   │
                 D₁  D₂  D₃  D₄   (for covered positions)

Parallel Computation:

All check bits can be computed simultaneously since they operate on independent (though overlapping) subsets of data bits. The encoding process has:

Latency: O(log n) gate delays (depth of XOR tree)
Hardware complexity: O(n log n) gates total
Throughput: One codeword per clock cycle

Hardware Complexity for Common Hamming Codes
Code	XOR Gates for Encoding	XOR Gates for Decoding	Total Latency (Gate Delays)
Hamming(7,4)	9	12	3
Hamming(15,11)	26	34	4
Hamming(31,26)	60	78	5
Hamming(63,57)	133	172	6
Hamming(127,120)	287	371	7

Memory ECC Implementation:

In ECC memory systems, Hamming encoding and decoding happen on every memory access:

Write Path:
- CPU provides 64 data bits
- Encoder generates 8 check bits
- 72-bit codeword stored in DRAM
Read Path:
- 72-bit codeword read from DRAM
- Syndrome computed (8 bits)
- If non-zero, error position determined and corrected
- 64 corrected data bits returned to CPU

The entire encode/decode cycle adds only a few nanoseconds to memory latency—negligible compared to DRAM access times of 50-100 nanoseconds.

Modern ECC Memory

DDR5 memory includes on-die ECC, meaning error correction happens inside the DRAM chip itself using Hamming-like codes. This provides an additional layer of protection beyond system-level ECC, addressing the higher error rates of smaller memory cells.

Check Bit Calculation Algorithms

Let's examine multiple algorithmic approaches for computing check bits, ranging from straightforward implementations to optimized versions.

Algorithm 1: Position-by-Position Iteration

The most intuitive approach: for each check position, iterate through all positions and XOR those in the coverage set.

check_bit_algorithms.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def compute_check_bits_v1(codeword: list[int], n: int) -> list[int]:
    """
    Algorithm 1: Position-by-position iteration.
    
    For each check bit position 2^k, scan all positions and XOR
    those whose index has the k-th bit set.
    
    Time complexity: O(n * log n) - for each of log(n) check bits, scan n positions
    """
    result = codeword.copy()
    r = (n - 1).bit_length()  # Number of check bit positions
    
    for k in range(r):
        check_pos = 1 << k  # 2^k
        parity = 0
        
        for pos in range(1, n + 1):
            if pos & check_pos:  # Position covered by this check bit
                parity ^= result[pos - 1]  # 0-indexed access
        
        result[check_pos - 1] = parity  # Set check bit (already included in parity)
    
    return result
 
 
def compute_check_bits_v2(data: list[int]) -> list[int]:
    """
    Algorithm 2: Direct formula for each check bit.
    
    Pre-compute which data positions contribute to each check bit.
    More efficient for repeated encoding with same parameters.
    """
    m = len(data)
    r = 0
    while (1 << r) < m + r + 1:
        r += 1
    n = m + r
    
    # Build mapping: which data index goes to which codeword position
    codeword = [0] * n
    data_positions = []
    data_idx = 0
    
    for pos in range(1, n + 1):
        if pos & (pos - 1):  # Not a power of two
            codeword[pos - 1] = data[data_idx]
            data_positions.append((pos, data_idx))
            data_idx += 1
    
    # Compute each check bit directly
    for k in range(r):
        check_pos = 1 << k
        parity = 0
        
        for pos, _ in data_positions:
            if pos & check_pos:
                parity ^= codeword[pos - 1]
        
        codeword[check_pos - 1] = parity
    
    return codeword
 
 
def compute_check_bits_v3_matrix(data: list[int]) -> list[int]:
    """
    Algorithm 3: Matrix multiplication approach.
    
    Use the generator matrix for a more algebraic implementation.
    This is how hardware implementations often work.
    """
    m = len(data)
    r = 0
    while (1 << r) < m + r + 1:
        r += 1
    n = m + r
    
    # Build generator matrix G (m x n)
    # Each row corresponds to a data bit
    # The row shows where that bit contributes in the codeword
    G = []
    data_idx = 0
    
    for pos in range(1, n + 1):
        if pos & (pos - 1):  # Data position
            row = [0] * n
            row[pos - 1] = 1  # Data bit in its position
            
            # Also contributes to check bits
            for k in range(r):
                check_pos = 1 << k
                if pos & check_pos:
                    row[check_pos - 1] ^= 1
            
            G.append(row)
    
    # Codeword = data * G (over GF(2))
    codeword = [0] * n
    for i, d in enumerate(data):
        if d:
            for j in range(n):
                codeword[j] ^= G[i][j]
    
    return codeword
 
 
# Demonstration and verification
print("Check Bit Calculation Algorithms Comparison")
print("=" * 60)
 
test_data = [1, 0, 1, 1]  # 4-bit data
 
# Algorithm 1: Needs codeword with data pre-placed
m, r = 4, 3
n = m + r
codeword_v1 = [0] * n
data_idx = 0
for pos in range(1, n + 1):
    if pos & (pos - 1):
        codeword_v1[pos - 1] = test_data[data_idx]
        data_idx += 1
result_v1 = compute_check_bits_v1(codeword_v1, n)
 
result_v2 = compute_check_bits_v2(test_data)
result_v3 = compute_check_bits_v3_matrix(test_data)
 
print(f"Input data:   {test_data}")
print(f"Algorithm 1:  {result_v1}")
print(f"Algorithm 2:  {result_v2}")
print(f"Algorithm 3:  {result_v3}")
print()
print("All algorithms produce identical results:", 
      result_v1 == result_v2 == result_v3)

Algorithm Selection

For software implementations, Algorithm 2 (direct formula) is typically fastest. For hardware, the matrix approach translates directly to XOR gate networks. Algorithm 1 is clearest for learning and debugging.

Summary: Check Bit Positions

We have thoroughly examined how check bits are positioned in Hamming codes and the algorithms for computing their values. Let's consolidate the key takeaways:

Key Takeaways

•Check bits occupy power-of-two positions — Positions 1, 2, 4, 8, 16, ... are reserved for parity bits
•Coverage follows binary arithmetic — Check bit P_2^k covers all positions with bit k set in their binary representation
•The skip-check pattern provides intuition — P₁ checks every other position, P₂ checks pairs, P₄ checks quads, etc.
•Check bits are computed via XOR — Each check bit equals the XOR of all data bits in its coverage set
•Multiple conventions exist — Always verify whether power-of-two or data-first positioning is being used
•Hardware implementation is efficient — Parallel XOR trees enable single-cycle encoding/decoding

What's Next:

With check bit positions and encoding thoroughly understood, the next page tackles syndrome calculation—the decoding side of Hamming codes. We'll see how the pattern of parity check failures (the syndrome) directly encodes the error position, enabling correction without knowing the original data.

Page Complete

You now have a complete understanding of check bit placement and calculation in Hamming codes. The power-of-two positioning strategy creates the overlapping coverage that enables error location. Next, we'll decode received codewords and locate errors using syndrome calculations.