Data Structures & AlgorithmsHash Functions — Properties & Design

Hash Functions: Properties & Design

LevelIntermediate

Duration60 mins

TopicHash Functions — Properties & Design

4 / 4

Common Hash Function Examples

From Theory to Practice

We've learned the essential properties of hash functions: determinism for correctness, uniformity for performance, and efficiency for speed. But when you sit down to write code, which hash function should you actually use?

The answer depends on your context. Over decades of computer science research and practical engineering, dozens of hash functions have been developed—each with different tradeoffs. Some prioritize raw speed, others security, others simplicity. Some are decades old and battle-tested; others represent the cutting edge.

This page is your practical field guide to hash functions you'll actually encounter in the wild.

What You Will Learn

By the end of this page, you will understand: (1) Classic hash functions (djb2, FNV) and their enduring popularity, (2) Modern high-performance hash functions (MurmurHash, xxHash), (3) Cryptographic hash functions (SHA, Blake) and when to use them, (4) How to implement each from first principles, and (5) Practical guidelines for choosing the right one.

The Classic: djb2

djb2 was created by Daniel J. Bernstein and first appeared in comp.lang.c around 1991. Despite its simplicity, it has remained popular for over 30 years due to its excellent balance of simplicity, speed, and quality.

djb2 at a Glance
Property	Value
Creator	Daniel J. Bernstein
Year	~1991
Output size	32 bits (native word size)
Speed	Very fast (~3-5 GB/s)
Quality	Good for general purpose
Security	NOT cryptographic
Best for	Simple hash tables, string lookup

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def djb2(s: str) -> int:
    """
    The classic djb2 hash function by Daniel J. Bernstein.
    
    The magic number 33 (implemented as shift-add for speed) was chosen
    empirically - Bernstein tested many multipliers and found 33 produced
    excellent distribution for common string inputs.
    
    Starting value 5381 is also empirically chosen.
    """
    h = 5381  # Magic starting value
    
    for c in s:
        # h * 33 + c (using shift-add for efficiency)
        h = ((h << 5) + h + ord(c)) & 0xFFFFFFFF
    
    return h
 
# Alternative form using explicit multiplication
def djb2_multiply(s: str) -> int:
    """Same algorithm, more readable form."""
    h = 5381
    for c in s:
        h = ((h * 33) + ord(c)) & 0xFFFFFFFF
    return h
 
# Demonstration
test_strings = ["hello", "Hello", "HELLO", "world", "foo", "bar", ""]
 
print("djb2 Hash Examples")
print("="*50)
for s in test_strings:
    h = djb2(s)
    print(f"djb2('{s}') = {h} (0x{h:08X})")
 
# Verify both implementations are equivalent
for s in test_strings:
    assert djb2(s) == djb2_multiply(s), f"Mismatch for '{s}'"
print("
✓ Both implementations produce identical results")

Why 33 and 5381?

Bernstein tested many multipliers and starting values. The number 33 has useful properties:

It's 2⁵ + 1, allowing efficient implementation via shift-add
It's odd, which helps avoid collisions with even character distributions
Empirically, it produces low collision rates for typical string data

The starting value 5381 is a prime number chosen because it produces better distribution than starting with 0. The specific choice was empirical—Bernstein tested many primes.

When to Use djb2

djb2 is ideal when you need: (1) A simple, easy-to-implement hash function, (2) Fast hashing for string keys, (3) Reasonable distribution for general-purpose data. Avoid it for: security-sensitive applications, very large data, or when you suspect adversarial input.

The Workhorse: FNV (Fowler-Noll-Vo)

FNV (Fowler-Noll-Vo) is a family of hash functions created by Glenn Fowler, Landon Curt Noll, and Phong Vo. FNV-1a is the most commonly used variant, offering good speed and excellent distribution.

FNV-1a at a Glance
Property	Value
Creators	Fowler, Noll, Vo
Year	1991
Variants	FNV-0, FNV-1, FNV-1a (32, 64, 128, 256+ bits)
Speed	Fast (~3-5 GB/s)
Quality	Very good distribution
Security	NOT cryptographic
Best for	Hash tables, checksums, file identification

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# FNV-1a constants for different bit sizes
FNV_PARAMS = {
                    32: {
                        'offset_basis': 2166136261,
                        'prime': 16777619,
                        'mask': 0xFFFFFFFF,
                    },
                    64: {
                        'offset_basis': 14695981039346656037,
                        'prime': 1099511628211,
                        'mask': 0xFFFFFFFFFFFFFFFF,
                    },
                    128: {
                        'offset_basis': 144066263297769815596495629667062367629,
                        'prime': 309485009821345068724781371,
                        'mask': (1 << 128) - 1,
                    },
                }
 
def fnv1a(data: bytes, bits: int = 32) -> int:
    """
    FNV-1a hash function.
    
    The 'a' variant XORs first, then multiplies - this provides
    better avalanche characteristics than FNV-1 (multiply then XOR).
    """
    params = FNV_PARAMS[bits]
    h = params['offset_basis']
    
    for byte in data:
        h ^= byte                              # XOR first (1a variant)
        h = (h * params['prime']) & params['mask']  # Then multiply
    
    return h
 
def fnv1a_str(s: str, bits: int = 32) -> int:
    """Convenience wrapper for string input."""
    return fnv1a(s.encode('utf-8'), bits)
 
# The original FNV-1 (multiply first, then XOR) - less commonly used
def fnv1(data: bytes, bits: int = 32) -> int:
    """
    Original FNV-1 (not recommended - use FNV-1a instead).
    Multiplies first, then XORs.
    """
    params = FNV_PARAMS[bits]
    h = params['offset_basis']
    
    for byte in data:
        h = (h * params['prime']) & params['mask']  # Multiply first
        h ^= byte                              # Then XOR
    
    return h
 
# Demonstration
print("FNV-1a Hash Examples")
print("="*60)
 
test_strings = ["hello", "Hello", "world", "foo", "bar"]
 
for s in test_strings:
    h32 = fnv1a_str(s, 32)
    h64 = fnv1a_str(s, 64)
    print(f"fnv1a('{s}')")
    print(f"  32-bit: {h32:>12} (0x{h32:08X})")
    print(f"  64-bit: {h64:>20} (0x{h64:016X})")
 
# Comparison: FNV-1 vs FNV-1a (showing why 1a is better)
print("
" + "="*60)
print("FNV-1 vs FNV-1a Comparison")
print("="*60)
 
similar_strings = ["aa", "aA", "Aa", "AA", "ab", "ba"]
for s in similar_strings:
    data = s.encode()
    h1 = fnv1(data, 32)
    h1a = fnv1a(data, 32)
    bits_diff = bin(h1 ^ h1a).count('1')
    print(f"'{s}': FNV-1={h1:08X}, FNV-1a={h1a:08X}, diff={bits_diff} bits")

FNV-1 vs FNV-1a: Why the 'a' Matters

The only difference between FNV-1 and FNV-1a is the order of operations:

FNV-1: multiply first, then XOR with byte
FNV-1a: XOR with byte first, then multiply

FNV-1a provides better avalanche characteristics—small input changes create larger output changes. This makes FNV-1a more resistant to patterns in input data and is why it's the recommended variant.

FNV in the Wild

FNV is used in: DNS name hashing, Python's PyPy dictionary implementation, file fingerprinting systems, network protocol implementations, and many hash table libraries. Its reliable quality and simple implementation make it a safe default choice.

The Modern Standard: MurmurHash

MurmurHash was created by Austin Appleby in 2008. It quickly became a de facto standard for high-performance hash tables due to its excellent combination of speed, quality, and collision resistance. MurmurHash3 (2011) is the current version and is widely deployed in production systems.

MurmurHash3 at a Glance
Property	Value
Creator	Austin Appleby
Versions	MurmurHash1, 2, 2A, 3
Output sizes	32-bit, 128-bit
Speed	Very fast (~5-8 GB/s)
Quality	Excellent (passes SMHasher)
Security	NOT cryptographic
Best for	Hash tables, distributed systems, bloom filters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def murmur3_32(data: bytes, seed: int = 0) -> int:
    """
    MurmurHash3 32-bit implementation.
    
    Key design principles:
    1. Process data in 4-byte chunks for efficiency
    2. Use rotation (circular shift) for better mixing than regular shift
    3. Finalization step ensures all bits are well-mixed
    """
    c1 = 0xcc9e2d51
    c2 = 0x1b873593
    
    h = seed
    length = len(data)
    
    # Process 4-byte chunks
    nblocks = length // 4
    for block_start in range(0, nblocks * 4, 4):
        # Get next 4-byte block (little-endian)
        k = (data[block_start] | 
             (data[block_start + 1] << 8) |
             (data[block_start + 2] << 16) |
             (data[block_start + 3] << 24))
        
        # Mix block into hash
        k = (k * c1) & 0xFFFFFFFF
        k = ((k << 15) | (k >> 17)) & 0xFFFFFFFF  # ROTL32(k, 15)
        k = (k * c2) & 0xFFFFFFFF
        
        h ^= k
        h = ((h << 13) | (h >> 19)) & 0xFFFFFFFF  # ROTL32(h, 13)
        h = ((h * 5) + 0xe6546b64) & 0xFFFFFFFF
    
    # Process remaining bytes (tail)
    tail_start = nblocks * 4
    k = 0
    tail_size = length & 3
    
    if tail_size >= 3:
        k ^= data[tail_start + 2] << 16
    if tail_size >= 2:
        k ^= data[tail_start + 1] << 8
    if tail_size >= 1:
        k ^= data[tail_start]
        k = (k * c1) & 0xFFFFFFFF
        k = ((k << 15) | (k >> 17)) & 0xFFFFFFFF
        k = (k * c2) & 0xFFFFFFFF
        h ^= k
    
    # Finalization - ensure avalanche
    h ^= length
    
    # fmix32
    h ^= h >> 16
    h = (h * 0x85ebca6b) & 0xFFFFFFFF
    h ^= h >> 13
    h = (h * 0xc2b2ae35) & 0xFFFFFFFF
    h ^= h >> 16
    
    return h
 
def murmur3_str(s: str, seed: int = 0) -> int:
    """Convenience wrapper for strings."""
    return murmur3_32(s.encode('utf-8'), seed)
 
# Demonstration
print("MurmurHash3 Examples")
print("="*60)
 
test_strings = ["hello", "Hello", "world", "foo", "bar"]
 
for s in test_strings:
    h = murmur3_str(s)
    print(f"murmur3('{s}') = {h:>12} (0x{h:08X})")
 
# Demonstrate seed feature (useful for keyed hashing)
print("
" + "="*60)
print("Seeded Hashing (different seeds give different results)")
print("="*60)
 
for seed in [0, 42, 12345]:
    h = murmur3_str("hello", seed)
    print(f"murmur3('hello', seed={seed:>5}) = 0x{h:08X}")

Why MurmurHash Dominates Modern Systems

SMHasher-tested: Austin Appleby created SMHasher, a comprehensive test suite for hash functions. MurmurHash3 passes all tests with flying colors.
Seed support: The seed parameter allows keyed hashing, which helps protect against hash collision attacks.
Cache-friendly: Processes data in 4-byte blocks aligned with typical CPU cache lines.
Finalization: The fmix32 step ensures all bits depend on all input, providing excellent avalanche.

MurmurHash in Production

MurmurHash3 is used in: Apache Hadoop/HBase, Redis, Cassandra, Elasticsearch, libcuckoo, Google's CityHash (inspired by), and countless internal systems at major tech companies. If you need a modern, well-tested, non-cryptographic hash, MurmurHash3 is an excellent default choice.

The Speed Demon: xxHash

xxHash was created by Yann Collet (who also created LZ4 and Zstandard compression) in 2012 specifically to be the fastest hash function while maintaining excellent quality. It's currently the speed champion among well-distributed hash functions.

xxHash at a Glance
Property	Value
Creator	Yann Collet
Versions	xxHash32, xxHash64, xxH3 (latest)
Speed	~15-30 GB/s (fastest non-crypto)
Quality	Excellent (passes SMHasher)
Security	NOT cryptographic
Best for	High-throughput hashing, data pipelines, file hashing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# xxHash64 constants (primes)
PRIME64_1 = 0x9E3779B185EBCA87
PRIME64_2 = 0xC2B2AE3D27D4EB4F
PRIME64_3 = 0x165667B19E3779F9
PRIME64_4 = 0x85EBCA77C2B2AE63
PRIME64_5 = 0x27D4EB2F165667C5
 
MASK64 = 0xFFFFFFFFFFFFFFFF
 
def rotl64(x: int, r: int) -> int:
    """64-bit rotate left."""
    return ((x << r) | (x >> (64 - r))) & MASK64
 
def xxh64_round(acc: int, val: int) -> int:
    """Process one lane."""
    acc = (acc + (val * PRIME64_2)) & MASK64
    acc = rotl64(acc, 31)
    acc = (acc * PRIME64_1) & MASK64
    return acc
 
def xxh64_avalanche(h: int) -> int:
    """Finalization avalanche."""
    h ^= h >> 33
    h = (h * PRIME64_2) & MASK64
    h ^= h >> 29
    h = (h * PRIME64_3) & MASK64
    h ^= h >> 32
    return h
 
def xxhash64(data: bytes, seed: int = 0) -> int:
    """
    xxHash64 implementation.
    
    For large inputs, uses 4 parallel accumulators for speed.
    For small inputs, uses a simpler single-accumulator approach.
    """
    length = len(data)
    h = 0
    
    if length >= 32:
        # Initialize 4 accumulators
        v1 = (seed + PRIME64_1 + PRIME64_2) & MASK64
        v2 = (seed + PRIME64_2) & MASK64
        v3 = seed
        v4 = (seed - PRIME64_1) & MASK64
        
        # Process 32-byte blocks
        i = 0
        while i + 32 <= length:
            v1 = xxh64_round(v1, int.from_bytes(data[i:i+8], 'little'))
            v2 = xxh64_round(v2, int.from_bytes(data[i+8:i+16], 'little'))
            v3 = xxh64_round(v3, int.from_bytes(data[i+16:i+24], 'little'))
            v4 = xxh64_round(v4, int.from_bytes(data[i+24:i+32], 'little'))
            i += 32
        
        # Merge accumulators
        h = rotl64(v1, 1) + rotl64(v2, 7) + rotl64(v3, 12) + rotl64(v4, 18)
        h = (h ^ xxh64_round(0, v1)) & MASK64
        h = (h * PRIME64_1 + PRIME64_4) & MASK64
        h = (h ^ xxh64_round(0, v2)) & MASK64
        h = (h * PRIME64_1 + PRIME64_4) & MASK64
        h = (h ^ xxh64_round(0, v3)) & MASK64
        h = (h * PRIME64_1 + PRIME64_4) & MASK64
        h = (h ^ xxh64_round(0, v4)) & MASK64
        h = (h * PRIME64_1 + PRIME64_4) & MASK64
    else:
        h = (seed + PRIME64_5) & MASK64
        i = 0
    
    h = (h + length) & MASK64
    
    # Process remaining 8-byte chunks
    while i + 8 <= length:
        k = int.from_bytes(data[i:i+8], 'little')
        k = (k * PRIME64_2) & MASK64
        k = rotl64(k, 31)
        k = (k * PRIME64_1) & MASK64
        h ^= k
        h = ((rotl64(h, 27) * PRIME64_1) + PRIME64_4) & MASK64
        i += 8
    
    # Process remaining 4-byte chunk
    if i + 4 <= length:
        k = int.from_bytes(data[i:i+4], 'little')
        h ^= (k * PRIME64_1) & MASK64
        h = ((rotl64(h, 23) * PRIME64_2) + PRIME64_3) & MASK64
        i += 4
    
    # Process remaining bytes
    while i < length:
        h ^= (data[i] * PRIME64_5) & MASK64
        h = (rotl64(h, 11) * PRIME64_1) & MASK64
        i += 1
    
    return xxh64_avalanche(h)
 
# Demonstration
print("xxHash64 Examples")
print("="*60)
 
test_strings = ["hello", "Hello", "world", "a", ""]
 
for s in test_strings:
    h = xxhash64(s.encode())
    print(f"xxhash64('{s}') = {h} (0x{h:016X})")
 
# Speed is xxHash's main feature - show with larger data
import time
large_data = b"x" * 1000000  # 1 MB
 
start = time.perf_counter()
for _ in range(10):
    xxhash64(large_data)
elapsed = time.perf_counter() - start
 
mb_hashed = 10 * 1  # 10 iterations × 1 MB
throughput = mb_hashed / elapsed
 
print(f"
Performance (1 MB data × 10 iterations):")
print(f"  Time: {elapsed*1000:.2f} ms")
print(f"  Throughput: {throughput:.2f} MB/s (Python implementation)")
print(f"  (Native C implementation would be 50-100× faster)")

xxHash3: The Next Generation

xxH3 (2019) is the latest version, designed specifically for modern CPUs with AVX2/NEON SIMD instructions. It can achieve:

31+ GB/s on large inputs (using SIMD)
Competitive speed even on small inputs
64-bit and 128-bit output variants

If you need maximum speed for large data hashing and have a modern CPU, xxH3 is the current champion.

xxHash in Production

xxHash is used in: LZ4 compression, Zstandard compression, FreeBSD kernel, Linux kernel (SquashFS), many game engines, and high-frequency trading systems. Wherever maximum hashing throughput matters, xxHash is often the choice.

When Security Matters: Cryptographic Hash Functions

All the hash functions we've discussed so far are non-cryptographic. They're fast and well-distributed, but they're NOT suitable for security-critical applications. For those, you need cryptographic hash functions with additional security properties.

Cryptographic Hash Function Requirements

•Pre-image resistance: Given hash h, it's computationally infeasible to find any m such that hash(m) = h.
•Second pre-image resistance: Given m₁, it's infeasible to find m₂ ≠ m₁ such that hash(m₁) = hash(m₂).
•Collision resistance: It's infeasible to find ANY two different messages m₁, m₂ such that hash(m₁) = hash(m₂).
•Determinism: Same input always produces same output (like all hash functions).
•Avalanche effect: Small changes cause ~50% of output bits to change (very strict).

Common Cryptographic Hash Functions
Hash	Output Size	Speed	Status	Use Case
MD5	128 bits	~500 MB/s	⚠️ BROKEN	Legacy only, not for security
SHA-1	160 bits	~400 MB/s	⚠️ BROKEN	Legacy code, Git (moving away)
SHA-256	256 bits	~200 MB/s	✓ Secure	General purpose, Bitcoin
SHA-512	512 bits	~300 MB/s	✓ Secure	High security, better on 64-bit
SHA-3	Configurable	~100 MB/s	✓ Secure	NIST standard, different design
Blake2	Up to 512 bits	~800 MB/s	✓ Secure	Fast + secure, modern choice
Blake3	256 bits (extendable)	~3 GB/s	✓ Secure	Fastest secure hash (with SIMD)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import hashlib
 
def demonstrate_crypto_hashes():
    """Show various cryptographic hash functions."""
    
    test_data = b"The quick brown fox jumps over the lazy dog"
    
    print("Cryptographic Hash Function Comparison")
    print("="*70)
    print(f"Input: '{test_data.decode()}'")
    print()
    
    # MD5 - BROKEN, shown for history/comparison
    md5 = hashlib.md5(test_data).hexdigest()
    print(f"MD5 (BROKEN - don't use for security):")
    print(f"  {md5}")
    
    # SHA-1 - BROKEN
    sha1 = hashlib.sha1(test_data).hexdigest()
    print(f"
SHA-1 (BROKEN - being phased out):")
    print(f"  {sha1}")
    
    # SHA-256 - Current standard
    sha256 = hashlib.sha256(test_data).hexdigest()
    print(f"
SHA-256 (Secure - recommended):")
    print(f"  {sha256}")
    
    # SHA-512 - Longer output
    sha512 = hashlib.sha512(test_data).hexdigest()
    print(f"
SHA-512 (Secure - high security):")
    print(f"  {sha512}")
    
    # SHA3-256 - Different construction
    sha3_256 = hashlib.sha3_256(test_data).hexdigest()
    print(f"
SHA3-256 (Secure - different design):")
    print(f"  {sha3_256}")
    
    # Blake2b - Fast and secure
    blake2 = hashlib.blake2b(test_data).hexdigest()
    print(f"
Blake2b (Secure + faster than SHA):")
    print(f"  {blake2}")
 
# Demonstrate security properties
def security_properties_demo():
    print("
" + "="*70)
    print("SECURITY PROPERTIES DEMONSTRATION")
    print("="*70)
    
    # Avalanche effect
    msg1 = b"hello"
    msg2 = b"hellp"  # One bit different
    
    h1 = hashlib.sha256(msg1).digest()
    h2 = hashlib.sha256(msg2).digest()
    
    # Count differing bits
    diff_bits = sum(bin(b1 ^ b2).count('1') for b1, b2 in zip(h1, h2))
    total_bits = len(h1) * 8
    
    print(f"
Avalanche Effect (SHA-256):")
    print(f"  Message 1: '{msg1.decode()}'")
    print(f"  Message 2: '{msg2.decode()}'")
    print(f"  Hash 1: {h1.hex()}")
    print(f"  Hash 2: {h2.hex()}")
    print(f"  Different bits: {diff_bits}/{total_bits} ({100*diff_bits/total_bits:.1f}%)")
    print(f"  (Ideal: ~50% = ~128 bits)")
    
    # Pre-image resistance
    print(f"
Pre-image Resistance:")
    target_hash = "5e884898da28047d9f1c6c5f..."  # SHA-256 of "password"
    print(f"  Given hash: {target_hash}")
    print(f"  Finding input that produces this hash would take ~2^256 attempts")
    print(f"  At 1 billion attempts/second: ~10^68 years")
 
demonstrate_crypto_hashes()
security_properties_demo()

Never Use MD5 or SHA-1 for Security

MD5 has been broken since 2004—collisions can be found in seconds. SHA-1 was broken in 2017 (SHAttered attack). For any security purpose (signatures, password hashing, integrity verification), use SHA-256 or newer. MD5/SHA-1 are only acceptable for non-security purposes like checksums or hash tables.

Choosing the Right Hash Function: A Decision Framework

With so many options, how do you choose? Here's a practical decision framework based on your requirements:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
START: What is your use case?
│
├─► Security-critical (passwords, signatures, integrity)?
│   │
│   ├─► Password storage?
│   │   └─► Use: Argon2, bcrypt, or scrypt (NOT SHA-256!)
│   │
│   ├─► Digital signatures or content addressing?
│   │   └─► Use: SHA-256, SHA-3, or Blake2b
│   │
│   └─► General cryptographic use?
│       └─► Use: SHA-256 (standard) or Blake2 (faster)
│
└─► Non-security (hash tables, checksums)?
    │
    ├─► External/untrusted input?
    │   │
    │   ├─► Need to prevent collision attacks?
    │   │   └─► Use: SipHash (with secret seed) or MurmurHash3
    │   │
    │   └─► Moderate threat model?
    │       └─► Use: MurmurHash3 or FNV-1a
    │
    ├─► Maximum speed is critical?
    │   │
    │   ├─► Large data (>1KB)?
    │   │   └─► Use: xxHash3 or xxHash64
    │   │
    │   └─► Small data (<100 bytes)?
    │       └─► Use: MurmurHash3 or FNV-1a
    │
    ├─► Need determinism across runs/machines?
    │   └─► Use: FNV-1a, MurmurHash3 (avoid Python's hash())
    │
    ├─► Simplicity is priority (learning/embedded)?
    │   └─► Use: djb2 or FNV-1a
    │
    └─► Just need something reasonable?
        └─► Use: Your language's built-in hash or MurmurHash3

Quick Reference: Hash Function Summary
Hash	Speed	Quality	Security	Best For
djb2	★★★★☆	★★★☆☆	None	Simple string hashing, learning
FNV-1a	★★★★☆	★★★★☆	None	General purpose, deterministic
MurmurHash3	★★★★★	★★★★★	None	Hash tables, distributed systems
xxHash64	★★★★★+	★★★★★	None	High throughput, large data
SipHash	★★★☆☆	★★★★☆	Keyed	Protection against collision attacks
SHA-256	★★☆☆☆	★★★★★	Cryptographic	Signatures, integrity, blockchain
Blake2b	★★★★☆	★★★★★	Cryptographic	Fast + secure applications

When in Doubt

If you're unsure: For hash tables, use your language's built-in dictionary/map (they choose a good hash for you). For explicit hashing needs, MurmurHash3 is a safe, well-tested choice. For security, use SHA-256 or Blake2. Never write your own cryptographic hash function.

Summary: Your Hash Function Toolkit

We've surveyed the landscape of hash functions, from classic designs to modern speed demons to cryptographic standards. You now have a practical toolkit for choosing the right hash function for any situation.

Key Takeaways

•djb2 and FNV-1a: Simple, fast, good enough for most non-security uses. Great for learning and embedded systems.
•MurmurHash3: The modern standard. Excellent speed and quality. Use for hash tables and distributed systems.
•xxHash: When maximum speed is critical. Dominates for large data throughput.
•Cryptographic hashes (SHA-256, Blake2): When security matters. Slower but provide strong guarantees.
•Never use MD5/SHA-1 for security: They're cryptographically broken.
•Match hash to use case: Speed, quality, and security have tradeoffs. Choose based on your needs.

Module Complete:

Congratulations! You've completed Module 3: Hash Functions — Properties & Design. You now understand:

Determinism: Same input must always produce same output (correctness)
Uniformity: Outputs should spread evenly across the range (performance)
Efficiency: Hash computation should be fast (practical usefulness)
Real-world options: From djb2 to SHA-256, you know when to use each

This knowledge forms the foundation for understanding hash tables, which we'll explore in the next module.

Module Complete

You now have comprehensive knowledge of hash function properties and practical options. You can evaluate hash functions for your specific needs, understand their tradeoffs, and make informed decisions. This foundation prepares you for understanding hash table implementations in the next module.

4 / 4

Loading learning content...

Data Structures & AlgorithmsHash Functions — Properties & Design

Hash Functions: Properties & Design

LevelIntermediate

Duration60 mins

TopicHash Functions — Properties & Design

4 / 4

Common Hash Function Examples

From Theory to Practice

This page is your practical field guide to hash functions you'll actually encounter in the wild.

What You Will Learn

The Classic: djb2

djb2 at a Glance
Property	Value
Creator	Daniel J. Bernstein
Year	~1991
Output size	32 bits (native word size)
Speed	Very fast (~3-5 GB/s)
Quality	Good for general purpose
Security	NOT cryptographic
Best for	Simple hash tables, string lookup

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def djb2(s: str) -> int:
    """
    The classic djb2 hash function by Daniel J. Bernstein.
    
    The magic number 33 (implemented as shift-add for speed) was chosen
    empirically - Bernstein tested many multipliers and found 33 produced
    excellent distribution for common string inputs.
    
    Starting value 5381 is also empirically chosen.
    """
    h = 5381  # Magic starting value
    
    for c in s:
        # h * 33 + c (using shift-add for efficiency)
        h = ((h << 5) + h + ord(c)) & 0xFFFFFFFF
    
    return h
 
# Alternative form using explicit multiplication
def djb2_multiply(s: str) -> int:
    """Same algorithm, more readable form."""
    h = 5381
    for c in s:
        h = ((h * 33) + ord(c)) & 0xFFFFFFFF
    return h
 
# Demonstration
test_strings = ["hello", "Hello", "HELLO", "world", "foo", "bar", ""]
 
print("djb2 Hash Examples")
print("="*50)
for s in test_strings:
    h = djb2(s)
    print(f"djb2('{s}') = {h} (0x{h:08X})")
 
# Verify both implementations are equivalent
for s in test_strings:
    assert djb2(s) == djb2_multiply(s), f"Mismatch for '{s}'"
print("
✓ Both implementations produce identical results")

Why 33 and 5381?

Bernstein tested many multipliers and starting values. The number 33 has useful properties:

It's 2⁵ + 1, allowing efficient implementation via shift-add
It's odd, which helps avoid collisions with even character distributions
Empirically, it produces low collision rates for typical string data

The starting value 5381 is a prime number chosen because it produces better distribution than starting with 0. The specific choice was empirical—Bernstein tested many primes.

When to Use djb2

The Workhorse: FNV (Fowler-Noll-Vo)

FNV-1a at a Glance
Property	Value
Creators	Fowler, Noll, Vo
Year	1991
Variants	FNV-0, FNV-1, FNV-1a (32, 64, 128, 256+ bits)
Speed	Fast (~3-5 GB/s)
Quality	Very good distribution
Security	NOT cryptographic
Best for	Hash tables, checksums, file identification

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# FNV-1a constants for different bit sizes
FNV_PARAMS = {
                    32: {
                        'offset_basis': 2166136261,
                        'prime': 16777619,
                        'mask': 0xFFFFFFFF,
                    },
                    64: {
                        'offset_basis': 14695981039346656037,
                        'prime': 1099511628211,
                        'mask': 0xFFFFFFFFFFFFFFFF,
                    },
                    128: {
                        'offset_basis': 144066263297769815596495629667062367629,
                        'prime': 309485009821345068724781371,
                        'mask': (1 << 128) - 1,
                    },
                }
 
def fnv1a(data: bytes, bits: int = 32) -> int:
    """
    FNV-1a hash function.
    
    The 'a' variant XORs first, then multiplies - this provides
    better avalanche characteristics than FNV-1 (multiply then XOR).
    """
    params = FNV_PARAMS[bits]
    h = params['offset_basis']
    
    for byte in data:
        h ^= byte                              # XOR first (1a variant)
        h = (h * params['prime']) & params['mask']  # Then multiply
    
    return h
 
def fnv1a_str(s: str, bits: int = 32) -> int:
    """Convenience wrapper for string input."""
    return fnv1a(s.encode('utf-8'), bits)
 
# The original FNV-1 (multiply first, then XOR) - less commonly used
def fnv1(data: bytes, bits: int = 32) -> int:
    """
    Original FNV-1 (not recommended - use FNV-1a instead).
    Multiplies first, then XORs.
    """
    params = FNV_PARAMS[bits]
    h = params['offset_basis']
    
    for byte in data:
        h = (h * params['prime']) & params['mask']  # Multiply first
        h ^= byte                              # Then XOR
    
    return h
 
# Demonstration
print("FNV-1a Hash Examples")
print("="*60)
 
test_strings = ["hello", "Hello", "world", "foo", "bar"]
 
for s in test_strings:
    h32 = fnv1a_str(s, 32)
    h64 = fnv1a_str(s, 64)
    print(f"fnv1a('{s}')")
    print(f"  32-bit: {h32:>12} (0x{h32:08X})")
    print(f"  64-bit: {h64:>20} (0x{h64:016X})")
 
# Comparison: FNV-1 vs FNV-1a (showing why 1a is better)
print("
" + "="*60)
print("FNV-1 vs FNV-1a Comparison")
print("="*60)
 
similar_strings = ["aa", "aA", "Aa", "AA", "ab", "ba"]
for s in similar_strings:
    data = s.encode()
    h1 = fnv1(data, 32)
    h1a = fnv1a(data, 32)
    bits_diff = bin(h1 ^ h1a).count('1')
    print(f"'{s}': FNV-1={h1:08X}, FNV-1a={h1a:08X}, diff={bits_diff} bits")

FNV-1 vs FNV-1a: Why the 'a' Matters

The only difference between FNV-1 and FNV-1a is the order of operations:

FNV-1: multiply first, then XOR with byte
FNV-1a: XOR with byte first, then multiply

FNV in the Wild

The Modern Standard: MurmurHash

MurmurHash3 at a Glance
Property	Value
Creator	Austin Appleby
Versions	MurmurHash1, 2, 2A, 3
Output sizes	32-bit, 128-bit
Speed	Very fast (~5-8 GB/s)
Quality	Excellent (passes SMHasher)
Security	NOT cryptographic
Best for	Hash tables, distributed systems, bloom filters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def murmur3_32(data: bytes, seed: int = 0) -> int:
    """
    MurmurHash3 32-bit implementation.
    
    Key design principles:
    1. Process data in 4-byte chunks for efficiency
    2. Use rotation (circular shift) for better mixing than regular shift
    3. Finalization step ensures all bits are well-mixed
    """
    c1 = 0xcc9e2d51
    c2 = 0x1b873593
    
    h = seed
    length = len(data)
    
    # Process 4-byte chunks
    nblocks = length // 4
    for block_start in range(0, nblocks * 4, 4):
        # Get next 4-byte block (little-endian)
        k = (data[block_start] | 
             (data[block_start + 1] << 8) |
             (data[block_start + 2] << 16) |
             (data[block_start + 3] << 24))
        
        # Mix block into hash
        k = (k * c1) & 0xFFFFFFFF
        k = ((k << 15) | (k >> 17)) & 0xFFFFFFFF  # ROTL32(k, 15)
        k = (k * c2) & 0xFFFFFFFF
        
        h ^= k
        h = ((h << 13) | (h >> 19)) & 0xFFFFFFFF  # ROTL32(h, 13)
        h = ((h * 5) + 0xe6546b64) & 0xFFFFFFFF
    
    # Process remaining bytes (tail)
    tail_start = nblocks * 4
    k = 0
    tail_size = length & 3
    
    if tail_size >= 3:
        k ^= data[tail_start + 2] << 16
    if tail_size >= 2:
        k ^= data[tail_start + 1] << 8
    if tail_size >= 1:
        k ^= data[tail_start]
        k = (k * c1) & 0xFFFFFFFF
        k = ((k << 15) | (k >> 17)) & 0xFFFFFFFF
        k = (k * c2) & 0xFFFFFFFF
        h ^= k
    
    # Finalization - ensure avalanche
    h ^= length
    
    # fmix32
    h ^= h >> 16
    h = (h * 0x85ebca6b) & 0xFFFFFFFF
    h ^= h >> 13
    h = (h * 0xc2b2ae35) & 0xFFFFFFFF
    h ^= h >> 16
    
    return h
 
def murmur3_str(s: str, seed: int = 0) -> int:
    """Convenience wrapper for strings."""
    return murmur3_32(s.encode('utf-8'), seed)
 
# Demonstration
print("MurmurHash3 Examples")
print("="*60)
 
test_strings = ["hello", "Hello", "world", "foo", "bar"]
 
for s in test_strings:
    h = murmur3_str(s)
    print(f"murmur3('{s}') = {h:>12} (0x{h:08X})")
 
# Demonstrate seed feature (useful for keyed hashing)
print("
" + "="*60)
print("Seeded Hashing (different seeds give different results)")
print("="*60)
 
for seed in [0, 42, 12345]:
    h = murmur3_str("hello", seed)
    print(f"murmur3('hello', seed={seed:>5}) = 0x{h:08X}")

Why MurmurHash Dominates Modern Systems

SMHasher-tested: Austin Appleby created SMHasher, a comprehensive test suite for hash functions. MurmurHash3 passes all tests with flying colors.
Seed support: The seed parameter allows keyed hashing, which helps protect against hash collision attacks.
Cache-friendly: Processes data in 4-byte blocks aligned with typical CPU cache lines.
Finalization: The fmix32 step ensures all bits depend on all input, providing excellent avalanche.

MurmurHash in Production

The Speed Demon: xxHash

xxHash at a Glance
Property	Value
Creator	Yann Collet
Versions	xxHash32, xxHash64, xxH3 (latest)
Speed	~15-30 GB/s (fastest non-crypto)
Quality	Excellent (passes SMHasher)
Security	NOT cryptographic
Best for	High-throughput hashing, data pipelines, file hashing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# xxHash64 constants (primes)
PRIME64_1 = 0x9E3779B185EBCA87
PRIME64_2 = 0xC2B2AE3D27D4EB4F
PRIME64_3 = 0x165667B19E3779F9
PRIME64_4 = 0x85EBCA77C2B2AE63
PRIME64_5 = 0x27D4EB2F165667C5
 
MASK64 = 0xFFFFFFFFFFFFFFFF
 
def rotl64(x: int, r: int) -> int:
    """64-bit rotate left."""
    return ((x << r) | (x >> (64 - r))) & MASK64
 
def xxh64_round(acc: int, val: int) -> int:
    """Process one lane."""
    acc = (acc + (val * PRIME64_2)) & MASK64
    acc = rotl64(acc, 31)
    acc = (acc * PRIME64_1) & MASK64
    return acc
 
def xxh64_avalanche(h: int) -> int:
    """Finalization avalanche."""
    h ^= h >> 33
    h = (h * PRIME64_2) & MASK64
    h ^= h >> 29
    h = (h * PRIME64_3) & MASK64
    h ^= h >> 32
    return h
 
def xxhash64(data: bytes, seed: int = 0) -> int:
    """
    xxHash64 implementation.
    
    For large inputs, uses 4 parallel accumulators for speed.
    For small inputs, uses a simpler single-accumulator approach.
    """
    length = len(data)
    h = 0
    
    if length >= 32:
        # Initialize 4 accumulators
        v1 = (seed + PRIME64_1 + PRIME64_2) & MASK64
        v2 = (seed + PRIME64_2) & MASK64
        v3 = seed
        v4 = (seed - PRIME64_1) & MASK64
        
        # Process 32-byte blocks
        i = 0
        while i + 32 <= length:
            v1 = xxh64_round(v1, int.from_bytes(data[i:i+8], 'little'))
            v2 = xxh64_round(v2, int.from_bytes(data[i+8:i+16], 'little'))
            v3 = xxh64_round(v3, int.from_bytes(data[i+16:i+24], 'little'))
            v4 = xxh64_round(v4, int.from_bytes(data[i+24:i+32], 'little'))
            i += 32
        
        # Merge accumulators
        h = rotl64(v1, 1) + rotl64(v2, 7) + rotl64(v3, 12) + rotl64(v4, 18)
        h = (h ^ xxh64_round(0, v1)) & MASK64
        h = (h * PRIME64_1 + PRIME64_4) & MASK64
        h = (h ^ xxh64_round(0, v2)) & MASK64
        h = (h * PRIME64_1 + PRIME64_4) & MASK64
        h = (h ^ xxh64_round(0, v3)) & MASK64
        h = (h * PRIME64_1 + PRIME64_4) & MASK64
        h = (h ^ xxh64_round(0, v4)) & MASK64
        h = (h * PRIME64_1 + PRIME64_4) & MASK64
    else:
        h = (seed + PRIME64_5) & MASK64
        i = 0
    
    h = (h + length) & MASK64
    
    # Process remaining 8-byte chunks
    while i + 8 <= length:
        k = int.from_bytes(data[i:i+8], 'little')
        k = (k * PRIME64_2) & MASK64
        k = rotl64(k, 31)
        k = (k * PRIME64_1) & MASK64
        h ^= k
        h = ((rotl64(h, 27) * PRIME64_1) + PRIME64_4) & MASK64
        i += 8
    
    # Process remaining 4-byte chunk
    if i + 4 <= length:
        k = int.from_bytes(data[i:i+4], 'little')
        h ^= (k * PRIME64_1) & MASK64
        h = ((rotl64(h, 23) * PRIME64_2) + PRIME64_3) & MASK64
        i += 4
    
    # Process remaining bytes
    while i < length:
        h ^= (data[i] * PRIME64_5) & MASK64
        h = (rotl64(h, 11) * PRIME64_1) & MASK64
        i += 1
    
    return xxh64_avalanche(h)
 
# Demonstration
print("xxHash64 Examples")
print("="*60)
 
test_strings = ["hello", "Hello", "world", "a", ""]
 
for s in test_strings:
    h = xxhash64(s.encode())
    print(f"xxhash64('{s}') = {h} (0x{h:016X})")
 
# Speed is xxHash's main feature - show with larger data
import time
large_data = b"x" * 1000000  # 1 MB
 
start = time.perf_counter()
for _ in range(10):
    xxhash64(large_data)
elapsed = time.perf_counter() - start
 
mb_hashed = 10 * 1  # 10 iterations × 1 MB
throughput = mb_hashed / elapsed
 
print(f"
Performance (1 MB data × 10 iterations):")
print(f"  Time: {elapsed*1000:.2f} ms")
print(f"  Throughput: {throughput:.2f} MB/s (Python implementation)")
print(f"  (Native C implementation would be 50-100× faster)")

xxHash3: The Next Generation

xxH3 (2019) is the latest version, designed specifically for modern CPUs with AVX2/NEON SIMD instructions. It can achieve:

31+ GB/s on large inputs (using SIMD)
Competitive speed even on small inputs
64-bit and 128-bit output variants

If you need maximum speed for large data hashing and have a modern CPU, xxH3 is the current champion.

xxHash in Production

When Security Matters: Cryptographic Hash Functions

Cryptographic Hash Function Requirements

•Pre-image resistance: Given hash h, it's computationally infeasible to find any m such that hash(m) = h.
•Second pre-image resistance: Given m₁, it's infeasible to find m₂ ≠ m₁ such that hash(m₁) = hash(m₂).
•Collision resistance: It's infeasible to find ANY two different messages m₁, m₂ such that hash(m₁) = hash(m₂).
•Determinism: Same input always produces same output (like all hash functions).
•Avalanche effect: Small changes cause ~50% of output bits to change (very strict).

Common Cryptographic Hash Functions
Hash	Output Size	Speed	Status	Use Case
MD5	128 bits	~500 MB/s	⚠️ BROKEN	Legacy only, not for security
SHA-1	160 bits	~400 MB/s	⚠️ BROKEN	Legacy code, Git (moving away)
SHA-256	256 bits	~200 MB/s	✓ Secure	General purpose, Bitcoin
SHA-512	512 bits	~300 MB/s	✓ Secure	High security, better on 64-bit
SHA-3	Configurable	~100 MB/s	✓ Secure	NIST standard, different design
Blake2	Up to 512 bits	~800 MB/s	✓ Secure	Fast + secure, modern choice
Blake3	256 bits (extendable)	~3 GB/s	✓ Secure	Fastest secure hash (with SIMD)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import hashlib
 
def demonstrate_crypto_hashes():
    """Show various cryptographic hash functions."""
    
    test_data = b"The quick brown fox jumps over the lazy dog"
    
    print("Cryptographic Hash Function Comparison")
    print("="*70)
    print(f"Input: '{test_data.decode()}'")
    print()
    
    # MD5 - BROKEN, shown for history/comparison
    md5 = hashlib.md5(test_data).hexdigest()
    print(f"MD5 (BROKEN - don't use for security):")
    print(f"  {md5}")
    
    # SHA-1 - BROKEN
    sha1 = hashlib.sha1(test_data).hexdigest()
    print(f"
SHA-1 (BROKEN - being phased out):")
    print(f"  {sha1}")
    
    # SHA-256 - Current standard
    sha256 = hashlib.sha256(test_data).hexdigest()
    print(f"
SHA-256 (Secure - recommended):")
    print(f"  {sha256}")
    
    # SHA-512 - Longer output
    sha512 = hashlib.sha512(test_data).hexdigest()
    print(f"
SHA-512 (Secure - high security):")
    print(f"  {sha512}")
    
    # SHA3-256 - Different construction
    sha3_256 = hashlib.sha3_256(test_data).hexdigest()
    print(f"
SHA3-256 (Secure - different design):")
    print(f"  {sha3_256}")
    
    # Blake2b - Fast and secure
    blake2 = hashlib.blake2b(test_data).hexdigest()
    print(f"
Blake2b (Secure + faster than SHA):")
    print(f"  {blake2}")
 
# Demonstrate security properties
def security_properties_demo():
    print("
" + "="*70)
    print("SECURITY PROPERTIES DEMONSTRATION")
    print("="*70)
    
    # Avalanche effect
    msg1 = b"hello"
    msg2 = b"hellp"  # One bit different
    
    h1 = hashlib.sha256(msg1).digest()
    h2 = hashlib.sha256(msg2).digest()
    
    # Count differing bits
    diff_bits = sum(bin(b1 ^ b2).count('1') for b1, b2 in zip(h1, h2))
    total_bits = len(h1) * 8
    
    print(f"
Avalanche Effect (SHA-256):")
    print(f"  Message 1: '{msg1.decode()}'")
    print(f"  Message 2: '{msg2.decode()}'")
    print(f"  Hash 1: {h1.hex()}")
    print(f"  Hash 2: {h2.hex()}")
    print(f"  Different bits: {diff_bits}/{total_bits} ({100*diff_bits/total_bits:.1f}%)")
    print(f"  (Ideal: ~50% = ~128 bits)")
    
    # Pre-image resistance
    print(f"
Pre-image Resistance:")
    target_hash = "5e884898da28047d9f1c6c5f..."  # SHA-256 of "password"
    print(f"  Given hash: {target_hash}")
    print(f"  Finding input that produces this hash would take ~2^256 attempts")
    print(f"  At 1 billion attempts/second: ~10^68 years")
 
demonstrate_crypto_hashes()
security_properties_demo()

Never Use MD5 or SHA-1 for Security

Choosing the Right Hash Function: A Decision Framework

With so many options, how do you choose? Here's a practical decision framework based on your requirements:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
START: What is your use case?
│
├─► Security-critical (passwords, signatures, integrity)?
│   │
│   ├─► Password storage?
│   │   └─► Use: Argon2, bcrypt, or scrypt (NOT SHA-256!)
│   │
│   ├─► Digital signatures or content addressing?
│   │   └─► Use: SHA-256, SHA-3, or Blake2b
│   │
│   └─► General cryptographic use?
│       └─► Use: SHA-256 (standard) or Blake2 (faster)
│
└─► Non-security (hash tables, checksums)?
    │
    ├─► External/untrusted input?
    │   │
    │   ├─► Need to prevent collision attacks?
    │   │   └─► Use: SipHash (with secret seed) or MurmurHash3
    │   │
    │   └─► Moderate threat model?
    │       └─► Use: MurmurHash3 or FNV-1a
    │
    ├─► Maximum speed is critical?
    │   │
    │   ├─► Large data (>1KB)?
    │   │   └─► Use: xxHash3 or xxHash64
    │   │
    │   └─► Small data (<100 bytes)?
    │       └─► Use: MurmurHash3 or FNV-1a
    │
    ├─► Need determinism across runs/machines?
    │   └─► Use: FNV-1a, MurmurHash3 (avoid Python's hash())
    │
    ├─► Simplicity is priority (learning/embedded)?
    │   └─► Use: djb2 or FNV-1a
    │
    └─► Just need something reasonable?
        └─► Use: Your language's built-in hash or MurmurHash3

Quick Reference: Hash Function Summary
Hash	Speed	Quality	Security	Best For
djb2	★★★★☆	★★★☆☆	None	Simple string hashing, learning
FNV-1a	★★★★☆	★★★★☆	None	General purpose, deterministic
MurmurHash3	★★★★★	★★★★★	None	Hash tables, distributed systems
xxHash64	★★★★★+	★★★★★	None	High throughput, large data
SipHash	★★★☆☆	★★★★☆	Keyed	Protection against collision attacks
SHA-256	★★☆☆☆	★★★★★	Cryptographic	Signatures, integrity, blockchain
Blake2b	★★★★☆	★★★★★	Cryptographic	Fast + secure applications

When in Doubt

Summary: Your Hash Function Toolkit

Key Takeaways

•djb2 and FNV-1a: Simple, fast, good enough for most non-security uses. Great for learning and embedded systems.
•MurmurHash3: The modern standard. Excellent speed and quality. Use for hash tables and distributed systems.
•xxHash: When maximum speed is critical. Dominates for large data throughput.
•Cryptographic hashes (SHA-256, Blake2): When security matters. Slower but provide strong guarantees.
•Never use MD5/SHA-1 for security: They're cryptographically broken.
•Match hash to use case: Speed, quality, and security have tradeoffs. Choose based on your needs.

Module Complete:

Congratulations! You've completed Module 3: Hash Functions — Properties & Design. You now understand:

Determinism: Same input must always produce same output (correctness)
Uniformity: Outputs should spread evenly across the range (performance)
Efficiency: Hash computation should be fast (practical usefulness)
Real-world options: From djb2 to SHA-256, you know when to use each

This knowledge forms the foundation for understanding hash tables, which we'll explore in the next module.

Module Complete

4 / 4