Loading learning content...
We've learned the essential properties of hash functions: determinism for correctness, uniformity for performance, and efficiency for speed. But when you sit down to write code, which hash function should you actually use?
The answer depends on your context. Over decades of computer science research and practical engineering, dozens of hash functions have been developed—each with different tradeoffs. Some prioritize raw speed, others security, others simplicity. Some are decades old and battle-tested; others represent the cutting edge.
This page is your practical field guide to hash functions you'll actually encounter in the wild.
By the end of this page, you will understand: (1) Classic hash functions (djb2, FNV) and their enduring popularity, (2) Modern high-performance hash functions (MurmurHash, xxHash), (3) Cryptographic hash functions (SHA, Blake) and when to use them, (4) How to implement each from first principles, and (5) Practical guidelines for choosing the right one.
djb2 was created by Daniel J. Bernstein and first appeared in comp.lang.c around 1991. Despite its simplicity, it has remained popular for over 30 years due to its excellent balance of simplicity, speed, and quality.
| Property | Value |
|---|---|
| Creator | Daniel J. Bernstein |
| Year | ~1991 |
| Output size | 32 bits (native word size) |
| Speed | Very fast (~3-5 GB/s) |
| Quality | Good for general purpose |
| Security | NOT cryptographic |
| Best for | Simple hash tables, string lookup |
12345678910111213141516171819202122232425262728293031323334353637383940
def djb2(s: str) -> int: """ The classic djb2 hash function by Daniel J. Bernstein. The magic number 33 (implemented as shift-add for speed) was chosen empirically - Bernstein tested many multipliers and found 33 produced excellent distribution for common string inputs. Starting value 5381 is also empirically chosen. """ h = 5381 # Magic starting value for c in s: # h * 33 + c (using shift-add for efficiency) h = ((h << 5) + h + ord(c)) & 0xFFFFFFFF return h # Alternative form using explicit multiplicationdef djb2_multiply(s: str) -> int: """Same algorithm, more readable form.""" h = 5381 for c in s: h = ((h * 33) + ord(c)) & 0xFFFFFFFF return h # Demonstrationtest_strings = ["hello", "Hello", "HELLO", "world", "foo", "bar", ""] print("djb2 Hash Examples")print("="*50)for s in test_strings: h = djb2(s) print(f"djb2('{s}') = {h} (0x{h:08X})") # Verify both implementations are equivalentfor s in test_strings: assert djb2(s) == djb2_multiply(s), f"Mismatch for '{s}'"print("✓ Both implementations produce identical results")Why 33 and 5381?
Bernstein tested many multipliers and starting values. The number 33 has useful properties:
The starting value 5381 is a prime number chosen because it produces better distribution than starting with 0. The specific choice was empirical—Bernstein tested many primes.
djb2 is ideal when you need: (1) A simple, easy-to-implement hash function, (2) Fast hashing for string keys, (3) Reasonable distribution for general-purpose data. Avoid it for: security-sensitive applications, very large data, or when you suspect adversarial input.
FNV (Fowler-Noll-Vo) is a family of hash functions created by Glenn Fowler, Landon Curt Noll, and Phong Vo. FNV-1a is the most commonly used variant, offering good speed and excellent distribution.
| Property | Value |
|---|---|
| Creators | Fowler, Noll, Vo |
| Year | 1991 |
| Variants | FNV-0, FNV-1, FNV-1a (32, 64, 128, 256+ bits) |
| Speed | Fast (~3-5 GB/s) |
| Quality | Very good distribution |
| Security | NOT cryptographic |
| Best for | Hash tables, checksums, file identification |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
# FNV-1a constants for different bit sizesFNV_PARAMS = { 32: { 'offset_basis': 2166136261, 'prime': 16777619, 'mask': 0xFFFFFFFF, }, 64: { 'offset_basis': 14695981039346656037, 'prime': 1099511628211, 'mask': 0xFFFFFFFFFFFFFFFF, }, 128: { 'offset_basis': 144066263297769815596495629667062367629, 'prime': 309485009821345068724781371, 'mask': (1 << 128) - 1, }, } def fnv1a(data: bytes, bits: int = 32) -> int: """ FNV-1a hash function. The 'a' variant XORs first, then multiplies - this provides better avalanche characteristics than FNV-1 (multiply then XOR). """ params = FNV_PARAMS[bits] h = params['offset_basis'] for byte in data: h ^= byte # XOR first (1a variant) h = (h * params['prime']) & params['mask'] # Then multiply return h def fnv1a_str(s: str, bits: int = 32) -> int: """Convenience wrapper for string input.""" return fnv1a(s.encode('utf-8'), bits) # The original FNV-1 (multiply first, then XOR) - less commonly useddef fnv1(data: bytes, bits: int = 32) -> int: """ Original FNV-1 (not recommended - use FNV-1a instead). Multiplies first, then XORs. """ params = FNV_PARAMS[bits] h = params['offset_basis'] for byte in data: h = (h * params['prime']) & params['mask'] # Multiply first h ^= byte # Then XOR return h # Demonstrationprint("FNV-1a Hash Examples")print("="*60) test_strings = ["hello", "Hello", "world", "foo", "bar"] for s in test_strings: h32 = fnv1a_str(s, 32) h64 = fnv1a_str(s, 64) print(f"fnv1a('{s}')") print(f" 32-bit: {h32:>12} (0x{h32:08X})") print(f" 64-bit: {h64:>20} (0x{h64:016X})") # Comparison: FNV-1 vs FNV-1a (showing why 1a is better)print("" + "="*60)print("FNV-1 vs FNV-1a Comparison")print("="*60) similar_strings = ["aa", "aA", "Aa", "AA", "ab", "ba"]for s in similar_strings: data = s.encode() h1 = fnv1(data, 32) h1a = fnv1a(data, 32) bits_diff = bin(h1 ^ h1a).count('1') print(f"'{s}': FNV-1={h1:08X}, FNV-1a={h1a:08X}, diff={bits_diff} bits")FNV-1 vs FNV-1a: Why the 'a' Matters
The only difference between FNV-1 and FNV-1a is the order of operations:
FNV-1a provides better avalanche characteristics—small input changes create larger output changes. This makes FNV-1a more resistant to patterns in input data and is why it's the recommended variant.
FNV is used in: DNS name hashing, Python's PyPy dictionary implementation, file fingerprinting systems, network protocol implementations, and many hash table libraries. Its reliable quality and simple implementation make it a safe default choice.
MurmurHash was created by Austin Appleby in 2008. It quickly became a de facto standard for high-performance hash tables due to its excellent combination of speed, quality, and collision resistance. MurmurHash3 (2011) is the current version and is widely deployed in production systems.
| Property | Value |
|---|---|
| Creator | Austin Appleby |
| Versions | MurmurHash1, 2, 2A, 3 |
| Output sizes | 32-bit, 128-bit |
| Speed | Very fast (~5-8 GB/s) |
| Quality | Excellent (passes SMHasher) |
| Security | NOT cryptographic |
| Best for | Hash tables, distributed systems, bloom filters |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
def murmur3_32(data: bytes, seed: int = 0) -> int: """ MurmurHash3 32-bit implementation. Key design principles: 1. Process data in 4-byte chunks for efficiency 2. Use rotation (circular shift) for better mixing than regular shift 3. Finalization step ensures all bits are well-mixed """ c1 = 0xcc9e2d51 c2 = 0x1b873593 h = seed length = len(data) # Process 4-byte chunks nblocks = length // 4 for block_start in range(0, nblocks * 4, 4): # Get next 4-byte block (little-endian) k = (data[block_start] | (data[block_start + 1] << 8) | (data[block_start + 2] << 16) | (data[block_start + 3] << 24)) # Mix block into hash k = (k * c1) & 0xFFFFFFFF k = ((k << 15) | (k >> 17)) & 0xFFFFFFFF # ROTL32(k, 15) k = (k * c2) & 0xFFFFFFFF h ^= k h = ((h << 13) | (h >> 19)) & 0xFFFFFFFF # ROTL32(h, 13) h = ((h * 5) + 0xe6546b64) & 0xFFFFFFFF # Process remaining bytes (tail) tail_start = nblocks * 4 k = 0 tail_size = length & 3 if tail_size >= 3: k ^= data[tail_start + 2] << 16 if tail_size >= 2: k ^= data[tail_start + 1] << 8 if tail_size >= 1: k ^= data[tail_start] k = (k * c1) & 0xFFFFFFFF k = ((k << 15) | (k >> 17)) & 0xFFFFFFFF k = (k * c2) & 0xFFFFFFFF h ^= k # Finalization - ensure avalanche h ^= length # fmix32 h ^= h >> 16 h = (h * 0x85ebca6b) & 0xFFFFFFFF h ^= h >> 13 h = (h * 0xc2b2ae35) & 0xFFFFFFFF h ^= h >> 16 return h def murmur3_str(s: str, seed: int = 0) -> int: """Convenience wrapper for strings.""" return murmur3_32(s.encode('utf-8'), seed) # Demonstrationprint("MurmurHash3 Examples")print("="*60) test_strings = ["hello", "Hello", "world", "foo", "bar"] for s in test_strings: h = murmur3_str(s) print(f"murmur3('{s}') = {h:>12} (0x{h:08X})") # Demonstrate seed feature (useful for keyed hashing)print("" + "="*60)print("Seeded Hashing (different seeds give different results)")print("="*60) for seed in [0, 42, 12345]: h = murmur3_str("hello", seed) print(f"murmur3('hello', seed={seed:>5}) = 0x{h:08X}")Why MurmurHash Dominates Modern Systems
SMHasher-tested: Austin Appleby created SMHasher, a comprehensive test suite for hash functions. MurmurHash3 passes all tests with flying colors.
Seed support: The seed parameter allows keyed hashing, which helps protect against hash collision attacks.
Cache-friendly: Processes data in 4-byte blocks aligned with typical CPU cache lines.
Finalization: The fmix32 step ensures all bits depend on all input, providing excellent avalanche.
MurmurHash3 is used in: Apache Hadoop/HBase, Redis, Cassandra, Elasticsearch, libcuckoo, Google's CityHash (inspired by), and countless internal systems at major tech companies. If you need a modern, well-tested, non-cryptographic hash, MurmurHash3 is an excellent default choice.
xxHash was created by Yann Collet (who also created LZ4 and Zstandard compression) in 2012 specifically to be the fastest hash function while maintaining excellent quality. It's currently the speed champion among well-distributed hash functions.
| Property | Value |
|---|---|
| Creator | Yann Collet |
| Versions | xxHash32, xxHash64, xxH3 (latest) |
| Speed | ~15-30 GB/s (fastest non-crypto) |
| Quality | Excellent (passes SMHasher) |
| Security | NOT cryptographic |
| Best for | High-throughput hashing, data pipelines, file hashing |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
# xxHash64 constants (primes)PRIME64_1 = 0x9E3779B185EBCA87PRIME64_2 = 0xC2B2AE3D27D4EB4FPRIME64_3 = 0x165667B19E3779F9PRIME64_4 = 0x85EBCA77C2B2AE63PRIME64_5 = 0x27D4EB2F165667C5 MASK64 = 0xFFFFFFFFFFFFFFFF def rotl64(x: int, r: int) -> int: """64-bit rotate left.""" return ((x << r) | (x >> (64 - r))) & MASK64 def xxh64_round(acc: int, val: int) -> int: """Process one lane.""" acc = (acc + (val * PRIME64_2)) & MASK64 acc = rotl64(acc, 31) acc = (acc * PRIME64_1) & MASK64 return acc def xxh64_avalanche(h: int) -> int: """Finalization avalanche.""" h ^= h >> 33 h = (h * PRIME64_2) & MASK64 h ^= h >> 29 h = (h * PRIME64_3) & MASK64 h ^= h >> 32 return h def xxhash64(data: bytes, seed: int = 0) -> int: """ xxHash64 implementation. For large inputs, uses 4 parallel accumulators for speed. For small inputs, uses a simpler single-accumulator approach. """ length = len(data) h = 0 if length >= 32: # Initialize 4 accumulators v1 = (seed + PRIME64_1 + PRIME64_2) & MASK64 v2 = (seed + PRIME64_2) & MASK64 v3 = seed v4 = (seed - PRIME64_1) & MASK64 # Process 32-byte blocks i = 0 while i + 32 <= length: v1 = xxh64_round(v1, int.from_bytes(data[i:i+8], 'little')) v2 = xxh64_round(v2, int.from_bytes(data[i+8:i+16], 'little')) v3 = xxh64_round(v3, int.from_bytes(data[i+16:i+24], 'little')) v4 = xxh64_round(v4, int.from_bytes(data[i+24:i+32], 'little')) i += 32 # Merge accumulators h = rotl64(v1, 1) + rotl64(v2, 7) + rotl64(v3, 12) + rotl64(v4, 18) h = (h ^ xxh64_round(0, v1)) & MASK64 h = (h * PRIME64_1 + PRIME64_4) & MASK64 h = (h ^ xxh64_round(0, v2)) & MASK64 h = (h * PRIME64_1 + PRIME64_4) & MASK64 h = (h ^ xxh64_round(0, v3)) & MASK64 h = (h * PRIME64_1 + PRIME64_4) & MASK64 h = (h ^ xxh64_round(0, v4)) & MASK64 h = (h * PRIME64_1 + PRIME64_4) & MASK64 else: h = (seed + PRIME64_5) & MASK64 i = 0 h = (h + length) & MASK64 # Process remaining 8-byte chunks while i + 8 <= length: k = int.from_bytes(data[i:i+8], 'little') k = (k * PRIME64_2) & MASK64 k = rotl64(k, 31) k = (k * PRIME64_1) & MASK64 h ^= k h = ((rotl64(h, 27) * PRIME64_1) + PRIME64_4) & MASK64 i += 8 # Process remaining 4-byte chunk if i + 4 <= length: k = int.from_bytes(data[i:i+4], 'little') h ^= (k * PRIME64_1) & MASK64 h = ((rotl64(h, 23) * PRIME64_2) + PRIME64_3) & MASK64 i += 4 # Process remaining bytes while i < length: h ^= (data[i] * PRIME64_5) & MASK64 h = (rotl64(h, 11) * PRIME64_1) & MASK64 i += 1 return xxh64_avalanche(h) # Demonstrationprint("xxHash64 Examples")print("="*60) test_strings = ["hello", "Hello", "world", "a", ""] for s in test_strings: h = xxhash64(s.encode()) print(f"xxhash64('{s}') = {h} (0x{h:016X})") # Speed is xxHash's main feature - show with larger dataimport timelarge_data = b"x" * 1000000 # 1 MB start = time.perf_counter()for _ in range(10): xxhash64(large_data)elapsed = time.perf_counter() - start mb_hashed = 10 * 1 # 10 iterations × 1 MBthroughput = mb_hashed / elapsed print(f"Performance (1 MB data × 10 iterations):")print(f" Time: {elapsed*1000:.2f} ms")print(f" Throughput: {throughput:.2f} MB/s (Python implementation)")print(f" (Native C implementation would be 50-100× faster)")xxHash3: The Next Generation
xxH3 (2019) is the latest version, designed specifically for modern CPUs with AVX2/NEON SIMD instructions. It can achieve:
If you need maximum speed for large data hashing and have a modern CPU, xxH3 is the current champion.
xxHash is used in: LZ4 compression, Zstandard compression, FreeBSD kernel, Linux kernel (SquashFS), many game engines, and high-frequency trading systems. Wherever maximum hashing throughput matters, xxHash is often the choice.
All the hash functions we've discussed so far are non-cryptographic. They're fast and well-distributed, but they're NOT suitable for security-critical applications. For those, you need cryptographic hash functions with additional security properties.
| Hash | Output Size | Speed | Status | Use Case |
|---|---|---|---|---|
| MD5 | 128 bits | ~500 MB/s | ⚠️ BROKEN | Legacy only, not for security |
| SHA-1 | 160 bits | ~400 MB/s | ⚠️ BROKEN | Legacy code, Git (moving away) |
| SHA-256 | 256 bits | ~200 MB/s | ✓ Secure | General purpose, Bitcoin |
| SHA-512 | 512 bits | ~300 MB/s | ✓ Secure | High security, better on 64-bit |
| SHA-3 | Configurable | ~100 MB/s | ✓ Secure | NIST standard, different design |
| Blake2 | Up to 512 bits | ~800 MB/s | ✓ Secure | Fast + secure, modern choice |
| Blake3 | 256 bits (extendable) | ~3 GB/s | ✓ Secure | Fastest secure hash (with SIMD) |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
import hashlib def demonstrate_crypto_hashes(): """Show various cryptographic hash functions.""" test_data = b"The quick brown fox jumps over the lazy dog" print("Cryptographic Hash Function Comparison") print("="*70) print(f"Input: '{test_data.decode()}'") print() # MD5 - BROKEN, shown for history/comparison md5 = hashlib.md5(test_data).hexdigest() print(f"MD5 (BROKEN - don't use for security):") print(f" {md5}") # SHA-1 - BROKEN sha1 = hashlib.sha1(test_data).hexdigest() print(f"SHA-1 (BROKEN - being phased out):") print(f" {sha1}") # SHA-256 - Current standard sha256 = hashlib.sha256(test_data).hexdigest() print(f"SHA-256 (Secure - recommended):") print(f" {sha256}") # SHA-512 - Longer output sha512 = hashlib.sha512(test_data).hexdigest() print(f"SHA-512 (Secure - high security):") print(f" {sha512}") # SHA3-256 - Different construction sha3_256 = hashlib.sha3_256(test_data).hexdigest() print(f"SHA3-256 (Secure - different design):") print(f" {sha3_256}") # Blake2b - Fast and secure blake2 = hashlib.blake2b(test_data).hexdigest() print(f"Blake2b (Secure + faster than SHA):") print(f" {blake2}") # Demonstrate security propertiesdef security_properties_demo(): print("" + "="*70) print("SECURITY PROPERTIES DEMONSTRATION") print("="*70) # Avalanche effect msg1 = b"hello" msg2 = b"hellp" # One bit different h1 = hashlib.sha256(msg1).digest() h2 = hashlib.sha256(msg2).digest() # Count differing bits diff_bits = sum(bin(b1 ^ b2).count('1') for b1, b2 in zip(h1, h2)) total_bits = len(h1) * 8 print(f"Avalanche Effect (SHA-256):") print(f" Message 1: '{msg1.decode()}'") print(f" Message 2: '{msg2.decode()}'") print(f" Hash 1: {h1.hex()}") print(f" Hash 2: {h2.hex()}") print(f" Different bits: {diff_bits}/{total_bits} ({100*diff_bits/total_bits:.1f}%)") print(f" (Ideal: ~50% = ~128 bits)") # Pre-image resistance print(f"Pre-image Resistance:") target_hash = "5e884898da28047d9f1c6c5f..." # SHA-256 of "password" print(f" Given hash: {target_hash}") print(f" Finding input that produces this hash would take ~2^256 attempts") print(f" At 1 billion attempts/second: ~10^68 years") demonstrate_crypto_hashes()security_properties_demo()MD5 has been broken since 2004—collisions can be found in seconds. SHA-1 was broken in 2017 (SHAttered attack). For any security purpose (signatures, password hashing, integrity verification), use SHA-256 or newer. MD5/SHA-1 are only acceptable for non-security purposes like checksums or hash tables.
With so many options, how do you choose? Here's a practical decision framework based on your requirements:
123456789101112131415161718192021222324252627282930313233343536373839
START: What is your use case?│├─► Security-critical (passwords, signatures, integrity)?│ ││ ├─► Password storage?│ │ └─► Use: Argon2, bcrypt, or scrypt (NOT SHA-256!)│ ││ ├─► Digital signatures or content addressing?│ │ └─► Use: SHA-256, SHA-3, or Blake2b│ ││ └─► General cryptographic use?│ └─► Use: SHA-256 (standard) or Blake2 (faster)│└─► Non-security (hash tables, checksums)? │ ├─► External/untrusted input? │ │ │ ├─► Need to prevent collision attacks? │ │ └─► Use: SipHash (with secret seed) or MurmurHash3 │ │ │ └─► Moderate threat model? │ └─► Use: MurmurHash3 or FNV-1a │ ├─► Maximum speed is critical? │ │ │ ├─► Large data (>1KB)? │ │ └─► Use: xxHash3 or xxHash64 │ │ │ └─► Small data (<100 bytes)? │ └─► Use: MurmurHash3 or FNV-1a │ ├─► Need determinism across runs/machines? │ └─► Use: FNV-1a, MurmurHash3 (avoid Python's hash()) │ ├─► Simplicity is priority (learning/embedded)? │ └─► Use: djb2 or FNV-1a │ └─► Just need something reasonable? └─► Use: Your language's built-in hash or MurmurHash3| Hash | Speed | Quality | Security | Best For |
|---|---|---|---|---|
| djb2 | ★★★★☆ | ★★★☆☆ | None | Simple string hashing, learning |
| FNV-1a | ★★★★☆ | ★★★★☆ | None | General purpose, deterministic |
| MurmurHash3 | ★★★★★ | ★★★★★ | None | Hash tables, distributed systems |
| xxHash64 | ★★★★★+ | ★★★★★ | None | High throughput, large data |
| SipHash | ★★★☆☆ | ★★★★☆ | Keyed | Protection against collision attacks |
| SHA-256 | ★★☆☆☆ | ★★★★★ | Cryptographic | Signatures, integrity, blockchain |
| Blake2b | ★★★★☆ | ★★★★★ | Cryptographic | Fast + secure applications |
If you're unsure: For hash tables, use your language's built-in dictionary/map (they choose a good hash for you). For explicit hashing needs, MurmurHash3 is a safe, well-tested choice. For security, use SHA-256 or Blake2. Never write your own cryptographic hash function.
We've surveyed the landscape of hash functions, from classic designs to modern speed demons to cryptographic standards. You now have a practical toolkit for choosing the right hash function for any situation.
Module Complete:
Congratulations! You've completed Module 3: Hash Functions — Properties & Design. You now understand:
This knowledge forms the foundation for understanding hash tables, which we'll explore in the next module.
You now have comprehensive knowledge of hash function properties and practical options. You can evaluate hash functions for your specific needs, understand their tradeoffs, and make informed decisions. This foundation prepares you for understanding hash table implementations in the next module.