Data Structures & AlgorithmsWhat Is Hashing? Core Intuition

What Is Hashing? Core Intuition

LevelBeginner

Duration60 mins

TopicWhat Is Hashing? Core Intuition

3 / 4

From Arbitrary Data to Array Positions

The Complete Transformation Pipeline

We've explored hash functions conceptually and learned how to map hash values to indices. Now it's time to see the complete picture—the end-to-end transformation that takes any piece of data and converts it into a usable array position.

This transformation is the heart of hash table functionality. Understanding it completely allows you to:

Debug hash table performance issues
Design custom hash functions for specialized data types
Reason about why certain hash tables outperform others
Optimize memory access patterns in cache-sensitive applications

In this page, we'll trace the complete journey from arbitrary data to array positions, examining each stage in detail and understanding how they work together as a unified system.

Learning Objectives

By the end of this page, you will understand: • The complete multi-stage hashing pipeline • How each data type flows through the transformation • Memory layout considerations for hash tables • Cache efficiency and modern hardware considerations • Complete worked examples tracing data from input to storage location

The Three-Stage Pipeline

The transformation from arbitrary data to array position follows a three-stage pipeline:

Stage 1: Normalization Convert the key to a canonical byte representation

Stage 2: Hashing Transform bytes into a fixed-size integer hash code

Stage 3: Index Computation Map the hash code to a valid array index

Let's visualize this pipeline:

┌─────────────┐    ┌──────────────┐    ┌───────────────┐    ┌─────────┐
│   Key       │───▶│ Normalization │───▶│  Hash Function │───▶│ Index   │
│ (any type)  │    │   (bytes)     │    │   (integer)    │    │ (0..m-1)│
└─────────────┘    └──────────────┘    └───────────────┘    └─────────┘

    "hello"    ───▶  [104,101,108,  ───▶   99162322      ───▶    7
                      108,111]

Why Three Stages?

This separation provides several advantages:

Modularity — Each stage can be optimized independently
Reusability — The same hash function works for different data types once normalized
Testability — Each stage can be tested in isolation
Flexibility — You can swap out any stage without affecting others

Stage Dependencies:

The stages form a directed pipeline—each depends only on the previous:

Normalization ─depends on─▶ Data type
Hash Function ─depends on─▶ Byte sequence
Index Computation ─depends on─▶ Hash value and table size

This separation of concerns is why you can use the same hash table implementation with strings, integers, objects, or any other type—only the normalization stage changes.

The Implementation Reality

In optimized implementations, these stages are often fused for efficiency. For example, string hash functions typically process characters directly rather than first converting to a byte array. But conceptually, the three stages always exist and understanding them aids debugging and optimization.

Stage 1: Normalization Strategies

Normalization converts varied data types into a form the hash function can process. Let's examine strategies for different data types:

Primitive Types:

Type	Normalization Strategy	Example
int32	Direct use	42 → 42
int64	Fold high/low bits	0x12345678ABCDEF → XOR/combine
float	IEEE bit representation	3.14 → 0x4048F5C3
double	IEEE bits, fold if needed	2.718 → bits → fold
bool	0 or 1	true → 1
char	Unicode code point	'A' → 65

Strings:

Strings are sequences of characters. The normalization must ensure:

"ab" ≠ "ba" (order matters)
Case sensitivity is handled appropriately
Unicode normalization if needed

# Direct character processing (most common)
def normalize_string(s):
    return [ord(c) for c in s]

# With case-insensitive handling
def normalize_string_ci(s):
    return [ord(c.lower()) for c in s]

# With Unicode normalization
import unicodedata
def normalize_string_unicode(s):
    normalized = unicodedata.normalize('NFC', s)
    return normalized.encode('utf-8')

Important: The same logical string can have different byte representations depending on encoding (UTF-8 vs UTF-16 vs ASCII). Your normalization must be consistent.

Compound Objects:

Objects with multiple fields require combining field values:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def normalized(self):
        """Convert to hashable representation."""
        # Option 1: Tuple (relies on tuple hashing)
        return (self.x, self.y)
        
        # Option 2: Explicit bytes
        return struct.pack('ii', self.x, self.y)
        
        # Option 3: Field combination
        return self.x * 31 + self.y

Critical Consideration: Field Order

The order fields are combined matters:

# Point(1, 2) should differ from Point(2, 1)
class BadHash:
    def normalized(self):
        return self.x + self.y  # BAD: (1,2) == (2,1) !!

class GoodHash:
    def normalized(self):
        return self.x * 31 + self.y  # GOOD: (1,2) ≠ (2,1)

Collections:

For ordered collections (lists, arrays), preserve order:

def normalize_list(lst):
    result = 1
    for item in lst:
        result = result * 31 + hash(item)
    return result

# [1, 2, 3] ≠ [3, 2, 1]

For unordered collections (sets), use order-independent operations:

def normalize_set(s):
    return sum(hash(item) for item in s)  # Order doesn't matter
    # or: XOR all hashes together

# {1, 2, 3} == {3, 1, 2} (same set)

Null/None Values:

Special care is needed for null values:

// Java approach
public int hashCode() {
    int result = 17;
    result = 31 * result + (name != null ? name.hashCode() : 0);
    result = 31 * result + age;
    return result;
}

Using 0 for null is conventional but means nullable fields and fields with value 0 might collide more often.

Stage 2: Hash Function Design

Once data is normalized to bytes or integers, we apply the hash function. Let's examine popular hash function designs:

The djb2 Hash (Dan Bernstein)

One of the most widely used string hash functions:

unsigned long djb2(const char *str) {
    unsigned long hash = 5381;
    int c;
    while ((c = *str++)) {
        hash = ((hash << 5) + hash) + c;  // hash * 33 + c
    }
    return hash;
}

Why it works:

Magic number 5381 is a prime (good starting point)
33 (= 32 + 1) combines bit shifting with addition
Left shift by 5 spreads influence across bits
Simple, fast, and surprisingly effective

The FNV-1a Hash (Fowler-Noll-Vo)

Another excellent general-purpose hash:

uint32_t fnv1a_32(const char *data, size_t len) {
    uint32_t hash = 2166136261;  // FNV offset basis
    for (size_t i = 0; i < len; i++) {
        hash ^= (uint8_t)data[i];
        hash *= 16777619;  // FNV prime
    }
    return hash;
}

Key properties:

XOR-then-multiply (order matters)
FNV prime and offset basis are carefully chosen
Simple to implement and verify
Widely used in network protocols and databases

The MurmurHash Family

Designed for high performance with excellent distribution:

uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed) {
    uint32_t h = seed;
    const uint32_t c1 = 0xcc9e2d51;
    const uint32_t c2 = 0x1b873593;
    
    // Body: process 4-byte chunks
    const uint32_t* blocks = (const uint32_t*)key;
    for (size_t i = 0; i < len / 4; i++) {
        uint32_t k = blocks[i];
        k *= c1;
        k = (k << 15) | (k >> 17);  // ROTL
        k *= c2;
        h ^= k;
        h = (h << 13) | (h >> 19);  // ROTL
        h = h * 5 + 0xe6546b64;
    }
    
    // Tail: process remaining bytes
    // ... (handle 1-3 remaining bytes)
    
    // Finalization: mix final bits
    h ^= len;
    h ^= h >> 16; h *= 0x85ebca6b;
    h ^= h >> 13; h *= 0xc2b2ae35;
    h ^= h >> 16;
    
    return h;
}

Why MurmurHash is popular:

Excellent avalanche properties (small input changes → large output changes)
Processes 4 bytes at a time (efficient on modern CPUs)
Seed parameter enables different hash "families"
Used in many databases and distributed systems

Comparison of Popular Hash Functions
Hash Function	Speed	Distribution	Complexity	Best Use Case
djb2	Fast	Good	Very Simple	General strings
FNV-1a	Fast	Good	Simple	General purpose
MurmurHash3	Very Fast	Excellent	Moderate	High-performance apps
xxHash	Extremely Fast	Excellent	Moderate	Big data, streaming
SipHash	Moderate	Excellent	Moderate	Security-critical

Stage 3: Index Computation in Practice

We've covered index computation methods (division, multiplication, universal). Now let's see how production implementations handle this:

Power-of-2 Table Sizes with Good Hash Functions:

Modern hash tables often use power-of-2 sizes despite earlier warnings. Why?

Bitwise AND is faster than modulo — hash & (size - 1) vs hash % size
Modern hash functions handle it — Functions like MurmurHash have excellent bit mixing
Doubling is simple — Growing from size n to 2n is natural

// Power-of-2 optimization
size_t table_size = 1024;  // 2^10
size_t mask = table_size - 1;  // 1023 = 0b1111111111

size_t index = hash & mask;  // Faster than hash % table_size

Java's HashMap Approach:

Java combines user's hashCode() with additional "spreading":

// Java 8+ HashMap internal hashing
static final int hash(Object key) {
    int h;
    // XOR high bits into low bits
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

// Index computation (table.length is always power of 2)
int index = hash(key) & (table.length - 1);

Why h ^ (h >>> 16)?

This mixes the high 16 bits into the low 16 bits. When table size is small (e.g., 16 slots), only the lowest bits are used. This mixing ensures high-order bits influence the index.

Original:   1010 0011 0101 1100 | 0001 1010 0111 1000
Shifted:    0000 0000 0000 0000 | 1010 0011 0101 1100
XOR'd:      1010 0011 0101 1100 | 1011 1001 0010 0100
                                 └── These bits now contain
                                     info from high bits!

Python's dict Approach:

Python uses a more sophisticated probing scheme with perturbation:

# Simplified Python dict indexing logic
def find_slot(hash_value, table_size):
    mask = table_size - 1
    index = hash_value & mask
    perturb = hash_value
    
    while table[index] is not None:
        # Probe sequence uses perturbation
        perturb >>= 5
        index = (5 * index + perturb + 1) & mask
    
    return index

Why perturbation?

Linear probing (index + 1, index + 2, ...) causes clustering. Perturbation creates a pseudo-random probe sequence that depends on the full hash value, not just the index, breaking up clusters.

Rust's HashBrown:

Rust's standard HashMap uses SwissTable/HashBrown with SIMD:

// Simplified concept
fn find_slot(hash: u64, table: &Table) -> usize {
    let h1 = (hash >> 7) as usize;  // Primary hash
    let h2 = (hash & 0x7F) as u8;    // Control byte
    
    // Use SIMD to check 16 slots at once
    // h2 is stored separately for fast comparison
}

This design allows checking 16 potential slots in a single CPU instruction using SIMD.

Memory Layout and Cache Efficiency

Understanding how hash tables interact with memory is crucial for performance. Modern CPUs don't access memory byte-by-byte—they load cache lines (typically 64 bytes).

The Memory Hierarchy:

┌─────────────┐  Access Time   Size
│  CPU Regs   │    ~0.5 ns    ~1 KB
├─────────────┤
│  L1 Cache   │    ~1 ns      32-64 KB
├─────────────┤
│  L2 Cache   │    ~4 ns      256 KB - 1 MB
├─────────────┤
│  L3 Cache   │   ~16 ns      2 - 32 MB
├─────────────┤
│  Main RAM   │   ~100 ns     4 - 128 GB
└─────────────┘

A cache miss to main memory is ~100x slower than an L1 cache hit!

Chaining vs. Open Addressing: Cache Perspective

Chaining with Linked Lists:

Table: [ptr] [ptr] [ptr] [ptr] ...   ← Contiguous
         │     │
         ▼     ▼
       [node] [node]                  ← Scattered in memory!
         │
         ▼
       [node]                         ← Cache miss each hop

Each pointer dereference is likely a cache miss because nodes are allocated separately and may be anywhere in memory.

Open Addressing:

Table: [entry] [entry] [entry] [entry] ...  ← All contiguous!

Entries are stored directly in the array. Sequential probing accesses consecutive memory, likely hitting the same cache line.

Cache Line Utilization:

With 64-byte cache lines and 8-byte entries:

One cache line holds 8 entries
Linear probing examines up to 8 entries "for free" after first load
This is why linear probing often outperforms more sophisticated schemes in practice

The Modern Reality

On modern hardware, memory access patterns often dominate algorithm complexity. An O(1) operation with a cache miss can be slower than O(log n) operations that hit cache. This is why open addressing hash tables often outperform chaining in practice, despite similar theoretical complexity.

Swiss Table Optimization:

Google's Swiss Table (used in Abseil) achieves remarkable cache efficiency:

Separate metadata array — Store small "control bytes" separately
SIMD parallelism — Check 16 slots in one instruction
Group-based probing — Process groups of 16 slots at a time

Control bytes: [h2|h2|h2|h2|h2|h2|h2|h2|...] ← Dense, SIMD-friendly
                │  │  │
                ▼  ▼  ▼
Slots:         [entry | entry | entry |...]   ← Only accessed on match

By checking metadata first with SIMD, most non-matching slots are eliminated without loading their full entries.

Complete Worked Examples

Let's trace complete examples through all three stages:

Example 1: String Key "hello"

┌─── Stage 1: Normalization ───┐
│                              │
│  "hello" → [104, 101, 108, 108, 111]  (ASCII values)
│                              │
└──────────────────────────────┘
              │
              ▼
┌─── Stage 2: Hash Function (djb2) ───┐
│                                      │
│  hash = 5381                         │
│  hash = 5381*33 + 104 = 177674      │
│  hash = 177674*33 + 101 = 5863343   │
│  hash = 5863343*33 + 108 = 193490427     │
│  hash = 193490427*33 + 108 = 6385184199  │
│  hash = 6385184199*33 + 111 = 210714636678  │
│           (with overflow)            │
│  Final: 210714636678 mod 2^32 = 261238937  │
│                                      │
└──────────────────────────────────────┘
              │
              ▼
┌─── Stage 3: Index (m=16) ───┐
│                              │
│  index = 261238937 mod 16    │
│  index = 9                   │
│                              │
└──────────────────────────────┘

"hello" → Slot 9

Example 2: Composite Object Person("Alice", 30)

┌─── Stage 1: Normalization ───┐
│                              │
│  name = "Alice"              │
│  age = 30                    │
│                              │
│  Combine: 17                 │
│  Step 1: 17*31 + hash("Alice") = 17*31 + 63494359 = 63494886  │
│  Step 2: 63494886*31 + 30 = 1968341496 + 30 = 1968341526  │
│                              │
└──────────────────────────────┘
              │
              ▼
┌─── Stage 2: Hash Already Computed ───┐
│                                       │
│  hash = 1968341526                    │
│                                       │
└───────────────────────────────────────┘
              │
              ▼
┌─── Stage 3: Index (m=32) ───┐
│                              │
│  Java-style spread:          │
│  h = 1968341526              │
│  h ^ (h >>> 16) = 1968341526 ^ 30037 = 1968371543  │
│  index = 1968371543 & 31     │
│  index = 7                   │
│                              │
└──────────────────────────────┘

Person("Alice", 30) → Slot 7

Example 3: Integer Key 42 (Simple Case)

┌─── Stage 1: Normalization ───┐
│                              │
│  42 is already an integer    │
│  Normalized value: 42        │
│                              │
└──────────────────────────────┘
              │
              ▼
┌─── Stage 2: Hash Function ───┐
│                              │
│  Many implementations use    │
│  identity for small ints:    │
│  hash(42) = 42               │
│                              │
│  Better: apply mixing        │
│  hash = 42 * 2654435769      │
│  hash = 111486382298 mod 2^32│
│  hash = 1631917802           │
│                              │
└──────────────────────────────┘
              │
              ▼
┌─── Stage 3: Index (m=8) ───┐
│                             │
│  With identity hash:        │
│  index = 42 mod 8 = 2       │
│                             │
│  With mixing:               │
│  index = 1631917802 & 7 = 2 │
│                             │
└─────────────────────────────┘

42 → Slot 2

Important: Using identity for integer keys can cause problems with patterned data. Better implementations always apply some mixing.

Summary: From Arbitrary Data to Array Positions

We've traced the complete journey from arbitrary data to array positions—the transformation that makes hash tables work. Let's consolidate these insights:

Key Takeaways

•Three-Stage Pipeline — Normalization → Hash Function → Index Computation forms the complete transformation.
•Normalization — Converts any data type to a canonical representation suitable for hashing.
•Hash Function Design — djb2, FNV-1a, MurmurHash, and others provide the core transformation with different trade-offs.
•Index Computation — Modern implementations often use power-of-2 sizes with mixing functions for speed.
•Cache Efficiency — Memory access patterns often dominate performance; open addressing is cache-friendly.
•Swiss Table — Modern designs use SIMD and separate metadata for maximum efficiency.
•Complete Tracing — Understanding the full pipeline enables debugging and optimization.

What's Next:

Having understood the mechanical transformation from data to indices, we're ready to demystify the perceived "magic" of hashing. The next page reveals why hashing works—the mathematical and intuitive foundations that make this elegant transformation reliable and powerful.

Complete Pipeline Mastery

You now understand the end-to-end transformation from arbitrary data to array positions. You can trace any key through normalization, hashing, and index computation. This knowledge enables you to debug performance issues, design custom hash functions, and optimize for modern hardware.

3 / 4

Loading learning content...

Data Structures & AlgorithmsWhat Is Hashing? Core Intuition

What Is Hashing? Core Intuition

LevelBeginner

Duration60 mins

TopicWhat Is Hashing? Core Intuition

3 / 4

From Arbitrary Data to Array Positions

The Complete Transformation Pipeline

This transformation is the heart of hash table functionality. Understanding it completely allows you to:

Debug hash table performance issues
Design custom hash functions for specialized data types
Reason about why certain hash tables outperform others
Optimize memory access patterns in cache-sensitive applications

In this page, we'll trace the complete journey from arbitrary data to array positions, examining each stage in detail and understanding how they work together as a unified system.

Learning Objectives

The Three-Stage Pipeline

The transformation from arbitrary data to array position follows a three-stage pipeline:

Stage 1: Normalization Convert the key to a canonical byte representation

Stage 2: Hashing Transform bytes into a fixed-size integer hash code

Stage 3: Index Computation Map the hash code to a valid array index

Let's visualize this pipeline:

┌─────────────┐    ┌──────────────┐    ┌───────────────┐    ┌─────────┐
│   Key       │───▶│ Normalization │───▶│  Hash Function │───▶│ Index   │
│ (any type)  │    │   (bytes)     │    │   (integer)    │    │ (0..m-1)│
└─────────────┘    └──────────────┘    └───────────────┘    └─────────┘

    "hello"    ───▶  [104,101,108,  ───▶   99162322      ───▶    7
                      108,111]

Why Three Stages?

This separation provides several advantages:

Modularity — Each stage can be optimized independently
Reusability — The same hash function works for different data types once normalized
Testability — Each stage can be tested in isolation
Flexibility — You can swap out any stage without affecting others

Stage Dependencies:

The stages form a directed pipeline—each depends only on the previous:

Normalization ─depends on─▶ Data type
Hash Function ─depends on─▶ Byte sequence
Index Computation ─depends on─▶ Hash value and table size

This separation of concerns is why you can use the same hash table implementation with strings, integers, objects, or any other type—only the normalization stage changes.

The Implementation Reality

Stage 1: Normalization Strategies

Normalization converts varied data types into a form the hash function can process. Let's examine strategies for different data types:

Primitive Types:

Type	Normalization Strategy	Example
int32	Direct use	42 → 42
int64	Fold high/low bits	0x12345678ABCDEF → XOR/combine
float	IEEE bit representation	3.14 → 0x4048F5C3
double	IEEE bits, fold if needed	2.718 → bits → fold
bool	0 or 1	true → 1
char	Unicode code point	'A' → 65

Strings:

Strings are sequences of characters. The normalization must ensure:

"ab" ≠ "ba" (order matters)
Case sensitivity is handled appropriately
Unicode normalization if needed

# Direct character processing (most common)
def normalize_string(s):
    return [ord(c) for c in s]

# With case-insensitive handling
def normalize_string_ci(s):
    return [ord(c.lower()) for c in s]

# With Unicode normalization
import unicodedata
def normalize_string_unicode(s):
    normalized = unicodedata.normalize('NFC', s)
    return normalized.encode('utf-8')

Important: The same logical string can have different byte representations depending on encoding (UTF-8 vs UTF-16 vs ASCII). Your normalization must be consistent.

Compound Objects:

Objects with multiple fields require combining field values:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def normalized(self):
        """Convert to hashable representation."""
        # Option 1: Tuple (relies on tuple hashing)
        return (self.x, self.y)
        
        # Option 2: Explicit bytes
        return struct.pack('ii', self.x, self.y)
        
        # Option 3: Field combination
        return self.x * 31 + self.y

Critical Consideration: Field Order

The order fields are combined matters:

# Point(1, 2) should differ from Point(2, 1)
class BadHash:
    def normalized(self):
        return self.x + self.y  # BAD: (1,2) == (2,1) !!

class GoodHash:
    def normalized(self):
        return self.x * 31 + self.y  # GOOD: (1,2) ≠ (2,1)

Collections:

For ordered collections (lists, arrays), preserve order:

def normalize_list(lst):
    result = 1
    for item in lst:
        result = result * 31 + hash(item)
    return result

# [1, 2, 3] ≠ [3, 2, 1]

For unordered collections (sets), use order-independent operations:

def normalize_set(s):
    return sum(hash(item) for item in s)  # Order doesn't matter
    # or: XOR all hashes together

# {1, 2, 3} == {3, 1, 2} (same set)

Null/None Values:

Special care is needed for null values:

// Java approach
public int hashCode() {
    int result = 17;
    result = 31 * result + (name != null ? name.hashCode() : 0);
    result = 31 * result + age;
    return result;
}

Using 0 for null is conventional but means nullable fields and fields with value 0 might collide more often.

Stage 2: Hash Function Design

Once data is normalized to bytes or integers, we apply the hash function. Let's examine popular hash function designs:

The djb2 Hash (Dan Bernstein)

One of the most widely used string hash functions:

unsigned long djb2(const char *str) {
    unsigned long hash = 5381;
    int c;
    while ((c = *str++)) {
        hash = ((hash << 5) + hash) + c;  // hash * 33 + c
    }
    return hash;
}

Why it works:

Magic number 5381 is a prime (good starting point)
33 (= 32 + 1) combines bit shifting with addition
Left shift by 5 spreads influence across bits
Simple, fast, and surprisingly effective

The FNV-1a Hash (Fowler-Noll-Vo)

Another excellent general-purpose hash:

uint32_t fnv1a_32(const char *data, size_t len) {
    uint32_t hash = 2166136261;  // FNV offset basis
    for (size_t i = 0; i < len; i++) {
        hash ^= (uint8_t)data[i];
        hash *= 16777619;  // FNV prime
    }
    return hash;
}

Key properties:

XOR-then-multiply (order matters)
FNV prime and offset basis are carefully chosen
Simple to implement and verify
Widely used in network protocols and databases

The MurmurHash Family

Designed for high performance with excellent distribution:

uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed) {
    uint32_t h = seed;
    const uint32_t c1 = 0xcc9e2d51;
    const uint32_t c2 = 0x1b873593;
    
    // Body: process 4-byte chunks
    const uint32_t* blocks = (const uint32_t*)key;
    for (size_t i = 0; i < len / 4; i++) {
        uint32_t k = blocks[i];
        k *= c1;
        k = (k << 15) | (k >> 17);  // ROTL
        k *= c2;
        h ^= k;
        h = (h << 13) | (h >> 19);  // ROTL
        h = h * 5 + 0xe6546b64;
    }
    
    // Tail: process remaining bytes
    // ... (handle 1-3 remaining bytes)
    
    // Finalization: mix final bits
    h ^= len;
    h ^= h >> 16; h *= 0x85ebca6b;
    h ^= h >> 13; h *= 0xc2b2ae35;
    h ^= h >> 16;
    
    return h;
}

Why MurmurHash is popular:

Excellent avalanche properties (small input changes → large output changes)
Processes 4 bytes at a time (efficient on modern CPUs)
Seed parameter enables different hash "families"
Used in many databases and distributed systems

Comparison of Popular Hash Functions
Hash Function	Speed	Distribution	Complexity	Best Use Case
djb2	Fast	Good	Very Simple	General strings
FNV-1a	Fast	Good	Simple	General purpose
MurmurHash3	Very Fast	Excellent	Moderate	High-performance apps
xxHash	Extremely Fast	Excellent	Moderate	Big data, streaming
SipHash	Moderate	Excellent	Moderate	Security-critical

Stage 3: Index Computation in Practice

We've covered index computation methods (division, multiplication, universal). Now let's see how production implementations handle this:

Power-of-2 Table Sizes with Good Hash Functions:

Modern hash tables often use power-of-2 sizes despite earlier warnings. Why?

Bitwise AND is faster than modulo — hash & (size - 1) vs hash % size
Modern hash functions handle it — Functions like MurmurHash have excellent bit mixing
Doubling is simple — Growing from size n to 2n is natural

// Power-of-2 optimization
size_t table_size = 1024;  // 2^10
size_t mask = table_size - 1;  // 1023 = 0b1111111111

size_t index = hash & mask;  // Faster than hash % table_size

Java's HashMap Approach:

Java combines user's hashCode() with additional "spreading":

// Java 8+ HashMap internal hashing
static final int hash(Object key) {
    int h;
    // XOR high bits into low bits
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

// Index computation (table.length is always power of 2)
int index = hash(key) & (table.length - 1);

Why h ^ (h >>> 16)?

This mixes the high 16 bits into the low 16 bits. When table size is small (e.g., 16 slots), only the lowest bits are used. This mixing ensures high-order bits influence the index.

Original:   1010 0011 0101 1100 | 0001 1010 0111 1000
Shifted:    0000 0000 0000 0000 | 1010 0011 0101 1100
XOR'd:      1010 0011 0101 1100 | 1011 1001 0010 0100
                                 └── These bits now contain
                                     info from high bits!

Python's dict Approach:

Python uses a more sophisticated probing scheme with perturbation:

# Simplified Python dict indexing logic
def find_slot(hash_value, table_size):
    mask = table_size - 1
    index = hash_value & mask
    perturb = hash_value
    
    while table[index] is not None:
        # Probe sequence uses perturbation
        perturb >>= 5
        index = (5 * index + perturb + 1) & mask
    
    return index

Why perturbation?

Linear probing (index + 1, index + 2, ...) causes clustering. Perturbation creates a pseudo-random probe sequence that depends on the full hash value, not just the index, breaking up clusters.

Rust's HashBrown:

Rust's standard HashMap uses SwissTable/HashBrown with SIMD:

// Simplified concept
fn find_slot(hash: u64, table: &Table) -> usize {
    let h1 = (hash >> 7) as usize;  // Primary hash
    let h2 = (hash & 0x7F) as u8;    // Control byte
    
    // Use SIMD to check 16 slots at once
    // h2 is stored separately for fast comparison
}

This design allows checking 16 potential slots in a single CPU instruction using SIMD.

Memory Layout and Cache Efficiency

Understanding how hash tables interact with memory is crucial for performance. Modern CPUs don't access memory byte-by-byte—they load cache lines (typically 64 bytes).

The Memory Hierarchy:

┌─────────────┐  Access Time   Size
│  CPU Regs   │    ~0.5 ns    ~1 KB
├─────────────┤
│  L1 Cache   │    ~1 ns      32-64 KB
├─────────────┤
│  L2 Cache   │    ~4 ns      256 KB - 1 MB
├─────────────┤
│  L3 Cache   │   ~16 ns      2 - 32 MB
├─────────────┤
│  Main RAM   │   ~100 ns     4 - 128 GB
└─────────────┘

A cache miss to main memory is ~100x slower than an L1 cache hit!

Chaining vs. Open Addressing: Cache Perspective

Chaining with Linked Lists:

Table: [ptr] [ptr] [ptr] [ptr] ...   ← Contiguous
         │     │
         ▼     ▼
       [node] [node]                  ← Scattered in memory!
         │
         ▼
       [node]                         ← Cache miss each hop

Each pointer dereference is likely a cache miss because nodes are allocated separately and may be anywhere in memory.

Open Addressing:

Table: [entry] [entry] [entry] [entry] ...  ← All contiguous!

Entries are stored directly in the array. Sequential probing accesses consecutive memory, likely hitting the same cache line.

Cache Line Utilization:

With 64-byte cache lines and 8-byte entries:

One cache line holds 8 entries
Linear probing examines up to 8 entries "for free" after first load
This is why linear probing often outperforms more sophisticated schemes in practice

The Modern Reality

Swiss Table Optimization:

Google's Swiss Table (used in Abseil) achieves remarkable cache efficiency:

Separate metadata array — Store small "control bytes" separately
SIMD parallelism — Check 16 slots in one instruction
Group-based probing — Process groups of 16 slots at a time

Control bytes: [h2|h2|h2|h2|h2|h2|h2|h2|...] ← Dense, SIMD-friendly
                │  │  │
                ▼  ▼  ▼
Slots:         [entry | entry | entry |...]   ← Only accessed on match

By checking metadata first with SIMD, most non-matching slots are eliminated without loading their full entries.

Complete Worked Examples

Let's trace complete examples through all three stages:

Example 1: String Key "hello"

┌─── Stage 1: Normalization ───┐
│                              │
│  "hello" → [104, 101, 108, 108, 111]  (ASCII values)
│                              │
└──────────────────────────────┘
              │
              ▼
┌─── Stage 2: Hash Function (djb2) ───┐
│                                      │
│  hash = 5381                         │
│  hash = 5381*33 + 104 = 177674      │
│  hash = 177674*33 + 101 = 5863343   │
│  hash = 5863343*33 + 108 = 193490427     │
│  hash = 193490427*33 + 108 = 6385184199  │
│  hash = 6385184199*33 + 111 = 210714636678  │
│           (with overflow)            │
│  Final: 210714636678 mod 2^32 = 261238937  │
│                                      │
└──────────────────────────────────────┘
              │
              ▼
┌─── Stage 3: Index (m=16) ───┐
│                              │
│  index = 261238937 mod 16    │
│  index = 9                   │
│                              │
└──────────────────────────────┘

"hello" → Slot 9

Example 2: Composite Object Person("Alice", 30)

┌─── Stage 1: Normalization ───┐
│                              │
│  name = "Alice"              │
│  age = 30                    │
│                              │
│  Combine: 17                 │
│  Step 1: 17*31 + hash("Alice") = 17*31 + 63494359 = 63494886  │
│  Step 2: 63494886*31 + 30 = 1968341496 + 30 = 1968341526  │
│                              │
└──────────────────────────────┘
              │
              ▼
┌─── Stage 2: Hash Already Computed ───┐
│                                       │
│  hash = 1968341526                    │
│                                       │
└───────────────────────────────────────┘
              │
              ▼
┌─── Stage 3: Index (m=32) ───┐
│                              │
│  Java-style spread:          │
│  h = 1968341526              │
│  h ^ (h >>> 16) = 1968341526 ^ 30037 = 1968371543  │
│  index = 1968371543 & 31     │
│  index = 7                   │
│                              │
└──────────────────────────────┘

Person("Alice", 30) → Slot 7

Example 3: Integer Key 42 (Simple Case)

┌─── Stage 1: Normalization ───┐
│                              │
│  42 is already an integer    │
│  Normalized value: 42        │
│                              │
└──────────────────────────────┘
              │
              ▼
┌─── Stage 2: Hash Function ───┐
│                              │
│  Many implementations use    │
│  identity for small ints:    │
│  hash(42) = 42               │
│                              │
│  Better: apply mixing        │
│  hash = 42 * 2654435769      │
│  hash = 111486382298 mod 2^32│
│  hash = 1631917802           │
│                              │
└──────────────────────────────┘
              │
              ▼
┌─── Stage 3: Index (m=8) ───┐
│                             │
│  With identity hash:        │
│  index = 42 mod 8 = 2       │
│                             │
│  With mixing:               │
│  index = 1631917802 & 7 = 2 │
│                             │
└─────────────────────────────┘

42 → Slot 2

Important: Using identity for integer keys can cause problems with patterned data. Better implementations always apply some mixing.

Summary: From Arbitrary Data to Array Positions

We've traced the complete journey from arbitrary data to array positions—the transformation that makes hash tables work. Let's consolidate these insights:

Key Takeaways

•Three-Stage Pipeline — Normalization → Hash Function → Index Computation forms the complete transformation.
•Normalization — Converts any data type to a canonical representation suitable for hashing.
•Hash Function Design — djb2, FNV-1a, MurmurHash, and others provide the core transformation with different trade-offs.
•Index Computation — Modern implementations often use power-of-2 sizes with mixing functions for speed.
•Cache Efficiency — Memory access patterns often dominate performance; open addressing is cache-friendly.
•Swiss Table — Modern designs use SIMD and separate metadata for maximum efficiency.
•Complete Tracing — Understanding the full pipeline enables debugging and optimization.

What's Next:

Complete Pipeline Mastery

3 / 4