Data Structures & AlgorithmsHash Table Structure & Mechanism

Hash Tables — Structure & Mechanism

LevelIntermediate

Duration60 mins

TopicHash Table Structure & Mechanism

4 / 4

How Lookup Works

Retrieving Data: From Key to Value in Constant Time

The lookup operation is the raison d'être of hash tables—the entire data structure exists to make this operation fast. When you write value = map[key] or result = dict.get(key), you're invoking the operation that hash tables optimize above all others.

Lookup mirrors insertion in many ways—both compute the same hash, navigate to the same bucket, and follow the same collision resolution path. But lookup has a different goal: it doesn't modify the table; it extracts information. And it must handle a case insertion never faces: the key might not exist.

Understanding lookup deeply means understanding how hash tables fulfill their O(1) promise, and when that promise might not hold.

What You Will Learn

By the end of this page, you will be able to trace hash table lookups step by step, understand how the table distinguishes 'found' from 'not found', handle collision scenarios during search, and recognize when lookup performance degrades from O(1).

The Lookup Algorithm: Overview

Hash table lookup follows a predictable sequence that closely mirrors insertion, with different logic at the end:

The Universal Lookup Algorithm:

LOOKUP(table, key):
    1. HASH: Compute hash_code = hash(key)
    2. INDEX: Compute bucket_index = hash_code mod table.capacity
    3. LOCATE: Navigate to table.buckets[bucket_index]
    4. SEARCH: Check if key exists at this location
    5. RETURN: Return value if found, or indicate not found

Steps 1-3 are identical to insertion. The difference lies in step 4's termination conditions and step 5's possible outcomes.

Why Lookup Is Symmetric to Insertion:

The hash function's determinism guarantees that any key always maps to the same bucket. If we inserted (key, value) at bucket i, then looking up key will also navigate to bucket i. The search process that finds where to insert a new key is the same process that finds an existing key.

Lookup Can Return

•Found: The value associated with the key
•Not Found: Indication that key doesn't exist
•Error: In some APIs, exception for missing key

Insertion Can Return

•Success: New entry created
•Updated: Existing entry modified
•Error: Table full (shouldn't happen normally)

Read-Only Operation

Lookup is a read-only operation—it never modifies the table. This means lookups can safely occur concurrently in multi-threaded environments (if no writes are happening). Some thread-safe implementations exploit this by allowing lock-free reads.

Steps 1-3: Hashing, Indexing, and Locating

The first three steps of lookup are mechanically identical to insertion. We compute the same hash, derive the same index, and navigate to the same bucket.

Step 1: Hash the Key

Compute hash_code = hash(key). This must produce the exact same value as when the key was inserted—hash functions must be deterministic.

Important: If the key is mutable and has changed since insertion, its hash will differ, and lookup will fail even though the data exists in the table. This is why mutable objects shouldn't be hash table keys.

Step 2: Compute the Index

Compute index = hash_code % capacity. This gives us the bucket where the key should be located.

Note that the table's capacity during lookup must use the current capacity, which may have changed due to rehashing since insertion. But rehashing moves entries to their correct buckets in the new table, so this works correctly.

Step 3: Navigate to the Bucket

Access table.buckets[index] in O(1) time. At this point, we're at the right bucket—now we need to find if our key is there.

lookup_steps_1_3.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def trace_lookup_navigation(table, key):
    """
    Trace steps 1-3 of hash table lookup.
    """
    print(f"\n{'='*60}")
    print(f"LOOKUP TRACE: Steps 1-3 for key {repr(key)}")
    print(f"{'='*60}")
    
    # ========================================
    # STEP 1: Hash the key
    # ========================================
    print("\n[STEP 1] HASH THE KEY")
    print("-" * 40)
    
    hash_code = hash(key)
    print(f"hash({repr(key)}) = {hash_code}")
    
    # Verify determinism
    assert hash(key) == hash_code, "Hash function is not deterministic!"
    print(f"Determinism verified: same key → same hash ✓")
    
    # ========================================
    # STEP 2: Compute bucket index
    # ========================================
    print("\n[STEP 2] COMPUTE INDEX")
    print("-" * 40)
    
    index = hash_code % table.capacity
    print(f"Table capacity: {table.capacity}")
    print(f"Index = hash_code % capacity")
    print(f"Index = {hash_code} % {table.capacity} = {index}")
    
    # ========================================
    # STEP 3: Navigate to bucket
    # ========================================
    print("\n[STEP 3] NAVIGATE TO BUCKET")
    print("-" * 40)
    
    bucket = table.buckets[index]
    print(f"Accessing table.buckets[{index}]")
    print(f"Operation: O(1) array access")
    
    if bucket is None:
        print(f"Bucket is EMPTY")
        print(f"Early termination possible: key definitely not in table")
    else:
        print(f"Bucket is OCCUPIED")
        print(f"Must search bucket contents for the key")
    
    return index, bucket, hash_code
 
 
# Example: Looking up a key
class MockTable:
    def __init__(self):
        self.capacity = 16
        self.buckets = [None] * 16
        # Simulate some data
        self.buckets[5] = "chain_node_placeholder"
 
table = MockTable()
trace_lookup_navigation(table, "alice")

The Critical Insight

Steps 1-3 are O(1) and bring us directly to the right bucket. Without hashing, finding a key would require O(n) linear search through all entries. Hashing reduces the search space from 'entire table' to 'single bucket' in constant time.

Step 4: Searching the Bucket

Once we're at the correct bucket, we must search for the key. This is where the collision resolution strategy matters most. The search logic determines:

How to traverse possible locations
When to stop searching
How to verify we've found the right key

Let's examine search behavior for both major strategies.

In chaining, search means traversing the linked list at the bucket:

SEARCH_CHAIN(bucket, key, hash_code):
    current = bucket.head
    while current is not None:
        if current.hash_code == hash_code:  // Fast check
            if current.key == key:          // Full comparison
                return current.value        // FOUND!
        current = current.next
    return NOT_FOUND

Termination Conditions:

Found: current.key == key → return the value
Not found: current == None (end of chain) → return not found

Optimization: Hash Comparison First

Comparing hash codes (integers) is faster than comparing keys (potentially large strings or objects). By checking hash_code == candidate.hash_code first, we quickly skip non-matching entries. Only when hashes match do we perform the more expensive key comparison.

chaining_lookup.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
from typing import Optional, Tuple
 
 
class ChainNode:
    def __init__(self, key, value, hash_code: int):
        self.key = key
        self.value = value
        self.hash_code = hash_code
        self.next: Optional['ChainNode'] = None
 
 
def lookup_chaining(bucket_head: Optional[ChainNode], key, target_hash: int) -> Tuple[bool, Optional[any], int]:
    """
    Search a chained bucket for a key.
    
    Returns: (found: bool, value: Optional, comparisons: int)
    """
    print(f"\n[STEP 4] SEARCHING CHAIN")
    print("-" * 40)
    
    current = bucket_head
    comparisons = 0
    
    if current is None:
        print(f"Chain is empty → Key not in table")
        return False, None, 0
    
    while current is not None:
        comparisons += 1
        print(f"\nComparison #{comparisons}:")
        print(f"  Candidate key: {repr(current.key)}")
        
        # Fast path: compare hash codes first
        if current.hash_code == target_hash:
            print(f"  Hash codes match: {target_hash}")
            
            # Slow path: compare actual keys
            if current.key == key:
                print(f"  Keys match: FOUND!")
                print(f"  Value: {current.value}")
                return True, current.value, comparisons
            else:
                print(f"  Hash collision: keys differ despite same hash")
        else:
            print(f"  Hash codes differ: {current.hash_code} ≠ {target_hash}")
        
        current = current.next
    
    print(f"\nReached end of chain after {comparisons} comparisons")
    print(f"Key not found in table")
    return False, None, comparisons
 
 
# Example: Chain with 3 nodes
# alice -> bob -> charlie
alice = ChainNode("alice", 100, hash("alice"))
bob = ChainNode("bob", 200, hash("bob"))
charlie = ChainNode("charlie", 300, hash("charlie"))
alice.next = bob
bob.next = charlie
 
# Search for "bob" (found after 2 comparisons)
found, value, comps = lookup_chaining(alice, "bob", hash("bob"))
 
# Search for "david" (not found after 3 comparisons)
found, value, comps = lookup_chaining(alice, "david", hash("david"))

Step 5: Returning the Result

The search concludes with one of two outcomes: the key is found, or it's not. Different programming languages and APIs handle these outcomes differently.

When Key Is Found:

The lookup returns the associated value. This is the happy path—the hash table delivers on its O(1) promise.

When Key Is Not Found:

The lookup must signal that the key doesn't exist. Common approaches:

How Languages Handle Missing Keys
Language	Direct Access	Safe Access	Default Value
Python	dict[key] → KeyError	dict.get(key) → None	dict.get(key, default)
JavaScript	obj[key] → undefined	obj[key] → undefined	obj[key] ?? default
Java	map.get(key) → null	map.containsKey(key)	map.getOrDefault(key, default)
C++	map[key] → creates entry!	map.find(key)	map.count(key) > 0
Go	m[key] → zero value	v, ok := m[key]	v, ok := m[key]
Rust	— (no direct access)	map.get(&key) → Option<V>	map.get(&key).unwrap_or(default)

C++ Map Access Creates Entries!

In C++, accessing std::map[key] for a non-existent key creates a new entry with a default value. This is a common source of bugs. Use map.find(key) or map.count(key) for safe lookup without side effects.

result_handling.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
from typing import TypeVar, Optional, Generic
from dataclasses import dataclass
 
K = TypeVar('K')
V = TypeVar('V')
 
 
@dataclass
class LookupResult(Generic[V]):
    """
    Explicit result type for hash table lookup.
    Avoids ambiguity between 'not found' and 'found with None value'.
    """
    found: bool
    value: Optional[V]
    probes: int  # For performance analysis
    
    @classmethod
    def success(cls, value: V, probes: int) -> 'LookupResult[V]':
        return cls(found=True, value=value, probes=probes)
    
    @classmethod
    def not_found(cls, probes: int) -> 'LookupResult[V]':
        return cls(found=False, value=None, probes=probes)
 
 
class HashTable(Generic[K, V]):
    """Hash table with explicit lookup result handling."""
    
    def lookup(self, key: K) -> LookupResult[V]:
        """
        Look up a key and return explicit result.
        """
        hash_code = hash(key)
        index = hash_code % self.capacity
        bucket = self.buckets[index]
        
        probes = 0
        current = bucket
        
        while current is not None:
            probes += 1
            if current.hash_code == hash_code and current.key == key:
                return LookupResult.success(current.value, probes)
            current = current.next
        
        return LookupResult.not_found(probes)
    
    # Python dict-style methods
    def __getitem__(self, key: K) -> V:
        """dict[key] - raises KeyError if not found."""
        result = self.lookup(key)
        if not result.found:
            raise KeyError(key)
        return result.value
    
    def get(self, key: K, default: Optional[V] = None) -> Optional[V]:
        """dict.get(key, default) - returns default if not found."""
        result = self.lookup(key)
        return result.value if result.found else default
    
    def __contains__(self, key: K) -> bool:
        """key in dict - membership test."""
        return self.lookup(key).found
    
    def setdefault(self, key: K, default: V) -> V:
        """Return value if exists, else insert and return default."""
        result = self.lookup(key)
        if result.found:
            return result.value
        else:
            self.insert(key, default)
            return default
 
 
# Usage patterns
def demonstrate_lookup_patterns():
    ht = HashTable()
    ht.insert("name", "Alice")
    ht.insert("age", 30)
    
    # Pattern 1: Direct access (raises exception)
    try:
        phone = ht["phone"]
    except KeyError:
        print("KeyError: 'phone' not in table")
    
    # Pattern 2: Safe access with default
    phone = ht.get("phone", "N/A")
    print(f"Phone: {phone}")  # "N/A"
    
    # Pattern 3: Check then access
    if "name" in ht:
        print(f"Name: {ht['name']}")
    
    # Pattern 4: Explicit result
    result = ht.lookup("email")
    if result.found:
        print(f"Email: {result.value}")
    else:
        print(f"Email not found (checked {result.probes} locations)")

Complete Lookup Trace: Start to Finish

Let's trace complete lookup operations through all steps, seeing how each scenario plays out from start to finish.

complete_lookup_trace.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
def complete_lookup_trace(table, key):
    """
    Trace every step of hash table lookup.
    """
    print("\n" + "=" * 70)
    print(f"COMPLETE LOOKUP TRACE: key = {repr(key)}")
    print("=" * 70)
    
    # ========================================
    # STEP 1: Hash the key
    # ========================================
    print("\n[STEP 1] HASH THE KEY")
    print("-" * 50)
    
    hash_code = hash(key)
    print(f"hash({repr(key)}) = {hash_code}")
    
    # ========================================
    # STEP 2: Compute bucket index
    # ========================================
    print("\n[STEP 2] COMPUTE INDEX")
    print("-" * 50)
    
    index = hash_code % table.capacity
    print(f"index = {hash_code} % {table.capacity} = {index}")
    
    # ========================================
    # STEP 3: Navigate to bucket
    # ========================================
    print("\n[STEP 3] NAVIGATE TO BUCKET")
    print("-" * 50)
    
    bucket = table.buckets[index]
    print(f"Accessing table.buckets[{index}]")
    
    if bucket is None:
        print(f"Bucket is EMPTY")
        print("\n[STEP 4] SEARCH: Skipped (bucket empty)")
        print("\n[STEP 5] RESULT: NOT FOUND")
        print("\n" + "=" * 70)
        print("LOOKUP COMPLETE: Key not in table")
        print(f"Total comparisons: 0")
        print("=" * 70)
        return None, 0
    
    print(f"Bucket contains data - must search")
    
    # ========================================
    # STEP 4: Search the bucket
    # ========================================
    print("\n[STEP 4] SEARCH THE BUCKET")
    print("-" * 50)
    
    current = bucket
    comparisons = 0
    
    while current is not None:
        comparisons += 1
        print(f"\n  Comparison #{comparisons}:")
        print(f"    Examining: key={repr(current.key)}, hash={current.hash_code}")
        
        # Check hash first
        if current.hash_code == hash_code:
            print(f"    Hash codes MATCH")
            
            # Check key
            if current.key == key:
                print(f"    Keys MATCH → FOUND!")
                
                # ========================================
                # STEP 5: Return result
                # ========================================
                print("\n[STEP 5] RETURN RESULT")
                print("-" * 50)
                print(f"Value: {current.value}")
                
                print("\n" + "=" * 70)
                print("LOOKUP COMPLETE: Key found")
                print(f"Total comparisons: {comparisons}")
                print(f"Time complexity: O(1) achieved" if comparisons <= 2 else 
                      f"Time complexity: O({comparisons}) - degraded due to collisions")
                print("=" * 70)
                
                return current.value, comparisons
            else:
                print(f"    Keys DIFFER (hash collision)")
        else:
            print(f"    Hash codes differ, skip")
        
        current = current.next
    
    # ========================================
    # STEP 5: Return not found
    # ========================================
    print("\n[STEP 5] RETURN RESULT")
    print("-" * 50)
    print(f"Reached end of chain")
    print(f"Key not in table")
    
    print("\n" + "=" * 70)
    print("LOOKUP COMPLETE: Key NOT found")
    print(f"Total comparisons: {comparisons}")
    print("=" * 70)
    
    return None, comparisons
 
 
# Test scenarios
class TestTable:
    def __init__(self):
        self.capacity = 8
        self.buckets = [None] * 8
        
        # Set up a chain at bucket 3: alice -> bob -> charlie
        alice = ChainNode("alice", 100, hash("alice"))
        bob = ChainNode("bob", 200, hash("bob"))
        charlie = ChainNode("charlie", 300, hash("charlie"))
        alice.next = bob
        bob.next = charlie
        
        # Assume all hash to bucket 3 for demonstration
        self.buckets[3] = alice
 
# Scenario 1: Key found immediately
# complete_lookup_trace(table, "alice")  # 1 comparison
 
# Scenario 2: Key found after traversal
# complete_lookup_trace(table, "charlie")  # 3 comparisons
 
# Scenario 3: Key not found
# complete_lookup_trace(table, "david")  # 3 comparisons, not found
 
# Scenario 4: Empty bucket
# complete_lookup_trace(table, "empty_bucket_key")  # 0 comparisons

Lookup Performance Analysis

Understanding when lookup achieves O(1) and when it degrades is crucial for using hash tables effectively.

Best Case: O(1)

Lookup is O(1) when:

The bucket is empty (immediate not-found)
The key is first in its chain or found on first probe
Collisions are rare and chains/probe sequences are short

Average Case: O(1 + α) where α is load factor

With a good hash function and reasonable load factor:

Chaining: Average chain length ≈ load_factor = n/m
Open addressing: Average probes ≈ 1/(1-α) for successful search

For load factor 0.5, average probes ≈ 2. For 0.75, average probes ≈ 4.

Worst Case: O(n)

Lookup degrades to O(n) when:

All keys hash to the same bucket (perfect collision)
Hash function is poor (clusters keys)
Table is nearly full (open addressing)
Adversarial input exploiting hash function

Lookup Performance by Scenario
Scenario	Chaining Cost	Open Addressing Cost	Mitigation
Key in first position	O(1)	O(1)	Already optimal
Key after k collisions	O(k)	O(k)	Keep load factor low
Key not in table	O(chain length)	O(cluster size)	Good hash function
All keys same bucket	O(n)	O(n)	Use randomized or cryptographic hash
Table 90% full (open)	N/A	O(10) probes avg	Rehash earlier (75% threshold)

performance_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import random
import time
from typing import List, Tuple
 
 
def analyze_lookup_performance():
    """
    Measure and visualize lookup performance under different conditions.
    """
    
    def measure_lookups(table, keys: List[str], existing_keys: List[str]) -> Tuple[float, float]:
        """
        Measure average time for successful and unsuccessful lookups.
        """
        # Successful lookups
        start = time.perf_counter()
        for key in existing_keys[:1000]:
            _ = table.get(key)
        success_time = (time.perf_counter() - start) / 1000
        
        # Unsuccessful lookups (keys not in table)
        missing_keys = [f"missing_{i}" for i in range(1000)]
        start = time.perf_counter()
        for key in missing_keys:
            _ = table.get(key)
        failure_time = (time.perf_counter() - start) / 1000
        
        return success_time, failure_time
    
    print("=" * 60)
    print("HASH TABLE LOOKUP PERFORMANCE ANALYSIS")
    print("=" * 60)
    
    # Test with different load factors
    for load_factor in [0.25, 0.50, 0.75, 0.90]:
        capacity = 10000
        n_entries = int(capacity * load_factor)
        
        table = dict()  # Python's dict
        keys = [f"key_{i}" for i in range(n_entries)]
        
        # Insert entries
        for key in keys:
            table[key] = random.randint(1, 1000)
        
        # Measure
        success_time, failure_time = measure_lookups(table, keys, keys)
        
        print(f"\nLoad factor: {load_factor:.0%}")
        print(f"  Entries: {n_entries:,}")
        print(f"  Successful lookup: {success_time*1e6:.2f} µs avg")
        print(f"  Unsuccessful lookup: {failure_time*1e6:.2f} µs avg")
    
    # Test pathological case: poor hash function
    print("\n" + "-" * 60)
    print("PATHOLOGICAL CASE: All keys hash to same bucket")
    print("-" * 60)
    
    class BadHash:
        """Object with terrible hash function."""
        def __init__(self, value):
            self.value = value
        def __hash__(self):
            return 42  # All objects hash to same value!
        def __eq__(self, other):
            return isinstance(other, BadHash) and self.value == other.value
    
    # This would create O(n) lookups!
    bad_keys = [BadHash(i) for i in range(1000)]
    # Lookup time would grow linearly with table size
 
 
def expected_probes_formula():
    """
    Show the mathematical relationship between load factor and probes.
    """
    print("\n" + "=" * 60)
    print("EXPECTED PROBES BY LOAD FACTOR (Open Addressing)")
    print("=" * 60)
    
    print("\nSuccessful search: E[probes] = (1/α) * ln(1/(1-α))")
    print("Unsuccessful search: E[probes] = 1/(1-α)")
    
    print("\n{:<15} {:>20} {:>20}".format(
        "Load Factor", "Successful", "Unsuccessful"
    ))
    print("-" * 55)
    
    import math
    for alpha in [0.10, 0.25, 0.50, 0.75, 0.90, 0.95]:
        success_probes = (1/alpha) * math.log(1/(1-alpha))
        failure_probes = 1 / (1 - alpha)
        
        print("{:<15} {:>20.2f} {:>20.2f}".format(
            f"{alpha:.0%}",
            success_probes,
            failure_probes
        ))
 
# Output:
# Load Factor      Successful          Unsuccessful
# -------------------------------------------------------
# 10%                       1.05               1.11
# 25%                       1.15               1.33
# 50%                       1.39               2.00
# 75%                       1.85               4.00
# 90%                       2.56              10.00
# 95%                       3.15              20.00

The Load Factor Sweet Spot

At 75% load factor, unsuccessful lookup probes about 4 slots on average. At 90%, it jumps to 10. At 95%, it becomes 20. This exponential growth explains why hash tables rehash well before reaching capacity—the performance cliff is steep.

Lookup Variations and Optimizations

Real-world hash table implementations include numerous optimizations to make lookup even faster. Understanding these helps you choose implementations and tune performance.

Optimization 1: Hash Caching

Store the hash code alongside each entry. Comparing integers (hashes) is faster than comparing strings (keys). Only perform the expensive key comparison when hashes match.

Optimization 2: Linear Probing Clustering Avoidance

Quadratic probing or double hashing spread probes more evenly, reducing primary clustering in open addressing.

Optimization 3: Robin Hood Hashing

During insertion, if the new key has probed further than the existing key at a slot, swap them. This bounds the maximum probe distance and makes lookup more predictable.

Optimization 4: SIMD Probing

Modern implementations (like Swiss Tables in Abseil) use SIMD instructions to check multiple slots in parallel during open addressing lookup.

Optimization 5: Bucket Trees (Java 8+)

When chains grow long (typically 8+ entries), Java HashMap converts them to balanced trees, guaranteeing O(log n) worst case instead of O(n).

Common Lookup Variants

•get(key) — Return value or null/None/undefined if missing
•getOrDefault(key, default) — Return value or the provided default
•getOrThrow(key) — Return value or throw an exception
•contains(key) / has(key) — Boolean check for key existence
•getOrInsert(key, default) — Return existing value or insert and return default
•computeIfAbsent(key, function) — Return existing or compute, insert, and return new value

lookup_variants.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
class OptimizedHashTable:
    """
    Hash table with optimized lookup and various access patterns.
    """
    
    def __init__(self, capacity: int = 16):
        self.capacity = capacity
        self.buckets = [None] * capacity
        self.size = 0
    
    # ========================
    # CORE LOOKUP (optimized)
    # ========================
    
    def _lookup_internal(self, key) -> tuple:
        """
        Internal lookup with hash caching optimization.
        Returns (found, value, bucket_index, node) for flexibility.
        """
        hash_code = hash(key)
        index = hash_code % self.capacity
        
        current = self.buckets[index]
        while current is not None:
            # OPTIMIZATION: Compare cached hash first (fast int comparison)
            if current.hash_code == hash_code:
                # Only then compare actual keys (potentially slow)
                if current.key == key:
                    return True, current.value, index, current
            current = current.next
        
        return False, None, index, None
    
    # ========================
    # LOOKUP VARIANTS
    # ========================
    
    def get(self, key, default=None):
        """Return value or default."""
        found, value, _, _ = self._lookup_internal(key)
        return value if found else default
    
    def __getitem__(self, key):
        """Direct access - raises KeyError if missing."""
        found, value, _, _ = self._lookup_internal(key)
        if not found:
            raise KeyError(key)
        return value
    
    def __contains__(self, key) -> bool:
        """key in table - membership test."""
        found, _, _, _ = self._lookup_internal(key)
        return found
    
    def setdefault(self, key, default=None):
        """
        If key exists, return its value.
        Otherwise, insert default and return it.
        """
        found, value, index, _ = self._lookup_internal(key)
        if found:
            return value
        
        # Insert the default
        self._insert_at(key, default, index)
        return default
    
    def pop(self, key, default=_MISSING):
        """
        Remove and return value.
        If key missing and default provided, return default.
        If key missing and no default, raise KeyError.
        """
        found, value, index, node = self._lookup_internal(key)
        
        if found:
            self._remove_node(index, node)
            return value
        elif default is not _MISSING:
            return default
        else:
            raise KeyError(key)
    
    def items(self):
        """
        Iterate over all key-value pairs.
        Note: This is O(n), not O(1)!
        """
        for bucket in self.buckets:
            current = bucket
            while current is not None:
                yield (current.key, current.value)
                current = current.next
 
 
_MISSING = object()  # Sentinel for distinguishing None from missing
 
 
# Usage demonstration
def demonstrate_variants():
    ht = OptimizedHashTable()
    ht["a"] = 1
    ht["b"] = 2
    
    # Various access patterns
    print(ht.get("a"))           # 1
    print(ht.get("c"))           # None
    print(ht.get("c", "default")) # "default"
    print("a" in ht)             # True
    print("c" in ht)             # False
    
    # setdefault pattern (useful for grouping)
    groups = OptimizedHashTable()
    for item, category in [("apple", "fruit"), ("carrot", "veg"), ("banana", "fruit")]:
        # Creates list if category doesn't exist, appends to it
        items = groups.setdefault(category, [])
        items.append(item)
    
    # Result: {"fruit": ["apple", "banana"], "veg": ["carrot"]}

Summary: How Lookup Works

We've thoroughly explored the hash table lookup operation—the core retrieval mechanism that makes hash tables so valuable. Let's consolidate the key insights:

Key Takeaways

•Lookup follows five steps: Hash → Index → Locate → Search → Return. Steps 1-3 are identical to insertion; the difference is in search termination and result handling.
•Hash function determinism is critical — the same key must produce the same hash code to find previously inserted data.
•Empty slots prove absence — in open addressing, an EMPTY slot definitively proves the key was never inserted along this probe path. Tombstones (DELETED) require continued search.
•Chaining searches linked lists — performance depends on chain length, which correlates with load factor.
•Open addressing follows probe sequences — the same sequence used during insertion must be followed during lookup.
•O(1) is the average case — with good hash functions and reasonable load factors. Worst case is O(n) with pathological collision patterns.
•Multiple lookup APIs exist — get(), getOrDefault(), contains(), direct access. Choose based on how you want to handle missing keys.

Module Complete:

You've now mastered the four core topics of hash table structure and mechanism:

What is a hash table — the conceptual foundation combining arrays with hash functions
Buckets and slots — the internal organization that stores data
How insertion works — the complete process of adding key-value pairs
How lookup works — the retrieval mechanism that delivers O(1) access

With this knowledge, you can trace hash table operations step by step, predict performance characteristics, and make informed decisions about when and how to use hash tables in your systems.

Module Complete

Congratulations! You now have a deep understanding of hash table structure and mechanism. You can trace insertions and lookups, understand collision handling, predict performance, and reason about the internal state of hash tables. This foundation prepares you for the next modules covering collision handling strategies in depth.

4 / 4

Loading learning content...

Data Structures & AlgorithmsHash Table Structure & Mechanism

Hash Tables — Structure & Mechanism

LevelIntermediate

Duration60 mins

TopicHash Table Structure & Mechanism

4 / 4

How Lookup Works

Retrieving Data: From Key to Value in Constant Time

Understanding lookup deeply means understanding how hash tables fulfill their O(1) promise, and when that promise might not hold.

What You Will Learn

The Lookup Algorithm: Overview

Hash table lookup follows a predictable sequence that closely mirrors insertion, with different logic at the end:

The Universal Lookup Algorithm:

LOOKUP(table, key):
    1. HASH: Compute hash_code = hash(key)
    2. INDEX: Compute bucket_index = hash_code mod table.capacity
    3. LOCATE: Navigate to table.buckets[bucket_index]
    4. SEARCH: Check if key exists at this location
    5. RETURN: Return value if found, or indicate not found

Steps 1-3 are identical to insertion. The difference lies in step 4's termination conditions and step 5's possible outcomes.

Why Lookup Is Symmetric to Insertion:

Lookup Can Return

•Found: The value associated with the key
•Not Found: Indication that key doesn't exist
•Error: In some APIs, exception for missing key

Insertion Can Return

•Success: New entry created
•Updated: Existing entry modified
•Error: Table full (shouldn't happen normally)

Read-Only Operation

Steps 1-3: Hashing, Indexing, and Locating

The first three steps of lookup are mechanically identical to insertion. We compute the same hash, derive the same index, and navigate to the same bucket.

Step 1: Hash the Key

Compute hash_code = hash(key). This must produce the exact same value as when the key was inserted—hash functions must be deterministic.

Step 2: Compute the Index

Compute index = hash_code % capacity. This gives us the bucket where the key should be located.

Step 3: Navigate to the Bucket

Access table.buckets[index] in O(1) time. At this point, we're at the right bucket—now we need to find if our key is there.

lookup_steps_1_3.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def trace_lookup_navigation(table, key):
    """
    Trace steps 1-3 of hash table lookup.
    """
    print(f"\n{'='*60}")
    print(f"LOOKUP TRACE: Steps 1-3 for key {repr(key)}")
    print(f"{'='*60}")
    
    # ========================================
    # STEP 1: Hash the key
    # ========================================
    print("\n[STEP 1] HASH THE KEY")
    print("-" * 40)
    
    hash_code = hash(key)
    print(f"hash({repr(key)}) = {hash_code}")
    
    # Verify determinism
    assert hash(key) == hash_code, "Hash function is not deterministic!"
    print(f"Determinism verified: same key → same hash ✓")
    
    # ========================================
    # STEP 2: Compute bucket index
    # ========================================
    print("\n[STEP 2] COMPUTE INDEX")
    print("-" * 40)
    
    index = hash_code % table.capacity
    print(f"Table capacity: {table.capacity}")
    print(f"Index = hash_code % capacity")
    print(f"Index = {hash_code} % {table.capacity} = {index}")
    
    # ========================================
    # STEP 3: Navigate to bucket
    # ========================================
    print("\n[STEP 3] NAVIGATE TO BUCKET")
    print("-" * 40)
    
    bucket = table.buckets[index]
    print(f"Accessing table.buckets[{index}]")
    print(f"Operation: O(1) array access")
    
    if bucket is None:
        print(f"Bucket is EMPTY")
        print(f"Early termination possible: key definitely not in table")
    else:
        print(f"Bucket is OCCUPIED")
        print(f"Must search bucket contents for the key")
    
    return index, bucket, hash_code
 
 
# Example: Looking up a key
class MockTable:
    def __init__(self):
        self.capacity = 16
        self.buckets = [None] * 16
        # Simulate some data
        self.buckets[5] = "chain_node_placeholder"
 
table = MockTable()
trace_lookup_navigation(table, "alice")

The Critical Insight

Step 4: Searching the Bucket

Once we're at the correct bucket, we must search for the key. This is where the collision resolution strategy matters most. The search logic determines:

How to traverse possible locations
When to stop searching
How to verify we've found the right key

Let's examine search behavior for both major strategies.

In chaining, search means traversing the linked list at the bucket:

SEARCH_CHAIN(bucket, key, hash_code):
    current = bucket.head
    while current is not None:
        if current.hash_code == hash_code:  // Fast check
            if current.key == key:          // Full comparison
                return current.value        // FOUND!
        current = current.next
    return NOT_FOUND

Termination Conditions:

Found: current.key == key → return the value
Not found: current == None (end of chain) → return not found

Optimization: Hash Comparison First

chaining_lookup.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
from typing import Optional, Tuple
 
 
class ChainNode:
    def __init__(self, key, value, hash_code: int):
        self.key = key
        self.value = value
        self.hash_code = hash_code
        self.next: Optional['ChainNode'] = None
 
 
def lookup_chaining(bucket_head: Optional[ChainNode], key, target_hash: int) -> Tuple[bool, Optional[any], int]:
    """
    Search a chained bucket for a key.
    
    Returns: (found: bool, value: Optional, comparisons: int)
    """
    print(f"\n[STEP 4] SEARCHING CHAIN")
    print("-" * 40)
    
    current = bucket_head
    comparisons = 0
    
    if current is None:
        print(f"Chain is empty → Key not in table")
        return False, None, 0
    
    while current is not None:
        comparisons += 1
        print(f"\nComparison #{comparisons}:")
        print(f"  Candidate key: {repr(current.key)}")
        
        # Fast path: compare hash codes first
        if current.hash_code == target_hash:
            print(f"  Hash codes match: {target_hash}")
            
            # Slow path: compare actual keys
            if current.key == key:
                print(f"  Keys match: FOUND!")
                print(f"  Value: {current.value}")
                return True, current.value, comparisons
            else:
                print(f"  Hash collision: keys differ despite same hash")
        else:
            print(f"  Hash codes differ: {current.hash_code} ≠ {target_hash}")
        
        current = current.next
    
    print(f"\nReached end of chain after {comparisons} comparisons")
    print(f"Key not found in table")
    return False, None, comparisons
 
 
# Example: Chain with 3 nodes
# alice -> bob -> charlie
alice = ChainNode("alice", 100, hash("alice"))
bob = ChainNode("bob", 200, hash("bob"))
charlie = ChainNode("charlie", 300, hash("charlie"))
alice.next = bob
bob.next = charlie
 
# Search for "bob" (found after 2 comparisons)
found, value, comps = lookup_chaining(alice, "bob", hash("bob"))
 
# Search for "david" (not found after 3 comparisons)
found, value, comps = lookup_chaining(alice, "david", hash("david"))

Step 5: Returning the Result

The search concludes with one of two outcomes: the key is found, or it's not. Different programming languages and APIs handle these outcomes differently.

When Key Is Found:

The lookup returns the associated value. This is the happy path—the hash table delivers on its O(1) promise.

When Key Is Not Found:

The lookup must signal that the key doesn't exist. Common approaches:

How Languages Handle Missing Keys
Language	Direct Access	Safe Access	Default Value
Python	dict[key] → KeyError	dict.get(key) → None	dict.get(key, default)
JavaScript	obj[key] → undefined	obj[key] → undefined	obj[key] ?? default
Java	map.get(key) → null	map.containsKey(key)	map.getOrDefault(key, default)
C++	map[key] → creates entry!	map.find(key)	map.count(key) > 0
Go	m[key] → zero value	v, ok := m[key]	v, ok := m[key]
Rust	— (no direct access)	map.get(&key) → Option<V>	map.get(&key).unwrap_or(default)

C++ Map Access Creates Entries!

result_handling.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
from typing import TypeVar, Optional, Generic
from dataclasses import dataclass
 
K = TypeVar('K')
V = TypeVar('V')
 
 
@dataclass
class LookupResult(Generic[V]):
    """
    Explicit result type for hash table lookup.
    Avoids ambiguity between 'not found' and 'found with None value'.
    """
    found: bool
    value: Optional[V]
    probes: int  # For performance analysis
    
    @classmethod
    def success(cls, value: V, probes: int) -> 'LookupResult[V]':
        return cls(found=True, value=value, probes=probes)
    
    @classmethod
    def not_found(cls, probes: int) -> 'LookupResult[V]':
        return cls(found=False, value=None, probes=probes)
 
 
class HashTable(Generic[K, V]):
    """Hash table with explicit lookup result handling."""
    
    def lookup(self, key: K) -> LookupResult[V]:
        """
        Look up a key and return explicit result.
        """
        hash_code = hash(key)
        index = hash_code % self.capacity
        bucket = self.buckets[index]
        
        probes = 0
        current = bucket
        
        while current is not None:
            probes += 1
            if current.hash_code == hash_code and current.key == key:
                return LookupResult.success(current.value, probes)
            current = current.next
        
        return LookupResult.not_found(probes)
    
    # Python dict-style methods
    def __getitem__(self, key: K) -> V:
        """dict[key] - raises KeyError if not found."""
        result = self.lookup(key)
        if not result.found:
            raise KeyError(key)
        return result.value
    
    def get(self, key: K, default: Optional[V] = None) -> Optional[V]:
        """dict.get(key, default) - returns default if not found."""
        result = self.lookup(key)
        return result.value if result.found else default
    
    def __contains__(self, key: K) -> bool:
        """key in dict - membership test."""
        return self.lookup(key).found
    
    def setdefault(self, key: K, default: V) -> V:
        """Return value if exists, else insert and return default."""
        result = self.lookup(key)
        if result.found:
            return result.value
        else:
            self.insert(key, default)
            return default
 
 
# Usage patterns
def demonstrate_lookup_patterns():
    ht = HashTable()
    ht.insert("name", "Alice")
    ht.insert("age", 30)
    
    # Pattern 1: Direct access (raises exception)
    try:
        phone = ht["phone"]
    except KeyError:
        print("KeyError: 'phone' not in table")
    
    # Pattern 2: Safe access with default
    phone = ht.get("phone", "N/A")
    print(f"Phone: {phone}")  # "N/A"
    
    # Pattern 3: Check then access
    if "name" in ht:
        print(f"Name: {ht['name']}")
    
    # Pattern 4: Explicit result
    result = ht.lookup("email")
    if result.found:
        print(f"Email: {result.value}")
    else:
        print(f"Email not found (checked {result.probes} locations)")

Complete Lookup Trace: Start to Finish

Let's trace complete lookup operations through all steps, seeing how each scenario plays out from start to finish.

complete_lookup_trace.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
def complete_lookup_trace(table, key):
    """
    Trace every step of hash table lookup.
    """
    print("\n" + "=" * 70)
    print(f"COMPLETE LOOKUP TRACE: key = {repr(key)}")
    print("=" * 70)
    
    # ========================================
    # STEP 1: Hash the key
    # ========================================
    print("\n[STEP 1] HASH THE KEY")
    print("-" * 50)
    
    hash_code = hash(key)
    print(f"hash({repr(key)}) = {hash_code}")
    
    # ========================================
    # STEP 2: Compute bucket index
    # ========================================
    print("\n[STEP 2] COMPUTE INDEX")
    print("-" * 50)
    
    index = hash_code % table.capacity
    print(f"index = {hash_code} % {table.capacity} = {index}")
    
    # ========================================
    # STEP 3: Navigate to bucket
    # ========================================
    print("\n[STEP 3] NAVIGATE TO BUCKET")
    print("-" * 50)
    
    bucket = table.buckets[index]
    print(f"Accessing table.buckets[{index}]")
    
    if bucket is None:
        print(f"Bucket is EMPTY")
        print("\n[STEP 4] SEARCH: Skipped (bucket empty)")
        print("\n[STEP 5] RESULT: NOT FOUND")
        print("\n" + "=" * 70)
        print("LOOKUP COMPLETE: Key not in table")
        print(f"Total comparisons: 0")
        print("=" * 70)
        return None, 0
    
    print(f"Bucket contains data - must search")
    
    # ========================================
    # STEP 4: Search the bucket
    # ========================================
    print("\n[STEP 4] SEARCH THE BUCKET")
    print("-" * 50)
    
    current = bucket
    comparisons = 0
    
    while current is not None:
        comparisons += 1
        print(f"\n  Comparison #{comparisons}:")
        print(f"    Examining: key={repr(current.key)}, hash={current.hash_code}")
        
        # Check hash first
        if current.hash_code == hash_code:
            print(f"    Hash codes MATCH")
            
            # Check key
            if current.key == key:
                print(f"    Keys MATCH → FOUND!")
                
                # ========================================
                # STEP 5: Return result
                # ========================================
                print("\n[STEP 5] RETURN RESULT")
                print("-" * 50)
                print(f"Value: {current.value}")
                
                print("\n" + "=" * 70)
                print("LOOKUP COMPLETE: Key found")
                print(f"Total comparisons: {comparisons}")
                print(f"Time complexity: O(1) achieved" if comparisons <= 2 else 
                      f"Time complexity: O({comparisons}) - degraded due to collisions")
                print("=" * 70)
                
                return current.value, comparisons
            else:
                print(f"    Keys DIFFER (hash collision)")
        else:
            print(f"    Hash codes differ, skip")
        
        current = current.next
    
    # ========================================
    # STEP 5: Return not found
    # ========================================
    print("\n[STEP 5] RETURN RESULT")
    print("-" * 50)
    print(f"Reached end of chain")
    print(f"Key not in table")
    
    print("\n" + "=" * 70)
    print("LOOKUP COMPLETE: Key NOT found")
    print(f"Total comparisons: {comparisons}")
    print("=" * 70)
    
    return None, comparisons
 
 
# Test scenarios
class TestTable:
    def __init__(self):
        self.capacity = 8
        self.buckets = [None] * 8
        
        # Set up a chain at bucket 3: alice -> bob -> charlie
        alice = ChainNode("alice", 100, hash("alice"))
        bob = ChainNode("bob", 200, hash("bob"))
        charlie = ChainNode("charlie", 300, hash("charlie"))
        alice.next = bob
        bob.next = charlie
        
        # Assume all hash to bucket 3 for demonstration
        self.buckets[3] = alice
 
# Scenario 1: Key found immediately
# complete_lookup_trace(table, "alice")  # 1 comparison
 
# Scenario 2: Key found after traversal
# complete_lookup_trace(table, "charlie")  # 3 comparisons
 
# Scenario 3: Key not found
# complete_lookup_trace(table, "david")  # 3 comparisons, not found
 
# Scenario 4: Empty bucket
# complete_lookup_trace(table, "empty_bucket_key")  # 0 comparisons

Lookup Performance Analysis

Understanding when lookup achieves O(1) and when it degrades is crucial for using hash tables effectively.

Best Case: O(1)

Lookup is O(1) when:

The bucket is empty (immediate not-found)
The key is first in its chain or found on first probe
Collisions are rare and chains/probe sequences are short

Average Case: O(1 + α) where α is load factor

With a good hash function and reasonable load factor:

Chaining: Average chain length ≈ load_factor = n/m
Open addressing: Average probes ≈ 1/(1-α) for successful search

For load factor 0.5, average probes ≈ 2. For 0.75, average probes ≈ 4.

Worst Case: O(n)

Lookup degrades to O(n) when:

All keys hash to the same bucket (perfect collision)
Hash function is poor (clusters keys)
Table is nearly full (open addressing)
Adversarial input exploiting hash function

Lookup Performance by Scenario
Scenario	Chaining Cost	Open Addressing Cost	Mitigation
Key in first position	O(1)	O(1)	Already optimal
Key after k collisions	O(k)	O(k)	Keep load factor low
Key not in table	O(chain length)	O(cluster size)	Good hash function
All keys same bucket	O(n)	O(n)	Use randomized or cryptographic hash
Table 90% full (open)	N/A	O(10) probes avg	Rehash earlier (75% threshold)

performance_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import random
import time
from typing import List, Tuple
 
 
def analyze_lookup_performance():
    """
    Measure and visualize lookup performance under different conditions.
    """
    
    def measure_lookups(table, keys: List[str], existing_keys: List[str]) -> Tuple[float, float]:
        """
        Measure average time for successful and unsuccessful lookups.
        """
        # Successful lookups
        start = time.perf_counter()
        for key in existing_keys[:1000]:
            _ = table.get(key)
        success_time = (time.perf_counter() - start) / 1000
        
        # Unsuccessful lookups (keys not in table)
        missing_keys = [f"missing_{i}" for i in range(1000)]
        start = time.perf_counter()
        for key in missing_keys:
            _ = table.get(key)
        failure_time = (time.perf_counter() - start) / 1000
        
        return success_time, failure_time
    
    print("=" * 60)
    print("HASH TABLE LOOKUP PERFORMANCE ANALYSIS")
    print("=" * 60)
    
    # Test with different load factors
    for load_factor in [0.25, 0.50, 0.75, 0.90]:
        capacity = 10000
        n_entries = int(capacity * load_factor)
        
        table = dict()  # Python's dict
        keys = [f"key_{i}" for i in range(n_entries)]
        
        # Insert entries
        for key in keys:
            table[key] = random.randint(1, 1000)
        
        # Measure
        success_time, failure_time = measure_lookups(table, keys, keys)
        
        print(f"\nLoad factor: {load_factor:.0%}")
        print(f"  Entries: {n_entries:,}")
        print(f"  Successful lookup: {success_time*1e6:.2f} µs avg")
        print(f"  Unsuccessful lookup: {failure_time*1e6:.2f} µs avg")
    
    # Test pathological case: poor hash function
    print("\n" + "-" * 60)
    print("PATHOLOGICAL CASE: All keys hash to same bucket")
    print("-" * 60)
    
    class BadHash:
        """Object with terrible hash function."""
        def __init__(self, value):
            self.value = value
        def __hash__(self):
            return 42  # All objects hash to same value!
        def __eq__(self, other):
            return isinstance(other, BadHash) and self.value == other.value
    
    # This would create O(n) lookups!
    bad_keys = [BadHash(i) for i in range(1000)]
    # Lookup time would grow linearly with table size
 
 
def expected_probes_formula():
    """
    Show the mathematical relationship between load factor and probes.
    """
    print("\n" + "=" * 60)
    print("EXPECTED PROBES BY LOAD FACTOR (Open Addressing)")
    print("=" * 60)
    
    print("\nSuccessful search: E[probes] = (1/α) * ln(1/(1-α))")
    print("Unsuccessful search: E[probes] = 1/(1-α)")
    
    print("\n{:<15} {:>20} {:>20}".format(
        "Load Factor", "Successful", "Unsuccessful"
    ))
    print("-" * 55)
    
    import math
    for alpha in [0.10, 0.25, 0.50, 0.75, 0.90, 0.95]:
        success_probes = (1/alpha) * math.log(1/(1-alpha))
        failure_probes = 1 / (1 - alpha)
        
        print("{:<15} {:>20.2f} {:>20.2f}".format(
            f"{alpha:.0%}",
            success_probes,
            failure_probes
        ))
 
# Output:
# Load Factor      Successful          Unsuccessful
# -------------------------------------------------------
# 10%                       1.05               1.11
# 25%                       1.15               1.33
# 50%                       1.39               2.00
# 75%                       1.85               4.00
# 90%                       2.56              10.00
# 95%                       3.15              20.00

The Load Factor Sweet Spot

Lookup Variations and Optimizations

Real-world hash table implementations include numerous optimizations to make lookup even faster. Understanding these helps you choose implementations and tune performance.

Optimization 1: Hash Caching

Store the hash code alongside each entry. Comparing integers (hashes) is faster than comparing strings (keys). Only perform the expensive key comparison when hashes match.

Optimization 2: Linear Probing Clustering Avoidance

Quadratic probing or double hashing spread probes more evenly, reducing primary clustering in open addressing.

Optimization 3: Robin Hood Hashing

During insertion, if the new key has probed further than the existing key at a slot, swap them. This bounds the maximum probe distance and makes lookup more predictable.

Optimization 4: SIMD Probing

Modern implementations (like Swiss Tables in Abseil) use SIMD instructions to check multiple slots in parallel during open addressing lookup.

Optimization 5: Bucket Trees (Java 8+)

When chains grow long (typically 8+ entries), Java HashMap converts them to balanced trees, guaranteeing O(log n) worst case instead of O(n).

Common Lookup Variants

•get(key) — Return value or null/None/undefined if missing
•getOrDefault(key, default) — Return value or the provided default
•getOrThrow(key) — Return value or throw an exception
•contains(key) / has(key) — Boolean check for key existence
•getOrInsert(key, default) — Return existing value or insert and return default
•computeIfAbsent(key, function) — Return existing or compute, insert, and return new value

lookup_variants.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
class OptimizedHashTable:
    """
    Hash table with optimized lookup and various access patterns.
    """
    
    def __init__(self, capacity: int = 16):
        self.capacity = capacity
        self.buckets = [None] * capacity
        self.size = 0
    
    # ========================
    # CORE LOOKUP (optimized)
    # ========================
    
    def _lookup_internal(self, key) -> tuple:
        """
        Internal lookup with hash caching optimization.
        Returns (found, value, bucket_index, node) for flexibility.
        """
        hash_code = hash(key)
        index = hash_code % self.capacity
        
        current = self.buckets[index]
        while current is not None:
            # OPTIMIZATION: Compare cached hash first (fast int comparison)
            if current.hash_code == hash_code:
                # Only then compare actual keys (potentially slow)
                if current.key == key:
                    return True, current.value, index, current
            current = current.next
        
        return False, None, index, None
    
    # ========================
    # LOOKUP VARIANTS
    # ========================
    
    def get(self, key, default=None):
        """Return value or default."""
        found, value, _, _ = self._lookup_internal(key)
        return value if found else default
    
    def __getitem__(self, key):
        """Direct access - raises KeyError if missing."""
        found, value, _, _ = self._lookup_internal(key)
        if not found:
            raise KeyError(key)
        return value
    
    def __contains__(self, key) -> bool:
        """key in table - membership test."""
        found, _, _, _ = self._lookup_internal(key)
        return found
    
    def setdefault(self, key, default=None):
        """
        If key exists, return its value.
        Otherwise, insert default and return it.
        """
        found, value, index, _ = self._lookup_internal(key)
        if found:
            return value
        
        # Insert the default
        self._insert_at(key, default, index)
        return default
    
    def pop(self, key, default=_MISSING):
        """
        Remove and return value.
        If key missing and default provided, return default.
        If key missing and no default, raise KeyError.
        """
        found, value, index, node = self._lookup_internal(key)
        
        if found:
            self._remove_node(index, node)
            return value
        elif default is not _MISSING:
            return default
        else:
            raise KeyError(key)
    
    def items(self):
        """
        Iterate over all key-value pairs.
        Note: This is O(n), not O(1)!
        """
        for bucket in self.buckets:
            current = bucket
            while current is not None:
                yield (current.key, current.value)
                current = current.next
 
 
_MISSING = object()  # Sentinel for distinguishing None from missing
 
 
# Usage demonstration
def demonstrate_variants():
    ht = OptimizedHashTable()
    ht["a"] = 1
    ht["b"] = 2
    
    # Various access patterns
    print(ht.get("a"))           # 1
    print(ht.get("c"))           # None
    print(ht.get("c", "default")) # "default"
    print("a" in ht)             # True
    print("c" in ht)             # False
    
    # setdefault pattern (useful for grouping)
    groups = OptimizedHashTable()
    for item, category in [("apple", "fruit"), ("carrot", "veg"), ("banana", "fruit")]:
        # Creates list if category doesn't exist, appends to it
        items = groups.setdefault(category, [])
        items.append(item)
    
    # Result: {"fruit": ["apple", "banana"], "veg": ["carrot"]}

Summary: How Lookup Works

We've thoroughly explored the hash table lookup operation—the core retrieval mechanism that makes hash tables so valuable. Let's consolidate the key insights:

Key Takeaways

•Lookup follows five steps: Hash → Index → Locate → Search → Return. Steps 1-3 are identical to insertion; the difference is in search termination and result handling.
•Hash function determinism is critical — the same key must produce the same hash code to find previously inserted data.
•Empty slots prove absence — in open addressing, an EMPTY slot definitively proves the key was never inserted along this probe path. Tombstones (DELETED) require continued search.
•Chaining searches linked lists — performance depends on chain length, which correlates with load factor.
•Open addressing follows probe sequences — the same sequence used during insertion must be followed during lookup.
•O(1) is the average case — with good hash functions and reasonable load factors. Worst case is O(n) with pathological collision patterns.
•Multiple lookup APIs exist — get(), getOrDefault(), contains(), direct access. Choose based on how you want to handle missing keys.

Module Complete:

You've now mastered the four core topics of hash table structure and mechanism:

What is a hash table — the conceptual foundation combining arrays with hash functions
Buckets and slots — the internal organization that stores data
How insertion works — the complete process of adding key-value pairs
How lookup works — the retrieval mechanism that delivers O(1) access

With this knowledge, you can trace hash table operations step by step, predict performance characteristics, and make informed decisions about when and how to use hash tables in your systems.

Module Complete

4 / 4