Loading learning content...
The lookup operation is the raison d'être of hash tables—the entire data structure exists to make this operation fast. When you write value = map[key] or result = dict.get(key), you're invoking the operation that hash tables optimize above all others.
Lookup mirrors insertion in many ways—both compute the same hash, navigate to the same bucket, and follow the same collision resolution path. But lookup has a different goal: it doesn't modify the table; it extracts information. And it must handle a case insertion never faces: the key might not exist.
Understanding lookup deeply means understanding how hash tables fulfill their O(1) promise, and when that promise might not hold.
By the end of this page, you will be able to trace hash table lookups step by step, understand how the table distinguishes 'found' from 'not found', handle collision scenarios during search, and recognize when lookup performance degrades from O(1).
Hash table lookup follows a predictable sequence that closely mirrors insertion, with different logic at the end:
The Universal Lookup Algorithm:
LOOKUP(table, key):
1. HASH: Compute hash_code = hash(key)
2. INDEX: Compute bucket_index = hash_code mod table.capacity
3. LOCATE: Navigate to table.buckets[bucket_index]
4. SEARCH: Check if key exists at this location
5. RETURN: Return value if found, or indicate not found
Steps 1-3 are identical to insertion. The difference lies in step 4's termination conditions and step 5's possible outcomes.
Why Lookup Is Symmetric to Insertion:
The hash function's determinism guarantees that any key always maps to the same bucket. If we inserted (key, value) at bucket i, then looking up key will also navigate to bucket i. The search process that finds where to insert a new key is the same process that finds an existing key.
Lookup is a read-only operation—it never modifies the table. This means lookups can safely occur concurrently in multi-threaded environments (if no writes are happening). Some thread-safe implementations exploit this by allowing lock-free reads.
The first three steps of lookup are mechanically identical to insertion. We compute the same hash, derive the same index, and navigate to the same bucket.
Step 1: Hash the Key
Compute hash_code = hash(key). This must produce the exact same value as when the key was inserted—hash functions must be deterministic.
Important: If the key is mutable and has changed since insertion, its hash will differ, and lookup will fail even though the data exists in the table. This is why mutable objects shouldn't be hash table keys.
Step 2: Compute the Index
Compute index = hash_code % capacity. This gives us the bucket where the key should be located.
Note that the table's capacity during lookup must use the current capacity, which may have changed due to rehashing since insertion. But rehashing moves entries to their correct buckets in the new table, so this works correctly.
Step 3: Navigate to the Bucket
Access table.buckets[index] in O(1) time. At this point, we're at the right bucket—now we need to find if our key is there.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
def trace_lookup_navigation(table, key): """ Trace steps 1-3 of hash table lookup. """ print(f"\n{'='*60}") print(f"LOOKUP TRACE: Steps 1-3 for key {repr(key)}") print(f"{'='*60}") # ======================================== # STEP 1: Hash the key # ======================================== print("\n[STEP 1] HASH THE KEY") print("-" * 40) hash_code = hash(key) print(f"hash({repr(key)}) = {hash_code}") # Verify determinism assert hash(key) == hash_code, "Hash function is not deterministic!" print(f"Determinism verified: same key → same hash ✓") # ======================================== # STEP 2: Compute bucket index # ======================================== print("\n[STEP 2] COMPUTE INDEX") print("-" * 40) index = hash_code % table.capacity print(f"Table capacity: {table.capacity}") print(f"Index = hash_code % capacity") print(f"Index = {hash_code} % {table.capacity} = {index}") # ======================================== # STEP 3: Navigate to bucket # ======================================== print("\n[STEP 3] NAVIGATE TO BUCKET") print("-" * 40) bucket = table.buckets[index] print(f"Accessing table.buckets[{index}]") print(f"Operation: O(1) array access") if bucket is None: print(f"Bucket is EMPTY") print(f"Early termination possible: key definitely not in table") else: print(f"Bucket is OCCUPIED") print(f"Must search bucket contents for the key") return index, bucket, hash_code # Example: Looking up a keyclass MockTable: def __init__(self): self.capacity = 16 self.buckets = [None] * 16 # Simulate some data self.buckets[5] = "chain_node_placeholder" table = MockTable()trace_lookup_navigation(table, "alice")Steps 1-3 are O(1) and bring us directly to the right bucket. Without hashing, finding a key would require O(n) linear search through all entries. Hashing reduces the search space from 'entire table' to 'single bucket' in constant time.
Once we're at the correct bucket, we must search for the key. This is where the collision resolution strategy matters most. The search logic determines:
Let's examine search behavior for both major strategies.
In chaining, search means traversing the linked list at the bucket:
SEARCH_CHAIN(bucket, key, hash_code):
current = bucket.head
while current is not None:
if current.hash_code == hash_code: // Fast check
if current.key == key: // Full comparison
return current.value // FOUND!
current = current.next
return NOT_FOUND
Termination Conditions:
current.key == key → return the valuecurrent == None (end of chain) → return not foundOptimization: Hash Comparison First
Comparing hash codes (integers) is faster than comparing keys (potentially large strings or objects). By checking hash_code == candidate.hash_code first, we quickly skip non-matching entries. Only when hashes match do we perform the more expensive key comparison.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
from typing import Optional, Tuple class ChainNode: def __init__(self, key, value, hash_code: int): self.key = key self.value = value self.hash_code = hash_code self.next: Optional['ChainNode'] = None def lookup_chaining(bucket_head: Optional[ChainNode], key, target_hash: int) -> Tuple[bool, Optional[any], int]: """ Search a chained bucket for a key. Returns: (found: bool, value: Optional, comparisons: int) """ print(f"\n[STEP 4] SEARCHING CHAIN") print("-" * 40) current = bucket_head comparisons = 0 if current is None: print(f"Chain is empty → Key not in table") return False, None, 0 while current is not None: comparisons += 1 print(f"\nComparison #{comparisons}:") print(f" Candidate key: {repr(current.key)}") # Fast path: compare hash codes first if current.hash_code == target_hash: print(f" Hash codes match: {target_hash}") # Slow path: compare actual keys if current.key == key: print(f" Keys match: FOUND!") print(f" Value: {current.value}") return True, current.value, comparisons else: print(f" Hash collision: keys differ despite same hash") else: print(f" Hash codes differ: {current.hash_code} ≠ {target_hash}") current = current.next print(f"\nReached end of chain after {comparisons} comparisons") print(f"Key not found in table") return False, None, comparisons # Example: Chain with 3 nodes# alice -> bob -> charliealice = ChainNode("alice", 100, hash("alice"))bob = ChainNode("bob", 200, hash("bob"))charlie = ChainNode("charlie", 300, hash("charlie"))alice.next = bobbob.next = charlie # Search for "bob" (found after 2 comparisons)found, value, comps = lookup_chaining(alice, "bob", hash("bob")) # Search for "david" (not found after 3 comparisons)found, value, comps = lookup_chaining(alice, "david", hash("david"))The search concludes with one of two outcomes: the key is found, or it's not. Different programming languages and APIs handle these outcomes differently.
When Key Is Found:
The lookup returns the associated value. This is the happy path—the hash table delivers on its O(1) promise.
When Key Is Not Found:
The lookup must signal that the key doesn't exist. Common approaches:
| Language | Direct Access | Safe Access | Default Value |
|---|---|---|---|
| Python | dict[key] → KeyError | dict.get(key) → None | dict.get(key, default) |
| JavaScript | obj[key] → undefined | obj[key] → undefined | obj[key] ?? default |
| Java | map.get(key) → null | map.containsKey(key) | map.getOrDefault(key, default) |
| C++ | map[key] → creates entry! | map.find(key) | map.count(key) > 0 |
| Go | m[key] → zero value | v, ok := m[key] | v, ok := m[key] |
| Rust | — (no direct access) | map.get(&key) → Option<V> | map.get(&key).unwrap_or(default) |
In C++, accessing std::map[key] for a non-existent key creates a new entry with a default value. This is a common source of bugs. Use map.find(key) or map.count(key) for safe lookup without side effects.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
from typing import TypeVar, Optional, Genericfrom dataclasses import dataclass K = TypeVar('K')V = TypeVar('V') @dataclassclass LookupResult(Generic[V]): """ Explicit result type for hash table lookup. Avoids ambiguity between 'not found' and 'found with None value'. """ found: bool value: Optional[V] probes: int # For performance analysis @classmethod def success(cls, value: V, probes: int) -> 'LookupResult[V]': return cls(found=True, value=value, probes=probes) @classmethod def not_found(cls, probes: int) -> 'LookupResult[V]': return cls(found=False, value=None, probes=probes) class HashTable(Generic[K, V]): """Hash table with explicit lookup result handling.""" def lookup(self, key: K) -> LookupResult[V]: """ Look up a key and return explicit result. """ hash_code = hash(key) index = hash_code % self.capacity bucket = self.buckets[index] probes = 0 current = bucket while current is not None: probes += 1 if current.hash_code == hash_code and current.key == key: return LookupResult.success(current.value, probes) current = current.next return LookupResult.not_found(probes) # Python dict-style methods def __getitem__(self, key: K) -> V: """dict[key] - raises KeyError if not found.""" result = self.lookup(key) if not result.found: raise KeyError(key) return result.value def get(self, key: K, default: Optional[V] = None) -> Optional[V]: """dict.get(key, default) - returns default if not found.""" result = self.lookup(key) return result.value if result.found else default def __contains__(self, key: K) -> bool: """key in dict - membership test.""" return self.lookup(key).found def setdefault(self, key: K, default: V) -> V: """Return value if exists, else insert and return default.""" result = self.lookup(key) if result.found: return result.value else: self.insert(key, default) return default # Usage patternsdef demonstrate_lookup_patterns(): ht = HashTable() ht.insert("name", "Alice") ht.insert("age", 30) # Pattern 1: Direct access (raises exception) try: phone = ht["phone"] except KeyError: print("KeyError: 'phone' not in table") # Pattern 2: Safe access with default phone = ht.get("phone", "N/A") print(f"Phone: {phone}") # "N/A" # Pattern 3: Check then access if "name" in ht: print(f"Name: {ht['name']}") # Pattern 4: Explicit result result = ht.lookup("email") if result.found: print(f"Email: {result.value}") else: print(f"Email not found (checked {result.probes} locations)")Let's trace complete lookup operations through all steps, seeing how each scenario plays out from start to finish.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
def complete_lookup_trace(table, key): """ Trace every step of hash table lookup. """ print("\n" + "=" * 70) print(f"COMPLETE LOOKUP TRACE: key = {repr(key)}") print("=" * 70) # ======================================== # STEP 1: Hash the key # ======================================== print("\n[STEP 1] HASH THE KEY") print("-" * 50) hash_code = hash(key) print(f"hash({repr(key)}) = {hash_code}") # ======================================== # STEP 2: Compute bucket index # ======================================== print("\n[STEP 2] COMPUTE INDEX") print("-" * 50) index = hash_code % table.capacity print(f"index = {hash_code} % {table.capacity} = {index}") # ======================================== # STEP 3: Navigate to bucket # ======================================== print("\n[STEP 3] NAVIGATE TO BUCKET") print("-" * 50) bucket = table.buckets[index] print(f"Accessing table.buckets[{index}]") if bucket is None: print(f"Bucket is EMPTY") print("\n[STEP 4] SEARCH: Skipped (bucket empty)") print("\n[STEP 5] RESULT: NOT FOUND") print("\n" + "=" * 70) print("LOOKUP COMPLETE: Key not in table") print(f"Total comparisons: 0") print("=" * 70) return None, 0 print(f"Bucket contains data - must search") # ======================================== # STEP 4: Search the bucket # ======================================== print("\n[STEP 4] SEARCH THE BUCKET") print("-" * 50) current = bucket comparisons = 0 while current is not None: comparisons += 1 print(f"\n Comparison #{comparisons}:") print(f" Examining: key={repr(current.key)}, hash={current.hash_code}") # Check hash first if current.hash_code == hash_code: print(f" Hash codes MATCH") # Check key if current.key == key: print(f" Keys MATCH → FOUND!") # ======================================== # STEP 5: Return result # ======================================== print("\n[STEP 5] RETURN RESULT") print("-" * 50) print(f"Value: {current.value}") print("\n" + "=" * 70) print("LOOKUP COMPLETE: Key found") print(f"Total comparisons: {comparisons}") print(f"Time complexity: O(1) achieved" if comparisons <= 2 else f"Time complexity: O({comparisons}) - degraded due to collisions") print("=" * 70) return current.value, comparisons else: print(f" Keys DIFFER (hash collision)") else: print(f" Hash codes differ, skip") current = current.next # ======================================== # STEP 5: Return not found # ======================================== print("\n[STEP 5] RETURN RESULT") print("-" * 50) print(f"Reached end of chain") print(f"Key not in table") print("\n" + "=" * 70) print("LOOKUP COMPLETE: Key NOT found") print(f"Total comparisons: {comparisons}") print("=" * 70) return None, comparisons # Test scenariosclass TestTable: def __init__(self): self.capacity = 8 self.buckets = [None] * 8 # Set up a chain at bucket 3: alice -> bob -> charlie alice = ChainNode("alice", 100, hash("alice")) bob = ChainNode("bob", 200, hash("bob")) charlie = ChainNode("charlie", 300, hash("charlie")) alice.next = bob bob.next = charlie # Assume all hash to bucket 3 for demonstration self.buckets[3] = alice # Scenario 1: Key found immediately# complete_lookup_trace(table, "alice") # 1 comparison # Scenario 2: Key found after traversal# complete_lookup_trace(table, "charlie") # 3 comparisons # Scenario 3: Key not found# complete_lookup_trace(table, "david") # 3 comparisons, not found # Scenario 4: Empty bucket# complete_lookup_trace(table, "empty_bucket_key") # 0 comparisonsUnderstanding when lookup achieves O(1) and when it degrades is crucial for using hash tables effectively.
Best Case: O(1)
Lookup is O(1) when:
Average Case: O(1 + α) where α is load factor
With a good hash function and reasonable load factor:
For load factor 0.5, average probes ≈ 2. For 0.75, average probes ≈ 4.
Worst Case: O(n)
Lookup degrades to O(n) when:
| Scenario | Chaining Cost | Open Addressing Cost | Mitigation |
|---|---|---|---|
| Key in first position | O(1) | O(1) | Already optimal |
| Key after k collisions | O(k) | O(k) | Keep load factor low |
| Key not in table | O(chain length) | O(cluster size) | Good hash function |
| All keys same bucket | O(n) | O(n) | Use randomized or cryptographic hash |
| Table 90% full (open) | N/A | O(10) probes avg | Rehash earlier (75% threshold) |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
import randomimport timefrom typing import List, Tuple def analyze_lookup_performance(): """ Measure and visualize lookup performance under different conditions. """ def measure_lookups(table, keys: List[str], existing_keys: List[str]) -> Tuple[float, float]: """ Measure average time for successful and unsuccessful lookups. """ # Successful lookups start = time.perf_counter() for key in existing_keys[:1000]: _ = table.get(key) success_time = (time.perf_counter() - start) / 1000 # Unsuccessful lookups (keys not in table) missing_keys = [f"missing_{i}" for i in range(1000)] start = time.perf_counter() for key in missing_keys: _ = table.get(key) failure_time = (time.perf_counter() - start) / 1000 return success_time, failure_time print("=" * 60) print("HASH TABLE LOOKUP PERFORMANCE ANALYSIS") print("=" * 60) # Test with different load factors for load_factor in [0.25, 0.50, 0.75, 0.90]: capacity = 10000 n_entries = int(capacity * load_factor) table = dict() # Python's dict keys = [f"key_{i}" for i in range(n_entries)] # Insert entries for key in keys: table[key] = random.randint(1, 1000) # Measure success_time, failure_time = measure_lookups(table, keys, keys) print(f"\nLoad factor: {load_factor:.0%}") print(f" Entries: {n_entries:,}") print(f" Successful lookup: {success_time*1e6:.2f} µs avg") print(f" Unsuccessful lookup: {failure_time*1e6:.2f} µs avg") # Test pathological case: poor hash function print("\n" + "-" * 60) print("PATHOLOGICAL CASE: All keys hash to same bucket") print("-" * 60) class BadHash: """Object with terrible hash function.""" def __init__(self, value): self.value = value def __hash__(self): return 42 # All objects hash to same value! def __eq__(self, other): return isinstance(other, BadHash) and self.value == other.value # This would create O(n) lookups! bad_keys = [BadHash(i) for i in range(1000)] # Lookup time would grow linearly with table size def expected_probes_formula(): """ Show the mathematical relationship between load factor and probes. """ print("\n" + "=" * 60) print("EXPECTED PROBES BY LOAD FACTOR (Open Addressing)") print("=" * 60) print("\nSuccessful search: E[probes] = (1/α) * ln(1/(1-α))") print("Unsuccessful search: E[probes] = 1/(1-α)") print("\n{:<15} {:>20} {:>20}".format( "Load Factor", "Successful", "Unsuccessful" )) print("-" * 55) import math for alpha in [0.10, 0.25, 0.50, 0.75, 0.90, 0.95]: success_probes = (1/alpha) * math.log(1/(1-alpha)) failure_probes = 1 / (1 - alpha) print("{:<15} {:>20.2f} {:>20.2f}".format( f"{alpha:.0%}", success_probes, failure_probes )) # Output:# Load Factor Successful Unsuccessful# -------------------------------------------------------# 10% 1.05 1.11# 25% 1.15 1.33# 50% 1.39 2.00# 75% 1.85 4.00# 90% 2.56 10.00# 95% 3.15 20.00At 75% load factor, unsuccessful lookup probes about 4 slots on average. At 90%, it jumps to 10. At 95%, it becomes 20. This exponential growth explains why hash tables rehash well before reaching capacity—the performance cliff is steep.
Real-world hash table implementations include numerous optimizations to make lookup even faster. Understanding these helps you choose implementations and tune performance.
Optimization 1: Hash Caching
Store the hash code alongside each entry. Comparing integers (hashes) is faster than comparing strings (keys). Only perform the expensive key comparison when hashes match.
Optimization 2: Linear Probing Clustering Avoidance
Quadratic probing or double hashing spread probes more evenly, reducing primary clustering in open addressing.
Optimization 3: Robin Hood Hashing
During insertion, if the new key has probed further than the existing key at a slot, swap them. This bounds the maximum probe distance and makes lookup more predictable.
Optimization 4: SIMD Probing
Modern implementations (like Swiss Tables in Abseil) use SIMD instructions to check multiple slots in parallel during open addressing lookup.
Optimization 5: Bucket Trees (Java 8+)
When chains grow long (typically 8+ entries), Java HashMap converts them to balanced trees, guaranteeing O(log n) worst case instead of O(n).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
class OptimizedHashTable: """ Hash table with optimized lookup and various access patterns. """ def __init__(self, capacity: int = 16): self.capacity = capacity self.buckets = [None] * capacity self.size = 0 # ======================== # CORE LOOKUP (optimized) # ======================== def _lookup_internal(self, key) -> tuple: """ Internal lookup with hash caching optimization. Returns (found, value, bucket_index, node) for flexibility. """ hash_code = hash(key) index = hash_code % self.capacity current = self.buckets[index] while current is not None: # OPTIMIZATION: Compare cached hash first (fast int comparison) if current.hash_code == hash_code: # Only then compare actual keys (potentially slow) if current.key == key: return True, current.value, index, current current = current.next return False, None, index, None # ======================== # LOOKUP VARIANTS # ======================== def get(self, key, default=None): """Return value or default.""" found, value, _, _ = self._lookup_internal(key) return value if found else default def __getitem__(self, key): """Direct access - raises KeyError if missing.""" found, value, _, _ = self._lookup_internal(key) if not found: raise KeyError(key) return value def __contains__(self, key) -> bool: """key in table - membership test.""" found, _, _, _ = self._lookup_internal(key) return found def setdefault(self, key, default=None): """ If key exists, return its value. Otherwise, insert default and return it. """ found, value, index, _ = self._lookup_internal(key) if found: return value # Insert the default self._insert_at(key, default, index) return default def pop(self, key, default=_MISSING): """ Remove and return value. If key missing and default provided, return default. If key missing and no default, raise KeyError. """ found, value, index, node = self._lookup_internal(key) if found: self._remove_node(index, node) return value elif default is not _MISSING: return default else: raise KeyError(key) def items(self): """ Iterate over all key-value pairs. Note: This is O(n), not O(1)! """ for bucket in self.buckets: current = bucket while current is not None: yield (current.key, current.value) current = current.next _MISSING = object() # Sentinel for distinguishing None from missing # Usage demonstrationdef demonstrate_variants(): ht = OptimizedHashTable() ht["a"] = 1 ht["b"] = 2 # Various access patterns print(ht.get("a")) # 1 print(ht.get("c")) # None print(ht.get("c", "default")) # "default" print("a" in ht) # True print("c" in ht) # False # setdefault pattern (useful for grouping) groups = OptimizedHashTable() for item, category in [("apple", "fruit"), ("carrot", "veg"), ("banana", "fruit")]: # Creates list if category doesn't exist, appends to it items = groups.setdefault(category, []) items.append(item) # Result: {"fruit": ["apple", "banana"], "veg": ["carrot"]}We've thoroughly explored the hash table lookup operation—the core retrieval mechanism that makes hash tables so valuable. Let's consolidate the key insights:
Module Complete:
You've now mastered the four core topics of hash table structure and mechanism:
With this knowledge, you can trace hash table operations step by step, predict performance characteristics, and make informed decisions about when and how to use hash tables in your systems.
Congratulations! You now have a deep understanding of hash table structure and mechanism. You can trace insertions and lookups, understand collision handling, predict performance, and reason about the internal state of hash tables. This foundation prepares you for the next modules covering collision handling strategies in depth.