Loading learning content...
We've now explored two fundamentally different approaches to handling hash collisions: chaining (separate chaining with linked lists) and open addressing (probing within the table itself). Both achieve the goal of resolving collisions, but they embody different philosophies and excel in different scenarios.
This page provides a comprehensive comparison across every dimension that matters in practice: memory usage, cache performance, deletion complexity, worst-case behavior, and implementation considerations. By the end, you'll understand not just how each approach works, but when to choose each one — and why production systems make the choices they do.
By the end of this page, you will understand:
• The fundamental philosophical differences between chaining and open addressing • Memory overhead comparison and when each is more efficient • Cache performance analysis and its profound practical impact • Deletion complexity and operational differences • Load factor tolerance and resize behavior • Worst-case scenarios and how to mitigate them • Decision framework for choosing between strategies • What real-world systems actually use and why
Before diving into metrics, let's understand the core philosophical difference between these approaches:
Chaining says: "When collisions happen, store multiple elements at each bucket using auxiliary data structures. The table is an array of containers."
Open addressing says: "The table is THE data structure. When collisions happen, find another slot within the table itself. No auxiliary structures needed."
These philosophical differences cascade into practical implications for every operation and design decision. Let's examine each dimension systematically.
Memory efficiency depends on load factor, key/value sizes, and platform-specific overheads. Let's break this down precisely.
Chaining Memory Model:
For a table with m buckets and n elements:
Table array: m × sizeof(pointer) = 8m bytes (64-bit)
Linked list nodes: n × (sizeof(key) + sizeof(value) + sizeof(next_ptr))
= n × (K + V + 8) bytes
Total: 8m + n(K + V + 8) bytes
The pointer overhead is significant: each node carries an 8-byte next pointer, and each bucket carries a head pointer.
Open Addressing Memory Model:
For a table with m slots and n elements:
Table array: m × (sizeof(key) + sizeof(value) + sizeof(metadata))
= m × (K + V + 1) bytes [1 byte for empty/tombstone state]
Total: m(K + V + 1) bytes
No per-element pointer overhead, but empty slots consume full key+value space.
Comparative Analysis:
Let's compare for realistic scenarios (K=8 bytes, V=8 bytes):
Chaining:
Open Addressing:
| Load Factor (α) | Chaining | Open Addressing | Winner |
|---|---|---|---|
| 0.25 | ~24 bytes + 32 empty slots/n | 68 bytes | Chaining |
| 0.50 | 24 bytes + 16 empty slots/n | 34 bytes | Chaining |
| 0.75 | 24 bytes + 10.7 empty slots/n | 22.7 bytes | Tie |
| 1.00 | 24 bytes | 17 bytes | Open Addressing |
| 2.00 | 24 bytes (chains full) | Impossible | Open Addressing* |
At low load factors, chaining wins because empty buckets are just pointers (8 bytes), while open addressing wastes full slot sizes (17+ bytes) on empty slots.
At high load factors, open addressing wins because it has no per-element pointer overhead.
The crossover is typically around α = 0.6-0.8, depending on key/value sizes.
Memory Allocation Considerations:
Beyond raw byte counts, memory allocation patterns differ significantly:
Chaining:
Open Addressing:
For embedded systems or real-time applications, open addressing's predictable allocation is often decisive.
Modern CPUs are so fast that memory access is often the bottleneck. Cache behavior can outweigh algorithmic complexity for hash tables. This is where open addressing — particularly linear probing — shines.
Understanding Cache Behavior:
CPU caches are organized in cache lines (typically 64 bytes on modern x86/ARM). When you access a memory location, the entire cache line is loaded. Accessing nearby locations within the same cache line is essentially free.
Chaining memory access pattern:
Search for key K:
1. Load table[hash(K)] → Cache miss (load bucket pointer)
2. Load head→key → Cache miss (node elsewhere in heap)
3. Compare key → Likely cache miss (key may be pointer)
4. If mismatch, load head→next → Cache miss (next node)
5. Repeat until found or end
Each pointer chase is likely a cache miss. For a chain of length L, expect ~L cache misses.
Open addressing (linear probing) access pattern:
Search for key K:
1. Load table[hash(K)] → Cache miss (load first slot)
2. Compare key → Cache hit (key in same cache line)
3. If mismatch, check table[hash(K)+1] → Cache hit! (same cache line)
4. Continue until found/empty → Many consecutive hits
With 64-byte cache lines and 17-byte slots, we get ~3-4 slots per cache line. Linear probing clusters accesses spatially, achieving high cache hit rates.
| Metric | Chaining | Linear Probing | Double Hashing |
|---|---|---|---|
| Cache misses per probe | ~1 | ~0.25 | ~1 |
| Slots per cache line | N/A | 3-4 | 3-4 |
| Memory access pattern | Random (pointers) | Sequential | Random (jumps) |
| Prefetcher effectiveness | Poor | Excellent | Poor |
| TLB pressure | High | Low | Moderate |
A stunning result from practical benchmarks:
Linear probing with α=0.9 (50+ expected probes theoretically) often outperforms double hashing with α=0.5 (2 expected probes) in wall-clock time.
Why? Those 50 linear probes might only cause 12-15 cache misses, while 2 double hashing probes cause 2 cache misses. At ~100ns per cache miss, the difference is massive:
Wait, that suggests double hashing wins? The reality is more complex — at lower load factors (where real systems operate), linear probing's cache advantage dominates.
SIMD and Modern Optimizations:
Modern hash table implementations exploit SIMD instructions to process multiple slots in parallel:
// Swiss Table approach
1. Load 16 metadata bytes (one cache line)
2. Use SIMD to compare all 16 slots' hash fragments simultaneously
3. Check matches with full key comparison
This turns 16 sequential probes into 1-2 cache accesses plus a few CPU cycles. Libraries like Google's Abseil and Facebook's F14 achieve remarkable performance using these techniques — all built on open addressing foundations.
Let's compare the theoretical and practical complexity of core operations:
| Operation | Chaining (Avg) | Chaining (Worst) | Open Address (Avg) | Open Address (Worst) |
|---|---|---|---|---|
| Insert | O(1) | O(n)* | O(1/(1-α)) | O(n) |
| Search (successful) | O(1 + α) | O(n) | O((1/α)ln(1/(1-α))) | O(n) |
| Search (unsuccessful) | O(1 + α) | O(n) | O(1/(1-α)) | O(n) |
| Delete | O(1 + α) | O(n) | O(1) or O(cluster) | O(n) |
| Iteration | O(n + m) | O(n + m) | O(m) | O(m) |
*Chaining insert is O(1) if we insert at head. O(n) if we check for duplicates or insert at tail.
Key observations:
Average case is similar: Both achieve O(1) average for well-distributed keys
Load factor sensitivity differs: Open addressing degrades rapidly past α ≈ 0.7; chaining handles α > 1 gracefully
Worst case is identical: Both degrade to O(n) under pathological hash collisions
Iteration differs: Chaining must visit empty buckets; open addressing visits all slots but often no pointer chasing
Deletion Complexity Deep Dive:
Deletion deserves special attention as it's the operation where strategies differ most:
Chaining deletion:
Open addressing deletion:
For workloads with frequent deletions, chaining's simpler model is often preferable.
For interactive systems, worst-case latency often matters more than average. Both strategies can hit O(n) worst case, but the triggers differ:
Chaining worst case: Many keys hash to same bucket (adversarial or poor hash)
Open addressing worst case: Table nearly full OR many tombstones OR adversarial clustering
Open addressing has more failure modes, but they're all preventable with proper load factor management.
Load factor tolerance is perhaps the starkest difference between strategies:
Chaining:
Open Addressing:
| Load Factor | Chaining (probes) | Open Addr (probes) | Practical Viability |
|---|---|---|---|
| 0.25 | 1.25 | 1.33 | Both excellent |
| 0.50 | 1.50 | 2.00 | Both good |
| 0.75 | 1.75 | 4.00 | Chaining good, OA acceptable |
| 1.00 | 2.00 | ∞ (full) | Chaining good, OA impossible |
| 2.00 | 3.00 | N/A | Chaining acceptable |
| 5.00 | 6.00 | N/A | Chaining slow but works |
Resize Behavior:
Chaining resize:
Open addressing resize:
Open addressing gets a valuable bonus from resize: tombstone cleanup. A table with 1000 elements and 5000 tombstones may seem to need 6000 slots, but after resize it only needs ~1500 slots (at α=0.67).
This is why triggering resize based on (elements + tombstones) effectively manages tombstone accumulation.
Memory Overhead During Resize:
Both strategies need memory for old and new tables simultaneously during resize. For applications with tight memory budgets:
Chaining: Old table = m × 8 bytes, New table = 2m × 8 bytes Total: 24m bytes during resize → 16m after (33% temporary overhead)
Open addressing: Old table = m × S bytes, New table = 2m × S bytes Total: 3mS bytes during resize → 2mS after (50% temporary overhead)
Open addressing's temporary memory spike is proportionally larger because its per-slot overhead is higher.
Let's synthesize everything into a practical decision framework:
Choose Chaining When:
Choose Open Addressing When:
| Workload | Recommended Strategy | Reasoning |
|---|---|---|
| Database index (read-heavy) | Open addressing | Cache performance critical |
| Session store (high churn) | Chaining | Frequent deletes, variable load |
| Compiler symbol table | Chaining | Never delete, need iteration |
| DNS cache | Open addressing | Read-heavy, known max size |
| Shopping cart | Chaining | Frequent add/remove |
| Configuration lookup | Open addressing | Small, read-only after init |
| Graph adjacency lists | Chaining | Variable degree, frequent updates |
If you're unsure, the modern default is open addressing with linear probing:
• Most language standard libraries now use open addressing (Python dict, Rust HashMap, C++ unordered_map implementations) • SIMD-optimized implementations like SwissTable have made it the performance leader • Memory efficiency benefits often outweigh deletion complexity • Rehash-based tombstone cleanup is well-understood
Chaining remains valuable for specialized cases, but it's no longer the universal default it once was.
Let's see what major systems and languages have chosen:
| System/Language | Strategy | Notable Features |
|---|---|---|
| Python dict (3.6+) | Open addressing | Compact dict, ordered, 1/3 empty slots |
| Java HashMap | Chaining→Tree | Chains become red-black trees at length 8 |
| C++ std::unordered_map | Chaining | Standard requires chaining semantics |
| Rust HashMap | Open addressing | SwissTable-based, SIMD probing |
| Go map | Chaining (buckets) | 8-element buckets with overflow chaining |
| Google Abseil | Open addressing | SwissTable, flat/node variants |
| Facebook F14 | Open addressing | SIMD, 14 slots per group |
| Redis dict | Chaining | Incremental rehash, two tables |
| PostgreSQL | Chaining | Simple, robust, well-understood |
Notable Trends:
Newer implementations favor open addressing: The shift toward cache-optimized, SIMD-friendly designs has made open addressing dominant in high-performance contexts.
Hybrid approaches are emerging: Java's HashMap uses chaining but converts long chains to trees. Go uses small buckets (open addressing within bucket) with chaining between buckets.
SwissTable is the new standard: Google's SwissTable design (used in Abseil, Rust, and others) combines open addressing with clever metadata encoding for exceptional performance.
Chaining persists where simplicity matters: Database systems, older codebases, and applications prioritizing correctness over raw speed often stick with chaining.
Google's SwissTable (2017) demonstrated that open addressing can be 2-3× faster than traditional implementations through:
• Metadata arrays: Store hash fragments separately for SIMD comparison • Group probing: Probe 16 slots at once using SSE/AVX instructions • Tombstone optimization: Special encoding reduces tombstone impact • Memory layout: Optimal alignment for cache lines
This design has been adopted by Rust, Facebook (F14), and many modern C++ codebases. It represents the current state-of-the-art for hash table performance.
We've completed a comprehensive comparison of chaining and open addressing. Let's consolidate the essential insights:
Module Complete:
You've now mastered collision handling with open addressing. You understand the probing strategies (linear, quadratic, double hashing), how search navigates probe sequences, the challenges of deletion and tombstone management, and when to choose open addressing over chaining.
With this knowledge, you can make informed decisions about hash table implementations, understand the tradeoffs in library choices, and reason about hash table performance in your systems.
Congratulations! You've completed Module 6: Collision Handling — Open Addressing. You now understand probing strategies (linear, quadratic, double hashing), search operations, deletion challenges with tombstones, and can make informed decisions between open addressing and chaining. You're equipped to understand and optimize hash table performance in real-world systems.