Data Structures & AlgorithmsCollision Handling — Open Addressing

Collision Handling — Open Addressing

LevelIntermediate

Duration60 mins

TopicCollision Handling — Open Addressing

4 / 4

Comparison with Chaining

Two Philosophies of Collision Resolution

We've now explored two fundamentally different approaches to handling hash collisions: chaining (separate chaining with linked lists) and open addressing (probing within the table itself). Both achieve the goal of resolving collisions, but they embody different philosophies and excel in different scenarios.

This page provides a comprehensive comparison across every dimension that matters in practice: memory usage, cache performance, deletion complexity, worst-case behavior, and implementation considerations. By the end, you'll understand not just how each approach works, but when to choose each one — and why production systems make the choices they do.

What You Will Learn

By the end of this page, you will understand:

• The fundamental philosophical differences between chaining and open addressing • Memory overhead comparison and when each is more efficient • Cache performance analysis and its profound practical impact • Deletion complexity and operational differences • Load factor tolerance and resize behavior • Worst-case scenarios and how to mitigate them • Decision framework for choosing between strategies • What real-world systems actually use and why

Fundamental Philosophical Differences

Before diving into metrics, let's understand the core philosophical difference between these approaches:

Chaining says: "When collisions happen, store multiple elements at each bucket using auxiliary data structures. The table is an array of containers."

Open addressing says: "The table is THE data structure. When collisions happen, find another slot within the table itself. No auxiliary structures needed."

Chaining Philosophy

•Localized collisions: All elements with same hash are grouped together
•Unbounded capacity: Can store any number of elements (chains grow)
•Pointer-based: Relies on dynamic memory allocation
•Independent buckets: Operations on different buckets don't interfere
•Simple deletion: Remove node from linked list

Open Addressing Philosophy

•Distributed collisions: Colliding elements spread across table
•Bounded capacity: At most one element per slot; table can fill up
•Pointer-free: All data in one contiguous array
•Interdependent slots: Probe sequences cross the entire table
•Complex deletion: Requires tombstones or shifting

These philosophical differences cascade into practical implications for every operation and design decision. Let's examine each dimension systematically.

Memory Usage — A Nuanced Comparison

Memory efficiency depends on load factor, key/value sizes, and platform-specific overheads. Let's break this down precisely.

Chaining Memory Model:

For a table with m buckets and n elements:

Table array:     m × sizeof(pointer) = 8m bytes (64-bit)
Linked list nodes: n × (sizeof(key) + sizeof(value) + sizeof(next_ptr))
                 = n × (K + V + 8) bytes

Total: 8m + n(K + V + 8) bytes

The pointer overhead is significant: each node carries an 8-byte next pointer, and each bucket carries a head pointer.

Open Addressing Memory Model:

For a table with m slots and n elements:

Table array: m × (sizeof(key) + sizeof(value) + sizeof(metadata))
           = m × (K + V + 1) bytes  [1 byte for empty/tombstone state]

Total: m(K + V + 1) bytes

No per-element pointer overhead, but empty slots consume full key+value space.

Comparative Analysis:

Let's compare for realistic scenarios (K=8 bytes, V=8 bytes):

Chaining:

Per element: 8 + 8 + 8 = 24 bytes
Per bucket (empty): 8 bytes
Overhead ratio at α=1: 8/24 = 33% (just pointers)

Open Addressing:

Per slot: 8 + 8 + 1 = 17 bytes
At α=0.5: n elements use 2n × 17 = 34 bytes per element
At α=0.75: n elements use 1.33n × 17 = 22.7 bytes per element

Memory Usage (bytes per active element) — K=8, V=8
Load Factor (α)	Chaining	Open Addressing	Winner
0.25	~24 bytes + 32 empty slots/n	68 bytes	Chaining
0.50	24 bytes + 16 empty slots/n	34 bytes	Chaining
0.75	24 bytes + 10.7 empty slots/n	22.7 bytes	Tie
1.00	24 bytes	17 bytes	Open Addressing
2.00	24 bytes (chains full)	Impossible	Open Addressing*

The Crossover Point

At low load factors, chaining wins because empty buckets are just pointers (8 bytes), while open addressing wastes full slot sizes (17+ bytes) on empty slots.

At high load factors, open addressing wins because it has no per-element pointer overhead.

The crossover is typically around α = 0.6-0.8, depending on key/value sizes.

Memory Allocation Considerations:

Beyond raw byte counts, memory allocation patterns differ significantly:

Chaining:

Requires dynamic allocation for each insert (new list node)
Allocation overhead (typically 8-24 bytes per allocation for metadata)
Fragmented memory: nodes scattered throughout heap
Can trigger garbage collection pressure (in GC languages)

Open Addressing:

Single contiguous allocation (the table array)
No per-insert allocation (until resize)
Predictable memory footprint
Resize requires one large reallocation

For embedded systems or real-time applications, open addressing's predictable allocation is often decisive.

Cache Performance — The Hidden Bottleneck

Modern CPUs are so fast that memory access is often the bottleneck. Cache behavior can outweigh algorithmic complexity for hash tables. This is where open addressing — particularly linear probing — shines.

Understanding Cache Behavior:

CPU caches are organized in cache lines (typically 64 bytes on modern x86/ARM). When you access a memory location, the entire cache line is loaded. Accessing nearby locations within the same cache line is essentially free.

Chaining memory access pattern:

Search for key K:
1. Load table[hash(K)]           → Cache miss (load bucket pointer)
2. Load head→key                  → Cache miss (node elsewhere in heap)
3. Compare key                    → Likely cache miss (key may be pointer)
4. If mismatch, load head→next   → Cache miss (next node)
5. Repeat until found or end

Each pointer chase is likely a cache miss. For a chain of length L, expect ~L cache misses.

Open addressing (linear probing) access pattern:

Search for key K:
1. Load table[hash(K)]           → Cache miss (load first slot)
2. Compare key                   → Cache hit (key in same cache line)
3. If mismatch, check table[hash(K)+1] → Cache hit! (same cache line)
4. Continue until found/empty   → Many consecutive hits

With 64-byte cache lines and 17-byte slots, we get ~3-4 slots per cache line. Linear probing clusters accesses spatially, achieving high cache hit rates.

Cache Performance Comparison (8-byte key, 8-byte value)
Metric	Chaining	Linear Probing	Double Hashing
Cache misses per probe	~1	~0.25	~1
Slots per cache line	N/A	3-4	3-4
Memory access pattern	Random (pointers)	Sequential	Random (jumps)
Prefetcher effectiveness	Poor	Excellent	Poor
TLB pressure	High	Low	Moderate

Cache Effects Dominate Theory

A stunning result from practical benchmarks:

Linear probing with α=0.9 (50+ expected probes theoretically) often outperforms double hashing with α=0.5 (2 expected probes) in wall-clock time.

Why? Those 50 linear probes might only cause 12-15 cache misses, while 2 double hashing probes cause 2 cache misses. At ~100ns per cache miss, the difference is massive:

Linear α=0.9: ~15 × 100ns = 1500ns
Double α=0.5: ~2 × 100ns = 200ns

Wait, that suggests double hashing wins? The reality is more complex — at lower load factors (where real systems operate), linear probing's cache advantage dominates.

SIMD and Modern Optimizations:

Modern hash table implementations exploit SIMD instructions to process multiple slots in parallel:

// Swiss Table approach
1. Load 16 metadata bytes (one cache line)
2. Use SIMD to compare all 16 slots' hash fragments simultaneously
3. Check matches with full key comparison

This turns 16 sequential probes into 1-2 cache accesses plus a few CPU cycles. Libraries like Google's Abseil and Facebook's F14 achieve remarkable performance using these techniques — all built on open addressing foundations.

Operation Complexity Comparison

Let's compare the theoretical and practical complexity of core operations:

Time Complexity Comparison
Operation	Chaining (Avg)	Chaining (Worst)	Open Address (Avg)	Open Address (Worst)
Insert	O(1)	O(n)*	O(1/(1-α))	O(n)
Search (successful)	O(1 + α)	O(n)	O((1/α)ln(1/(1-α)))	O(n)
Search (unsuccessful)	O(1 + α)	O(n)	O(1/(1-α))	O(n)
Delete	O(1 + α)	O(n)	O(1) or O(cluster)	O(n)
Iteration	O(n + m)	O(n + m)	O(m)	O(m)

*Chaining insert is O(1) if we insert at head. O(n) if we check for duplicates or insert at tail.

Key observations:

Average case is similar: Both achieve O(1) average for well-distributed keys
Load factor sensitivity differs: Open addressing degrades rapidly past α ≈ 0.7; chaining handles α > 1 gracefully
Worst case is identical: Both degrade to O(n) under pathological hash collisions
Iteration differs: Chaining must visit empty buckets; open addressing visits all slots but often no pointer chasing

Deletion Complexity Deep Dive:

Deletion deserves special attention as it's the operation where strategies differ most:

Chaining deletion:

Find element: O(chain length)
Remove from list: O(1) with proper pointers
No impact on other elements
No tombstones, no cleanup needed

Open addressing deletion:

Find element: O(probe length)
Mark as tombstone: O(1)
OR shift subsequent elements: O(cluster size)
May need periodic cleanup/rehash
Tombstones degrade future operations

For workloads with frequent deletions, chaining's simpler model is often preferable.

The Worst-Case Matters

For interactive systems, worst-case latency often matters more than average. Both strategies can hit O(n) worst case, but the triggers differ:

Chaining worst case: Many keys hash to same bucket (adversarial or poor hash)

Open addressing worst case: Table nearly full OR many tombstones OR adversarial clustering

Open addressing has more failure modes, but they're all preventable with proper load factor management.

Load Factor Tolerance and Resize Behavior

Load factor tolerance is perhaps the starkest difference between strategies:

Chaining:

α can exceed 1 (more elements than buckets)
Performance degrades gradually: O(1 + α)
Typical resize threshold: α = 2.0 to 3.0
Works fine with α = 10+ (just slower)

Open Addressing:

α cannot exceed 1 (one element per slot maximum)
Performance degrades rapidly: O(1/(1-α))
Typical resize threshold: α = 0.5 to 0.75
Becomes unusable past α ≈ 0.9

Performance at Various Load Factors
Load Factor	Chaining (probes)	Open Addr (probes)	Practical Viability
0.25	1.25	1.33	Both excellent
0.50	1.50	2.00	Both good
0.75	1.75	4.00	Chaining good, OA acceptable
1.00	2.00	∞ (full)	Chaining good, OA impossible
2.00	3.00	N/A	Chaining acceptable
5.00	6.00	N/A	Chaining slow but works

Resize Behavior:

Chaining resize:

Create new table with more buckets
For each element in old table, rehash and insert into new
Old list nodes can be reused (pointer surgery) OR reallocated
Time: O(n + m_old)
Memory during resize: old table + new table + possible node allocations

Open addressing resize:

Create new table (typically 2× size)
For each non-empty, non-tombstone slot, rehash and insert
Tombstones are discarded (free cleanup!)
Time: O(m_old) to scan + O(n) to insert
Memory during resize: old table + new table (no extra allocations)

Resize as Cleanup Opportunity

Open addressing gets a valuable bonus from resize: tombstone cleanup. A table with 1000 elements and 5000 tombstones may seem to need 6000 slots, but after resize it only needs ~1500 slots (at α=0.67).

This is why triggering resize based on (elements + tombstones) effectively manages tombstone accumulation.

Memory Overhead During Resize:

Both strategies need memory for old and new tables simultaneously during resize. For applications with tight memory budgets:

Chaining: Old table = m × 8 bytes, New table = 2m × 8 bytes Total: 24m bytes during resize → 16m after (33% temporary overhead)

Open addressing: Old table = m × S bytes, New table = 2m × S bytes Total: 3mS bytes during resize → 2mS after (50% temporary overhead)

Open addressing's temporary memory spike is proportionally larger because its per-slot overhead is higher.

Decision Framework — Making the Right Choice

Let's synthesize everything into a practical decision framework:

Choose Chaining When:

Deletion-heavy workloads: Chaining deletion is cleaner, no tombstones
Unpredictable sizing: Load factor > 1 is acceptable
Large values: Pointer overhead is small relative to value size
Stable, long-lived entries: Allocation overhead amortizes well
Need to iterate in insertion order: Easy with doubly-linked lists
Simple baseline implementation: Easier to implement correctly
Pointer-friendly environment: GC handles memory, pointers are cheap

Choose Open Addressing When:

Read-heavy workloads: Cache-friendly searches dominate
Memory-constrained systems: No per-element pointer overhead
Embedded/real-time: Predictable memory, no dynamic allocation per insert
Small key-value pairs: Slot overhead is acceptable
High-performance requirements: SIMD optimization opportunities
Infrequent deletions: Tombstone management is minimal
Known size bounds: Can size table appropriately upfront

Strategy Selection by Workload Type
Workload	Recommended Strategy	Reasoning
Database index (read-heavy)	Open addressing	Cache performance critical
Session store (high churn)	Chaining	Frequent deletes, variable load
Compiler symbol table	Chaining	Never delete, need iteration
DNS cache	Open addressing	Read-heavy, known max size
Shopping cart	Chaining	Frequent add/remove
Configuration lookup	Open addressing	Small, read-only after init
Graph adjacency lists	Chaining	Variable degree, frequent updates

The Modern Default

If you're unsure, the modern default is open addressing with linear probing:

• Most language standard libraries now use open addressing (Python dict, Rust HashMap, C++ unordered_map implementations) • SIMD-optimized implementations like SwissTable have made it the performance leader • Memory efficiency benefits often outweigh deletion complexity • Rehash-based tombstone cleanup is well-understood

Chaining remains valuable for specialized cases, but it's no longer the universal default it once was.

What Production Systems Actually Use

Let's see what major systems and languages have chosen:

Hash Table Implementations in Major Systems
System/Language	Strategy	Notable Features
Python dict (3.6+)	Open addressing	Compact dict, ordered, 1/3 empty slots
Java HashMap	Chaining→Tree	Chains become red-black trees at length 8
C++ std::unordered_map	Chaining	Standard requires chaining semantics
Rust HashMap	Open addressing	SwissTable-based, SIMD probing
Go map	Chaining (buckets)	8-element buckets with overflow chaining
Google Abseil	Open addressing	SwissTable, flat/node variants
Facebook F14	Open addressing	SIMD, 14 slots per group
Redis dict	Chaining	Incremental rehash, two tables
PostgreSQL	Chaining	Simple, robust, well-understood

Notable Trends:

Newer implementations favor open addressing: The shift toward cache-optimized, SIMD-friendly designs has made open addressing dominant in high-performance contexts.
Hybrid approaches are emerging: Java's HashMap uses chaining but converts long chains to trees. Go uses small buckets (open addressing within bucket) with chaining between buckets.
SwissTable is the new standard: Google's SwissTable design (used in Abseil, Rust, and others) combines open addressing with clever metadata encoding for exceptional performance.
Chaining persists where simplicity matters: Database systems, older codebases, and applications prioritizing correctness over raw speed often stick with chaining.

The SwissTable Revolution

Google's SwissTable (2017) demonstrated that open addressing can be 2-3× faster than traditional implementations through:

• Metadata arrays: Store hash fragments separately for SIMD comparison • Group probing: Probe 16 slots at once using SSE/AVX instructions • Tombstone optimization: Special encoding reduces tombstone impact • Memory layout: Optimal alignment for cache lines

This design has been adopted by Rust, Facebook (F14), and many modern C++ codebases. It represents the current state-of-the-art for hash table performance.

Summary: Choosing Your Collision Strategy

We've completed a comprehensive comparison of chaining and open addressing. Let's consolidate the essential insights:

Key Takeaways

•Chaining uses auxiliary structures; open addressing uses the table itself — This fundamental difference cascades through every aspect of behavior.
•Memory efficiency depends on load factor — Chaining wins at low α; open addressing wins at high α.
•Cache performance often dominates theory — Linear probing's cache-friendly behavior can outweigh its clustering disadvantage.
•Deletion is chaining's clearest advantage — No tombstones, no accumulation, no cleanup.
•Load factor tolerance differs dramatically — Chaining handles α > 1; open addressing fails past α ≈ 0.9.
•Modern implementations favor open addressing — SwissTable and similar designs have made it the performance leader.
•Neither is universally superior — The right choice depends on workload characteristics and constraints.

Module Complete:

You've now mastered collision handling with open addressing. You understand the probing strategies (linear, quadratic, double hashing), how search navigates probe sequences, the challenges of deletion and tombstone management, and when to choose open addressing over chaining.

With this knowledge, you can make informed decisions about hash table implementations, understand the tradeoffs in library choices, and reason about hash table performance in your systems.

Module Complete

Congratulations! You've completed Module 6: Collision Handling — Open Addressing. You now understand probing strategies (linear, quadratic, double hashing), search operations, deletion challenges with tombstones, and can make informed decisions between open addressing and chaining. You're equipped to understand and optimize hash table performance in real-world systems.

4 / 4

Loading learning content...

Data Structures & AlgorithmsCollision Handling — Open Addressing

Collision Handling — Open Addressing

LevelIntermediate

Duration60 mins

TopicCollision Handling — Open Addressing

4 / 4

Comparison with Chaining

Two Philosophies of Collision Resolution

What You Will Learn

By the end of this page, you will understand:

Fundamental Philosophical Differences

Before diving into metrics, let's understand the core philosophical difference between these approaches:

Chaining says: "When collisions happen, store multiple elements at each bucket using auxiliary data structures. The table is an array of containers."

Open addressing says: "The table is THE data structure. When collisions happen, find another slot within the table itself. No auxiliary structures needed."

Chaining Philosophy

•Localized collisions: All elements with same hash are grouped together
•Unbounded capacity: Can store any number of elements (chains grow)
•Pointer-based: Relies on dynamic memory allocation
•Independent buckets: Operations on different buckets don't interfere
•Simple deletion: Remove node from linked list

Open Addressing Philosophy

•Distributed collisions: Colliding elements spread across table
•Bounded capacity: At most one element per slot; table can fill up
•Pointer-free: All data in one contiguous array
•Interdependent slots: Probe sequences cross the entire table
•Complex deletion: Requires tombstones or shifting

These philosophical differences cascade into practical implications for every operation and design decision. Let's examine each dimension systematically.

Memory Usage — A Nuanced Comparison

Memory efficiency depends on load factor, key/value sizes, and platform-specific overheads. Let's break this down precisely.

Chaining Memory Model:

For a table with m buckets and n elements:

Table array:     m × sizeof(pointer) = 8m bytes (64-bit)
Linked list nodes: n × (sizeof(key) + sizeof(value) + sizeof(next_ptr))
                 = n × (K + V + 8) bytes

Total: 8m + n(K + V + 8) bytes

The pointer overhead is significant: each node carries an 8-byte next pointer, and each bucket carries a head pointer.

Open Addressing Memory Model:

For a table with m slots and n elements:

Table array: m × (sizeof(key) + sizeof(value) + sizeof(metadata))
           = m × (K + V + 1) bytes  [1 byte for empty/tombstone state]

Total: m(K + V + 1) bytes

No per-element pointer overhead, but empty slots consume full key+value space.

Comparative Analysis:

Let's compare for realistic scenarios (K=8 bytes, V=8 bytes):

Chaining:

Per element: 8 + 8 + 8 = 24 bytes
Per bucket (empty): 8 bytes
Overhead ratio at α=1: 8/24 = 33% (just pointers)

Open Addressing:

Per slot: 8 + 8 + 1 = 17 bytes
At α=0.5: n elements use 2n × 17 = 34 bytes per element
At α=0.75: n elements use 1.33n × 17 = 22.7 bytes per element

Memory Usage (bytes per active element) — K=8, V=8
Load Factor (α)	Chaining	Open Addressing	Winner
0.25	~24 bytes + 32 empty slots/n	68 bytes	Chaining
0.50	24 bytes + 16 empty slots/n	34 bytes	Chaining
0.75	24 bytes + 10.7 empty slots/n	22.7 bytes	Tie
1.00	24 bytes	17 bytes	Open Addressing
2.00	24 bytes (chains full)	Impossible	Open Addressing*

The Crossover Point

At low load factors, chaining wins because empty buckets are just pointers (8 bytes), while open addressing wastes full slot sizes (17+ bytes) on empty slots.

At high load factors, open addressing wins because it has no per-element pointer overhead.

The crossover is typically around α = 0.6-0.8, depending on key/value sizes.

Memory Allocation Considerations:

Beyond raw byte counts, memory allocation patterns differ significantly:

Chaining:

Requires dynamic allocation for each insert (new list node)
Allocation overhead (typically 8-24 bytes per allocation for metadata)
Fragmented memory: nodes scattered throughout heap
Can trigger garbage collection pressure (in GC languages)

Open Addressing:

Single contiguous allocation (the table array)
No per-insert allocation (until resize)
Predictable memory footprint
Resize requires one large reallocation

For embedded systems or real-time applications, open addressing's predictable allocation is often decisive.

Cache Performance — The Hidden Bottleneck

Understanding Cache Behavior:

Chaining memory access pattern:

Search for key K:
1. Load table[hash(K)]           → Cache miss (load bucket pointer)
2. Load head→key                  → Cache miss (node elsewhere in heap)
3. Compare key                    → Likely cache miss (key may be pointer)
4. If mismatch, load head→next   → Cache miss (next node)
5. Repeat until found or end

Each pointer chase is likely a cache miss. For a chain of length L, expect ~L cache misses.

Open addressing (linear probing) access pattern:

Search for key K:
1. Load table[hash(K)]           → Cache miss (load first slot)
2. Compare key                   → Cache hit (key in same cache line)
3. If mismatch, check table[hash(K)+1] → Cache hit! (same cache line)
4. Continue until found/empty   → Many consecutive hits

With 64-byte cache lines and 17-byte slots, we get ~3-4 slots per cache line. Linear probing clusters accesses spatially, achieving high cache hit rates.

Cache Performance Comparison (8-byte key, 8-byte value)
Metric	Chaining	Linear Probing	Double Hashing
Cache misses per probe	~1	~0.25	~1
Slots per cache line	N/A	3-4	3-4
Memory access pattern	Random (pointers)	Sequential	Random (jumps)
Prefetcher effectiveness	Poor	Excellent	Poor
TLB pressure	High	Low	Moderate

Cache Effects Dominate Theory

A stunning result from practical benchmarks:

Linear probing with α=0.9 (50+ expected probes theoretically) often outperforms double hashing with α=0.5 (2 expected probes) in wall-clock time.

Why? Those 50 linear probes might only cause 12-15 cache misses, while 2 double hashing probes cause 2 cache misses. At ~100ns per cache miss, the difference is massive:

Linear α=0.9: ~15 × 100ns = 1500ns
Double α=0.5: ~2 × 100ns = 200ns

Wait, that suggests double hashing wins? The reality is more complex — at lower load factors (where real systems operate), linear probing's cache advantage dominates.

SIMD and Modern Optimizations:

Modern hash table implementations exploit SIMD instructions to process multiple slots in parallel:

// Swiss Table approach
1. Load 16 metadata bytes (one cache line)
2. Use SIMD to compare all 16 slots' hash fragments simultaneously
3. Check matches with full key comparison

Operation Complexity Comparison

Let's compare the theoretical and practical complexity of core operations:

Time Complexity Comparison
Operation	Chaining (Avg)	Chaining (Worst)	Open Address (Avg)	Open Address (Worst)
Insert	O(1)	O(n)*	O(1/(1-α))	O(n)
Search (successful)	O(1 + α)	O(n)	O((1/α)ln(1/(1-α)))	O(n)
Search (unsuccessful)	O(1 + α)	O(n)	O(1/(1-α))	O(n)
Delete	O(1 + α)	O(n)	O(1) or O(cluster)	O(n)
Iteration	O(n + m)	O(n + m)	O(m)	O(m)

*Chaining insert is O(1) if we insert at head. O(n) if we check for duplicates or insert at tail.

Key observations:

Average case is similar: Both achieve O(1) average for well-distributed keys
Load factor sensitivity differs: Open addressing degrades rapidly past α ≈ 0.7; chaining handles α > 1 gracefully
Worst case is identical: Both degrade to O(n) under pathological hash collisions
Iteration differs: Chaining must visit empty buckets; open addressing visits all slots but often no pointer chasing

Deletion Complexity Deep Dive:

Deletion deserves special attention as it's the operation where strategies differ most:

Chaining deletion:

Find element: O(chain length)
Remove from list: O(1) with proper pointers
No impact on other elements
No tombstones, no cleanup needed

Open addressing deletion:

Find element: O(probe length)
Mark as tombstone: O(1)
OR shift subsequent elements: O(cluster size)
May need periodic cleanup/rehash
Tombstones degrade future operations

For workloads with frequent deletions, chaining's simpler model is often preferable.

The Worst-Case Matters

For interactive systems, worst-case latency often matters more than average. Both strategies can hit O(n) worst case, but the triggers differ:

Chaining worst case: Many keys hash to same bucket (adversarial or poor hash)

Open addressing worst case: Table nearly full OR many tombstones OR adversarial clustering

Open addressing has more failure modes, but they're all preventable with proper load factor management.

Load Factor Tolerance and Resize Behavior

Load factor tolerance is perhaps the starkest difference between strategies:

Chaining:

α can exceed 1 (more elements than buckets)
Performance degrades gradually: O(1 + α)
Typical resize threshold: α = 2.0 to 3.0
Works fine with α = 10+ (just slower)

Open Addressing:

α cannot exceed 1 (one element per slot maximum)
Performance degrades rapidly: O(1/(1-α))
Typical resize threshold: α = 0.5 to 0.75
Becomes unusable past α ≈ 0.9

Performance at Various Load Factors
Load Factor	Chaining (probes)	Open Addr (probes)	Practical Viability
0.25	1.25	1.33	Both excellent
0.50	1.50	2.00	Both good
0.75	1.75	4.00	Chaining good, OA acceptable
1.00	2.00	∞ (full)	Chaining good, OA impossible
2.00	3.00	N/A	Chaining acceptable
5.00	6.00	N/A	Chaining slow but works

Resize Behavior:

Chaining resize:

Create new table with more buckets
For each element in old table, rehash and insert into new
Old list nodes can be reused (pointer surgery) OR reallocated
Time: O(n + m_old)
Memory during resize: old table + new table + possible node allocations

Open addressing resize:

Create new table (typically 2× size)
For each non-empty, non-tombstone slot, rehash and insert
Tombstones are discarded (free cleanup!)
Time: O(m_old) to scan + O(n) to insert
Memory during resize: old table + new table (no extra allocations)

Resize as Cleanup Opportunity

This is why triggering resize based on (elements + tombstones) effectively manages tombstone accumulation.

Memory Overhead During Resize:

Both strategies need memory for old and new tables simultaneously during resize. For applications with tight memory budgets:

Chaining: Old table = m × 8 bytes, New table = 2m × 8 bytes Total: 24m bytes during resize → 16m after (33% temporary overhead)

Open addressing: Old table = m × S bytes, New table = 2m × S bytes Total: 3mS bytes during resize → 2mS after (50% temporary overhead)

Open addressing's temporary memory spike is proportionally larger because its per-slot overhead is higher.

Decision Framework — Making the Right Choice

Let's synthesize everything into a practical decision framework:

Choose Chaining When:

Deletion-heavy workloads: Chaining deletion is cleaner, no tombstones
Unpredictable sizing: Load factor > 1 is acceptable
Large values: Pointer overhead is small relative to value size
Stable, long-lived entries: Allocation overhead amortizes well
Need to iterate in insertion order: Easy with doubly-linked lists
Simple baseline implementation: Easier to implement correctly
Pointer-friendly environment: GC handles memory, pointers are cheap

Choose Open Addressing When:

Read-heavy workloads: Cache-friendly searches dominate
Memory-constrained systems: No per-element pointer overhead
Embedded/real-time: Predictable memory, no dynamic allocation per insert
Small key-value pairs: Slot overhead is acceptable
High-performance requirements: SIMD optimization opportunities
Infrequent deletions: Tombstone management is minimal
Known size bounds: Can size table appropriately upfront

Strategy Selection by Workload Type
Workload	Recommended Strategy	Reasoning
Database index (read-heavy)	Open addressing	Cache performance critical
Session store (high churn)	Chaining	Frequent deletes, variable load
Compiler symbol table	Chaining	Never delete, need iteration
DNS cache	Open addressing	Read-heavy, known max size
Shopping cart	Chaining	Frequent add/remove
Configuration lookup	Open addressing	Small, read-only after init
Graph adjacency lists	Chaining	Variable degree, frequent updates

The Modern Default

If you're unsure, the modern default is open addressing with linear probing:

Chaining remains valuable for specialized cases, but it's no longer the universal default it once was.

What Production Systems Actually Use

Let's see what major systems and languages have chosen:

Hash Table Implementations in Major Systems
System/Language	Strategy	Notable Features
Python dict (3.6+)	Open addressing	Compact dict, ordered, 1/3 empty slots
Java HashMap	Chaining→Tree	Chains become red-black trees at length 8
C++ std::unordered_map	Chaining	Standard requires chaining semantics
Rust HashMap	Open addressing	SwissTable-based, SIMD probing
Go map	Chaining (buckets)	8-element buckets with overflow chaining
Google Abseil	Open addressing	SwissTable, flat/node variants
Facebook F14	Open addressing	SIMD, 14 slots per group
Redis dict	Chaining	Incremental rehash, two tables
PostgreSQL	Chaining	Simple, robust, well-understood

Notable Trends:

Newer implementations favor open addressing: The shift toward cache-optimized, SIMD-friendly designs has made open addressing dominant in high-performance contexts.
Hybrid approaches are emerging: Java's HashMap uses chaining but converts long chains to trees. Go uses small buckets (open addressing within bucket) with chaining between buckets.
SwissTable is the new standard: Google's SwissTable design (used in Abseil, Rust, and others) combines open addressing with clever metadata encoding for exceptional performance.
Chaining persists where simplicity matters: Database systems, older codebases, and applications prioritizing correctness over raw speed often stick with chaining.

The SwissTable Revolution

Google's SwissTable (2017) demonstrated that open addressing can be 2-3× faster than traditional implementations through:

This design has been adopted by Rust, Facebook (F14), and many modern C++ codebases. It represents the current state-of-the-art for hash table performance.

Summary: Choosing Your Collision Strategy

We've completed a comprehensive comparison of chaining and open addressing. Let's consolidate the essential insights:

Key Takeaways

•Chaining uses auxiliary structures; open addressing uses the table itself — This fundamental difference cascades through every aspect of behavior.
•Memory efficiency depends on load factor — Chaining wins at low α; open addressing wins at high α.
•Cache performance often dominates theory — Linear probing's cache-friendly behavior can outweigh its clustering disadvantage.
•Deletion is chaining's clearest advantage — No tombstones, no accumulation, no cleanup.
•Load factor tolerance differs dramatically — Chaining handles α > 1; open addressing fails past α ≈ 0.9.
•Modern implementations favor open addressing — SwissTable and similar designs have made it the performance leader.
•Neither is universally superior — The right choice depends on workload characteristics and constraints.

Module Complete:

With this knowledge, you can make informed decisions about hash table implementations, understand the tradeoffs in library choices, and reason about hash table performance in your systems.

Module Complete

4 / 4