Data Structures & AlgorithmsCollision Handling — Open Addressing

Collision Handling — Open Addressing

LevelIntermediate

Duration60 mins

TopicCollision Handling — Open Addressing

3 / 4

Deletion Challenges with Open Addressing

The Deletion Dilemma

Deletion in open-addressed hash tables presents a fundamental problem that doesn't exist with chaining. In chaining, deleting an element simply removes a node from a linked list — the bucket structure remains intact. In open addressing, deleting an element creates a hole in the probe sequence, potentially severing the connection to elements inserted afterward.

This page confronts the deletion problem head-on. We'll understand why naive deletion breaks the data structure, develop the tombstone (or lazy deletion) solution, analyze its tradeoffs, and explore advanced techniques for managing the complexity tombstones introduce.

What You Will Learn

By the end of this page, you will understand:

• Why naive deletion breaks search in open addressing • The tombstone mechanism and how it preserves probe sequence integrity • How tombstones affect insertion, search, and load factor calculations • The accumulation problem and why tombstones degrade performance over time • Strategies for tombstone cleanup: rehashing and periodic purging • Alternative deletion approaches: backward-shift and move-to-fill • When to choose each deletion strategy

Why Naive Deletion Breaks Everything

Let's start with the intuitive approach: to delete a key, simply mark its slot as empty. This seems reasonable — after all, that's how we'd remove an element from an array. But in open addressing, this intuition leads to catastrophic failure.

The Problem: Severed Probe Chains

Recall the search invariant: when we encounter an empty slot during search, we conclude the key isn't in the table. But if we've deleted an element that was between the initial hash position and the target key, we create a false empty slot that prematurely terminates search.

Step-by-Step Demonstration:

Consider a hash table with linear probing (m = 11). We insert three keys that all hash to index 3:

Step 1: Insert A (h(A) = 3)
[ _ | _ | _ | A | _ | _ | _ | _ | _ | _ | _ ]
  0   1   2   3   4   5   6   7   8   9  10

Step 2: Insert B (h(B) = 3) — collides, probes to 4
[ _ | _ | _ | A | B | _ | _ | _ | _ | _ | _ ]

Step 3: Insert C (h(C) = 3) — collides at 3 and 4, probes to 5
[ _ | _ | _ | A | B | C | _ | _ | _ | _ | _ ]

Now let's naively delete B:

Step 4: Delete B — mark slot 4 as empty
[ _ | _ | _ | A | _ | C | _ | _ | _ | _ | _ ]
                  ^
                  Empty slot created

Now search for C:

Search C (h(C) = 3):
  Probe 0: index 3 → A (not C, continue)
  Probe 1: index 4 → EMPTY
  → Search terminates. C not found!

But C is still in the table at index 5!

Data Structure Corruption

Naive deletion doesn't just cause search to fail — it corrupts the data structure. Elements remain in the table but become unreachable. This is a severe bug:

• Searching for C returns 'not found' even though C exists • Inserting C again would create a duplicate at a different slot • The table now violates its fundamental invariants

This corruption is insidious because it doesn't cause an immediate crash — the table appears to function, but returns incorrect results.

The Root Cause:

The problem stems from the dual meaning of 'empty' in naive implementations:

Empty = never occupied: No key has ever been placed here
Empty = deleted: A key was here but was removed

For search, these two meanings require different actions:

Never occupied → Stop searching, key not in table
Deleted → Continue searching, key might be further along

Naive deletion conflates these meanings, causing the search algorithm to stop early when it should continue.

The Tombstone Solution — Lazy Deletion

The solution is to introduce a third state for slots: instead of just 'empty' and 'occupied', we add 'deleted' — also known as a tombstone.

The Three Slot States:

State	Meaning	Search Behavior	Insert Behavior
Empty	Never used	Stop, key not found	Use this slot
Occupied	Contains a key	Compare key, continue if mismatch	Continue probing
Tombstone	Previously occupied, now deleted	Continue probing	Can use this slot

How Tombstones Preserve Correctness:

Revisiting our example with tombstones:

After inserting A, B, C:
[ _ | _ | _ | A | B | C | _ | _ | _ | _ | _ ]

Delete B with tombstone (⊗):
[ _ | _ | _ | A | ⊗ | C | _ | _ | _ | _ | _ ]

Search for C:
  Probe 0: index 3 → A (not C, continue)
  Probe 1: index 4 → TOMBSTONE (continue probing!)
  Probe 2: index 5 → C (found!)

The tombstone preserves the probe chain. Search knows to continue past deleted slots, eventually finding C.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
class TombstoneHashTable:
    # Sentinel object for tombstones - distinct from None (empty)
    TOMBSTONE = object()
    
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.keys = [None] * capacity    # None = empty, TOMBSTONE = deleted
        self.values = [None] * capacity
        self.size = 0                     # Active elements (excludes tombstones)
        self.tombstone_count = 0          # Track tombstones separately
    
    def _is_empty(self, index: int) -> bool:
        return self.keys[index] is None
    
    def _is_tombstone(self, index: int) -> bool:
        return self.keys[index] is self.TOMBSTONE
    
    def _is_occupied(self, index: int) -> bool:
        return self.keys[index] is not None and not self._is_tombstone(index)
    
    def search(self, key):
        """
        Search for key, treating tombstones as 'continue probing'.
        """
        for index in self._probe_sequence(key):
            if self._is_empty(index):
                return None  # True empty → key not in table
            
            if self._is_tombstone(index):
                continue  # Deleted slot → keep probing
            
            if self.keys[index] == key:
                return self.values[index]
        
        return None
    
    def insert(self, key, value) -> bool:
        """
        Insert key-value, reusing tombstone slots when possible.
        """
        first_tombstone = None  # Track first tombstone encountered
        
        for index in self._probe_sequence(key):
            if self._is_empty(index):
                # Use first tombstone if we found one, else this empty slot
                target = first_tombstone if first_tombstone is not None else index
                self.keys[target] = key
                self.values[target] = value
                
                if target == first_tombstone:
                    self.tombstone_count -= 1  # Reused a tombstone
                
                self.size += 1
                return True
            
            if self._is_tombstone(index):
                if first_tombstone is None:
                    first_tombstone = index  # Remember for potential reuse
                continue
            
            if self.keys[index] == key:
                self.values[index] = value  # Update existing
                return True
        
        return False
    
    def delete(self, key) -> bool:
        """
        Delete key by replacing it with a tombstone.
        """
        for index in self._probe_sequence(key):
            if self._is_empty(index):
                return False  # Key not found
            
            if self._is_tombstone(index):
                continue
            
            if self.keys[index] == key:
                self.keys[index] = self.TOMBSTONE
                self.values[index] = None  # Free the value
                self.size -= 1
                self.tombstone_count += 1
                return True
        
        return False

Tombstone Reuse During Insertion

The insert algorithm above includes an important optimization: when inserting a new key, we can reuse a tombstone slot.

However, we must still probe to the first empty slot to verify the key doesn't already exist — we can't just stop at the tombstone. This ensures we don't create duplicates.

After confirming the key is new, we insert at the earliest tombstone we encountered, which may shorten future probe sequences for this key.

The Tombstone Accumulation Problem

Tombstones solve the correctness problem but introduce a new issue: tombstones accumulate over time, degrading performance even when the number of active elements is small.

The Problem:

Consider a hash table that undergoes many insert-delete cycles:

Initial: 100 elements, 0 tombstones
After deleting 50: 50 elements, 50 tombstones
After inserting 50 new: 100 elements, ~50 tombstones (some reused)
After deleting 50 different: 50 elements, ~100 tombstones
... repeat ...

Over time, tombstones accumulate. Even if the table contains only a few active elements, it may be littered with tombstones that slow down every operation.

Impact on Performance:

Tombstones affect performance in multiple ways:

1. Search traversal: Search must probe through tombstones, treating them as occupied slots. A table with 50 elements and 200 tombstones performs as if it had 250 elements for unsuccessful search.

2. Load factor calculation: The 'effective load factor' is (elements + tombstones) / capacity, not just elements / capacity. High tombstone counts effectively reduce capacity.

3. Insertion overhead: Insertion must scan past tombstones to verify key uniqueness, then backtrack to use the first tombstone.

4. Memory waste: Tombstones occupy slots but store no useful data.

Impact of Tombstones on Search Performance
Scenario	Active Elements	Tombstones	Effective Load Factor	Avg Probes (Unsuccessful)
Optimal	100	0	0.50	2.00
Light tombstones	100	50	0.75	4.00
Heavy tombstones	100	100	1.00	∞ (full)
Mostly tombstones	20	180	1.00	∞ (full)

Tombstones Count Against Capacity

A critical implementation detail: when checking if the table needs resizing, you must consider both active elements AND tombstones:

effective_load = (self.size + self.tombstone_count) / self.capacity
if effective_load > self.max_load_factor:
    self._resize()  # Resize also purges tombstones

If you only check self.size, a table with 10 elements and 190 tombstones won't resize, despite being effectively full.

Worst-Case Scenario:

Imagine an adversarial usage pattern:

Fill table to 70% capacity (70 elements in table of 100)
Delete all 70 elements (now: 0 elements, 70 tombstones)
Insert 70 new elements (partially reuse tombstones, but many remain)
Delete all again
Repeat

Without cleanup, tombstones accumulate toward 100%, eventually making every operation O(n). The table is effectively full despite being logically empty.

This workload pattern is common in caches with short-lived entries, session stores, and any system with high churn.

Tombstone Cleanup Strategies

To prevent tombstone accumulation from degrading performance, we need cleanup strategies. The primary approaches are rehashing and lazy cleanup.

Strategy 1: Full Rehash

The most thorough cleanup is to create a new table and reinsert all active elements, discarding tombstones:

def _cleanup_rehash(self):
    """
    Create new table with only active elements.
    Tombstones are discarded.
    """
    old_keys = self.keys
    old_values = self.values
    
    # May resize or keep same capacity
    new_capacity = self._calculate_new_capacity()
    self.keys = [None] * new_capacity
    self.values = [None] * new_capacity
    self.capacity = new_capacity
    self.size = 0
    self.tombstone_count = 0  # Reset!
    
    # Reinsert only active elements
    for i in range(len(old_keys)):
        if old_keys[i] is not None and old_keys[i] is not self.TOMBSTONE:
            self.insert(old_keys[i], old_values[i])

Pros: Completely eliminates tombstones, restores optimal performance Cons: O(n) operation, can cause latency spikes

Strategy 2: Resize Triggers Cleanup

Combine resize with cleanup — whenever we resize (up or down), we get a free tombstone purge:

def _check_load_factor(self):
    """
    Check if resize is needed. Consider both elements and tombstones.
    """
    effective_load = (self.size + self.tombstone_count) / self.capacity
    actual_load = self.size / self.capacity
    
    if effective_load > 0.7:
        if actual_load > 0.5:
            # Table genuinely full, grow it
            self._resize(self.capacity * 2)
        else:
            # Many tombstones, just cleanup (rehash at same size)
            self._resize(self.capacity)

This piggybacks cleanup on resize, avoiding additional operations.

Strategy 3: Periodic Cleanup Based on Tombstone Ratio

Trigger cleanup when tombstones exceed a threshold relative to active elements:

def _maybe_cleanup(self):
    """
    Trigger cleanup if tombstones are excessive.
    """
    if self.size == 0:
        if self.tombstone_count > self.capacity // 4:
            self._cleanup_rehash()  # Table is mostly tombstones
    else:
        tombstone_ratio = self.tombstone_count / self.size
        if tombstone_ratio > 2.0:  # More than 2 tombstones per element
            self._cleanup_rehash()

# Call after each delete
def delete(self, key) -> bool:
    result = self._delete_internal(key)
    if result:
        self._maybe_cleanup()
    return result

This bounds the tombstone-to-element ratio, preventing unbounded accumulation.

Amortized Analysis

With proper cleanup triggering, the amortized cost per operation remains O(1):

• Each deletion creates one tombstone • Cleanup is triggered when tombstones reach O(n) • Cleanup takes O(n) time but 'pays for' all accumulated tombstones • Average cost per delete: O(1)

This is analogous to dynamic array resizing — occasional expensive operations are amortized across many cheap operations.

Strategy 4: Incremental Cleanup

Spread cleanup work across multiple operations to avoid latency spikes:

def _incremental_cleanup(self, probes_per_op: int = 5):
    """
    Clean up a few tombstones each operation.
    Spread O(n) work over many operations.
    """
    if self.tombstone_count == 0:
        return
    
    checked = 0
    index = self._cleanup_cursor
    
    while checked < probes_per_op:
        if self._is_tombstone(index):
            # Try to fill this tombstone from later in its chain
            self._try_fill_tombstone(index)
        index = (index + 1) % self.capacity
        checked += 1
    
    self._cleanup_cursor = index

This 'housekeeping' approach maintains performance without any single expensive operation.

Alternative Deletion Strategies

Tombstones aren't the only way to handle deletion. Several alternative approaches avoid tombstones entirely at the cost of more complex delete operations.

Alternative 1: Backward Shift Deletion

After removing an element, shift subsequent elements backward to fill the gap — but only elements that belong in this probe chain:

def delete_backward_shift(self, key) -> bool:
    """
    Delete key by shifting elements backward.
    No tombstones created.
    """
    # Find the key
    delete_index = None
    for index in self._probe_sequence(key):
        if self._is_empty(index):
            return False  # Not found
        if self.keys[index] == key:
            delete_index = index
            break
    
    if delete_index is None:
        return False
    
    # Now shift elements backward
    current = delete_index
    while True:
        next_idx = (current + 1) % self.capacity
        
        if self._is_empty(next_idx):
            break  # End of chain
        
        # Check if element at next_idx can move backward
        element_ideal = self._hash(self.keys[next_idx])
        
        # Can move if current slot is on element's probe path
        if self._is_between(element_ideal, current, next_idx):
            self.keys[current] = self.keys[next_idx]
            self.values[current] = self.values[next_idx]
            current = next_idx
        else:
            break
    
    # Clear the final slot
    self.keys[current] = None
    self.values[current] = None
    self.size -= 1
    return True

Pros: No tombstones, no accumulation problem Cons: Delete is O(cluster size), complex implementation, cache-unfriendly

Alternative 2: Robin Hood Deletion

Robin Hood hashing tracks the 'probe distance' for each element. During deletion, we can shift elements more intelligently:

def delete_robin_hood(self, key) -> bool:
    """
    Robin Hood deletion: shift elements with positive probe distance.
    """
    delete_index = self._find(key)
    if delete_index is None:
        return False
    
    current = delete_index
    
    while True:
        next_idx = (current + 1) % self.capacity
        
        if self._is_empty(next_idx):
            break
        
        # Elements in Robin Hood have stored probe distances
        if self.probe_distances[next_idx] == 0:
            break  # Element is at ideal position, can't shift
        
        # Move element backward and decrease its probe distance
        self.keys[current] = self.keys[next_idx]
        self.values[current] = self.values[next_idx]
        self.probe_distances[current] = self.probe_distances[next_idx] - 1
        
        current = next_idx
    
    # Clear final slot
    self.keys[current] = None
    self.probe_distances[current] = 0
    self.size -= 1
    return True

Robin Hood's probe distance tracking makes the shift logic simpler and more efficient.

When to Avoid Tombstones

Tombstone-free deletion is preferred when:

• Deletions are rare: Backward shift's O(cluster) cost is acceptable • Read-heavy workloads: Tombstones slow down every search • Long-running processes: Tombstone accumulation over days/weeks becomes severe • Memory-constrained systems: Can't afford tombstone cleanup's temporary memory spike

Conversely, tombstones are better when:

• Deletions are frequent: O(1) delete matters • Latency-sensitive: Can't afford worst-case O(n) backward shift • Already using Robin Hood: Probe distance makes both approaches viable

Alternative 3: No Deletion Allowed

The simplest approach: don't support deletion at all. This sounds extreme but is valid for:

Write-once caches: Data is inserted and read, never deleted
Append-only logs: Events are added but never removed
Temporary tables: Built for one computation, then discarded

Many high-performance systems use insert-only hash tables when the workload permits.

Comparing Deletion Strategies

Let's compare the deletion strategies across key dimensions:

Deletion Strategy Comparison
Characteristic	Tombstones	Backward Shift	Robin Hood Shift
Delete time (average)	O(1)	O(cluster size)	O(cluster size)
Delete time (worst)	O(n) search	O(n)	O(n)
Search overhead	Must skip tombstones	None added	None added
Accumulation problem	Yes	No	No
Implementation complexity	Simple	Moderate	Complex
Extra memory	None	None	Probe distance array
Cache behavior during delete	Excellent	Poor (many writes)	Moderate
Cleanup required	Yes (rehash)	No	No

Decision Framework:

Choose Tombstones When

•Deletion is infrequent relative to insert/search
•Simplicity is valued over peak performance
•Periodic rehashing is acceptable
•Delete latency must be predictable (O(1) average)
•Standard library implementations are used

Choose Shift Deletion When

•Search-heavy workload with many negative lookups
•Long-running process where accumulation hurts
•Cannot afford unpredictable rehash pauses
•Already using Robin Hood hashing
•Clusters are small (low load factor)

Real-World Choices

Most production hash tables use tombstones because:

O(1) delete is expected by users
The accumulation problem is manageable with proper rehashing
Implementation simplicity reduces bugs
Shift deletion's cache thrashing outweighs theoretical benefits

Google's Swiss Tables and Facebook's F14 both use tombstones with sophisticated cleanup strategies.

Summary: Mastering Deletion in Open Addressing

We've thoroughly explored the deletion challenge unique to open addressing. Let's consolidate the essential insights:

Key Takeaways

•Naive deletion breaks search — Creating empty slots severs probe chains, making subsequent elements unreachable.
•Tombstones preserve probe chain integrity — A 'deleted' marker tells search to continue probing, maintaining correctness.
•Tombstones accumulate over time — Without cleanup, performance degrades even with few active elements.
•Cleanup strategies restore performance — Rehashing purges tombstones; can be triggered by tombstone ratio or combined with resize.
•Shift deletion avoids tombstones — At the cost of O(cluster) delete time and implementation complexity.
•Strategy choice depends on workload — Deletion frequency, search patterns, and latency requirements guide the decision.

What's next:

We've now covered the three core operations for open addressing: insertion, search, and deletion. In the final page of this module, we'll compare open addressing with chaining — examining when each collision resolution strategy excels, their performance tradeoffs, and guidance for choosing between them in practice.

Page Complete

You now understand why deletion is problematic in open-addressed hash tables, how tombstones solve the correctness issue, the accumulation problem they create, cleanup strategies to manage them, and alternative deletion approaches that avoid tombstones entirely. Next, we'll synthesize everything with a comprehensive comparison of open addressing versus chaining.

3 / 4

Loading learning content...

Data Structures & AlgorithmsCollision Handling — Open Addressing

Collision Handling — Open Addressing

LevelIntermediate

Duration60 mins

TopicCollision Handling — Open Addressing

3 / 4

Deletion Challenges with Open Addressing

The Deletion Dilemma

What You Will Learn

By the end of this page, you will understand:

Why Naive Deletion Breaks Everything

The Problem: Severed Probe Chains

Step-by-Step Demonstration:

Consider a hash table with linear probing (m = 11). We insert three keys that all hash to index 3:

Step 1: Insert A (h(A) = 3)
[ _ | _ | _ | A | _ | _ | _ | _ | _ | _ | _ ]
  0   1   2   3   4   5   6   7   8   9  10

Step 2: Insert B (h(B) = 3) — collides, probes to 4
[ _ | _ | _ | A | B | _ | _ | _ | _ | _ | _ ]

Step 3: Insert C (h(C) = 3) — collides at 3 and 4, probes to 5
[ _ | _ | _ | A | B | C | _ | _ | _ | _ | _ ]

Now let's naively delete B:

Step 4: Delete B — mark slot 4 as empty
[ _ | _ | _ | A | _ | C | _ | _ | _ | _ | _ ]
                  ^
                  Empty slot created

Now search for C:

Search C (h(C) = 3):
  Probe 0: index 3 → A (not C, continue)
  Probe 1: index 4 → EMPTY
  → Search terminates. C not found!

But C is still in the table at index 5!

Data Structure Corruption

Naive deletion doesn't just cause search to fail — it corrupts the data structure. Elements remain in the table but become unreachable. This is a severe bug:

• Searching for C returns 'not found' even though C exists • Inserting C again would create a duplicate at a different slot • The table now violates its fundamental invariants

This corruption is insidious because it doesn't cause an immediate crash — the table appears to function, but returns incorrect results.

The Root Cause:

The problem stems from the dual meaning of 'empty' in naive implementations:

Empty = never occupied: No key has ever been placed here
Empty = deleted: A key was here but was removed

For search, these two meanings require different actions:

Never occupied → Stop searching, key not in table
Deleted → Continue searching, key might be further along

Naive deletion conflates these meanings, causing the search algorithm to stop early when it should continue.

The Tombstone Solution — Lazy Deletion

The solution is to introduce a third state for slots: instead of just 'empty' and 'occupied', we add 'deleted' — also known as a tombstone.

The Three Slot States:

State	Meaning	Search Behavior	Insert Behavior
Empty	Never used	Stop, key not found	Use this slot
Occupied	Contains a key	Compare key, continue if mismatch	Continue probing
Tombstone	Previously occupied, now deleted	Continue probing	Can use this slot

How Tombstones Preserve Correctness:

Revisiting our example with tombstones:

After inserting A, B, C:
[ _ | _ | _ | A | B | C | _ | _ | _ | _ | _ ]

Delete B with tombstone (⊗):
[ _ | _ | _ | A | ⊗ | C | _ | _ | _ | _ | _ ]

Search for C:
  Probe 0: index 3 → A (not C, continue)
  Probe 1: index 4 → TOMBSTONE (continue probing!)
  Probe 2: index 5 → C (found!)

The tombstone preserves the probe chain. Search knows to continue past deleted slots, eventually finding C.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
class TombstoneHashTable:
    # Sentinel object for tombstones - distinct from None (empty)
    TOMBSTONE = object()
    
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.keys = [None] * capacity    # None = empty, TOMBSTONE = deleted
        self.values = [None] * capacity
        self.size = 0                     # Active elements (excludes tombstones)
        self.tombstone_count = 0          # Track tombstones separately
    
    def _is_empty(self, index: int) -> bool:
        return self.keys[index] is None
    
    def _is_tombstone(self, index: int) -> bool:
        return self.keys[index] is self.TOMBSTONE
    
    def _is_occupied(self, index: int) -> bool:
        return self.keys[index] is not None and not self._is_tombstone(index)
    
    def search(self, key):
        """
        Search for key, treating tombstones as 'continue probing'.
        """
        for index in self._probe_sequence(key):
            if self._is_empty(index):
                return None  # True empty → key not in table
            
            if self._is_tombstone(index):
                continue  # Deleted slot → keep probing
            
            if self.keys[index] == key:
                return self.values[index]
        
        return None
    
    def insert(self, key, value) -> bool:
        """
        Insert key-value, reusing tombstone slots when possible.
        """
        first_tombstone = None  # Track first tombstone encountered
        
        for index in self._probe_sequence(key):
            if self._is_empty(index):
                # Use first tombstone if we found one, else this empty slot
                target = first_tombstone if first_tombstone is not None else index
                self.keys[target] = key
                self.values[target] = value
                
                if target == first_tombstone:
                    self.tombstone_count -= 1  # Reused a tombstone
                
                self.size += 1
                return True
            
            if self._is_tombstone(index):
                if first_tombstone is None:
                    first_tombstone = index  # Remember for potential reuse
                continue
            
            if self.keys[index] == key:
                self.values[index] = value  # Update existing
                return True
        
        return False
    
    def delete(self, key) -> bool:
        """
        Delete key by replacing it with a tombstone.
        """
        for index in self._probe_sequence(key):
            if self._is_empty(index):
                return False  # Key not found
            
            if self._is_tombstone(index):
                continue
            
            if self.keys[index] == key:
                self.keys[index] = self.TOMBSTONE
                self.values[index] = None  # Free the value
                self.size -= 1
                self.tombstone_count += 1
                return True
        
        return False

Tombstone Reuse During Insertion

The insert algorithm above includes an important optimization: when inserting a new key, we can reuse a tombstone slot.

However, we must still probe to the first empty slot to verify the key doesn't already exist — we can't just stop at the tombstone. This ensures we don't create duplicates.

After confirming the key is new, we insert at the earliest tombstone we encountered, which may shorten future probe sequences for this key.

The Tombstone Accumulation Problem

Tombstones solve the correctness problem but introduce a new issue: tombstones accumulate over time, degrading performance even when the number of active elements is small.

The Problem:

Consider a hash table that undergoes many insert-delete cycles:

Initial: 100 elements, 0 tombstones
After deleting 50: 50 elements, 50 tombstones
After inserting 50 new: 100 elements, ~50 tombstones (some reused)
After deleting 50 different: 50 elements, ~100 tombstones
... repeat ...

Over time, tombstones accumulate. Even if the table contains only a few active elements, it may be littered with tombstones that slow down every operation.

Impact on Performance:

Tombstones affect performance in multiple ways:

1. Search traversal: Search must probe through tombstones, treating them as occupied slots. A table with 50 elements and 200 tombstones performs as if it had 250 elements for unsuccessful search.

2. Load factor calculation: The 'effective load factor' is (elements + tombstones) / capacity, not just elements / capacity. High tombstone counts effectively reduce capacity.

3. Insertion overhead: Insertion must scan past tombstones to verify key uniqueness, then backtrack to use the first tombstone.

4. Memory waste: Tombstones occupy slots but store no useful data.

Impact of Tombstones on Search Performance
Scenario	Active Elements	Tombstones	Effective Load Factor	Avg Probes (Unsuccessful)
Optimal	100	0	0.50	2.00
Light tombstones	100	50	0.75	4.00
Heavy tombstones	100	100	1.00	∞ (full)
Mostly tombstones	20	180	1.00	∞ (full)

Tombstones Count Against Capacity

A critical implementation detail: when checking if the table needs resizing, you must consider both active elements AND tombstones:

effective_load = (self.size + self.tombstone_count) / self.capacity
if effective_load > self.max_load_factor:
    self._resize()  # Resize also purges tombstones

If you only check self.size, a table with 10 elements and 190 tombstones won't resize, despite being effectively full.

Worst-Case Scenario:

Imagine an adversarial usage pattern:

Fill table to 70% capacity (70 elements in table of 100)
Delete all 70 elements (now: 0 elements, 70 tombstones)
Insert 70 new elements (partially reuse tombstones, but many remain)
Delete all again
Repeat

Without cleanup, tombstones accumulate toward 100%, eventually making every operation O(n). The table is effectively full despite being logically empty.

This workload pattern is common in caches with short-lived entries, session stores, and any system with high churn.

Tombstone Cleanup Strategies

To prevent tombstone accumulation from degrading performance, we need cleanup strategies. The primary approaches are rehashing and lazy cleanup.

Strategy 1: Full Rehash

The most thorough cleanup is to create a new table and reinsert all active elements, discarding tombstones:

def _cleanup_rehash(self):
    """
    Create new table with only active elements.
    Tombstones are discarded.
    """
    old_keys = self.keys
    old_values = self.values
    
    # May resize or keep same capacity
    new_capacity = self._calculate_new_capacity()
    self.keys = [None] * new_capacity
    self.values = [None] * new_capacity
    self.capacity = new_capacity
    self.size = 0
    self.tombstone_count = 0  # Reset!
    
    # Reinsert only active elements
    for i in range(len(old_keys)):
        if old_keys[i] is not None and old_keys[i] is not self.TOMBSTONE:
            self.insert(old_keys[i], old_values[i])

Pros: Completely eliminates tombstones, restores optimal performance Cons: O(n) operation, can cause latency spikes

Strategy 2: Resize Triggers Cleanup

Combine resize with cleanup — whenever we resize (up or down), we get a free tombstone purge:

def _check_load_factor(self):
    """
    Check if resize is needed. Consider both elements and tombstones.
    """
    effective_load = (self.size + self.tombstone_count) / self.capacity
    actual_load = self.size / self.capacity
    
    if effective_load > 0.7:
        if actual_load > 0.5:
            # Table genuinely full, grow it
            self._resize(self.capacity * 2)
        else:
            # Many tombstones, just cleanup (rehash at same size)
            self._resize(self.capacity)

This piggybacks cleanup on resize, avoiding additional operations.

Strategy 3: Periodic Cleanup Based on Tombstone Ratio

Trigger cleanup when tombstones exceed a threshold relative to active elements:

def _maybe_cleanup(self):
    """
    Trigger cleanup if tombstones are excessive.
    """
    if self.size == 0:
        if self.tombstone_count > self.capacity // 4:
            self._cleanup_rehash()  # Table is mostly tombstones
    else:
        tombstone_ratio = self.tombstone_count / self.size
        if tombstone_ratio > 2.0:  # More than 2 tombstones per element
            self._cleanup_rehash()

# Call after each delete
def delete(self, key) -> bool:
    result = self._delete_internal(key)
    if result:
        self._maybe_cleanup()
    return result

This bounds the tombstone-to-element ratio, preventing unbounded accumulation.

Amortized Analysis

With proper cleanup triggering, the amortized cost per operation remains O(1):

• Each deletion creates one tombstone • Cleanup is triggered when tombstones reach O(n) • Cleanup takes O(n) time but 'pays for' all accumulated tombstones • Average cost per delete: O(1)

This is analogous to dynamic array resizing — occasional expensive operations are amortized across many cheap operations.

Strategy 4: Incremental Cleanup

Spread cleanup work across multiple operations to avoid latency spikes:

def _incremental_cleanup(self, probes_per_op: int = 5):
    """
    Clean up a few tombstones each operation.
    Spread O(n) work over many operations.
    """
    if self.tombstone_count == 0:
        return
    
    checked = 0
    index = self._cleanup_cursor
    
    while checked < probes_per_op:
        if self._is_tombstone(index):
            # Try to fill this tombstone from later in its chain
            self._try_fill_tombstone(index)
        index = (index + 1) % self.capacity
        checked += 1
    
    self._cleanup_cursor = index

This 'housekeeping' approach maintains performance without any single expensive operation.

Alternative Deletion Strategies

Tombstones aren't the only way to handle deletion. Several alternative approaches avoid tombstones entirely at the cost of more complex delete operations.

Alternative 1: Backward Shift Deletion

After removing an element, shift subsequent elements backward to fill the gap — but only elements that belong in this probe chain:

def delete_backward_shift(self, key) -> bool:
    """
    Delete key by shifting elements backward.
    No tombstones created.
    """
    # Find the key
    delete_index = None
    for index in self._probe_sequence(key):
        if self._is_empty(index):
            return False  # Not found
        if self.keys[index] == key:
            delete_index = index
            break
    
    if delete_index is None:
        return False
    
    # Now shift elements backward
    current = delete_index
    while True:
        next_idx = (current + 1) % self.capacity
        
        if self._is_empty(next_idx):
            break  # End of chain
        
        # Check if element at next_idx can move backward
        element_ideal = self._hash(self.keys[next_idx])
        
        # Can move if current slot is on element's probe path
        if self._is_between(element_ideal, current, next_idx):
            self.keys[current] = self.keys[next_idx]
            self.values[current] = self.values[next_idx]
            current = next_idx
        else:
            break
    
    # Clear the final slot
    self.keys[current] = None
    self.values[current] = None
    self.size -= 1
    return True

Pros: No tombstones, no accumulation problem Cons: Delete is O(cluster size), complex implementation, cache-unfriendly

Alternative 2: Robin Hood Deletion

Robin Hood hashing tracks the 'probe distance' for each element. During deletion, we can shift elements more intelligently:

def delete_robin_hood(self, key) -> bool:
    """
    Robin Hood deletion: shift elements with positive probe distance.
    """
    delete_index = self._find(key)
    if delete_index is None:
        return False
    
    current = delete_index
    
    while True:
        next_idx = (current + 1) % self.capacity
        
        if self._is_empty(next_idx):
            break
        
        # Elements in Robin Hood have stored probe distances
        if self.probe_distances[next_idx] == 0:
            break  # Element is at ideal position, can't shift
        
        # Move element backward and decrease its probe distance
        self.keys[current] = self.keys[next_idx]
        self.values[current] = self.values[next_idx]
        self.probe_distances[current] = self.probe_distances[next_idx] - 1
        
        current = next_idx
    
    # Clear final slot
    self.keys[current] = None
    self.probe_distances[current] = 0
    self.size -= 1
    return True

Robin Hood's probe distance tracking makes the shift logic simpler and more efficient.

When to Avoid Tombstones

Tombstone-free deletion is preferred when:

Conversely, tombstones are better when:

Alternative 3: No Deletion Allowed

The simplest approach: don't support deletion at all. This sounds extreme but is valid for:

Write-once caches: Data is inserted and read, never deleted
Append-only logs: Events are added but never removed
Temporary tables: Built for one computation, then discarded

Many high-performance systems use insert-only hash tables when the workload permits.

Comparing Deletion Strategies

Let's compare the deletion strategies across key dimensions:

Deletion Strategy Comparison
Characteristic	Tombstones	Backward Shift	Robin Hood Shift
Delete time (average)	O(1)	O(cluster size)	O(cluster size)
Delete time (worst)	O(n) search	O(n)	O(n)
Search overhead	Must skip tombstones	None added	None added
Accumulation problem	Yes	No	No
Implementation complexity	Simple	Moderate	Complex
Extra memory	None	None	Probe distance array
Cache behavior during delete	Excellent	Poor (many writes)	Moderate
Cleanup required	Yes (rehash)	No	No

Decision Framework:

Choose Tombstones When

•Deletion is infrequent relative to insert/search
•Simplicity is valued over peak performance
•Periodic rehashing is acceptable
•Delete latency must be predictable (O(1) average)
•Standard library implementations are used

Choose Shift Deletion When

•Search-heavy workload with many negative lookups
•Long-running process where accumulation hurts
•Cannot afford unpredictable rehash pauses
•Already using Robin Hood hashing
•Clusters are small (low load factor)

Real-World Choices

Most production hash tables use tombstones because:

O(1) delete is expected by users
The accumulation problem is manageable with proper rehashing
Implementation simplicity reduces bugs
Shift deletion's cache thrashing outweighs theoretical benefits

Google's Swiss Tables and Facebook's F14 both use tombstones with sophisticated cleanup strategies.

Summary: Mastering Deletion in Open Addressing

We've thoroughly explored the deletion challenge unique to open addressing. Let's consolidate the essential insights:

Key Takeaways

•Naive deletion breaks search — Creating empty slots severs probe chains, making subsequent elements unreachable.
•Tombstones preserve probe chain integrity — A 'deleted' marker tells search to continue probing, maintaining correctness.
•Tombstones accumulate over time — Without cleanup, performance degrades even with few active elements.
•Cleanup strategies restore performance — Rehashing purges tombstones; can be triggered by tombstone ratio or combined with resize.
•Shift deletion avoids tombstones — At the cost of O(cluster) delete time and implementation complexity.
•Strategy choice depends on workload — Deletion frequency, search patterns, and latency requirements guide the decision.

What's next:

Page Complete

3 / 4