Data Structures & AlgorithmsHash Table Operations & Amortized Cost Model

Hash Table Operations & Amortized Cost Model

LevelIntermediate

Duration60 mins

TopicHash Table Operations & Amortized Cost Model

3 / 4

Hash Table Delete Operation — O(1) Average, O(n) Worst Case

Deletion: The Overlooked Complexity

Insertion and lookup receive most of the attention in hash table discussions, but deletion harbors subtle complexities that can fundamentally impact implementation choices and long-term performance. While deletion shares the same O(1) average / O(n) worst case complexity as its siblings, the mechanisms differ significantly between collision resolution strategies.

With chaining, deletion is straightforward—remove a node from a linked list. With open addressing, deletion threatens to break the very probe sequences that make lookups work, requiring the introduction of tombstones that introduce their own degradation patterns.

This page provides an exhaustive examination of deletion across both collision strategies, analyzing the trade-offs, performance implications, and engineering decisions that make deletion a hidden source of complexity in hash table implementations.

What You Will Learn

By the end of this page, you will understand: why deletion is fundamentally different between chaining and open addressing; the tombstone mechanism and why it's necessary; how tombstone accumulation degrades performance; strategies for managing tombstone overhead; and when deletion patterns should influence hash table choice.

Deletion with Chaining: The Simple Case

With separate chaining, deletion is conceptually straightforward: find the key in its bucket's linked list, remove the node. No special considerations are needed—removing a node from a linked list doesn't affect any other element's accessibility.

The Deletion Algorithm (Chaining):

Compute the hash: Apply the hash function to the key
Compute the index: Reduce the hash to a valid bucket index
Access the bucket: Navigate to the bucket's linked list
Search for the key: Traverse the list comparing keys
Remove if found: Unlink the node from the list, optionally free memory
Update size: Decrement the element count

Every step is O(1) except step 4, which is O(k) where k is the bucket's chain length. Under normal conditions with good hash distribution, k averages to the load factor α, giving O(1) average-case deletion.

chaining_deletion.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
from typing import Optional, List, Tuple, TypeVar
 
K = TypeVar('K')
V = TypeVar('V')
 
class Node:
    """Linked list node for chaining."""
    def __init__(self, key: K, value: V):
        self.key = key
        self.value = value
        self.next: Optional['Node'] = None
 
class ChainingHashTable:
    """
    Hash table with separate chaining demonstrating straightforward deletion.
    """
    
    def __init__(self, capacity: int = 16):
        self.capacity = capacity
        self.buckets: List[Optional[Node]] = [None] * capacity
        self.size = 0
        
        # Deletion tracking
        self.deletion_comparisons = 0
    
    def _hash(self, key: K) -> int:
        return hash(key)
    
    def _bucket_index(self, key: K) -> int:
        return self._hash(key) % self.capacity
    
    def insert(self, key: K, value: V):
        """Insert a key-value pair."""
        idx = self._bucket_index(key)
        
        # Check if key exists
        current = self.buckets[idx]
        while current:
            if current.key == key:
                current.value = value  # Update
                return
            current = current.next
        
        # Insert at head of chain
        new_node = Node(key, value)
        new_node.next = self.buckets[idx]
        self.buckets[idx] = new_node
        self.size += 1
    
    def delete(self, key: K) -> Tuple[bool, int]:
        """
        Delete a key from the hash table.
        
        Returns:
            - success: True if key was found and deleted
            - comparisons: Number of key comparisons performed
        
        Time Complexity:
            - Average case: O(1) when load factor is bounded
            - Worst case: O(n) when all elements in one bucket
        """
        idx = self._bucket_index(key)
        comparisons = 0
        
        current = self.buckets[idx]
        previous = None
        
        while current:
            comparisons += 1
            self.deletion_comparisons += 1
            
            if current.key == key:
                # Found the key - remove the node
                if previous is None:
                    # Removing head of chain
                    self.buckets[idx] = current.next
                else:
                    # Removing middle/end of chain
                    previous.next = current.next
                
                self.size -= 1
                return (True, comparisons)
            
            previous = current
            current = current.next
        
        # Key not found
        return (False, comparisons)
    
    def get_chain_lengths(self) -> List[int]:
        """Get the length of each bucket's chain."""
        lengths = []
        for bucket in self.buckets:
            length = 0
            current = bucket
            while current:
                length += 1
                current = current.next
            lengths.append(length)
        return lengths
 
 
def demonstrate_chaining_deletion():
    """Show chaining deletion behavior."""
    
    print("Chaining Deletion Demonstration")
    print("=" * 60)
    
    import random
    
    # Create table with known capacity
    table = ChainingHashTable(capacity=100)
    
    # Insert elements
    keys = [f"key_{i}" for i in range(500)]
    for key in keys:
        table.insert(key, f"value_for_{key}")
    
    print(f"Inserted {table.size} elements into {table.capacity} buckets")
    print(f"Load factor: {table.size / table.capacity:.2f}")
    
    chain_lengths = table.get_chain_lengths()
    print(f"Avg chain length: {sum(chain_lengths)/len(chain_lengths):.2f}")
    print(f"Max chain length: {max(chain_lengths)}")
    print()
    
    # Delete random subset
    to_delete = random.sample(keys, 100)
    
    table.deletion_comparisons = 0
    successful_deletions = 0
    
    for key in to_delete:
        success, comps = table.delete(key)
        if success:
            successful_deletions += 1
    
    print(f"Deletion results:")
    print(f"  Attempted: {len(to_delete)}")
    print(f"  Successful: {successful_deletions}")
    print(f"  Total comparisons: {table.deletion_comparisons}")
    print(f"  Avg comparisons: {table.deletion_comparisons / len(to_delete):.2f}")
    print()
    
    # Try deleting already-deleted keys (should fail fast)
    table.deletion_comparisons = 0
    
    for key in to_delete[:20]:
        success, comps = table.delete(key)
        # These will fail since we already deleted them
    
    print(f"Re-deleting (already deleted keys):")
    print(f"  Total comparisons: {table.deletion_comparisons}")
    print(f"  Note: Empty buckets have 0 comparisons, non-empty must scan to confirm absence")
 
demonstrate_chaining_deletion()

Why Chaining Deletion Is Clean

With chaining, deleting an element has no side effects on other elements. Each key's probe 'path' is simply the linked list in its bucket—removing one node doesn't affect how other nodes are found. This isolation is chaining's major advantage for workloads with frequent deletions.

The Open Addressing Problem: Why Deletion Is Hard

With open addressing, deletion introduces a fundamental challenge that doesn't exist with chaining: removing an element can break probe sequences for other elements.

Consider this scenario with linear probing:

Insert key A → hashes to slot 5, stored there
Insert key B → hashes to slot 5, collision, probe to slot 6, stored there
Insert key C → hashes to slot 5, collision, probe to 6, collision, probe to 7, stored there

Now the probe chain for C is: 5 → 6 → 7. If we naively delete B (slot 6) by marking it empty:

Lookup key C → hashes to slot 5, not there, probe to slot 6, empty! → conclude C doesn't exist

The deletion broke C's lookup. The empty slot created a false termination point in C's probe sequence. This is the core problem that tombstones solve.

deletion_problem.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
class BrokenOpenAddressing:
    """
    Demonstrates how naive deletion breaks lookups in open addressing.
    THIS IS INTENTIONALLY BROKEN to show the problem.
    """
    
    EMPTY = None
    
    def __init__(self, capacity=10):
        self.capacity = capacity
        self.slots = [self.EMPTY] * capacity
    
    def _hash(self, key) -> int:
        # For demonstration, use a simple hash that causes predictable collisions
        if isinstance(key, str) and key.startswith("collision_"):
            return 5  # All "collision_*" keys hash to slot 5
        return hash(key) % self.capacity
    
    def insert(self, key, value):
        """Insert using linear probing."""
        start_idx = self._hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start_idx + offset) % self.capacity
            
            if self.slots[idx] is self.EMPTY:
                self.slots[idx] = (key, value)
                return idx  # Return where we inserted
            
            if self.slots[idx][0] == key:
                self.slots[idx] = (key, value)
                return idx
        
        raise RuntimeError("Table full")
    
    def naive_delete(self, key) -> bool:
        """
        BROKEN: Naive deletion that just empties the slot.
        This will break lookups for keys that probed past this slot.
        """
        start_idx = self._hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start_idx + offset) % self.capacity
            
            if self.slots[idx] is self.EMPTY:
                return False  # Not found
            
            if self.slots[idx][0] == key:
                self.slots[idx] = self.EMPTY  # PROBLEM: Creates hole
                return True
        
        return False
    
    def lookup(self, key):
        """Lookup that stops at empty slots."""
        start_idx = self._hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start_idx + offset) % self.capacity
            
            if self.slots[idx] is self.EMPTY:
                return (False, None)  # Stops at empty
            
            if self.slots[idx][0] == key:
                return (True, self.slots[idx][1])
        
        return (False, None)
    
    def __str__(self):
        result = []
        for i, slot in enumerate(self.slots):
            if slot is self.EMPTY:
                result.append(f"[{i}]: empty")
            else:
                result.append(f"[{i}]: {slot[0]}")
        return "\n".join(result)
 
 
def demonstrate_broken_deletion():
    """Show how naive deletion breaks lookups."""
    
    print("Demonstrating Why Naive Deletion Breaks Open Addressing")
    print("=" * 60)
    
    table = BrokenOpenAddressing(capacity=10)
    
    # Insert three keys that all hash to slot 5
    keys = ["collision_A", "collision_B", "collision_C"]
    
    print("Step 1: Insert three keys that all hash to slot 5")
    print("-" * 40)
    for key in keys:
        slot = table.insert(key, f"value_{key}")
        print(f"  Inserted '{key}' at slot {slot}")
    
    print()
    print("Table state after insertions:")
    print(table)
    print()
    
    # Verify all keys can be found
    print("Step 2: Verify all keys can be found")
    print("-" * 40)
    for key in keys:
        found, value = table.lookup(key)
        print(f"  Lookup '{key}': found={found}")
    
    print()
    
    # Now naively delete the middle key
    print("Step 3: Naively delete 'collision_B' (middle of chain)")
    print("-" * 40)
    table.naive_delete("collision_B")
    print("  Deleted!")
    print()
    print("Table state after deletion:")
    print(table)
    print()
    
    # Try to find collision_C - IT WILL FAIL
    print("Step 4: Try to lookup 'collision_C'")
    print("-" * 40)
    found, value = table.lookup("collision_C")
    print(f"  Lookup 'collision_C': found={found}")
    print()
    print("  PROBLEM: collision_C IS in the table (slot 7), but lookup")
    print("  stopped at slot 6 (now empty) and concluded it's absent!")
    print()
    print("  This is why we need TOMBSTONES instead of true deletion.")
 
demonstrate_broken_deletion()

Critical Insight

The fundamental issue is that open addressing uses the absence of data (empty slots) as a termination signal for probe sequences. Deleting an element by creating an empty slot introduces false termination points that break lookups for elements that were displaced past the deleted slot. This makes naive deletion impossible—we must use a different mechanism.

Tombstones: Marking Without Breaking

The standard solution to open addressing deletion is the tombstone (also called a deleted marker or lazy deletion). Instead of truly emptying a slot, we mark it with a special value that says:

"This slot was occupied but is now deleted. Keep probing past me during lookups—I'm not a terminus. But you can reuse me during insertions—I'm available."

Tombstones maintain the integrity of probe sequences while allowing slot reuse. The distinction between three slot states becomes:

EMPTY: Never occupied. Terminates probe sequences.
OCCUPIED: Contains a key-value pair. Continue probing if not a match.
DELETED (Tombstone): Was occupied, now deleted. Continue probing for lookups; can be reused for insertions.

tombstone_implementation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
from typing import Optional, Tuple
from enum import Enum
from dataclasses import dataclass
 
class SlotState(Enum):
    EMPTY = "EMPTY"       # Never used
    OCCUPIED = "OCCUPIED" # Contains data
    DELETED = "DELETED"   # Tombstone
 
@dataclass
class Slot:
    state: SlotState
    key: Optional[str] = None
    value: Optional[str] = None
 
class TombstoneHashTable:
    """
    Open addressing hash table with proper tombstone-based deletion.
    """
    
    def __init__(self, capacity: int = 16):
        self.capacity = capacity
        self.slots = [Slot(SlotState.EMPTY) for _ in range(capacity)]
        self.size = 0           # Active elements
        self.tombstone_count = 0  # Deleted markers
    
    def _hash(self, key) -> int:
        return hash(key)
    
    def _probe_sequence(self, key):
        """Generate probe indices using linear probing."""
        start = self._hash(key) % self.capacity
        for offset in range(self.capacity):
            yield (start + offset) % self.capacity
    
    def insert(self, key, value) -> Tuple[int, int]:
        """
        Insert with tombstone reuse.
        Returns (slot_index, probes).
        """
        probes = 0
        tombstone_slot = None  # Remember first tombstone for reuse
        
        for idx in self._probe_sequence(key):
            probes += 1
            slot = self.slots[idx]
            
            if slot.state == SlotState.EMPTY:
                # Truly empty - use this slot (or earlier tombstone)
                target_idx = tombstone_slot if tombstone_slot is not None else idx
                
                if tombstone_slot is not None:
                    self.tombstone_count -= 1  # Reusing tombstone
                
                self.slots[target_idx] = Slot(SlotState.OCCUPIED, key, value)
                self.size += 1
                return (target_idx, probes)
            
            if slot.state == SlotState.DELETED:
                # Remember first tombstone for potential reuse
                if tombstone_slot is None:
                    tombstone_slot = idx
                continue  # Keep probing - key might exist later
            
            if slot.key == key:
                # Key exists - update value
                self.slots[idx] = Slot(SlotState.OCCUPIED, key, value)
                return (idx, probes)
        
        raise RuntimeError("Table is full")
    
    def delete(self, key) -> Tuple[bool, int]:
        """
        Delete using tombstone.
        
        Returns:
            - success: True if key was found and deleted
            - probes: Number of slots examined
        
        Time Complexity:
            - Average: O(1) with good hash distribution
            - Worst: O(n) when heavily clustered
        """
        probes = 0
        
        for idx in self._probe_sequence(key):
            probes += 1
            slot = self.slots[idx]
            
            if slot.state == SlotState.EMPTY:
                # Never occupied - key definitely absent
                return (False, probes)
            
            if slot.state == SlotState.DELETED:
                # Was occupied - keep probing
                continue
            
            if slot.key == key:
                # Found - replace with tombstone
                self.slots[idx] = Slot(SlotState.DELETED)
                self.size -= 1
                self.tombstone_count += 1
                return (True, probes)
        
        return (False, probes)
    
    def lookup(self, key) -> Tuple[bool, Optional[str], int]:
        """
        Lookup that properly handles tombstones.
        
        Returns: (found, value, probes)
        """
        probes = 0
        
        for idx in self._probe_sequence(key):
            probes += 1
            slot = self.slots[idx]
            
            if slot.state == SlotState.EMPTY:
                # Never occupied - key absent
                return (False, None, probes)
            
            if slot.state == SlotState.DELETED:
                # Was occupied - keep probing
                continue
            
            if slot.key == key:
                return (True, slot.value, probes)
        
        return (False, None, probes)
    
    def effective_load_factor(self) -> float:
        """Load factor including tombstones (affects probing cost)."""
        return (self.size + self.tombstone_count) / self.capacity
    
    def true_load_factor(self) -> float:
        """Load factor counting only live elements."""
        return self.size / self.capacity
    
    def __str__(self):
        lines = []
        for i, slot in enumerate(self.slots):
            if slot.state == SlotState.EMPTY:
                lines.append(f"[{i}]: EMPTY")
            elif slot.state == SlotState.DELETED:
                lines.append(f"[{i}]: DELETED (tombstone)")
            else:
                lines.append(f"[{i}]: {slot.key}")
        return "\n".join(lines)
 
 
def demonstrate_tombstones():
    """Show correct tombstone behavior."""
    
    print("Tombstone-Based Deletion Demonstration")
    print("=" * 60)
    
    table = TombstoneHashTable(capacity=10)
    
    # Use a hash function that creates collisions for demo
    original_hash = table._hash
    def predictable_hash(key):
        if isinstance(key, str) and key.startswith("collision_"):
            return 5  # Force all collision_* keys to hash to slot 5
        return original_hash(key)
    table._hash = predictable_hash
    
    # Insert three colliding keys
    keys = ["collision_A", "collision_B", "collision_C"]
    
    print("Step 1: Insert three keys that all hash to slot 5")
    print("-" * 40)
    for key in keys:
        slot, probes = table.insert(key, f"value_{key}")
        print(f"  Inserted '{key}' at slot {slot} ({probes} probes)")
    
    print()
    print("Table state after insertions:")
    print(table)
    print()
    
    # Delete the middle key
    print("Step 2: Delete 'collision_B' using tombstone")
    print("-" * 40)
    success, probes = table.delete("collision_B")
    print(f"  Deleted: {success} ({probes} probes)")
    print(f"  Tombstones now: {table.tombstone_count}")
    print()
    print("Table state after deletion:")
    print(table)
    print()
    
    # Verify collision_C is still findable
    print("Step 3: Verify 'collision_C' is still accessible")
    print("-" * 40)
    found, value, probes = table.lookup("collision_C")
    print(f"  Lookup 'collision_C': found={found}, probes={probes}")
    print("  SUCCESS: Tombstone allowed probing to continue past deleted slot!")
    print()
    
    # Insert reuses tombstone
    print("Step 4: Insert 'collision_D' - should reuse tombstone")
    print("-" * 40)
    slot, probes = table.insert("collision_D", "value_D")
    print(f"  Inserted 'collision_D' at slot {slot} ({probes} probes)")
    print(f"  Tombstones now: {table.tombstone_count}")
    print()
    print("Table state after insertion:")
    print(table)
 
demonstrate_tombstones()

The Key Distinction

EMPTY means 'never occupied'—safe to stop probing. DELETED means 'was occupied'—must keep probing. This distinction is what makes tombstones work. Insertions can treat DELETED as available (since we can overwrite), but lookups must treat it as 'occupied by a ghost' and continue searching.

Tombstone Accumulation: The Hidden Performance Drain

Tombstones solve the correctness problem but introduce their own performance challenge: tombstone accumulation. In workloads with many deletions, tombstones build up, increasing probe lengths even when the live element count stays low.

The Problem:

Consider a table with capacity 1000:

Insert 750 elements (load factor 0.75)
Delete 500 elements (500 tombstones)
Live load factor: 0.25 (250 elements)
Effective load factor: 0.75 (750 slots non-empty)

Even though only 250 elements exist, probing must traverse tombstones as if they were occupied. The 'effective' load—what matters for probe length—includes both live elements and tombstones.

This creates a paradox: a table can appear lightly loaded while performing like a nearly-full table.

tombstone_accumulation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
import random
 
def demonstrate_tombstone_impact():
    """
    Show how tombstone accumulation degrades performance
    even when live element count is low.
    """
    
    class MeasuredHashTable:
        EMPTY = "E"
        DELETED = "D"
        
        def __init__(self, capacity):
            self.capacity = capacity
            self.slots = [self.EMPTY] * capacity
            self.size = 0
            self.tombstones = 0
            self.total_probes = 0
            self.operations = 0
        
        def insert(self, key, value):
            self.operations += 1
            probes = 0
            start = hash(key) % self.capacity
            first_tombstone = None
            
            for offset in range(self.capacity):
                idx = (start + offset) % self.capacity
                probes += 1
                
                slot = self.slots[idx]
                
                if slot == self.EMPTY:
                    target = first_tombstone if first_tombstone is not None else idx
                    if first_tombstone is not None:
                        self.tombstones -= 1
                    self.slots[target] = (key, value)
                    self.size += 1
                    self.total_probes += probes
                    return probes
                
                if slot == self.DELETED:
                    if first_tombstone is None:
                        first_tombstone = idx
                    continue
                
                if slot[0] == key:
                    self.slots[idx] = (key, value)
                    self.total_probes += probes
                    return probes
            
            raise RuntimeError("Full")
        
        def delete(self, key):
            self.operations += 1
            probes = 0
            start = hash(key) % self.capacity
            
            for offset in range(self.capacity):
                idx = (start + offset) % self.capacity
                probes += 1
                
                slot = self.slots[idx]
                
                if slot == self.EMPTY:
                    self.total_probes += probes
                    return False
                
                if slot == self.DELETED:
                    continue
                
                if slot[0] == key:
                    self.slots[idx] = self.DELETED
                    self.size -= 1
                    self.tombstones += 1
                    self.total_probes += probes
                    return True
            
            self.total_probes += probes
            return False
        
        def lookup(self, key):
            self.operations += 1
            probes = 0
            start = hash(key) % self.capacity
            
            for offset in range(self.capacity):
                idx = (start + offset) % self.capacity
                probes += 1
                
                slot = self.slots[idx]
                
                if slot == self.EMPTY:
                    self.total_probes += probes
                    return (False, probes)
                
                if slot == self.DELETED:
                    continue
                
                if slot[0] == key:
                    self.total_probes += probes
                    return (True, probes)
            
            self.total_probes += probes
            return (False, probes)
        
        def avg_probes(self):
            return self.total_probes / self.operations if self.operations > 0 else 0
    
    print("Tombstone Accumulation Impact Analysis")
    print("=" * 70)
    print()
    
    capacity = 1000
    
    # Scenario 1: Fresh table with 250 elements (25% load)
    print("Scenario 1: Fresh table with 250 elements")
    print("-" * 50)
    table1 = MeasuredHashTable(capacity)
    
    keys = [f"key_{i}" for i in range(250)]
    for key in keys:
        table1.insert(key, "value")
    
    table1.total_probes = 0
    table1.operations = 0
    
    for key in keys:
        table1.lookup(key)
    
    print(f"  Live elements: {table1.size}")
    print(f"  Tombstones: {table1.tombstones}")
    print(f"  True load: {table1.size / capacity:.0%}")
    print(f"  Effective load: {(table1.size + table1.tombstones) / capacity:.0%}")
    print(f"  Avg probes per lookup: {table1.avg_probes():.2f}")
    print()
    
    # Scenario 2: Table with 750 inserts, 500 deletes = 250 live + 500 tombstones
    print("Scenario 2: Same 250 elements, but 500 tombstones from deletions")
    print("-" * 50)
    table2 = MeasuredHashTable(capacity)
    
    all_keys = [f"key_{i}" for i in range(750)]
    for key in all_keys:
        table2.insert(key, "value")
    
    # Delete 500 of them
    to_delete = random.sample(all_keys, 500)
    for key in to_delete:
        table2.delete(key)
    
    remaining_keys = [k for k in all_keys if k not in to_delete]
    
    table2.total_probes = 0
    table2.operations = 0
    
    for key in remaining_keys:
        table2.lookup(key)
    
    print(f"  Live elements: {table2.size}")
    print(f"  Tombstones: {table2.tombstones}")
    print(f"  True load: {table2.size / capacity:.0%}")
    print(f"  Effective load: {(table2.size + table2.tombstones) / capacity:.0%}")
    print(f"  Avg probes per lookup: {table2.avg_probes():.2f}")
    print()
    
    print("Comparison:")
    print(f"  Both have ~250 live elements")
    print(f"  Tombstone-laden table requires {table2.avg_probes() / table1.avg_probes():.1f}x more probes!")
    print()
    print("Conclusion: Tombstones count toward probing cost even though")
    print("they hold no data. Periodic cleanup is essential.")
 
demonstrate_tombstone_impact()

Impact of Tombstone Accumulation
True Load	Tombstones	Effective Load	Expected Probe Increase
25%	0%	25%	1.0x (baseline)
25%	25%	50%	~1.5x
25%	50%	75%	~3.0x
25%	65%	90%	~6.0x
10%	80%	90%	~6.0x (extremely degraded)

Workload Consideration

For workloads with high deletion rates—caches with TTL eviction, session stores, temporary data—tombstone accumulation is a real threat. Either implement periodic compaction, track tombstone ratio and trigger rebuild, or consider chaining instead.

Tombstone Management Strategies

To prevent tombstone-induced degradation, implementations employ various management strategies. The choice depends on workload characteristics and performance requirements.

Tombstone Management Approaches

•Threshold-Based Rebuild — Track tombstone count. When tombstones exceed a threshold (e.g., 25% of capacity or exceed live count), rebuild the entire table without tombstones. Cost: O(n) but eliminates all tombstones at once.
•Lazy Cleanup During Insert — When inserting into a tombstone slot, compact the table locally by shifting elements. More complex but avoids discrete rebuild events.
•Periodic Background Compaction — Separate thread/process periodically rebuilds the table. Good for systems that can tolerate brief inconsistency or have locking mechanisms.
•Tombstone Reuse Priority — During insertion, prefer reusing tombstones over extending probe chains. Reduces new tombstone creation but doesn't eliminate existing ones.
•Robin Hood Hashing — A variant of open addressing that maintains fairness by swapping elements. Can reduce probe variance but has its own complexity trade-offs.
•Hopscotch Hashing — Bounds the maximum displacement of any element, providing more predictable performance at the cost of more complex insert logic.

tombstone_management.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
class ManagedHashTable:
    """
    Hash table with automatic tombstone management through rebuilding.
    """
    
    EMPTY = "EMPTY"
    DELETED = "DELETED"
    
    def __init__(self, capacity=16, tombstone_threshold=0.25):
        self.capacity = capacity
        self.slots = [self.EMPTY] * capacity
        self.size = 0
        self.tombstone_count = 0
        self.tombstone_threshold = tombstone_threshold  # Max tombstone ratio
        self.rebuild_count = 0
    
    def _needs_rebuild(self) -> bool:
        """Determine if tombstone cleanup is needed."""
        if self.capacity == 0:
            return False
        tombstone_ratio = self.tombstone_count / self.capacity
        return tombstone_ratio > self.tombstone_threshold
    
    def _rebuild(self):
        """
        Rebuild table without tombstones.
        Time: O(n) where n is current size.
        """
        self.rebuild_count += 1
        old_slots = self.slots
        
        # Potentially resize based on current size
        # Here we keep same capacity, just remove tombstones
        self.slots = [self.EMPTY] * self.capacity
        self.size = 0
        self.tombstone_count = 0
        
        for slot in old_slots:
            if slot not in (self.EMPTY, self.DELETED):
                key, value = slot
                self._insert_no_rebuild(key, value)
    
    def _insert_no_rebuild(self, key, value):
        """Insert without triggering rebuild (used during rebuild)."""
        start = hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start + offset) % self.capacity
            slot = self.slots[idx]
            
            if slot in (self.EMPTY, self.DELETED):
                self.slots[idx] = (key, value)
                self.size += 1
                return
            
            if slot[0] == key:
                self.slots[idx] = (key, value)
                return
    
    def insert(self, key, value):
        """Insert with automatic tombstone management."""
        # Check load factor for resize (separate from tombstone cleanup)
        if self.size / self.capacity > 0.7:
            self._resize()
        
        self._insert_no_rebuild(key, value)
    
    def delete(self, key) -> bool:
        """Delete with automatic tombstone cleanup."""
        start = hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start + offset) % self.capacity
            slot = self.slots[idx]
            
            if slot == self.EMPTY:
                return False
            
            if slot == self.DELETED:
                continue
            
            if slot[0] == key:
                self.slots[idx] = self.DELETED
                self.size -= 1
                self.tombstone_count += 1
                
                # Check if cleanup needed
                if self._needs_rebuild():
                    self._rebuild()
                
                return True
        
        return False
    
    def _resize(self):
        """Double capacity (implicitly removes tombstones)."""
        old_slots = self.slots
        self.capacity *= 2
        self.slots = [self.EMPTY] * self.capacity
        self.size = 0
        self.tombstone_count = 0
        
        for slot in old_slots:
            if slot not in (self.EMPTY, self.DELETED):
                self._insert_no_rebuild(slot[0], slot[1])
    
    def stats(self) -> dict:
        return {
            "size": self.size,
            "capacity": self.capacity,
            "tombstones": self.tombstone_count,
            "rebuilds": self.rebuild_count,
            "true_load": f"{self.size / self.capacity:.1%}",
            "effective_load": f"{(self.size + self.tombstone_count) / self.capacity:.1%}",
        }
 
 
def demonstrate_managed_table():
    """Show automatic tombstone management."""
    import random
    
    print("Automatic Tombstone Management Demonstration")
    print("=" * 60)
    
    table = ManagedHashTable(capacity=100, tombstone_threshold=0.25)
    
    # Simulate heavy deletion workload
    print("Simulating workload: insert 50, delete 40, repeat 10 times")
    print("-" * 50)
    
    all_keys = []
    
    for round in range(10):
        # Insert 50
        new_keys = [f"key_{round}_{i}" for i in range(50)]
        for key in new_keys:
            table.insert(key, "value")
        all_keys.extend(new_keys)
        
        # Delete 40 random existing keys
        to_delete = random.sample(all_keys, min(40, len(all_keys)))
        for key in to_delete:
            table.delete(key)
        all_keys = [k for k in all_keys if k not in to_delete]
        
        stats = table.stats()
        print(f"Round {round + 1}: {stats}")
    
    print()
    print(f"Final state: {table.stats()}")
    print()
    print(f"Table was rebuilt {table.rebuild_count} times to manage tombstones")
    print(f"Without management, tombstones would have accumulated to ~400")
 
demonstrate_managed_table()

Deletion Complexity: The Complete Picture

Let's consolidate the complexity analysis for deletion across both collision strategies:

Deletion Complexity Comparison
Aspect	Chaining	Open Addressing (Tombstones)
Average Case	O(1 + α)	O(1/(1-α))
Worst Case	O(n)	O(n)
Space Overhead	Pointer per node	Tombstone slots
Effect on Other Keys	None	None (with tombstones)
Cleanup Needed	No	Yes (tombstone accumulation)
Implementation Complexity	Simple	Moderate (3 slot states)
Cache Performance	Poor (pointer chasing)	Good (contiguous)
High Deletion Rate	Handles well	Degrades without cleanup

Choose Chaining When

•High deletion rate expected
•Deletion correctness is paramount
•Memory overhead is acceptable
•Implementation simplicity preferred
•Load factor may exceed 1.0

Choose Open Addressing When

•Deletions are rare
•Memory efficiency critical
•Cache performance matters
•Can implement tombstone management
•Load stays moderate (< 0.7)

Summary: The Full Deletion Story

Deletion is where hash table implementations diverge most significantly. Let's consolidate the essential insights:

Key Takeaways

•Chaining deletion is straightforward — Remove a linked list node. No side effects on other elements. O(1) average, O(n) worst case with clean semantics.
•Open addressing deletion is problematic — Naive emptying breaks probe sequences. Elements that probed past the deleted slot become unreachable.
•Tombstones solve correctness but create overhead — Marking slots as 'deleted but keep probing' preserves correctness. But tombstones accumulate and degrade performance.
•Effective load includes tombstones — Even low live-element counts can exhibit high-load behavior if tombstones are numerous. Probe lengths reflect total non-empty slots.
•Tombstone management is essential — Periodic rebuilding, threshold-based cleanup, or alternative hashing schemes (Robin Hood, Hopscotch) prevent accumulation-induced degradation.
•Workload should inform strategy choice — High-deletion workloads favor chaining. Low-deletion, cache-sensitive workloads favor open addressing with tombstone management.

What's Next:

We've examined insert, lookup, and delete individually. The final piece of the puzzle is understanding when worst-case behavior actually matters in practice. The next page explores real-world scenarios where O(n) worst case transitions from theoretical concern to production crisis—and how to prevent it.

Page Complete

You now have a thorough understanding of hash table deletion. You can explain why open addressing requires tombstones, articulate the tombstone accumulation problem, describe management strategies, and make informed decisions about collision resolution based on deletion patterns. Next, we'll explore when worst-case complexity truly matters in practice.

3 / 4

Loading learning content...

Data Structures & AlgorithmsHash Table Operations & Amortized Cost Model

Hash Table Operations & Amortized Cost Model

LevelIntermediate

Duration60 mins

TopicHash Table Operations & Amortized Cost Model

3 / 4

Hash Table Delete Operation — O(1) Average, O(n) Worst Case

Deletion: The Overlooked Complexity

What You Will Learn

Deletion with Chaining: The Simple Case

The Deletion Algorithm (Chaining):

Compute the hash: Apply the hash function to the key
Compute the index: Reduce the hash to a valid bucket index
Access the bucket: Navigate to the bucket's linked list
Search for the key: Traverse the list comparing keys
Remove if found: Unlink the node from the list, optionally free memory
Update size: Decrement the element count

chaining_deletion.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
from typing import Optional, List, Tuple, TypeVar
 
K = TypeVar('K')
V = TypeVar('V')
 
class Node:
    """Linked list node for chaining."""
    def __init__(self, key: K, value: V):
        self.key = key
        self.value = value
        self.next: Optional['Node'] = None
 
class ChainingHashTable:
    """
    Hash table with separate chaining demonstrating straightforward deletion.
    """
    
    def __init__(self, capacity: int = 16):
        self.capacity = capacity
        self.buckets: List[Optional[Node]] = [None] * capacity
        self.size = 0
        
        # Deletion tracking
        self.deletion_comparisons = 0
    
    def _hash(self, key: K) -> int:
        return hash(key)
    
    def _bucket_index(self, key: K) -> int:
        return self._hash(key) % self.capacity
    
    def insert(self, key: K, value: V):
        """Insert a key-value pair."""
        idx = self._bucket_index(key)
        
        # Check if key exists
        current = self.buckets[idx]
        while current:
            if current.key == key:
                current.value = value  # Update
                return
            current = current.next
        
        # Insert at head of chain
        new_node = Node(key, value)
        new_node.next = self.buckets[idx]
        self.buckets[idx] = new_node
        self.size += 1
    
    def delete(self, key: K) -> Tuple[bool, int]:
        """
        Delete a key from the hash table.
        
        Returns:
            - success: True if key was found and deleted
            - comparisons: Number of key comparisons performed
        
        Time Complexity:
            - Average case: O(1) when load factor is bounded
            - Worst case: O(n) when all elements in one bucket
        """
        idx = self._bucket_index(key)
        comparisons = 0
        
        current = self.buckets[idx]
        previous = None
        
        while current:
            comparisons += 1
            self.deletion_comparisons += 1
            
            if current.key == key:
                # Found the key - remove the node
                if previous is None:
                    # Removing head of chain
                    self.buckets[idx] = current.next
                else:
                    # Removing middle/end of chain
                    previous.next = current.next
                
                self.size -= 1
                return (True, comparisons)
            
            previous = current
            current = current.next
        
        # Key not found
        return (False, comparisons)
    
    def get_chain_lengths(self) -> List[int]:
        """Get the length of each bucket's chain."""
        lengths = []
        for bucket in self.buckets:
            length = 0
            current = bucket
            while current:
                length += 1
                current = current.next
            lengths.append(length)
        return lengths
 
 
def demonstrate_chaining_deletion():
    """Show chaining deletion behavior."""
    
    print("Chaining Deletion Demonstration")
    print("=" * 60)
    
    import random
    
    # Create table with known capacity
    table = ChainingHashTable(capacity=100)
    
    # Insert elements
    keys = [f"key_{i}" for i in range(500)]
    for key in keys:
        table.insert(key, f"value_for_{key}")
    
    print(f"Inserted {table.size} elements into {table.capacity} buckets")
    print(f"Load factor: {table.size / table.capacity:.2f}")
    
    chain_lengths = table.get_chain_lengths()
    print(f"Avg chain length: {sum(chain_lengths)/len(chain_lengths):.2f}")
    print(f"Max chain length: {max(chain_lengths)}")
    print()
    
    # Delete random subset
    to_delete = random.sample(keys, 100)
    
    table.deletion_comparisons = 0
    successful_deletions = 0
    
    for key in to_delete:
        success, comps = table.delete(key)
        if success:
            successful_deletions += 1
    
    print(f"Deletion results:")
    print(f"  Attempted: {len(to_delete)}")
    print(f"  Successful: {successful_deletions}")
    print(f"  Total comparisons: {table.deletion_comparisons}")
    print(f"  Avg comparisons: {table.deletion_comparisons / len(to_delete):.2f}")
    print()
    
    # Try deleting already-deleted keys (should fail fast)
    table.deletion_comparisons = 0
    
    for key in to_delete[:20]:
        success, comps = table.delete(key)
        # These will fail since we already deleted them
    
    print(f"Re-deleting (already deleted keys):")
    print(f"  Total comparisons: {table.deletion_comparisons}")
    print(f"  Note: Empty buckets have 0 comparisons, non-empty must scan to confirm absence")
 
demonstrate_chaining_deletion()

Why Chaining Deletion Is Clean

The Open Addressing Problem: Why Deletion Is Hard

With open addressing, deletion introduces a fundamental challenge that doesn't exist with chaining: removing an element can break probe sequences for other elements.

Consider this scenario with linear probing:

Insert key A → hashes to slot 5, stored there
Insert key B → hashes to slot 5, collision, probe to slot 6, stored there
Insert key C → hashes to slot 5, collision, probe to 6, collision, probe to 7, stored there

Now the probe chain for C is: 5 → 6 → 7. If we naively delete B (slot 6) by marking it empty:

Lookup key C → hashes to slot 5, not there, probe to slot 6, empty! → conclude C doesn't exist

The deletion broke C's lookup. The empty slot created a false termination point in C's probe sequence. This is the core problem that tombstones solve.

deletion_problem.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
class BrokenOpenAddressing:
    """
    Demonstrates how naive deletion breaks lookups in open addressing.
    THIS IS INTENTIONALLY BROKEN to show the problem.
    """
    
    EMPTY = None
    
    def __init__(self, capacity=10):
        self.capacity = capacity
        self.slots = [self.EMPTY] * capacity
    
    def _hash(self, key) -> int:
        # For demonstration, use a simple hash that causes predictable collisions
        if isinstance(key, str) and key.startswith("collision_"):
            return 5  # All "collision_*" keys hash to slot 5
        return hash(key) % self.capacity
    
    def insert(self, key, value):
        """Insert using linear probing."""
        start_idx = self._hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start_idx + offset) % self.capacity
            
            if self.slots[idx] is self.EMPTY:
                self.slots[idx] = (key, value)
                return idx  # Return where we inserted
            
            if self.slots[idx][0] == key:
                self.slots[idx] = (key, value)
                return idx
        
        raise RuntimeError("Table full")
    
    def naive_delete(self, key) -> bool:
        """
        BROKEN: Naive deletion that just empties the slot.
        This will break lookups for keys that probed past this slot.
        """
        start_idx = self._hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start_idx + offset) % self.capacity
            
            if self.slots[idx] is self.EMPTY:
                return False  # Not found
            
            if self.slots[idx][0] == key:
                self.slots[idx] = self.EMPTY  # PROBLEM: Creates hole
                return True
        
        return False
    
    def lookup(self, key):
        """Lookup that stops at empty slots."""
        start_idx = self._hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start_idx + offset) % self.capacity
            
            if self.slots[idx] is self.EMPTY:
                return (False, None)  # Stops at empty
            
            if self.slots[idx][0] == key:
                return (True, self.slots[idx][1])
        
        return (False, None)
    
    def __str__(self):
        result = []
        for i, slot in enumerate(self.slots):
            if slot is self.EMPTY:
                result.append(f"[{i}]: empty")
            else:
                result.append(f"[{i}]: {slot[0]}")
        return "\n".join(result)
 
 
def demonstrate_broken_deletion():
    """Show how naive deletion breaks lookups."""
    
    print("Demonstrating Why Naive Deletion Breaks Open Addressing")
    print("=" * 60)
    
    table = BrokenOpenAddressing(capacity=10)
    
    # Insert three keys that all hash to slot 5
    keys = ["collision_A", "collision_B", "collision_C"]
    
    print("Step 1: Insert three keys that all hash to slot 5")
    print("-" * 40)
    for key in keys:
        slot = table.insert(key, f"value_{key}")
        print(f"  Inserted '{key}' at slot {slot}")
    
    print()
    print("Table state after insertions:")
    print(table)
    print()
    
    # Verify all keys can be found
    print("Step 2: Verify all keys can be found")
    print("-" * 40)
    for key in keys:
        found, value = table.lookup(key)
        print(f"  Lookup '{key}': found={found}")
    
    print()
    
    # Now naively delete the middle key
    print("Step 3: Naively delete 'collision_B' (middle of chain)")
    print("-" * 40)
    table.naive_delete("collision_B")
    print("  Deleted!")
    print()
    print("Table state after deletion:")
    print(table)
    print()
    
    # Try to find collision_C - IT WILL FAIL
    print("Step 4: Try to lookup 'collision_C'")
    print("-" * 40)
    found, value = table.lookup("collision_C")
    print(f"  Lookup 'collision_C': found={found}")
    print()
    print("  PROBLEM: collision_C IS in the table (slot 7), but lookup")
    print("  stopped at slot 6 (now empty) and concluded it's absent!")
    print()
    print("  This is why we need TOMBSTONES instead of true deletion.")
 
demonstrate_broken_deletion()

Critical Insight

Tombstones: Marking Without Breaking

"This slot was occupied but is now deleted. Keep probing past me during lookups—I'm not a terminus. But you can reuse me during insertions—I'm available."

Tombstones maintain the integrity of probe sequences while allowing slot reuse. The distinction between three slot states becomes:

EMPTY: Never occupied. Terminates probe sequences.
OCCUPIED: Contains a key-value pair. Continue probing if not a match.
DELETED (Tombstone): Was occupied, now deleted. Continue probing for lookups; can be reused for insertions.

tombstone_implementation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
from typing import Optional, Tuple
from enum import Enum
from dataclasses import dataclass
 
class SlotState(Enum):
    EMPTY = "EMPTY"       # Never used
    OCCUPIED = "OCCUPIED" # Contains data
    DELETED = "DELETED"   # Tombstone
 
@dataclass
class Slot:
    state: SlotState
    key: Optional[str] = None
    value: Optional[str] = None
 
class TombstoneHashTable:
    """
    Open addressing hash table with proper tombstone-based deletion.
    """
    
    def __init__(self, capacity: int = 16):
        self.capacity = capacity
        self.slots = [Slot(SlotState.EMPTY) for _ in range(capacity)]
        self.size = 0           # Active elements
        self.tombstone_count = 0  # Deleted markers
    
    def _hash(self, key) -> int:
        return hash(key)
    
    def _probe_sequence(self, key):
        """Generate probe indices using linear probing."""
        start = self._hash(key) % self.capacity
        for offset in range(self.capacity):
            yield (start + offset) % self.capacity
    
    def insert(self, key, value) -> Tuple[int, int]:
        """
        Insert with tombstone reuse.
        Returns (slot_index, probes).
        """
        probes = 0
        tombstone_slot = None  # Remember first tombstone for reuse
        
        for idx in self._probe_sequence(key):
            probes += 1
            slot = self.slots[idx]
            
            if slot.state == SlotState.EMPTY:
                # Truly empty - use this slot (or earlier tombstone)
                target_idx = tombstone_slot if tombstone_slot is not None else idx
                
                if tombstone_slot is not None:
                    self.tombstone_count -= 1  # Reusing tombstone
                
                self.slots[target_idx] = Slot(SlotState.OCCUPIED, key, value)
                self.size += 1
                return (target_idx, probes)
            
            if slot.state == SlotState.DELETED:
                # Remember first tombstone for potential reuse
                if tombstone_slot is None:
                    tombstone_slot = idx
                continue  # Keep probing - key might exist later
            
            if slot.key == key:
                # Key exists - update value
                self.slots[idx] = Slot(SlotState.OCCUPIED, key, value)
                return (idx, probes)
        
        raise RuntimeError("Table is full")
    
    def delete(self, key) -> Tuple[bool, int]:
        """
        Delete using tombstone.
        
        Returns:
            - success: True if key was found and deleted
            - probes: Number of slots examined
        
        Time Complexity:
            - Average: O(1) with good hash distribution
            - Worst: O(n) when heavily clustered
        """
        probes = 0
        
        for idx in self._probe_sequence(key):
            probes += 1
            slot = self.slots[idx]
            
            if slot.state == SlotState.EMPTY:
                # Never occupied - key definitely absent
                return (False, probes)
            
            if slot.state == SlotState.DELETED:
                # Was occupied - keep probing
                continue
            
            if slot.key == key:
                # Found - replace with tombstone
                self.slots[idx] = Slot(SlotState.DELETED)
                self.size -= 1
                self.tombstone_count += 1
                return (True, probes)
        
        return (False, probes)
    
    def lookup(self, key) -> Tuple[bool, Optional[str], int]:
        """
        Lookup that properly handles tombstones.
        
        Returns: (found, value, probes)
        """
        probes = 0
        
        for idx in self._probe_sequence(key):
            probes += 1
            slot = self.slots[idx]
            
            if slot.state == SlotState.EMPTY:
                # Never occupied - key absent
                return (False, None, probes)
            
            if slot.state == SlotState.DELETED:
                # Was occupied - keep probing
                continue
            
            if slot.key == key:
                return (True, slot.value, probes)
        
        return (False, None, probes)
    
    def effective_load_factor(self) -> float:
        """Load factor including tombstones (affects probing cost)."""
        return (self.size + self.tombstone_count) / self.capacity
    
    def true_load_factor(self) -> float:
        """Load factor counting only live elements."""
        return self.size / self.capacity
    
    def __str__(self):
        lines = []
        for i, slot in enumerate(self.slots):
            if slot.state == SlotState.EMPTY:
                lines.append(f"[{i}]: EMPTY")
            elif slot.state == SlotState.DELETED:
                lines.append(f"[{i}]: DELETED (tombstone)")
            else:
                lines.append(f"[{i}]: {slot.key}")
        return "\n".join(lines)
 
 
def demonstrate_tombstones():
    """Show correct tombstone behavior."""
    
    print("Tombstone-Based Deletion Demonstration")
    print("=" * 60)
    
    table = TombstoneHashTable(capacity=10)
    
    # Use a hash function that creates collisions for demo
    original_hash = table._hash
    def predictable_hash(key):
        if isinstance(key, str) and key.startswith("collision_"):
            return 5  # Force all collision_* keys to hash to slot 5
        return original_hash(key)
    table._hash = predictable_hash
    
    # Insert three colliding keys
    keys = ["collision_A", "collision_B", "collision_C"]
    
    print("Step 1: Insert three keys that all hash to slot 5")
    print("-" * 40)
    for key in keys:
        slot, probes = table.insert(key, f"value_{key}")
        print(f"  Inserted '{key}' at slot {slot} ({probes} probes)")
    
    print()
    print("Table state after insertions:")
    print(table)
    print()
    
    # Delete the middle key
    print("Step 2: Delete 'collision_B' using tombstone")
    print("-" * 40)
    success, probes = table.delete("collision_B")
    print(f"  Deleted: {success} ({probes} probes)")
    print(f"  Tombstones now: {table.tombstone_count}")
    print()
    print("Table state after deletion:")
    print(table)
    print()
    
    # Verify collision_C is still findable
    print("Step 3: Verify 'collision_C' is still accessible")
    print("-" * 40)
    found, value, probes = table.lookup("collision_C")
    print(f"  Lookup 'collision_C': found={found}, probes={probes}")
    print("  SUCCESS: Tombstone allowed probing to continue past deleted slot!")
    print()
    
    # Insert reuses tombstone
    print("Step 4: Insert 'collision_D' - should reuse tombstone")
    print("-" * 40)
    slot, probes = table.insert("collision_D", "value_D")
    print(f"  Inserted 'collision_D' at slot {slot} ({probes} probes)")
    print(f"  Tombstones now: {table.tombstone_count}")
    print()
    print("Table state after insertion:")
    print(table)
 
demonstrate_tombstones()

The Key Distinction

Tombstone Accumulation: The Hidden Performance Drain

The Problem:

Consider a table with capacity 1000:

Insert 750 elements (load factor 0.75)
Delete 500 elements (500 tombstones)
Live load factor: 0.25 (250 elements)
Effective load factor: 0.75 (750 slots non-empty)

Even though only 250 elements exist, probing must traverse tombstones as if they were occupied. The 'effective' load—what matters for probe length—includes both live elements and tombstones.

This creates a paradox: a table can appear lightly loaded while performing like a nearly-full table.

tombstone_accumulation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
import random
 
def demonstrate_tombstone_impact():
    """
    Show how tombstone accumulation degrades performance
    even when live element count is low.
    """
    
    class MeasuredHashTable:
        EMPTY = "E"
        DELETED = "D"
        
        def __init__(self, capacity):
            self.capacity = capacity
            self.slots = [self.EMPTY] * capacity
            self.size = 0
            self.tombstones = 0
            self.total_probes = 0
            self.operations = 0
        
        def insert(self, key, value):
            self.operations += 1
            probes = 0
            start = hash(key) % self.capacity
            first_tombstone = None
            
            for offset in range(self.capacity):
                idx = (start + offset) % self.capacity
                probes += 1
                
                slot = self.slots[idx]
                
                if slot == self.EMPTY:
                    target = first_tombstone if first_tombstone is not None else idx
                    if first_tombstone is not None:
                        self.tombstones -= 1
                    self.slots[target] = (key, value)
                    self.size += 1
                    self.total_probes += probes
                    return probes
                
                if slot == self.DELETED:
                    if first_tombstone is None:
                        first_tombstone = idx
                    continue
                
                if slot[0] == key:
                    self.slots[idx] = (key, value)
                    self.total_probes += probes
                    return probes
            
            raise RuntimeError("Full")
        
        def delete(self, key):
            self.operations += 1
            probes = 0
            start = hash(key) % self.capacity
            
            for offset in range(self.capacity):
                idx = (start + offset) % self.capacity
                probes += 1
                
                slot = self.slots[idx]
                
                if slot == self.EMPTY:
                    self.total_probes += probes
                    return False
                
                if slot == self.DELETED:
                    continue
                
                if slot[0] == key:
                    self.slots[idx] = self.DELETED
                    self.size -= 1
                    self.tombstones += 1
                    self.total_probes += probes
                    return True
            
            self.total_probes += probes
            return False
        
        def lookup(self, key):
            self.operations += 1
            probes = 0
            start = hash(key) % self.capacity
            
            for offset in range(self.capacity):
                idx = (start + offset) % self.capacity
                probes += 1
                
                slot = self.slots[idx]
                
                if slot == self.EMPTY:
                    self.total_probes += probes
                    return (False, probes)
                
                if slot == self.DELETED:
                    continue
                
                if slot[0] == key:
                    self.total_probes += probes
                    return (True, probes)
            
            self.total_probes += probes
            return (False, probes)
        
        def avg_probes(self):
            return self.total_probes / self.operations if self.operations > 0 else 0
    
    print("Tombstone Accumulation Impact Analysis")
    print("=" * 70)
    print()
    
    capacity = 1000
    
    # Scenario 1: Fresh table with 250 elements (25% load)
    print("Scenario 1: Fresh table with 250 elements")
    print("-" * 50)
    table1 = MeasuredHashTable(capacity)
    
    keys = [f"key_{i}" for i in range(250)]
    for key in keys:
        table1.insert(key, "value")
    
    table1.total_probes = 0
    table1.operations = 0
    
    for key in keys:
        table1.lookup(key)
    
    print(f"  Live elements: {table1.size}")
    print(f"  Tombstones: {table1.tombstones}")
    print(f"  True load: {table1.size / capacity:.0%}")
    print(f"  Effective load: {(table1.size + table1.tombstones) / capacity:.0%}")
    print(f"  Avg probes per lookup: {table1.avg_probes():.2f}")
    print()
    
    # Scenario 2: Table with 750 inserts, 500 deletes = 250 live + 500 tombstones
    print("Scenario 2: Same 250 elements, but 500 tombstones from deletions")
    print("-" * 50)
    table2 = MeasuredHashTable(capacity)
    
    all_keys = [f"key_{i}" for i in range(750)]
    for key in all_keys:
        table2.insert(key, "value")
    
    # Delete 500 of them
    to_delete = random.sample(all_keys, 500)
    for key in to_delete:
        table2.delete(key)
    
    remaining_keys = [k for k in all_keys if k not in to_delete]
    
    table2.total_probes = 0
    table2.operations = 0
    
    for key in remaining_keys:
        table2.lookup(key)
    
    print(f"  Live elements: {table2.size}")
    print(f"  Tombstones: {table2.tombstones}")
    print(f"  True load: {table2.size / capacity:.0%}")
    print(f"  Effective load: {(table2.size + table2.tombstones) / capacity:.0%}")
    print(f"  Avg probes per lookup: {table2.avg_probes():.2f}")
    print()
    
    print("Comparison:")
    print(f"  Both have ~250 live elements")
    print(f"  Tombstone-laden table requires {table2.avg_probes() / table1.avg_probes():.1f}x more probes!")
    print()
    print("Conclusion: Tombstones count toward probing cost even though")
    print("they hold no data. Periodic cleanup is essential.")
 
demonstrate_tombstone_impact()

Impact of Tombstone Accumulation
True Load	Tombstones	Effective Load	Expected Probe Increase
25%	0%	25%	1.0x (baseline)
25%	25%	50%	~1.5x
25%	50%	75%	~3.0x
25%	65%	90%	~6.0x
10%	80%	90%	~6.0x (extremely degraded)

Workload Consideration

Tombstone Management Strategies

To prevent tombstone-induced degradation, implementations employ various management strategies. The choice depends on workload characteristics and performance requirements.

Tombstone Management Approaches

•Threshold-Based Rebuild — Track tombstone count. When tombstones exceed a threshold (e.g., 25% of capacity or exceed live count), rebuild the entire table without tombstones. Cost: O(n) but eliminates all tombstones at once.
•Lazy Cleanup During Insert — When inserting into a tombstone slot, compact the table locally by shifting elements. More complex but avoids discrete rebuild events.
•Periodic Background Compaction — Separate thread/process periodically rebuilds the table. Good for systems that can tolerate brief inconsistency or have locking mechanisms.
•Tombstone Reuse Priority — During insertion, prefer reusing tombstones over extending probe chains. Reduces new tombstone creation but doesn't eliminate existing ones.
•Robin Hood Hashing — A variant of open addressing that maintains fairness by swapping elements. Can reduce probe variance but has its own complexity trade-offs.
•Hopscotch Hashing — Bounds the maximum displacement of any element, providing more predictable performance at the cost of more complex insert logic.

tombstone_management.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
class ManagedHashTable:
    """
    Hash table with automatic tombstone management through rebuilding.
    """
    
    EMPTY = "EMPTY"
    DELETED = "DELETED"
    
    def __init__(self, capacity=16, tombstone_threshold=0.25):
        self.capacity = capacity
        self.slots = [self.EMPTY] * capacity
        self.size = 0
        self.tombstone_count = 0
        self.tombstone_threshold = tombstone_threshold  # Max tombstone ratio
        self.rebuild_count = 0
    
    def _needs_rebuild(self) -> bool:
        """Determine if tombstone cleanup is needed."""
        if self.capacity == 0:
            return False
        tombstone_ratio = self.tombstone_count / self.capacity
        return tombstone_ratio > self.tombstone_threshold
    
    def _rebuild(self):
        """
        Rebuild table without tombstones.
        Time: O(n) where n is current size.
        """
        self.rebuild_count += 1
        old_slots = self.slots
        
        # Potentially resize based on current size
        # Here we keep same capacity, just remove tombstones
        self.slots = [self.EMPTY] * self.capacity
        self.size = 0
        self.tombstone_count = 0
        
        for slot in old_slots:
            if slot not in (self.EMPTY, self.DELETED):
                key, value = slot
                self._insert_no_rebuild(key, value)
    
    def _insert_no_rebuild(self, key, value):
        """Insert without triggering rebuild (used during rebuild)."""
        start = hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start + offset) % self.capacity
            slot = self.slots[idx]
            
            if slot in (self.EMPTY, self.DELETED):
                self.slots[idx] = (key, value)
                self.size += 1
                return
            
            if slot[0] == key:
                self.slots[idx] = (key, value)
                return
    
    def insert(self, key, value):
        """Insert with automatic tombstone management."""
        # Check load factor for resize (separate from tombstone cleanup)
        if self.size / self.capacity > 0.7:
            self._resize()
        
        self._insert_no_rebuild(key, value)
    
    def delete(self, key) -> bool:
        """Delete with automatic tombstone cleanup."""
        start = hash(key) % self.capacity
        
        for offset in range(self.capacity):
            idx = (start + offset) % self.capacity
            slot = self.slots[idx]
            
            if slot == self.EMPTY:
                return False
            
            if slot == self.DELETED:
                continue
            
            if slot[0] == key:
                self.slots[idx] = self.DELETED
                self.size -= 1
                self.tombstone_count += 1
                
                # Check if cleanup needed
                if self._needs_rebuild():
                    self._rebuild()
                
                return True
        
        return False
    
    def _resize(self):
        """Double capacity (implicitly removes tombstones)."""
        old_slots = self.slots
        self.capacity *= 2
        self.slots = [self.EMPTY] * self.capacity
        self.size = 0
        self.tombstone_count = 0
        
        for slot in old_slots:
            if slot not in (self.EMPTY, self.DELETED):
                self._insert_no_rebuild(slot[0], slot[1])
    
    def stats(self) -> dict:
        return {
            "size": self.size,
            "capacity": self.capacity,
            "tombstones": self.tombstone_count,
            "rebuilds": self.rebuild_count,
            "true_load": f"{self.size / self.capacity:.1%}",
            "effective_load": f"{(self.size + self.tombstone_count) / self.capacity:.1%}",
        }
 
 
def demonstrate_managed_table():
    """Show automatic tombstone management."""
    import random
    
    print("Automatic Tombstone Management Demonstration")
    print("=" * 60)
    
    table = ManagedHashTable(capacity=100, tombstone_threshold=0.25)
    
    # Simulate heavy deletion workload
    print("Simulating workload: insert 50, delete 40, repeat 10 times")
    print("-" * 50)
    
    all_keys = []
    
    for round in range(10):
        # Insert 50
        new_keys = [f"key_{round}_{i}" for i in range(50)]
        for key in new_keys:
            table.insert(key, "value")
        all_keys.extend(new_keys)
        
        # Delete 40 random existing keys
        to_delete = random.sample(all_keys, min(40, len(all_keys)))
        for key in to_delete:
            table.delete(key)
        all_keys = [k for k in all_keys if k not in to_delete]
        
        stats = table.stats()
        print(f"Round {round + 1}: {stats}")
    
    print()
    print(f"Final state: {table.stats()}")
    print()
    print(f"Table was rebuilt {table.rebuild_count} times to manage tombstones")
    print(f"Without management, tombstones would have accumulated to ~400")
 
demonstrate_managed_table()

Deletion Complexity: The Complete Picture

Let's consolidate the complexity analysis for deletion across both collision strategies:

Deletion Complexity Comparison
Aspect	Chaining	Open Addressing (Tombstones)
Average Case	O(1 + α)	O(1/(1-α))
Worst Case	O(n)	O(n)
Space Overhead	Pointer per node	Tombstone slots
Effect on Other Keys	None	None (with tombstones)
Cleanup Needed	No	Yes (tombstone accumulation)
Implementation Complexity	Simple	Moderate (3 slot states)
Cache Performance	Poor (pointer chasing)	Good (contiguous)
High Deletion Rate	Handles well	Degrades without cleanup

Choose Chaining When

•High deletion rate expected
•Deletion correctness is paramount
•Memory overhead is acceptable
•Implementation simplicity preferred
•Load factor may exceed 1.0

Choose Open Addressing When

•Deletions are rare
•Memory efficiency critical
•Cache performance matters
•Can implement tombstone management
•Load stays moderate (< 0.7)

Summary: The Full Deletion Story

Deletion is where hash table implementations diverge most significantly. Let's consolidate the essential insights:

Key Takeaways

•Chaining deletion is straightforward — Remove a linked list node. No side effects on other elements. O(1) average, O(n) worst case with clean semantics.
•Open addressing deletion is problematic — Naive emptying breaks probe sequences. Elements that probed past the deleted slot become unreachable.
•Tombstones solve correctness but create overhead — Marking slots as 'deleted but keep probing' preserves correctness. But tombstones accumulate and degrade performance.
•Effective load includes tombstones — Even low live-element counts can exhibit high-load behavior if tombstones are numerous. Probe lengths reflect total non-empty slots.
•Tombstone management is essential — Periodic rebuilding, threshold-based cleanup, or alternative hashing schemes (Robin Hood, Hopscotch) prevent accumulation-induced degradation.
•Workload should inform strategy choice — High-deletion workloads favor chaining. Low-deletion, cache-sensitive workloads favor open addressing with tombstone management.

What's Next:

Page Complete

3 / 4