Data Structures & AlgorithmsLoad Factor & Rehashing

Load Factor & Rehashing

LevelIntermediate

Duration55 mins

TopicLoad Factor & Rehashing

3 / 4

When and How to Rehash

The Table Rebuilding Operation

Every dynamic hash table eventually faces a critical moment: the load factor has crossed its threshold, performance is degrading, and the only solution is to rebuild the entire table from scratch. This operation—called rehashing (also known as resizing, growing, or table doubling)—is one of the most important behind-the-scenes operations in computer science.

Rehashing is counterintuitive at first. We're expending O(n) work just to maintain the O(1) operations we love. Why would an operation that processes every single element preserve constant-time efficiency? The answer lies in one of the most elegant concepts in algorithm analysis: amortized cost.

This page examines rehashing from every angle: when to trigger it, exactly how it works, the tradeoffs involved, and how to implement it correctly. By the end, you'll understand not just the mechanics but the deep reasoning that makes rehashing essential for hash table efficiency.

What You Will Learn

By the end of this page, you will understand: (1) The exact conditions that trigger rehashing, (2) The step-by-step rehashing algorithm, (3) Why elements must be re-inserted (not just moved), (4) Growth factor choices and their implications, (5) Shrinking strategies for memory reclamation, and (6) Implementation patterns for both chaining and open addressing.

When to Rehash: Trigger Conditions

Rehashing is triggered by specific conditions related to load factor. Understanding these triggers helps you predict when resizing will occur and plan accordingly.

Primary Trigger: Load Factor Exceeds Threshold

•The fundamental rule: When current load factor α exceeds the maximum threshold α_max, trigger a resize.
•Timing: The check typically happens after each insertion. If inserting element n+1 would push α above threshold, resize first.
•Formula: Resize when (n + 1) / m > α_max, equivalently when n + 1 > m × α_max.
•Threshold values: Common α_max values are 0.75 (Java HashMap), 0.67 (Python dict), 0.5 (open addressing heavy systems).

Example trigger calculation:

HashMap with m = 16 buckets, α_max = 0.75

Threshold = 16 × 0.75 = 12 elements
After 12 insertions: α = 12/16 = 0.75 (at threshold)
13th insertion triggers resize
New capacity: 32 buckets
After resize: α = 13/32 = 0.406 (healthy again)

Secondary triggers:

Some implementations consider additional conditions:

Maximum chain length (chaining): If any bucket's chain exceeds a threshold (e.g., 8 in Java), restructure that bucket into a tree or trigger resize.
Probe sequence length (open addressing): If an insertion requires more than a threshold number of probes, resize proactively.
Memory pressure: External signals indicating memory scarcity might delay resizing or trigger shrinking.
Time-based amortization: Some systems schedule resizing during low-load periods rather than during peak insertions.

Checking Before vs. After Insertion

Implementations vary on whether they check the threshold before or after insertion. Checking before ensures the table never exceeds the threshold; checking after may briefly exceed it. For open addressing at high load factors, checking before is safer since even one insertion at near-capacity can cause expensive probing.

resize-trigger.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class HashTableWithResize:
    """
    Demonstrates resize triggering logic.
    """
    
    MAX_LOAD_FACTOR = 0.75
    
    def __init__(self, initial_capacity: int = 16):
        self.capacity = initial_capacity
        self.size = 0
        self.buckets = [[] for _ in range(initial_capacity)]
        self.resize_count = 0
        
    @property
    def load_factor(self) -> float:
        return self.size / self.capacity
    
    @property
    def threshold(self) -> int:
        """Maximum elements before resize triggers."""
        return int(self.capacity * self.MAX_LOAD_FACTOR)
    
    def _should_resize(self) -> bool:
        """Check if insertion would exceed load factor threshold."""
        return self.size + 1 > self.threshold
    
    def insert(self, key, value):
        """Insert with automatic resize triggering."""
        # Check threshold BEFORE insertion
        if self._should_resize():
            print(f"⚡ Resize triggered! size={self.size}, capacity={self.capacity}, "
                  f"α={self.load_factor:.3f}")
            self._resize()
        
        # Now insert (simplified - ignoring collision handling)
        bucket_idx = hash(key) % self.capacity
        self.buckets[bucket_idx].append((key, value))
        self.size += 1
        
    def _resize(self):
        """Double capacity and rehash all elements."""
        old_buckets = self.buckets
        self.capacity *= 2
        self.buckets = [[] for _ in range(self.capacity)]
        self.size = 0
        self.resize_count += 1
        
        # Re-insert all elements
        for bucket in old_buckets:
            for key, value in bucket:
                bucket_idx = hash(key) % self.capacity
                self.buckets[bucket_idx].append((key, value))
                self.size += 1
        
        print(f"  ✓ Resize complete: new capacity={self.capacity}, α={self.load_factor:.3f}")
    
    def status(self):
        print(f"Size: {self.size}, Capacity: {self.capacity}, "
              f"Load Factor: {self.load_factor:.3f}, Resizes: {self.resize_count}")
 
# Demonstrate resize triggers
table = HashTableWithResize(initial_capacity=8)
print("Inserting 30 elements into table with initial capacity 8...")
print(f"Threshold: {table.threshold} elements (α_max = 0.75)\n")
 
for i in range(30):
    table.insert(f"key_{i}", i)
 
print("\nFinal status:")
table.status()
 
# Output:
# Inserting 30 elements into table with initial capacity 8...
# Threshold: 6 elements (α_max = 0.75)
#
# ⚡ Resize triggered! size=6, capacity=8, α=0.750
#   ✓ Resize complete: new capacity=16, α=0.375
# ⚡ Resize triggered! size=12, capacity=16, α=0.750
#   ✓ Resize complete: new capacity=32, α=0.375
# ⚡ Resize triggered! size=24, capacity=32, α=0.750
#   ✓ Resize complete: new capacity=64, α=0.375
#
# Final status:
# Size: 30, Capacity: 64, Load Factor: 0.469, Resizes: 3

The Rehashing Algorithm Step-by-Step

Rehashing appears simple in concept—make the table bigger—but the implementation requires careful attention to detail. Here's the complete algorithm:

Rehashing Algorithm

•Allocate new bucket array: Create a new array with increased capacity (typically 2× current size).
•Store reference to old buckets: Save a pointer to the existing bucket array—we'll need it to iterate.
•Reset size counter: Set size to 0. We'll re-count as we re-insert.
•Re-insert every element: For each element in old buckets, compute new hash position and insert into new array.
•Release old array: Old bucket array is no longer needed and can be garbage collected.
•Update hash table state: Update capacity field to new capacity. Table is now ready for operations.

Why Re-Insert, Not Copy?

A common misconception is that we can simply copy elements to corresponding positions in the larger array. This is wrong! Hash positions depend on capacity via hash % capacity. When capacity changes, elements hash to different buckets. An element at bucket 5 (hash=37, capacity=16, 37%16=5) might go to bucket 37 (hash=37, capacity=64, 37%64=37) after resize. Every element must be re-hashed and re-inserted.

Visualizing position changes:

Consider elements with hash values 5, 21, 37, 53 in a table of capacity 16:

Hash	Old Position (hash % 16)	New Position (hash % 32)
5	5	5
21	5	21
37	5	5
53	5	21

Notice: These four elements all collided at bucket 5 in the old table. After resize, they split between buckets 5 and 21. This redistribution is exactly the goal—reducing chain lengths by spreading elements across more buckets.

Elegant property of power-of-2 doubling:

When capacity doubles from m to 2m:

Element stays in position i if bit (log₂m) of hash is 0
Element moves to position i + m if bit (log₂m) of hash is 1

This means we can determine new positions with a single bit check, which some implementations exploit for efficiency.

rehashing-detailed.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
interface Entry<K, V> {
    key: K;
    value: V;
    hash: number;  // Cached hash for efficiency
}
 
class HashTableWithDetailedRehash<K, V> {
    private buckets: Array<Entry<K, V>[]>;
    private _capacity: number;
    private _size: number;
    private readonly maxLoadFactor = 0.75;
    
    constructor(initialCapacity: number = 16) {
        this._capacity = initialCapacity;
        this._size = 0;
        this.buckets = Array.from({ length: initialCapacity }, () => []);
    }
    
    get capacity(): number { return this._capacity; }
    get size(): number { return this._size; }
    get loadFactor(): number { return this._size / this._capacity; }
    
    private hash(key: K): number {
        // Simplified hash - real implementation would be more robust
        const str = String(key);
        let h = 0;
        for (let i = 0; i < str.length; i++) {
            h = (h * 31 + str.charCodeAt(i)) >>> 0;
        }
        return h;
    }
    
    private rehash(): void {
        console.log(`\n=== REHASH OPERATION ===`);
        console.log(`Before: capacity=${this._capacity}, size=${this._size}, α=${this.loadFactor.toFixed(3)}`);
        
        // Step 1: Store old buckets reference
        const oldBuckets = this.buckets;
        const oldCapacity = this._capacity;
        
        // Step 2: Allocate new bucket array (2x capacity)
        this._capacity *= 2;
        this.buckets = Array.from({ length: this._capacity }, () => []);
        
        // Step 3: Reset size (will recount)
        this._size = 0;
        
        // Step 4: Re-insert every element
        let movedCount = 0;
        let stayedCount = 0;
        
        for (let i = 0; i < oldCapacity; i++) {
            for (const entry of oldBuckets[i]) {
                // Compute NEW position (hash % newCapacity)
                const newBucketIdx = entry.hash % this._capacity;
                
                // Track if element moved
                if (newBucketIdx === i) {
                    stayedCount++;
                } else {
                    movedCount++;
                }
                
                // Insert into new bucket
                this.buckets[newBucketIdx].push(entry);
                this._size++;
            }
        }
        
        // Step 5: Old array released automatically (no reference)
        
        console.log(`After: capacity=${this._capacity}, size=${this._size}, α=${this.loadFactor.toFixed(3)}`);
        console.log(`Elements stayed: ${stayedCount}, moved: ${movedCount}`);
        console.log(`=== REHASH COMPLETE ===\n`);
    }
    
    set(key: K, value: V): void {
        // Check if resize needed before insertion
        if ((this._size + 1) / this._capacity > this.maxLoadFactor) {
            this.rehash();
        }
        
        const h = this.hash(key);
        const bucketIdx = h % this._capacity;
        
        // Check if key exists (update case)
        const bucket = this.buckets[bucketIdx];
        for (const entry of bucket) {
            if (entry.key === key) {
                entry.value = value;
                return;
            }
        }
        
        // New key - insert
        bucket.push({ key, value, hash: h });
        this._size++;
    }
}
 
// Demonstration
const table = new HashTableWithDetailedRehash<string, number>(8);
 
console.log("Inserting 15 elements...");
for (let i = 0; i < 15; i++) {
    table.set(`key${i}`, i);
}
 
console.log(`Final state: capacity=${table.capacity}, size=${table.size}, α=${table.loadFactor.toFixed(3)}`);

Growth Factor Choices and Trade-offs

When resizing, by how much should we increase capacity? This growth factor decision has significant implications for both time efficiency and memory usage.

Growth Factor Comparison
Growth Factor	Pros	Cons	Used By
2× (doubling)	Fewer resizes, simple bit operations, predictable	Memory doubles (+100%)	Java HashMap, Python dict
1.5×	Less memory waste (+50%)	More frequent resizes, no bit tricks	Some C++ implementations
φ ≈ 1.618 (golden ratio)	Theoretical elegance, memory reuse possible	Complex, marginal benefits	Academic interest
Additive (+k)	Constant overhead	Too many resizes for growing tables	Rarely used alone

Why doubling is dominant:

Amortized O(1) insertions: With doubling, the total work for n insertions is O(n), giving O(1) amortized per insert. Smaller growth factors also achieve O(1) amortized but with higher constants.
Power-of-2 capacity enables bit masking: hash % (2^k) = hash & (2^k - 1). Bit-AND is faster than modulo division on most processors.
Predictable half-empty tables: After doubling, load factor drops to α/2. If threshold is 0.75, post-resize α is ~0.375—comfortably far from threshold.
Simple implementation: Multiply by 2 or left-shift by 1. No floating-point or complex calculations.

When non-doubling makes sense:

Memory-constrained environments: Embedded systems might use 1.25× or 1.5× to conserve RAM.
Huge tables: Doubling a 10GB hash table requires 20GB total briefly. Smaller growth reduces peak memory.
Real-time requirements: Smaller resizes mean shorter individual pause times (though more frequent).

The Memory Overhead Reality

With α_max = 0.75 and doubling, hash tables typically use 1.33× to 2.67× the memory of the elements themselves (depending on where in the growth cycle they are). This overhead is the price of O(1) operations. For applications where memory is more precious than CPU cycles, consider higher load factor thresholds or smaller growth factors.

Shrinking: Reclaiming Memory After Deletions

Most discussions focus on growing hash tables, but shrinking is equally important for long-lived applications where tables grow and then shrink. Without shrinking, memory from deleted elements is never reclaimed.

When to shrink:

Common strategies for triggering shrink operations:

Load factor minimum threshold: If α drops below α_min (e.g., 0.25), halve the capacity.
Hysteresis window: Only shrink when α < α_min for a sustained period or multiple deletions, avoiding thrashing.
Explicit shrink-to-fit: Provide a method for users to explicitly request compaction.
Never shrink: Some implementations (early Python dicts) never shrink automatically—users must rebuild.

The thrashing problem:

If α_max = 0.75 and α_min = 0.375 (half of threshold), a series of insert-delete-insert-delete operations at the boundary could cause repeated grow-shrink cycles. This is thrashing and destroys performance.

Solution: Hysteresis

Set α_min significantly below α_max/growth_factor:

α_max = 0.75, growth = 2×, post-resize α ≈ 0.375
Set α_min = 0.25 (not 0.375)
Gap between 0.25 and 0.375 prevents thrashing

shrink-strategy.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
class HashTableWithShrinking:
    """
    Hash table that both grows and shrinks to maintain memory efficiency.
    """
    
    MAX_LOAD_FACTOR = 0.75   # Grow threshold
    MIN_LOAD_FACTOR = 0.25   # Shrink threshold (with hysteresis gap)
    MIN_CAPACITY = 16        # Never shrink below this
    
    def __init__(self, initial_capacity: int = 16):
        self.capacity = max(initial_capacity, self.MIN_CAPACITY)
        self.size = 0
        self.buckets = [[] for _ in range(self.capacity)]
    
    @property
    def load_factor(self) -> float:
        return self.size / self.capacity
    
    def _should_grow(self) -> bool:
        return (self.size + 1) / self.capacity > self.MAX_LOAD_FACTOR
    
    def _should_shrink(self) -> bool:
        # Only shrink if above minimum capacity and below minimum load factor
        return (self.capacity > self.MIN_CAPACITY and 
                self.load_factor < self.MIN_LOAD_FACTOR)
    
    def _resize(self, new_capacity: int) -> None:
        action = "Growing" if new_capacity > self.capacity else "Shrinking"
        print(f"{action}: {self.capacity} → {new_capacity} "
              f"(α: {self.load_factor:.3f} → {self.size/new_capacity:.3f})")
        
        old_buckets = self.buckets
        self.capacity = new_capacity
        self.buckets = [[] for _ in range(new_capacity)]
        self.size = 0
        
        for bucket in old_buckets:
            for key, value in bucket:
                bucket_idx = hash(key) % self.capacity
                self.buckets[bucket_idx].append((key, value))
                self.size += 1
    
    def insert(self, key, value) -> None:
        if self._should_grow():
            self._resize(self.capacity * 2)
        
        bucket_idx = hash(key) % self.capacity
        self.buckets[bucket_idx].append((key, value))
        self.size += 1
    
    def delete(self, key) -> bool:
        bucket_idx = hash(key) % self.capacity
        bucket = self.buckets[bucket_idx]
        
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket.pop(i)
                self.size -= 1
                
                # Check shrink condition after deletion
                if self._should_shrink():
                    self._resize(self.capacity // 2)
                return True
        return False
    
    def status(self) -> str:
        return f"size={self.size}, capacity={self.capacity}, α={self.load_factor:.3f}"
 
# Demonstration
table = HashTableWithShrinking(16)
 
# Grow phase
print("=== GROWTH PHASE ===")
for i in range(50):
    table.insert(f"key_{i}", i)
print(f"After 50 insertions: {table.status()}")
 
# Shrink phase
print("\n=== SHRINK PHASE ===")
for i in range(45):
    table.delete(f"key_{i}")
print(f"After 45 deletions: {table.status()}")
 
# Output:
# === GROWTH PHASE ===
# Growing: 16 → 32 (α: 0.750 → 0.375)
# Growing: 32 → 64 (α: 0.750 → 0.375)
# After 50 insertions: size=50, capacity=64, α=0.781
#
# === SHRINK PHASE ===
# Shrinking: 64 → 32 (α: 0.234 → 0.469)
# After 45 deletions: size=5, capacity=32, α=0.156

Lazy Shrinking

Many production implementations don't shrink automatically because: (1) Memory freed may not be returned to OS immediately anyway, (2) Re-growth is likely in many applications, (3) Shrinking is O(n) work that may not be worth it. Consider providing explicit shrink_to_fit() methods instead of automatic shrinking.

Special Considerations for Open Addressing

Rehashing in open addressing systems has additional complexities compared to chaining, primarily around tombstones and probe sequence integrity.

The tombstone problem:

In open addressing, deletion can't simply remove an element—doing so would break probe sequences for other elements. Instead, deleted slots are marked with a tombstone (special marker indicating 'deleted but probeable').

Over time, tombstones accumulate:

Lookups don't stop at tombstones (must continue probing)
Insertions can reuse tombstones (but this doesn't help lookup)
Heavy delete-insert patterns degrade performance

Rehashing clears tombstones:

When rehashing an open addressing table:

Only living elements are transferred
All tombstones are discarded
The new table starts fresh, with optimal probe sequences

This makes rehashing especially valuable in open addressing—it's not just about capacity, but about table health.

open-addressing-rehash.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
from enum import Enum
from typing import Optional, Tuple
 
class SlotState(Enum):
    EMPTY = "empty"
    OCCUPIED = "occupied"
    TOMBSTONE = "tombstone"
 
class OpenAddressedHashTable:
    """
    Open addressing hash table with tombstone handling during rehash.
    """
    
    MAX_LOAD_FACTOR = 0.6  # Lower threshold for open addressing
    
    def __init__(self, capacity: int = 16):
        self.capacity = capacity
        self.size = 0            # Living elements
        self.tombstones = 0       # Deleted elements (tombstones)
        self.slots = [(SlotState.EMPTY, None, None)] * capacity
    
    @property
    def load_factor(self) -> float:
        return self.size / self.capacity
    
    @property
    def effective_load(self) -> float:
        """Load including tombstones - affects probe sequence length."""
        return (self.size + self.tombstones) / self.capacity
    
    def _probe(self, key, for_insert: bool = False) -> int:
        """Linear probing. Returns slot index."""
        h = hash(key) % self.capacity
        first_tombstone = -1
        
        for i in range(self.capacity):
            idx = (h + i) % self.capacity
            state, k, v = self.slots[idx]
            
            if state == SlotState.EMPTY:
                # For insert: use first tombstone if found, else this empty slot
                return first_tombstone if first_tombstone >= 0 and for_insert else idx
            elif state == SlotState.TOMBSTONE:
                if first_tombstone < 0:
                    first_tombstone = idx
            elif k == key:
                return idx  # Found the key
        
        return first_tombstone if first_tombstone >= 0 else -1
    
    def _rehash(self) -> None:
        print(f"\n=== REHASH (Open Addressing) ===")
        print(f"Before: capacity={self.capacity}, size={self.size}, "
              f"tombstones={self.tombstones}, effective_α={self.effective_load:.3f}")
        
        old_slots = self.slots
        self.capacity *= 2
        self.slots = [(SlotState.EMPTY, None, None)] * self.capacity
        
        # Reset counters
        old_size = self.size
        old_tombstones = self.tombstones
        self.size = 0
        self.tombstones = 0  # Tombstones are NOT transferred!
        
        # Re-insert only living elements
        transferred = 0
        for state, key, value in old_slots:
            if state == SlotState.OCCUPIED:
                idx = self._probe(key, for_insert=True)
                self.slots[idx] = (SlotState.OCCUPIED, key, value)
                self.size += 1
                transferred += 1
        
        print(f"After: capacity={self.capacity}, size={self.size}, "
              f"tombstones={self.tombstones}, effective_α={self.effective_load:.3f}")
        print(f"Transferred {transferred} elements, discarded {old_tombstones} tombstones")
        print(f"=== REHASH COMPLETE ===\n")
    
    def insert(self, key, value) -> None:
        if (self.size + 1) / self.capacity > self.MAX_LOAD_FACTOR:
            self._rehash()
        
        idx = self._probe(key, for_insert=True)
        state, _, _ = self.slots[idx]
        
        if state == SlotState.TOMBSTONE:
            self.tombstones -= 1  # Reusing a tombstone
        
        self.slots[idx] = (SlotState.OCCUPIED, key, value)
        self.size += 1
    
    def delete(self, key) -> bool:
        idx = self._probe(key)
        state, k, _ = self.slots[idx]
        
        if state == SlotState.OCCUPIED and k == key:
            self.slots[idx] = (SlotState.TOMBSTONE, None, None)
            self.size -= 1
            self.tombstones += 1
            return True
        return False
    
    def status(self) -> str:
        return (f"size={self.size}, tombstones={self.tombstones}, "
                f"capacity={self.capacity}, α={self.load_factor:.3f}, "
                f"effective_α={self.effective_load:.3f}")
 
# Demonstration of tombstone cleanup
table = OpenAddressedHashTable(16)
 
# Insert then delete many elements (creates tombstones)
print("Creating tombstones through insert/delete cycles...")
for i in range(20):
    table.insert(f"temp_{i}", i)
for i in range(15):
    table.delete(f"temp_{i}")
 
print(f"Status after deletions: {table.status()}")
print("Notice: tombstones bloat effective load factor!")
 
# Trigger rehash
print("\nTriggering rehash by inserting more elements...")
for i in range(10):
    table.insert(f"new_{i}", i)
 
print(f"Final status: {table.status()}")
print("Notice: tombstones were cleared by rehash!")

Tombstone Density Threshold

Some implementations trigger rehashing based on tombstone density, not just load factor. If tombstones exceed a threshold (e.g., 25% of capacity), rehash even if living element count is low. This prevents pathological probe sequences in delete-heavy workloads.

Concurrent Rehashing Considerations

In multi-threaded environments, rehashing introduces significant complexity. While deep concurrent hash table implementation is beyond our scope, understanding the key challenges is valuable.

Concurrency Challenges During Rehash

•Stop-the-world approach: Lock the entire table during rehash. Simple but causes latency spikes—all operations block during O(n) rehash.
•Incremental rehashing: Transfer elements gradually across multiple operations. Redis uses this—maintains old and new tables simultaneously.
•Cooperative rehashing: Each thread helps migrate elements when it accesses the table. Spreads work but complicates implementation.
•Copy-on-Write (COW): Create new table, atomically swap pointers when complete. Simple for readers, but write latency increases.
•Lock-free/Wait-free designs: Advanced techniques using atomic operations. Java ConcurrentHashMap uses complex bucket-level coordination.

Incremental rehashing example (Redis-style):

When rehash triggers, allocate new table but keep old one.
Maintain a 'rehash index' starting at 0.
On each operation (get/set/delete):
- Move N buckets from old table to new table
- Increment rehash index
- Perform the operation (checking both tables if needed)
When all buckets migrated, discard old table.

This spreads O(n) work across O(n) operations, maintaining O(1) per-operation cost without blocking.

When to Care About Concurrent Rehashing

If you're building a multi-threaded application, you probably shouldn't implement your own concurrent hash table—use battle-tested implementations like Java's ConcurrentHashMap, Go's sync.Map, or concurrent hashmaps from your language's standard library. Understanding these challenges helps you choose the right tool and reason about performance characteristics.

Summary: Mastering the Rehash Operation

We've comprehensively covered when and how rehashing works. Here are the essential takeaways:

Key Takeaways

•Trigger condition: Rehash when load factor exceeds threshold, typically checked before each insertion.
•Core algorithm: Allocate new array, re-insert every element (recomputing hash positions), release old array.
•Why re-insert, not copy: Hash positions depend on capacity. Changed capacity means changed positions.
•Doubling is standard: 2× growth factor enables O(1) amortized insertions, bit masking, and simple implementation.
•Shrinking needs hysteresis: Shrink at α_min significantly below α_max/2 to prevent thrashing.
•Open addressing bonus: Rehashing clears tombstones, restoring optimal probe sequences.

What's next:

The final page addresses a crucial question: if rehashing is O(n), how can we claim hash table operations are O(1)? The answer—amortized analysis—reveals one of the most elegant arguments in algorithm analysis. We'll prove that the total cost of n insertions with dynamic resizing is O(n), yielding O(1) amortized cost per operation.

Page Complete

You now understand the complete mechanics of rehashing: when it triggers, how it works, and the design decisions involved. This operational understanding prepares you for the amortized analysis that justifies hash tables' O(1) claims despite expensive resizing operations.

3 / 4

Loading learning content...

Data Structures & AlgorithmsLoad Factor & Rehashing

Load Factor & Rehashing

LevelIntermediate

Duration55 mins

TopicLoad Factor & Rehashing

3 / 4

When and How to Rehash

The Table Rebuilding Operation

What You Will Learn

When to Rehash: Trigger Conditions

Rehashing is triggered by specific conditions related to load factor. Understanding these triggers helps you predict when resizing will occur and plan accordingly.

Primary Trigger: Load Factor Exceeds Threshold

•The fundamental rule: When current load factor α exceeds the maximum threshold α_max, trigger a resize.
•Timing: The check typically happens after each insertion. If inserting element n+1 would push α above threshold, resize first.
•Formula: Resize when (n + 1) / m > α_max, equivalently when n + 1 > m × α_max.
•Threshold values: Common α_max values are 0.75 (Java HashMap), 0.67 (Python dict), 0.5 (open addressing heavy systems).

Example trigger calculation:

HashMap with m = 16 buckets, α_max = 0.75

Threshold = 16 × 0.75 = 12 elements
After 12 insertions: α = 12/16 = 0.75 (at threshold)
13th insertion triggers resize
New capacity: 32 buckets
After resize: α = 13/32 = 0.406 (healthy again)

Secondary triggers:

Some implementations consider additional conditions:

Maximum chain length (chaining): If any bucket's chain exceeds a threshold (e.g., 8 in Java), restructure that bucket into a tree or trigger resize.
Probe sequence length (open addressing): If an insertion requires more than a threshold number of probes, resize proactively.
Memory pressure: External signals indicating memory scarcity might delay resizing or trigger shrinking.
Time-based amortization: Some systems schedule resizing during low-load periods rather than during peak insertions.

Checking Before vs. After Insertion

resize-trigger.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class HashTableWithResize:
    """
    Demonstrates resize triggering logic.
    """
    
    MAX_LOAD_FACTOR = 0.75
    
    def __init__(self, initial_capacity: int = 16):
        self.capacity = initial_capacity
        self.size = 0
        self.buckets = [[] for _ in range(initial_capacity)]
        self.resize_count = 0
        
    @property
    def load_factor(self) -> float:
        return self.size / self.capacity
    
    @property
    def threshold(self) -> int:
        """Maximum elements before resize triggers."""
        return int(self.capacity * self.MAX_LOAD_FACTOR)
    
    def _should_resize(self) -> bool:
        """Check if insertion would exceed load factor threshold."""
        return self.size + 1 > self.threshold
    
    def insert(self, key, value):
        """Insert with automatic resize triggering."""
        # Check threshold BEFORE insertion
        if self._should_resize():
            print(f"⚡ Resize triggered! size={self.size}, capacity={self.capacity}, "
                  f"α={self.load_factor:.3f}")
            self._resize()
        
        # Now insert (simplified - ignoring collision handling)
        bucket_idx = hash(key) % self.capacity
        self.buckets[bucket_idx].append((key, value))
        self.size += 1
        
    def _resize(self):
        """Double capacity and rehash all elements."""
        old_buckets = self.buckets
        self.capacity *= 2
        self.buckets = [[] for _ in range(self.capacity)]
        self.size = 0
        self.resize_count += 1
        
        # Re-insert all elements
        for bucket in old_buckets:
            for key, value in bucket:
                bucket_idx = hash(key) % self.capacity
                self.buckets[bucket_idx].append((key, value))
                self.size += 1
        
        print(f"  ✓ Resize complete: new capacity={self.capacity}, α={self.load_factor:.3f}")
    
    def status(self):
        print(f"Size: {self.size}, Capacity: {self.capacity}, "
              f"Load Factor: {self.load_factor:.3f}, Resizes: {self.resize_count}")
 
# Demonstrate resize triggers
table = HashTableWithResize(initial_capacity=8)
print("Inserting 30 elements into table with initial capacity 8...")
print(f"Threshold: {table.threshold} elements (α_max = 0.75)\n")
 
for i in range(30):
    table.insert(f"key_{i}", i)
 
print("\nFinal status:")
table.status()
 
# Output:
# Inserting 30 elements into table with initial capacity 8...
# Threshold: 6 elements (α_max = 0.75)
#
# ⚡ Resize triggered! size=6, capacity=8, α=0.750
#   ✓ Resize complete: new capacity=16, α=0.375
# ⚡ Resize triggered! size=12, capacity=16, α=0.750
#   ✓ Resize complete: new capacity=32, α=0.375
# ⚡ Resize triggered! size=24, capacity=32, α=0.750
#   ✓ Resize complete: new capacity=64, α=0.375
#
# Final status:
# Size: 30, Capacity: 64, Load Factor: 0.469, Resizes: 3

The Rehashing Algorithm Step-by-Step

Rehashing appears simple in concept—make the table bigger—but the implementation requires careful attention to detail. Here's the complete algorithm:

Rehashing Algorithm

•Allocate new bucket array: Create a new array with increased capacity (typically 2× current size).
•Store reference to old buckets: Save a pointer to the existing bucket array—we'll need it to iterate.
•Reset size counter: Set size to 0. We'll re-count as we re-insert.
•Re-insert every element: For each element in old buckets, compute new hash position and insert into new array.
•Release old array: Old bucket array is no longer needed and can be garbage collected.
•Update hash table state: Update capacity field to new capacity. Table is now ready for operations.

Why Re-Insert, Not Copy?

Visualizing position changes:

Consider elements with hash values 5, 21, 37, 53 in a table of capacity 16:

Hash	Old Position (hash % 16)	New Position (hash % 32)
5	5	5
21	5	21
37	5	5
53	5	21

Elegant property of power-of-2 doubling:

When capacity doubles from m to 2m:

Element stays in position i if bit (log₂m) of hash is 0
Element moves to position i + m if bit (log₂m) of hash is 1

This means we can determine new positions with a single bit check, which some implementations exploit for efficiency.

rehashing-detailed.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
interface Entry<K, V> {
    key: K;
    value: V;
    hash: number;  // Cached hash for efficiency
}
 
class HashTableWithDetailedRehash<K, V> {
    private buckets: Array<Entry<K, V>[]>;
    private _capacity: number;
    private _size: number;
    private readonly maxLoadFactor = 0.75;
    
    constructor(initialCapacity: number = 16) {
        this._capacity = initialCapacity;
        this._size = 0;
        this.buckets = Array.from({ length: initialCapacity }, () => []);
    }
    
    get capacity(): number { return this._capacity; }
    get size(): number { return this._size; }
    get loadFactor(): number { return this._size / this._capacity; }
    
    private hash(key: K): number {
        // Simplified hash - real implementation would be more robust
        const str = String(key);
        let h = 0;
        for (let i = 0; i < str.length; i++) {
            h = (h * 31 + str.charCodeAt(i)) >>> 0;
        }
        return h;
    }
    
    private rehash(): void {
        console.log(`\n=== REHASH OPERATION ===`);
        console.log(`Before: capacity=${this._capacity}, size=${this._size}, α=${this.loadFactor.toFixed(3)}`);
        
        // Step 1: Store old buckets reference
        const oldBuckets = this.buckets;
        const oldCapacity = this._capacity;
        
        // Step 2: Allocate new bucket array (2x capacity)
        this._capacity *= 2;
        this.buckets = Array.from({ length: this._capacity }, () => []);
        
        // Step 3: Reset size (will recount)
        this._size = 0;
        
        // Step 4: Re-insert every element
        let movedCount = 0;
        let stayedCount = 0;
        
        for (let i = 0; i < oldCapacity; i++) {
            for (const entry of oldBuckets[i]) {
                // Compute NEW position (hash % newCapacity)
                const newBucketIdx = entry.hash % this._capacity;
                
                // Track if element moved
                if (newBucketIdx === i) {
                    stayedCount++;
                } else {
                    movedCount++;
                }
                
                // Insert into new bucket
                this.buckets[newBucketIdx].push(entry);
                this._size++;
            }
        }
        
        // Step 5: Old array released automatically (no reference)
        
        console.log(`After: capacity=${this._capacity}, size=${this._size}, α=${this.loadFactor.toFixed(3)}`);
        console.log(`Elements stayed: ${stayedCount}, moved: ${movedCount}`);
        console.log(`=== REHASH COMPLETE ===\n`);
    }
    
    set(key: K, value: V): void {
        // Check if resize needed before insertion
        if ((this._size + 1) / this._capacity > this.maxLoadFactor) {
            this.rehash();
        }
        
        const h = this.hash(key);
        const bucketIdx = h % this._capacity;
        
        // Check if key exists (update case)
        const bucket = this.buckets[bucketIdx];
        for (const entry of bucket) {
            if (entry.key === key) {
                entry.value = value;
                return;
            }
        }
        
        // New key - insert
        bucket.push({ key, value, hash: h });
        this._size++;
    }
}
 
// Demonstration
const table = new HashTableWithDetailedRehash<string, number>(8);
 
console.log("Inserting 15 elements...");
for (let i = 0; i < 15; i++) {
    table.set(`key${i}`, i);
}
 
console.log(`Final state: capacity=${table.capacity}, size=${table.size}, α=${table.loadFactor.toFixed(3)}`);

Growth Factor Choices and Trade-offs

When resizing, by how much should we increase capacity? This growth factor decision has significant implications for both time efficiency and memory usage.

Growth Factor Comparison
Growth Factor	Pros	Cons	Used By
2× (doubling)	Fewer resizes, simple bit operations, predictable	Memory doubles (+100%)	Java HashMap, Python dict
1.5×	Less memory waste (+50%)	More frequent resizes, no bit tricks	Some C++ implementations
φ ≈ 1.618 (golden ratio)	Theoretical elegance, memory reuse possible	Complex, marginal benefits	Academic interest
Additive (+k)	Constant overhead	Too many resizes for growing tables	Rarely used alone

Why doubling is dominant:

Amortized O(1) insertions: With doubling, the total work for n insertions is O(n), giving O(1) amortized per insert. Smaller growth factors also achieve O(1) amortized but with higher constants.
Power-of-2 capacity enables bit masking: hash % (2^k) = hash & (2^k - 1). Bit-AND is faster than modulo division on most processors.
Predictable half-empty tables: After doubling, load factor drops to α/2. If threshold is 0.75, post-resize α is ~0.375—comfortably far from threshold.
Simple implementation: Multiply by 2 or left-shift by 1. No floating-point or complex calculations.

When non-doubling makes sense:

Memory-constrained environments: Embedded systems might use 1.25× or 1.5× to conserve RAM.
Huge tables: Doubling a 10GB hash table requires 20GB total briefly. Smaller growth reduces peak memory.
Real-time requirements: Smaller resizes mean shorter individual pause times (though more frequent).

The Memory Overhead Reality

Shrinking: Reclaiming Memory After Deletions

When to shrink:

Common strategies for triggering shrink operations:

Load factor minimum threshold: If α drops below α_min (e.g., 0.25), halve the capacity.
Hysteresis window: Only shrink when α < α_min for a sustained period or multiple deletions, avoiding thrashing.
Explicit shrink-to-fit: Provide a method for users to explicitly request compaction.
Never shrink: Some implementations (early Python dicts) never shrink automatically—users must rebuild.

The thrashing problem:

Solution: Hysteresis

Set α_min significantly below α_max/growth_factor:

α_max = 0.75, growth = 2×, post-resize α ≈ 0.375
Set α_min = 0.25 (not 0.375)
Gap between 0.25 and 0.375 prevents thrashing

shrink-strategy.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
class HashTableWithShrinking:
    """
    Hash table that both grows and shrinks to maintain memory efficiency.
    """
    
    MAX_LOAD_FACTOR = 0.75   # Grow threshold
    MIN_LOAD_FACTOR = 0.25   # Shrink threshold (with hysteresis gap)
    MIN_CAPACITY = 16        # Never shrink below this
    
    def __init__(self, initial_capacity: int = 16):
        self.capacity = max(initial_capacity, self.MIN_CAPACITY)
        self.size = 0
        self.buckets = [[] for _ in range(self.capacity)]
    
    @property
    def load_factor(self) -> float:
        return self.size / self.capacity
    
    def _should_grow(self) -> bool:
        return (self.size + 1) / self.capacity > self.MAX_LOAD_FACTOR
    
    def _should_shrink(self) -> bool:
        # Only shrink if above minimum capacity and below minimum load factor
        return (self.capacity > self.MIN_CAPACITY and 
                self.load_factor < self.MIN_LOAD_FACTOR)
    
    def _resize(self, new_capacity: int) -> None:
        action = "Growing" if new_capacity > self.capacity else "Shrinking"
        print(f"{action}: {self.capacity} → {new_capacity} "
              f"(α: {self.load_factor:.3f} → {self.size/new_capacity:.3f})")
        
        old_buckets = self.buckets
        self.capacity = new_capacity
        self.buckets = [[] for _ in range(new_capacity)]
        self.size = 0
        
        for bucket in old_buckets:
            for key, value in bucket:
                bucket_idx = hash(key) % self.capacity
                self.buckets[bucket_idx].append((key, value))
                self.size += 1
    
    def insert(self, key, value) -> None:
        if self._should_grow():
            self._resize(self.capacity * 2)
        
        bucket_idx = hash(key) % self.capacity
        self.buckets[bucket_idx].append((key, value))
        self.size += 1
    
    def delete(self, key) -> bool:
        bucket_idx = hash(key) % self.capacity
        bucket = self.buckets[bucket_idx]
        
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket.pop(i)
                self.size -= 1
                
                # Check shrink condition after deletion
                if self._should_shrink():
                    self._resize(self.capacity // 2)
                return True
        return False
    
    def status(self) -> str:
        return f"size={self.size}, capacity={self.capacity}, α={self.load_factor:.3f}"
 
# Demonstration
table = HashTableWithShrinking(16)
 
# Grow phase
print("=== GROWTH PHASE ===")
for i in range(50):
    table.insert(f"key_{i}", i)
print(f"After 50 insertions: {table.status()}")
 
# Shrink phase
print("\n=== SHRINK PHASE ===")
for i in range(45):
    table.delete(f"key_{i}")
print(f"After 45 deletions: {table.status()}")
 
# Output:
# === GROWTH PHASE ===
# Growing: 16 → 32 (α: 0.750 → 0.375)
# Growing: 32 → 64 (α: 0.750 → 0.375)
# After 50 insertions: size=50, capacity=64, α=0.781
#
# === SHRINK PHASE ===
# Shrinking: 64 → 32 (α: 0.234 → 0.469)
# After 45 deletions: size=5, capacity=32, α=0.156

Lazy Shrinking

Special Considerations for Open Addressing

Rehashing in open addressing systems has additional complexities compared to chaining, primarily around tombstones and probe sequence integrity.

The tombstone problem:

Over time, tombstones accumulate:

Lookups don't stop at tombstones (must continue probing)
Insertions can reuse tombstones (but this doesn't help lookup)
Heavy delete-insert patterns degrade performance

Rehashing clears tombstones:

When rehashing an open addressing table:

Only living elements are transferred
All tombstones are discarded
The new table starts fresh, with optimal probe sequences

This makes rehashing especially valuable in open addressing—it's not just about capacity, but about table health.

open-addressing-rehash.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
from enum import Enum
from typing import Optional, Tuple
 
class SlotState(Enum):
    EMPTY = "empty"
    OCCUPIED = "occupied"
    TOMBSTONE = "tombstone"
 
class OpenAddressedHashTable:
    """
    Open addressing hash table with tombstone handling during rehash.
    """
    
    MAX_LOAD_FACTOR = 0.6  # Lower threshold for open addressing
    
    def __init__(self, capacity: int = 16):
        self.capacity = capacity
        self.size = 0            # Living elements
        self.tombstones = 0       # Deleted elements (tombstones)
        self.slots = [(SlotState.EMPTY, None, None)] * capacity
    
    @property
    def load_factor(self) -> float:
        return self.size / self.capacity
    
    @property
    def effective_load(self) -> float:
        """Load including tombstones - affects probe sequence length."""
        return (self.size + self.tombstones) / self.capacity
    
    def _probe(self, key, for_insert: bool = False) -> int:
        """Linear probing. Returns slot index."""
        h = hash(key) % self.capacity
        first_tombstone = -1
        
        for i in range(self.capacity):
            idx = (h + i) % self.capacity
            state, k, v = self.slots[idx]
            
            if state == SlotState.EMPTY:
                # For insert: use first tombstone if found, else this empty slot
                return first_tombstone if first_tombstone >= 0 and for_insert else idx
            elif state == SlotState.TOMBSTONE:
                if first_tombstone < 0:
                    first_tombstone = idx
            elif k == key:
                return idx  # Found the key
        
        return first_tombstone if first_tombstone >= 0 else -1
    
    def _rehash(self) -> None:
        print(f"\n=== REHASH (Open Addressing) ===")
        print(f"Before: capacity={self.capacity}, size={self.size}, "
              f"tombstones={self.tombstones}, effective_α={self.effective_load:.3f}")
        
        old_slots = self.slots
        self.capacity *= 2
        self.slots = [(SlotState.EMPTY, None, None)] * self.capacity
        
        # Reset counters
        old_size = self.size
        old_tombstones = self.tombstones
        self.size = 0
        self.tombstones = 0  # Tombstones are NOT transferred!
        
        # Re-insert only living elements
        transferred = 0
        for state, key, value in old_slots:
            if state == SlotState.OCCUPIED:
                idx = self._probe(key, for_insert=True)
                self.slots[idx] = (SlotState.OCCUPIED, key, value)
                self.size += 1
                transferred += 1
        
        print(f"After: capacity={self.capacity}, size={self.size}, "
              f"tombstones={self.tombstones}, effective_α={self.effective_load:.3f}")
        print(f"Transferred {transferred} elements, discarded {old_tombstones} tombstones")
        print(f"=== REHASH COMPLETE ===\n")
    
    def insert(self, key, value) -> None:
        if (self.size + 1) / self.capacity > self.MAX_LOAD_FACTOR:
            self._rehash()
        
        idx = self._probe(key, for_insert=True)
        state, _, _ = self.slots[idx]
        
        if state == SlotState.TOMBSTONE:
            self.tombstones -= 1  # Reusing a tombstone
        
        self.slots[idx] = (SlotState.OCCUPIED, key, value)
        self.size += 1
    
    def delete(self, key) -> bool:
        idx = self._probe(key)
        state, k, _ = self.slots[idx]
        
        if state == SlotState.OCCUPIED and k == key:
            self.slots[idx] = (SlotState.TOMBSTONE, None, None)
            self.size -= 1
            self.tombstones += 1
            return True
        return False
    
    def status(self) -> str:
        return (f"size={self.size}, tombstones={self.tombstones}, "
                f"capacity={self.capacity}, α={self.load_factor:.3f}, "
                f"effective_α={self.effective_load:.3f}")
 
# Demonstration of tombstone cleanup
table = OpenAddressedHashTable(16)
 
# Insert then delete many elements (creates tombstones)
print("Creating tombstones through insert/delete cycles...")
for i in range(20):
    table.insert(f"temp_{i}", i)
for i in range(15):
    table.delete(f"temp_{i}")
 
print(f"Status after deletions: {table.status()}")
print("Notice: tombstones bloat effective load factor!")
 
# Trigger rehash
print("\nTriggering rehash by inserting more elements...")
for i in range(10):
    table.insert(f"new_{i}", i)
 
print(f"Final status: {table.status()}")
print("Notice: tombstones were cleared by rehash!")

Tombstone Density Threshold

Concurrent Rehashing Considerations

In multi-threaded environments, rehashing introduces significant complexity. While deep concurrent hash table implementation is beyond our scope, understanding the key challenges is valuable.

Concurrency Challenges During Rehash

•Stop-the-world approach: Lock the entire table during rehash. Simple but causes latency spikes—all operations block during O(n) rehash.
•Incremental rehashing: Transfer elements gradually across multiple operations. Redis uses this—maintains old and new tables simultaneously.
•Cooperative rehashing: Each thread helps migrate elements when it accesses the table. Spreads work but complicates implementation.
•Copy-on-Write (COW): Create new table, atomically swap pointers when complete. Simple for readers, but write latency increases.
•Lock-free/Wait-free designs: Advanced techniques using atomic operations. Java ConcurrentHashMap uses complex bucket-level coordination.

Incremental rehashing example (Redis-style):

When rehash triggers, allocate new table but keep old one.
Maintain a 'rehash index' starting at 0.
On each operation (get/set/delete):
- Move N buckets from old table to new table
- Increment rehash index
- Perform the operation (checking both tables if needed)
When all buckets migrated, discard old table.

This spreads O(n) work across O(n) operations, maintaining O(1) per-operation cost without blocking.

When to Care About Concurrent Rehashing

Summary: Mastering the Rehash Operation

We've comprehensively covered when and how rehashing works. Here are the essential takeaways:

Key Takeaways

•Trigger condition: Rehash when load factor exceeds threshold, typically checked before each insertion.
•Core algorithm: Allocate new array, re-insert every element (recomputing hash positions), release old array.
•Why re-insert, not copy: Hash positions depend on capacity. Changed capacity means changed positions.
•Doubling is standard: 2× growth factor enables O(1) amortized insertions, bit masking, and simple implementation.
•Shrinking needs hysteresis: Shrink at α_min significantly below α_max/2 to prevent thrashing.
•Open addressing bonus: Rehashing clears tombstones, restoring optimal probe sequences.

What's next:

Page Complete

3 / 4