Data Structures & AlgorithmsHash Table Structure & Mechanism

Hash Tables — Structure & Mechanism

LevelIntermediate

Duration60 mins

TopicHash Table Structure & Mechanism

3 / 4

How Insertion Works

Storing Data: The Journey from Key to Bucket

Every time you write map[key] = value or dict.put(key, value), you trigger a sophisticated sequence of operations that the hash table executes in microseconds. Understanding this sequence—from the moment the key enters the function to the moment the value is stored—is essential for reasoning about performance, debugging issues, and designing efficient systems.

Insertion is where theory meets practice. All the concepts we've covered—hash functions, buckets, slots, collision resolution—come together in a single, coordinated operation. Let's trace through this operation step by step, examining every decision point and potential outcome.

What You Will Learn

By the end of this page, you will be able to trace the complete insertion process in your mind, handle collision scenarios with both chaining and open addressing, understand when and why updates occur instead of insertions, and identify the performance implications of each step.

The Insertion Algorithm: Overview

Before diving into details, let's establish the high-level flow of hash table insertion. This algorithm applies regardless of the collision resolution strategy:

The Universal Insertion Algorithm:

INSERT(table, key, value):
    1. HASH: Compute hash_code = hash(key)
    2. INDEX: Compute bucket_index = hash_code mod table.capacity
    3. LOCATE: Navigate to table.buckets[bucket_index]
    4. SEARCH: Check if key already exists at this location
    5. STORE: Either update existing entry or create new one
    6. MAINTAIN: Trigger rehashing if load factor exceeds threshold

Each of these steps has nuances that affect correctness and performance. Let's examine each in depth.

Insertion Step Breakdown

•Step 1: Hashing — Transform the key into a raw hash code. This must be deterministic (same key → same hash) and ideally distributes keys uniformly.
•Step 2: Indexing — Constrain the hash code to a valid array index using modulo arithmetic. The result must be in range [0, capacity-1].
•Step 3: Location — Access the computed bucket in O(1) time via array indexing.
•Step 4: Search — Check whether the key already exists. This step differs significantly between chaining and open addressing.
•Step 5: Storage — Store the key-value pair, either updating an existing entry or creating a new one.
•Step 6: Maintenance — Check if the table is overloaded and needs to grow (rehashing).

The Key Invariant

Hash tables enforce a fundamental invariant: each key appears at most once. Insertion with a duplicate key doesn't create a second entry—it replaces the existing value. This invariant simplifies reasoning and prevents inconsistencies.

Step 1 & 2: Hashing and Indexing

The first two steps transform an arbitrary key into a specific array index. This transformation is the "hash" in "hash table" and is central to achieving O(1) access.

Computing the Hash Code:

The hash function takes the key and produces an integer hash code. Different key types require different hashing approaches:

Strings: Combine character codes (e.g., polynomial hashing)
Integers: Often used directly or with multiplication mixing
Objects: Combine hashes of component fields
Composite Keys: Hash each component and combine

The hash code is typically a large integer (potentially negative) that doesn't directly correspond to any array index.

hashing_step.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def trace_hashing_step(key, table_capacity: int = 16):
    """
    Trace the hashing and indexing steps in detail.
    """
    print(f"\n{'='*60}")
    print(f"INSERTION TRACE: Steps 1-2 (Hashing and Indexing)")
    print(f"{'='*60}")
    print(f"Key: {repr(key)}")
    print(f"Table capacity: {table_capacity}")
    print()
    
    # STEP 1: Compute hash code
    print("STEP 1: Hashing")
    print("-" * 40)
    
    # Python's built-in hash
    hash_code = hash(key)
    print(f"  hash({repr(key)}) = {hash_code}")
    
    # Note on hash code range
    print(f"  Hash code is a potentially large integer")
    print(f"  Can be negative: {hash_code < 0}")
    print(f"  Bit length: {hash_code.bit_length()} bits")
    print()
    
    # STEP 2: Compute array index
    print("STEP 2: Indexing")
    print("-" * 40)
    
    # Standard modulo approach
    raw_index = hash_code % table_capacity
    print(f"  hash_code % capacity = {hash_code} % {table_capacity}")
    print(f"  Raw index = {raw_index}")
    
    # Handle negative results (Python % always returns non-negative for positive divisor)
    final_index = raw_index if raw_index >= 0 else raw_index + table_capacity
    print(f"  Final index = {final_index}")
    print()
    
    # Demonstrate why same key always gives same index
    print("VERIFICATION: Determinism")
    print("-" * 40)
    repeat_hash = hash(key)
    repeat_index = repeat_hash % table_capacity
    print(f"  Re-hash: {repeat_hash}, Re-index: {repeat_index}")
    print(f"  Consistent: {repeat_index == final_index}")
    
    return final_index
 
 
# Examples with different key types
trace_hashing_step("alice")
trace_hashing_step(42)
trace_hashing_step(("composite", "key", 123))
 
# Sample output:
# ============================================================
# INSERTION TRACE: Steps 1-2 (Hashing and Indexing)
# ============================================================
# Key: 'alice'
# Table capacity: 16
#
# STEP 1: Hashing
# ----------------------------------------
#   hash('alice') = -6512894263891412622
#   Hash code is a potentially large integer
#   Can be negative: True
#   Bit length: 63 bits
#
# STEP 2: Indexing
# ----------------------------------------
#   hash_code % capacity = -6512894263891412622 % 16
#   Raw index = 2
#   Final index = 2

Index Computation Details:

The modulo operation hash_code % capacity constrains the hash code to the range [0, capacity-1]. There are important implementation details:

Positive Modulo: Some languages (C, C++, Java) can return negative remainders for negative dividends. Hash table implementations must ensure the result is non-negative:

int index = (hashCode % capacity + capacity) % capacity;
// or
int index = hashCode & (capacity - 1);  // Only works for power-of-2 capacity

Power-of-Two Optimization: When capacity is a power of 2, modulo can be replaced with bitwise AND:

hash % 16 = hash & 15    // 16 = 2^4, so mask is 1111 in binary
hash % 32 = hash & 31    // 32 = 2^5, so mask is 11111

This is faster because bitwise operations are simpler than division. Most implementations use power-of-two sizes for this reason.

Hash Function Quality Matters

If the hash function produces clustered values (many keys mapping to similar indices), insertion performance degrades. A good hash function spreads keys uniformly across all buckets, minimizing collisions and keeping insertion O(1) on average.

Step 3: Navigating to the Bucket

With the index computed, we access the corresponding bucket. This step is deceptively simple—the power of hash tables lies in its O(1) nature.

Array Indexing:

Array access by index is a constant-time operation because the memory address is computed directly:

address = base_address + (index × element_size)

There's no searching, no iteration, no comparison—just arithmetic followed by memory access.

What We Find at the Bucket:

Depending on the table's state and collision strategy, we may find:

Empty bucket — No key has ever hashed here; we can insert directly
Occupied bucket with same key — We'll update the value
Occupied bucket with different key — A collision; we need resolution
Tombstone (deleted) — In open addressing, we might reuse this slot
Chain of entries — In chaining, we may need to search the chain

Converting Mermaid diagram...

bucket_location.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
from typing import Optional, Any
from dataclasses import dataclass
from enum import Enum
 
 
class BucketState(Enum):
    EMPTY = "empty"           # Never occupied
    OCCUPIED = "occupied"     # Contains entry
    DELETED = "deleted"       # Tombstone
 
 
@dataclass
class BucketInspection:
    """Result of inspecting a bucket during insertion."""
    index: int
    state: BucketState
    existing_key: Optional[Any]
    action_needed: str
 
 
def trace_bucket_location(table, key, index: int) -> BucketInspection:
    """
    Step 3: Navigate to and inspect the computed bucket.
    Determines what action is needed.
    """
    print(f"\nSTEP 3: Locating Bucket")
    print("-" * 40)
    print(f"  Computed index: {index}")
    print(f"  Accessing table.buckets[{index}]...")
    
    # Access the bucket (O(1) operation)
    bucket = table.buckets[index]
    
    # Analyze bucket state
    if bucket is None or bucket.state == BucketState.EMPTY:
        print(f"  Bucket state: EMPTY")
        print(f"  Action: Can insert directly at this index")
        return BucketInspection(
            index=index,
            state=BucketState.EMPTY,
            existing_key=None,
            action_needed="INSERT_NEW"
        )
    
    elif bucket.state == BucketState.DELETED:
        print(f"  Bucket state: DELETED (tombstone)")
        print(f"  Action: Can reuse for insertion, but must continue search")
        return BucketInspection(
            index=index,
            state=BucketState.DELETED,
            existing_key=None,
            action_needed="CHECK_FURTHER_THEN_INSERT"
        )
    
    elif bucket.state == BucketState.OCCUPIED:
        existing_key = bucket.entry.key
        print(f"  Bucket state: OCCUPIED")
        print(f"  Existing key: {repr(existing_key)}")
        
        if existing_key == key:
            print(f"  Key match: YES")
            print(f"  Action: Update value in place")
            return BucketInspection(
                index=index,
                state=BucketState.OCCUPIED,
                existing_key=existing_key,
                action_needed="UPDATE_VALUE"
            )
        else:
            print(f"  Key match: NO (collision!)")
            print(f"  Action: Need collision resolution")
            return BucketInspection(
                index=index,
                state=BucketState.OCCUPIED,
                existing_key=existing_key,
                action_needed="RESOLVE_COLLISION"
            )
 
 
# The key insight: This step is always O(1)
# The complexity comes in what we find and what happens next

Step 4: Searching for Existing Key

Before storing a value, we must check if the key already exists in the table. This check enforces the unique-key invariant and determines whether we're performing an insertion or an update.

Why This Step Exists:

Uniqueness: Hash tables guarantee each key appears at most once
Semantic correctness: table[key] = value should update, not duplicate
Memory management: Replacing avoids memory leaks from orphaned entries

The Search Process Differs by Strategy:

Chaining: Search the linked list at the bucket Open Addressing: Follow the probe sequence until key found or empty slot

In chaining, searching means traversing the linked list at the bucket. We compare each node's key until we find a match or reach the end:

SEARCH_CHAIN(bucket, key):
    current = bucket.head
    while current is not None:
        if current.key == key:
            return current  // Found existing entry
        current = current.next
    return None  // Key not in chain

Performance:

Best case: Key is first in chain → O(1)
Average case: Chain length is load_factor → O(load_factor)
Worst case: All n keys in same bucket → O(n)

chaining_search.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class ChainNode:
    def __init__(self, key, value, hash_code):
        self.key = key
        self.value = value
        self.hash_code = hash_code
        self.next = None
 
 
def search_chain(bucket_head, key, target_hash):
    """
    Search for a key in a chained bucket.
    Returns (found_node, predecessor) tuple.
    """
    predecessor = None
    current = bucket_head
    comparisons = 0
    
    print(f"\n  Searching chain for key: {repr(key)}")
    
    while current is not None:
        comparisons += 1
        
        # Optimization: compare hash first (fast integer comparison)
        if current.hash_code == target_hash:
            # Hash matches - now compare actual key
            if current.key == key:
                print(f"    [Node {comparisons}] Key {repr(current.key)}: MATCH!")
                print(f"    Found after {comparisons} comparison(s)")
                return current, predecessor
            else:
                print(f"    [Node {comparisons}] Key {repr(current.key)}: hash match but key differs")
        else:
            print(f"    [Node {comparisons}] Key {repr(current.key)}: hash mismatch")
        
        predecessor = current
        current = current.next
    
    print(f"    Reached end of chain after {comparisons} comparison(s)")
    print(f"    Key not found - this is a new entry")
    return None, predecessor  # predecessor useful for appending
 
 
# Example chain traversal
# bucket: ("alice", A) -> ("charlie", C) -> ("eve", E) -> None
# Searching for "charlie" requires 2 comparisons
# Searching for "new_key" requires 3 comparisons (full chain)

Step 5: Storing the Key-Value Entry

The search step determines what happens next. Based on the search result, we take one of several storage actions:

Case 1: Key Found (Update)

If the key already exists, we simply replace the old value with the new one. No structural changes to the table are needed—we're modifying an existing entry in place.

Case 2: Key Not Found, Empty Slot Available (Insert)

If the key doesn't exist and we found an empty slot (or, in open addressing, a tombstone), we create a new entry at that location.

Case 3: Key Not Found, Collision (Resolve and Insert)

If the key doesn't exist but the target slot is occupied by a different key, we must apply collision resolution before inserting.

storing_entry.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
from typing import Optional, Tuple
from dataclasses import dataclass
 
 
@dataclass
class InsertResult:
    """Describes what happened during insertion."""
    action: str          # "INSERTED" or "UPDATED"
    index: int           # Where the entry ended up
    probes: int          # Number of probes/comparisons needed
    old_value: Optional[any]  # Previous value if updated
 
 
class ChainingHashTable:
    """Chaining implementation focusing on insertion."""
    
    def insert(self, key, value) -> InsertResult:
        """
        Complete insertion implementation with detailed tracking.
        """
        hash_code = hash(key)
        index = hash_code % self.capacity
        
        # Search existing chain
        current = self.buckets[index]
        comparisons = 0
        
        while current is not None:
            comparisons += 1
            if current.hash_code == hash_code and current.key == key:
                # CASE 1: Key found - UPDATE
                old_value = current.value
                current.value = value
                return InsertResult(
                    action="UPDATED",
                    index=index,
                    probes=comparisons,
                    old_value=old_value
                )
            current = current.next
        
        # CASE 2: Key not found - INSERT at head of chain
        new_node = ChainNode(key, value, hash_code)
        new_node.next = self.buckets[index]  # Prepend to existing chain
        self.buckets[index] = new_node
        self.size += 1
        
        return InsertResult(
            action="INSERTED",
            index=index,
            probes=comparisons + 1,  # +1 for the insert itself
            old_value=None
        )
 
 
class OpenAddressingHashTable:
    """Open addressing implementation focusing on insertion."""
    
    def insert(self, key, value) -> InsertResult:
        """
        Complete insertion with linear probing.
        """
        if self.size >= self.capacity * 0.75:
            self._rehash()  # Prevent high load factor
        
        hash_code = hash(key)
        start_index = hash_code % self.capacity
        first_available = None
        probes = 0
        
        for i in range(self.capacity):
            probes += 1
            probe_index = (start_index + i) % self.capacity
            slot = self.slots[probe_index]
            
            if slot.state == SlotState.EMPTY:
                # CASE 2: Empty slot - INSERT here
                insert_index = first_available if first_available is not None else probe_index
                self._store_at(insert_index, key, value, hash_code)
                self.size += 1
                
                return InsertResult(
                    action="INSERTED",
                    index=insert_index,
                    probes=probes,
                    old_value=None
                )
            
            elif slot.state == SlotState.DELETED:
                # Remember first deleted slot as potential insertion point
                if first_available is None:
                    first_available = probe_index
                # Continue probing - key might exist ahead
            
            elif slot.state == SlotState.OCCUPIED:
                if slot.key == key:
                    # CASE 1: Key found - UPDATE
                    old_value = slot.value
                    slot.value = value
                    
                    return InsertResult(
                        action="UPDATED",
                        index=probe_index,
                        probes=probes,
                        old_value=old_value
                    )
                # Different key - continue probing (CASE 3: collision)
        
        # Table is full - should never happen if we rehash properly
        raise RuntimeError("Hash table is full - rehash failed")
    
    def _store_at(self, index: int, key, value, hash_code: int):
        """Store a new entry at the given index."""
        slot = self.slots[index]
        slot.state = SlotState.OCCUPIED
        slot.key = key
        slot.value = value
        slot.hash_code = hash_code  # Cache for fast comparisons
 
 
# Demonstration
def trace_full_insertion():
    """Trace a complete insertion sequence."""
    table = ChainingHashTable(capacity=8)
    
    insertions = [
        ("alice", 100),
        ("bob", 200),
        ("alice", 150),  # Update!
        ("charlie", 300),
    ]
    
    print("=" * 60)
    print("COMPLETE INSERTION TRACE")
    print("=" * 60)
    
    for key, value in insertions:
        print(f"\nInserting ({repr(key)}, {value}):")
        result = table.insert(key, value)
        print(f"  Action: {result.action}")
        print(f"  Final index: {result.index}")
        print(f"  Probes: {result.probes}")
        if result.old_value is not None:
            print(f"  Replaced value: {result.old_value} → {value}")

Insertion at Chain Head vs Tail

In chaining, new entries are typically inserted at the head of the chain (prepending) rather than the tail (appending). Prepending is O(1) because we immediately have the head pointer. Appending would require O(chain_length) to traverse to the end. If recently inserted keys are more likely to be accessed, prepending also improves cache locality.

Step 6: Table Maintenance and Rehashing

After successful insertion, the hash table may need maintenance. The primary concern is the load factor—if too many entries crowd the table, performance degrades. The solution is rehashing: creating a larger table and re-inserting all entries.

Rehashing Trigger:

if (size / capacity) > load_factor_threshold:
    rehash()

Typical thresholds:

Open addressing: 0.5 to 0.7 (performance degrades sharply above 0.75)
Chaining: 0.75 to 1.0 (degrades more gracefully)

The Rehashing Process:

Allocate a new, larger bucket array (typically 2× the size)
For each entry in the old table:
- Compute its hash index in the NEW table
- Insert it at the new location
Replace the old table with the new one
Free the old table's memory

rehashing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
class HashTableWithRehash:
    """
    Hash table demonstrating the rehashing process.
    """
    
    LOAD_FACTOR_THRESHOLD = 0.75
    
    def __init__(self, initial_capacity: int = 16):
        self.capacity = initial_capacity
        self.size = 0
        self.buckets = [None] * self.capacity
        print(f"Created table with capacity {self.capacity}")
    
    def _load_factor(self) -> float:
        return self.size / self.capacity
    
    def _needs_rehash(self) -> bool:
        return self._load_factor() > self.LOAD_FACTOR_THRESHOLD
    
    def _rehash(self):
        """
        Resize the table and re-insert all entries.
        """
        old_capacity = self.capacity
        old_buckets = self.buckets
        
        # Double the capacity
        self.capacity *= 2
        self.buckets = [None] * self.capacity
        self.size = 0  # Reset - will be incremented during re-insertion
        
        print(f"\n{'='*50}")
        print(f"REHASHING: {old_capacity} → {self.capacity}")
        print(f"{'='*50}")
        
        entries_moved = 0
        
        # Re-insert all existing entries
        for bucket in old_buckets:
            if bucket is None:
                continue
            
            # For chaining: iterate through chain
            current = bucket
            while current is not None:
                # Insert into new table
                old_index = hash(current.key) % old_capacity
                new_index = hash(current.key) % self.capacity
                
                print(f"  Moving ({current.key}): index {old_index} → {new_index}")
                
                # Perform insertion (simplified - direct store)
                self._insert_internal(current.key, current.value)
                entries_moved += 1
                current = current.next
        
        print(f"Rehashing complete: {entries_moved} entries moved")
        print(f"New load factor: {self._load_factor():.2%}")
    
    def insert(self, key, value):
        """Insert with automatic rehashing."""
        # Check BEFORE insert if we need to rehash
        if self._needs_rehash():
            print(f"\nLoad factor {self._load_factor():.2%} exceeds threshold {self.LOAD_FACTOR_THRESHOLD:.0%}")
            self._rehash()
        
        self._insert_internal(key, value)
    
    def _insert_internal(self, key, value):
        """Internal insertion (no rehash check)."""
        hash_code = hash(key)
        index = hash_code % self.capacity
        
        # Simplified: prepend to chain (ignoring update case)
        new_node = ChainNode(key, value, hash_code)
        new_node.next = self.buckets[index]
        self.buckets[index] = new_node
        self.size += 1
 
 
# Demonstrate rehashing trigger
def demo_rehashing():
    table = HashTableWithRehash(initial_capacity=4)
    
    keys = ["a", "b", "c", "d", "e", "f", "g", "h"]
    
    for key in keys:
        print(f"\nInserting '{key}'...")
        print(f"  Before: size={table.size}, capacity={table.capacity}, "
              f"load={table._load_factor():.2%}")
        table.insert(key, key.upper())
        print(f"  After: size={table.size}, capacity={table.capacity}")
 
 
# Output shows rehashing at load factor threshold:
# Inserting 'a'...
#   Before: size=0, capacity=4, load=0.00%
#   After: size=1, capacity=4
# ...
# Inserting 'd'...
#   Before: size=3, capacity=4, load=75.00%
# 
# Load factor 100.00% exceeds threshold 75%
# ==================================================
# REHASHING: 4 → 8
# ==================================================
#   Moving (c): index 2 → 2
#   Moving (b): index 1 → 1
#   Moving (a): index 1 → 1
# Rehashing complete: 3 entries moved
# New load factor: 37.50%

Amortized O(1) Insertion

Rehashing is O(n)—we must re-insert every entry. But because the table doubles in size, the next rehash won't occur until n more insertions. Spreading the O(n) cost over those n insertions gives O(1) amortized time per insertion. This is the same principle that makes dynamic arrays (Python lists, Java ArrayList) efficient.

Complete Insertion Example: Start to Finish

Let's trace a complete insertion through all six steps, combining everything we've learned into one comprehensive example.

complete_insertion_trace.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
def complete_insertion_trace(key: str, value: int, table):
    """
    Trace every step of hash table insertion.
    """
    print("\n" + "=" * 70)
    print(f"COMPLETE INSERTION TRACE: ({repr(key)}, {value})")
    print("=" * 70)
    
    # ========================================
    # STEP 1: Hash the key
    # ========================================
    print("\n[STEP 1] HASHING THE KEY")
    print("-" * 50)
    
    hash_code = hash(key)
    print(f"Input key: {repr(key)}")
    print(f"hash({repr(key)}) = {hash_code}")
    print(f"Hash computation: O(k) where k = len(key)")
    
    # ========================================
    # STEP 2: Compute bucket index
    # ========================================
    print("\n[STEP 2] COMPUTING BUCKET INDEX")
    print("-" * 50)
    
    index = hash_code % table.capacity
    print(f"Table capacity: {table.capacity}")
    print(f"Index = {hash_code} % {table.capacity} = {index}")
    print(f"Index computation: O(1)")
    
    # ========================================
    # STEP 3: Navigate to bucket
    # ========================================
    print("\n[STEP 3] NAVIGATING TO BUCKET")
    print("-" * 50)
    
    bucket = table.buckets[index]
    bucket_state = "EMPTY" if bucket is None else "OCCUPIED"
    print(f"Accessing table.buckets[{index}]")
    print(f"Bucket state: {bucket_state}")
    print(f"Array access: O(1)")
    
    # ========================================
    # STEP 4: Search for existing key
    # ========================================
    print("\n[STEP 4] SEARCHING FOR EXISTING KEY")
    print("-" * 50)
    
    found_node = None
    search_steps = 0
    current = bucket
    
    while current is not None:
        search_steps += 1
        print(f"  Comparing with node {search_steps}: key={repr(current.key)}")
        
        if current.hash_code == hash_code and current.key == key:
            found_node = current
            print(f"  → MATCH FOUND at node {search_steps}")
            break
        
        current = current.next
    
    if found_node is None:
        print(f"  → Key not found after {search_steps} comparison(s)")
    
    print(f"Search complexity: O(chain_length) = O({search_steps})")
    
    # ========================================
    # STEP 5: Store the entry
    # ========================================
    print("\n[STEP 5] STORING THE ENTRY")
    print("-" * 50)
    
    if found_node is not None:
        # UPDATE existing entry
        old_value = found_node.value
        found_node.value = value
        print(f"Action: UPDATE existing entry")
        print(f"Old value: {old_value}")
        print(f"New value: {value}")
        action = "UPDATED"
    else:
        # INSERT new entry at head
        new_node = ChainNode(key, value, hash_code)
        new_node.next = table.buckets[index]
        table.buckets[index] = new_node
        table.size += 1
        print(f"Action: INSERT new entry at head of chain")
        print(f"Table size: {table.size - 1} → {table.size}")
        action = "INSERTED"
    
    print(f"Storage: O(1)")
    
    # ========================================
    # STEP 6: Check for rehashing
    # ========================================
    print("\n[STEP 6] MAINTENANCE CHECK")
    print("-" * 50)
    
    load_factor = table.size / table.capacity
    threshold = 0.75
    
    print(f"Current load factor: {table.size}/{table.capacity} = {load_factor:.2%}")
    print(f"Threshold: {threshold:.0%}")
    
    if load_factor > threshold:
        print(f"→ REHASH NEEDED!")
        print(f"  This would trigger O(n) rehashing")
        # In real code: table._rehash()
    else:
        remaining = int(table.capacity * threshold) - table.size
        print(f"→ No rehash needed")
        print(f"  {remaining} more insertions before rehash")
    
    # ========================================
    # Summary
    # ========================================
    print("\n" + "=" * 70)
    print("INSERTION COMPLETE")
    print("=" * 70)
    
    total_ops = 1 + 1 + 1 + search_steps + 1 + 1  # Each step's primary operation
    print(f"Result: {action}")
    print(f"Final position: bucket[{index}]")
    print(f"Total comparisons: {search_steps}")
    print(f"Time complexity: O(1) average, O(n) worst case")
    
    return action, index
 
 
# Run the trace
class SimpleTable:
    def __init__(self):
        self.capacity = 8
        self.size = 3
        # Pre-populate bucket 2 with a chain: bob -> charlie
        self.buckets = [None] * 8
        charlie = ChainNode("charlie", 300, hash("charlie"))
        bob = ChainNode("bob", 200, hash("bob"))
        bob.next = charlie
        self.buckets[hash("bob") % 8] = bob
 
# table = SimpleTable()
# complete_insertion_trace("alice", 100, table)   # Insert new
# complete_insertion_trace("bob", 250, table)     # Update existing

Summary: How Insertion Works

We've traced the complete journey of hash table insertion from key to stored value. Let's consolidate the essential knowledge:

Key Takeaways

•Insertion follows six steps: Hash → Index → Locate → Search → Store → Maintain. Each step is O(1) except search (which depends on collisions) and maintenance (occasional O(n) rehash).
•Hashing transforms keys to indices via hash(key) % capacity. This must be deterministic for correct retrieval.
•The key uniqueness invariant ensures no duplicates: inserting an existing key updates the value rather than creating a second entry.
•Collision handling differs by strategy: Chaining searches a linked list; open addressing probes through the array.
•New entries prepend to chains in chaining (O(1)) or occupy the first available slot in open addressing.
•Rehashing prevents performance degradation by growing the table when load factor exceeds the threshold (typically 0.75).
•Amortized insertion is O(1) because rare O(n) rehashes are spread across many insertions.

What's Next:

Now that you understand insertion, the next page explores the symmetric operation: lookup. We'll trace how the hash table finds values by key, handling the same collision scenarios but with different decision logic for not-found cases.

Page Complete

You can now trace hash table insertion step by step, from initial hashing through collision resolution to final storage. This detailed understanding of "what happens when" is essential for debugging, optimization, and designing efficient systems.

3 / 4

Loading learning content...

Data Structures & AlgorithmsHash Table Structure & Mechanism

Hash Tables — Structure & Mechanism

LevelIntermediate

Duration60 mins

TopicHash Table Structure & Mechanism

3 / 4

How Insertion Works

Storing Data: The Journey from Key to Bucket

What You Will Learn

The Insertion Algorithm: Overview

Before diving into details, let's establish the high-level flow of hash table insertion. This algorithm applies regardless of the collision resolution strategy:

The Universal Insertion Algorithm:

INSERT(table, key, value):
    1. HASH: Compute hash_code = hash(key)
    2. INDEX: Compute bucket_index = hash_code mod table.capacity
    3. LOCATE: Navigate to table.buckets[bucket_index]
    4. SEARCH: Check if key already exists at this location
    5. STORE: Either update existing entry or create new one
    6. MAINTAIN: Trigger rehashing if load factor exceeds threshold

Each of these steps has nuances that affect correctness and performance. Let's examine each in depth.

Insertion Step Breakdown

•Step 1: Hashing — Transform the key into a raw hash code. This must be deterministic (same key → same hash) and ideally distributes keys uniformly.
•Step 2: Indexing — Constrain the hash code to a valid array index using modulo arithmetic. The result must be in range [0, capacity-1].
•Step 3: Location — Access the computed bucket in O(1) time via array indexing.
•Step 4: Search — Check whether the key already exists. This step differs significantly between chaining and open addressing.
•Step 5: Storage — Store the key-value pair, either updating an existing entry or creating a new one.
•Step 6: Maintenance — Check if the table is overloaded and needs to grow (rehashing).

The Key Invariant

Step 1 & 2: Hashing and Indexing

The first two steps transform an arbitrary key into a specific array index. This transformation is the "hash" in "hash table" and is central to achieving O(1) access.

Computing the Hash Code:

The hash function takes the key and produces an integer hash code. Different key types require different hashing approaches:

Strings: Combine character codes (e.g., polynomial hashing)
Integers: Often used directly or with multiplication mixing
Objects: Combine hashes of component fields
Composite Keys: Hash each component and combine

The hash code is typically a large integer (potentially negative) that doesn't directly correspond to any array index.

hashing_step.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def trace_hashing_step(key, table_capacity: int = 16):
    """
    Trace the hashing and indexing steps in detail.
    """
    print(f"\n{'='*60}")
    print(f"INSERTION TRACE: Steps 1-2 (Hashing and Indexing)")
    print(f"{'='*60}")
    print(f"Key: {repr(key)}")
    print(f"Table capacity: {table_capacity}")
    print()
    
    # STEP 1: Compute hash code
    print("STEP 1: Hashing")
    print("-" * 40)
    
    # Python's built-in hash
    hash_code = hash(key)
    print(f"  hash({repr(key)}) = {hash_code}")
    
    # Note on hash code range
    print(f"  Hash code is a potentially large integer")
    print(f"  Can be negative: {hash_code < 0}")
    print(f"  Bit length: {hash_code.bit_length()} bits")
    print()
    
    # STEP 2: Compute array index
    print("STEP 2: Indexing")
    print("-" * 40)
    
    # Standard modulo approach
    raw_index = hash_code % table_capacity
    print(f"  hash_code % capacity = {hash_code} % {table_capacity}")
    print(f"  Raw index = {raw_index}")
    
    # Handle negative results (Python % always returns non-negative for positive divisor)
    final_index = raw_index if raw_index >= 0 else raw_index + table_capacity
    print(f"  Final index = {final_index}")
    print()
    
    # Demonstrate why same key always gives same index
    print("VERIFICATION: Determinism")
    print("-" * 40)
    repeat_hash = hash(key)
    repeat_index = repeat_hash % table_capacity
    print(f"  Re-hash: {repeat_hash}, Re-index: {repeat_index}")
    print(f"  Consistent: {repeat_index == final_index}")
    
    return final_index
 
 
# Examples with different key types
trace_hashing_step("alice")
trace_hashing_step(42)
trace_hashing_step(("composite", "key", 123))
 
# Sample output:
# ============================================================
# INSERTION TRACE: Steps 1-2 (Hashing and Indexing)
# ============================================================
# Key: 'alice'
# Table capacity: 16
#
# STEP 1: Hashing
# ----------------------------------------
#   hash('alice') = -6512894263891412622
#   Hash code is a potentially large integer
#   Can be negative: True
#   Bit length: 63 bits
#
# STEP 2: Indexing
# ----------------------------------------
#   hash_code % capacity = -6512894263891412622 % 16
#   Raw index = 2
#   Final index = 2

Index Computation Details:

The modulo operation hash_code % capacity constrains the hash code to the range [0, capacity-1]. There are important implementation details:

Positive Modulo: Some languages (C, C++, Java) can return negative remainders for negative dividends. Hash table implementations must ensure the result is non-negative:

int index = (hashCode % capacity + capacity) % capacity;
// or
int index = hashCode & (capacity - 1);  // Only works for power-of-2 capacity

Power-of-Two Optimization: When capacity is a power of 2, modulo can be replaced with bitwise AND:

hash % 16 = hash & 15    // 16 = 2^4, so mask is 1111 in binary
hash % 32 = hash & 31    // 32 = 2^5, so mask is 11111

This is faster because bitwise operations are simpler than division. Most implementations use power-of-two sizes for this reason.

Hash Function Quality Matters

Step 3: Navigating to the Bucket

With the index computed, we access the corresponding bucket. This step is deceptively simple—the power of hash tables lies in its O(1) nature.

Array Indexing:

Array access by index is a constant-time operation because the memory address is computed directly:

address = base_address + (index × element_size)

There's no searching, no iteration, no comparison—just arithmetic followed by memory access.

What We Find at the Bucket:

Depending on the table's state and collision strategy, we may find:

Empty bucket — No key has ever hashed here; we can insert directly
Occupied bucket with same key — We'll update the value
Occupied bucket with different key — A collision; we need resolution
Tombstone (deleted) — In open addressing, we might reuse this slot
Chain of entries — In chaining, we may need to search the chain

Converting Mermaid diagram...

bucket_location.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
from typing import Optional, Any
from dataclasses import dataclass
from enum import Enum
 
 
class BucketState(Enum):
    EMPTY = "empty"           # Never occupied
    OCCUPIED = "occupied"     # Contains entry
    DELETED = "deleted"       # Tombstone
 
 
@dataclass
class BucketInspection:
    """Result of inspecting a bucket during insertion."""
    index: int
    state: BucketState
    existing_key: Optional[Any]
    action_needed: str
 
 
def trace_bucket_location(table, key, index: int) -> BucketInspection:
    """
    Step 3: Navigate to and inspect the computed bucket.
    Determines what action is needed.
    """
    print(f"\nSTEP 3: Locating Bucket")
    print("-" * 40)
    print(f"  Computed index: {index}")
    print(f"  Accessing table.buckets[{index}]...")
    
    # Access the bucket (O(1) operation)
    bucket = table.buckets[index]
    
    # Analyze bucket state
    if bucket is None or bucket.state == BucketState.EMPTY:
        print(f"  Bucket state: EMPTY")
        print(f"  Action: Can insert directly at this index")
        return BucketInspection(
            index=index,
            state=BucketState.EMPTY,
            existing_key=None,
            action_needed="INSERT_NEW"
        )
    
    elif bucket.state == BucketState.DELETED:
        print(f"  Bucket state: DELETED (tombstone)")
        print(f"  Action: Can reuse for insertion, but must continue search")
        return BucketInspection(
            index=index,
            state=BucketState.DELETED,
            existing_key=None,
            action_needed="CHECK_FURTHER_THEN_INSERT"
        )
    
    elif bucket.state == BucketState.OCCUPIED:
        existing_key = bucket.entry.key
        print(f"  Bucket state: OCCUPIED")
        print(f"  Existing key: {repr(existing_key)}")
        
        if existing_key == key:
            print(f"  Key match: YES")
            print(f"  Action: Update value in place")
            return BucketInspection(
                index=index,
                state=BucketState.OCCUPIED,
                existing_key=existing_key,
                action_needed="UPDATE_VALUE"
            )
        else:
            print(f"  Key match: NO (collision!)")
            print(f"  Action: Need collision resolution")
            return BucketInspection(
                index=index,
                state=BucketState.OCCUPIED,
                existing_key=existing_key,
                action_needed="RESOLVE_COLLISION"
            )
 
 
# The key insight: This step is always O(1)
# The complexity comes in what we find and what happens next

Step 4: Searching for Existing Key

Before storing a value, we must check if the key already exists in the table. This check enforces the unique-key invariant and determines whether we're performing an insertion or an update.

Why This Step Exists:

Uniqueness: Hash tables guarantee each key appears at most once
Semantic correctness: table[key] = value should update, not duplicate
Memory management: Replacing avoids memory leaks from orphaned entries

The Search Process Differs by Strategy:

Chaining: Search the linked list at the bucket Open Addressing: Follow the probe sequence until key found or empty slot

In chaining, searching means traversing the linked list at the bucket. We compare each node's key until we find a match or reach the end:

SEARCH_CHAIN(bucket, key):
    current = bucket.head
    while current is not None:
        if current.key == key:
            return current  // Found existing entry
        current = current.next
    return None  // Key not in chain

Performance:

Best case: Key is first in chain → O(1)
Average case: Chain length is load_factor → O(load_factor)
Worst case: All n keys in same bucket → O(n)

chaining_search.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class ChainNode:
    def __init__(self, key, value, hash_code):
        self.key = key
        self.value = value
        self.hash_code = hash_code
        self.next = None
 
 
def search_chain(bucket_head, key, target_hash):
    """
    Search for a key in a chained bucket.
    Returns (found_node, predecessor) tuple.
    """
    predecessor = None
    current = bucket_head
    comparisons = 0
    
    print(f"\n  Searching chain for key: {repr(key)}")
    
    while current is not None:
        comparisons += 1
        
        # Optimization: compare hash first (fast integer comparison)
        if current.hash_code == target_hash:
            # Hash matches - now compare actual key
            if current.key == key:
                print(f"    [Node {comparisons}] Key {repr(current.key)}: MATCH!")
                print(f"    Found after {comparisons} comparison(s)")
                return current, predecessor
            else:
                print(f"    [Node {comparisons}] Key {repr(current.key)}: hash match but key differs")
        else:
            print(f"    [Node {comparisons}] Key {repr(current.key)}: hash mismatch")
        
        predecessor = current
        current = current.next
    
    print(f"    Reached end of chain after {comparisons} comparison(s)")
    print(f"    Key not found - this is a new entry")
    return None, predecessor  # predecessor useful for appending
 
 
# Example chain traversal
# bucket: ("alice", A) -> ("charlie", C) -> ("eve", E) -> None
# Searching for "charlie" requires 2 comparisons
# Searching for "new_key" requires 3 comparisons (full chain)

Step 5: Storing the Key-Value Entry

The search step determines what happens next. Based on the search result, we take one of several storage actions:

Case 1: Key Found (Update)

If the key already exists, we simply replace the old value with the new one. No structural changes to the table are needed—we're modifying an existing entry in place.

Case 2: Key Not Found, Empty Slot Available (Insert)

If the key doesn't exist and we found an empty slot (or, in open addressing, a tombstone), we create a new entry at that location.

Case 3: Key Not Found, Collision (Resolve and Insert)

If the key doesn't exist but the target slot is occupied by a different key, we must apply collision resolution before inserting.

storing_entry.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
from typing import Optional, Tuple
from dataclasses import dataclass
 
 
@dataclass
class InsertResult:
    """Describes what happened during insertion."""
    action: str          # "INSERTED" or "UPDATED"
    index: int           # Where the entry ended up
    probes: int          # Number of probes/comparisons needed
    old_value: Optional[any]  # Previous value if updated
 
 
class ChainingHashTable:
    """Chaining implementation focusing on insertion."""
    
    def insert(self, key, value) -> InsertResult:
        """
        Complete insertion implementation with detailed tracking.
        """
        hash_code = hash(key)
        index = hash_code % self.capacity
        
        # Search existing chain
        current = self.buckets[index]
        comparisons = 0
        
        while current is not None:
            comparisons += 1
            if current.hash_code == hash_code and current.key == key:
                # CASE 1: Key found - UPDATE
                old_value = current.value
                current.value = value
                return InsertResult(
                    action="UPDATED",
                    index=index,
                    probes=comparisons,
                    old_value=old_value
                )
            current = current.next
        
        # CASE 2: Key not found - INSERT at head of chain
        new_node = ChainNode(key, value, hash_code)
        new_node.next = self.buckets[index]  # Prepend to existing chain
        self.buckets[index] = new_node
        self.size += 1
        
        return InsertResult(
            action="INSERTED",
            index=index,
            probes=comparisons + 1,  # +1 for the insert itself
            old_value=None
        )
 
 
class OpenAddressingHashTable:
    """Open addressing implementation focusing on insertion."""
    
    def insert(self, key, value) -> InsertResult:
        """
        Complete insertion with linear probing.
        """
        if self.size >= self.capacity * 0.75:
            self._rehash()  # Prevent high load factor
        
        hash_code = hash(key)
        start_index = hash_code % self.capacity
        first_available = None
        probes = 0
        
        for i in range(self.capacity):
            probes += 1
            probe_index = (start_index + i) % self.capacity
            slot = self.slots[probe_index]
            
            if slot.state == SlotState.EMPTY:
                # CASE 2: Empty slot - INSERT here
                insert_index = first_available if first_available is not None else probe_index
                self._store_at(insert_index, key, value, hash_code)
                self.size += 1
                
                return InsertResult(
                    action="INSERTED",
                    index=insert_index,
                    probes=probes,
                    old_value=None
                )
            
            elif slot.state == SlotState.DELETED:
                # Remember first deleted slot as potential insertion point
                if first_available is None:
                    first_available = probe_index
                # Continue probing - key might exist ahead
            
            elif slot.state == SlotState.OCCUPIED:
                if slot.key == key:
                    # CASE 1: Key found - UPDATE
                    old_value = slot.value
                    slot.value = value
                    
                    return InsertResult(
                        action="UPDATED",
                        index=probe_index,
                        probes=probes,
                        old_value=old_value
                    )
                # Different key - continue probing (CASE 3: collision)
        
        # Table is full - should never happen if we rehash properly
        raise RuntimeError("Hash table is full - rehash failed")
    
    def _store_at(self, index: int, key, value, hash_code: int):
        """Store a new entry at the given index."""
        slot = self.slots[index]
        slot.state = SlotState.OCCUPIED
        slot.key = key
        slot.value = value
        slot.hash_code = hash_code  # Cache for fast comparisons
 
 
# Demonstration
def trace_full_insertion():
    """Trace a complete insertion sequence."""
    table = ChainingHashTable(capacity=8)
    
    insertions = [
        ("alice", 100),
        ("bob", 200),
        ("alice", 150),  # Update!
        ("charlie", 300),
    ]
    
    print("=" * 60)
    print("COMPLETE INSERTION TRACE")
    print("=" * 60)
    
    for key, value in insertions:
        print(f"\nInserting ({repr(key)}, {value}):")
        result = table.insert(key, value)
        print(f"  Action: {result.action}")
        print(f"  Final index: {result.index}")
        print(f"  Probes: {result.probes}")
        if result.old_value is not None:
            print(f"  Replaced value: {result.old_value} → {value}")

Insertion at Chain Head vs Tail

Step 6: Table Maintenance and Rehashing

Rehashing Trigger:

if (size / capacity) > load_factor_threshold:
    rehash()

Typical thresholds:

Open addressing: 0.5 to 0.7 (performance degrades sharply above 0.75)
Chaining: 0.75 to 1.0 (degrades more gracefully)

The Rehashing Process:

Allocate a new, larger bucket array (typically 2× the size)
For each entry in the old table:
- Compute its hash index in the NEW table
- Insert it at the new location
Replace the old table with the new one
Free the old table's memory

rehashing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
class HashTableWithRehash:
    """
    Hash table demonstrating the rehashing process.
    """
    
    LOAD_FACTOR_THRESHOLD = 0.75
    
    def __init__(self, initial_capacity: int = 16):
        self.capacity = initial_capacity
        self.size = 0
        self.buckets = [None] * self.capacity
        print(f"Created table with capacity {self.capacity}")
    
    def _load_factor(self) -> float:
        return self.size / self.capacity
    
    def _needs_rehash(self) -> bool:
        return self._load_factor() > self.LOAD_FACTOR_THRESHOLD
    
    def _rehash(self):
        """
        Resize the table and re-insert all entries.
        """
        old_capacity = self.capacity
        old_buckets = self.buckets
        
        # Double the capacity
        self.capacity *= 2
        self.buckets = [None] * self.capacity
        self.size = 0  # Reset - will be incremented during re-insertion
        
        print(f"\n{'='*50}")
        print(f"REHASHING: {old_capacity} → {self.capacity}")
        print(f"{'='*50}")
        
        entries_moved = 0
        
        # Re-insert all existing entries
        for bucket in old_buckets:
            if bucket is None:
                continue
            
            # For chaining: iterate through chain
            current = bucket
            while current is not None:
                # Insert into new table
                old_index = hash(current.key) % old_capacity
                new_index = hash(current.key) % self.capacity
                
                print(f"  Moving ({current.key}): index {old_index} → {new_index}")
                
                # Perform insertion (simplified - direct store)
                self._insert_internal(current.key, current.value)
                entries_moved += 1
                current = current.next
        
        print(f"Rehashing complete: {entries_moved} entries moved")
        print(f"New load factor: {self._load_factor():.2%}")
    
    def insert(self, key, value):
        """Insert with automatic rehashing."""
        # Check BEFORE insert if we need to rehash
        if self._needs_rehash():
            print(f"\nLoad factor {self._load_factor():.2%} exceeds threshold {self.LOAD_FACTOR_THRESHOLD:.0%}")
            self._rehash()
        
        self._insert_internal(key, value)
    
    def _insert_internal(self, key, value):
        """Internal insertion (no rehash check)."""
        hash_code = hash(key)
        index = hash_code % self.capacity
        
        # Simplified: prepend to chain (ignoring update case)
        new_node = ChainNode(key, value, hash_code)
        new_node.next = self.buckets[index]
        self.buckets[index] = new_node
        self.size += 1
 
 
# Demonstrate rehashing trigger
def demo_rehashing():
    table = HashTableWithRehash(initial_capacity=4)
    
    keys = ["a", "b", "c", "d", "e", "f", "g", "h"]
    
    for key in keys:
        print(f"\nInserting '{key}'...")
        print(f"  Before: size={table.size}, capacity={table.capacity}, "
              f"load={table._load_factor():.2%}")
        table.insert(key, key.upper())
        print(f"  After: size={table.size}, capacity={table.capacity}")
 
 
# Output shows rehashing at load factor threshold:
# Inserting 'a'...
#   Before: size=0, capacity=4, load=0.00%
#   After: size=1, capacity=4
# ...
# Inserting 'd'...
#   Before: size=3, capacity=4, load=75.00%
# 
# Load factor 100.00% exceeds threshold 75%
# ==================================================
# REHASHING: 4 → 8
# ==================================================
#   Moving (c): index 2 → 2
#   Moving (b): index 1 → 1
#   Moving (a): index 1 → 1
# Rehashing complete: 3 entries moved
# New load factor: 37.50%

Amortized O(1) Insertion

Complete Insertion Example: Start to Finish

Let's trace a complete insertion through all six steps, combining everything we've learned into one comprehensive example.

complete_insertion_trace.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
def complete_insertion_trace(key: str, value: int, table):
    """
    Trace every step of hash table insertion.
    """
    print("\n" + "=" * 70)
    print(f"COMPLETE INSERTION TRACE: ({repr(key)}, {value})")
    print("=" * 70)
    
    # ========================================
    # STEP 1: Hash the key
    # ========================================
    print("\n[STEP 1] HASHING THE KEY")
    print("-" * 50)
    
    hash_code = hash(key)
    print(f"Input key: {repr(key)}")
    print(f"hash({repr(key)}) = {hash_code}")
    print(f"Hash computation: O(k) where k = len(key)")
    
    # ========================================
    # STEP 2: Compute bucket index
    # ========================================
    print("\n[STEP 2] COMPUTING BUCKET INDEX")
    print("-" * 50)
    
    index = hash_code % table.capacity
    print(f"Table capacity: {table.capacity}")
    print(f"Index = {hash_code} % {table.capacity} = {index}")
    print(f"Index computation: O(1)")
    
    # ========================================
    # STEP 3: Navigate to bucket
    # ========================================
    print("\n[STEP 3] NAVIGATING TO BUCKET")
    print("-" * 50)
    
    bucket = table.buckets[index]
    bucket_state = "EMPTY" if bucket is None else "OCCUPIED"
    print(f"Accessing table.buckets[{index}]")
    print(f"Bucket state: {bucket_state}")
    print(f"Array access: O(1)")
    
    # ========================================
    # STEP 4: Search for existing key
    # ========================================
    print("\n[STEP 4] SEARCHING FOR EXISTING KEY")
    print("-" * 50)
    
    found_node = None
    search_steps = 0
    current = bucket
    
    while current is not None:
        search_steps += 1
        print(f"  Comparing with node {search_steps}: key={repr(current.key)}")
        
        if current.hash_code == hash_code and current.key == key:
            found_node = current
            print(f"  → MATCH FOUND at node {search_steps}")
            break
        
        current = current.next
    
    if found_node is None:
        print(f"  → Key not found after {search_steps} comparison(s)")
    
    print(f"Search complexity: O(chain_length) = O({search_steps})")
    
    # ========================================
    # STEP 5: Store the entry
    # ========================================
    print("\n[STEP 5] STORING THE ENTRY")
    print("-" * 50)
    
    if found_node is not None:
        # UPDATE existing entry
        old_value = found_node.value
        found_node.value = value
        print(f"Action: UPDATE existing entry")
        print(f"Old value: {old_value}")
        print(f"New value: {value}")
        action = "UPDATED"
    else:
        # INSERT new entry at head
        new_node = ChainNode(key, value, hash_code)
        new_node.next = table.buckets[index]
        table.buckets[index] = new_node
        table.size += 1
        print(f"Action: INSERT new entry at head of chain")
        print(f"Table size: {table.size - 1} → {table.size}")
        action = "INSERTED"
    
    print(f"Storage: O(1)")
    
    # ========================================
    # STEP 6: Check for rehashing
    # ========================================
    print("\n[STEP 6] MAINTENANCE CHECK")
    print("-" * 50)
    
    load_factor = table.size / table.capacity
    threshold = 0.75
    
    print(f"Current load factor: {table.size}/{table.capacity} = {load_factor:.2%}")
    print(f"Threshold: {threshold:.0%}")
    
    if load_factor > threshold:
        print(f"→ REHASH NEEDED!")
        print(f"  This would trigger O(n) rehashing")
        # In real code: table._rehash()
    else:
        remaining = int(table.capacity * threshold) - table.size
        print(f"→ No rehash needed")
        print(f"  {remaining} more insertions before rehash")
    
    # ========================================
    # Summary
    # ========================================
    print("\n" + "=" * 70)
    print("INSERTION COMPLETE")
    print("=" * 70)
    
    total_ops = 1 + 1 + 1 + search_steps + 1 + 1  # Each step's primary operation
    print(f"Result: {action}")
    print(f"Final position: bucket[{index}]")
    print(f"Total comparisons: {search_steps}")
    print(f"Time complexity: O(1) average, O(n) worst case")
    
    return action, index
 
 
# Run the trace
class SimpleTable:
    def __init__(self):
        self.capacity = 8
        self.size = 3
        # Pre-populate bucket 2 with a chain: bob -> charlie
        self.buckets = [None] * 8
        charlie = ChainNode("charlie", 300, hash("charlie"))
        bob = ChainNode("bob", 200, hash("bob"))
        bob.next = charlie
        self.buckets[hash("bob") % 8] = bob
 
# table = SimpleTable()
# complete_insertion_trace("alice", 100, table)   # Insert new
# complete_insertion_trace("bob", 250, table)     # Update existing

Summary: How Insertion Works

We've traced the complete journey of hash table insertion from key to stored value. Let's consolidate the essential knowledge:

Key Takeaways

•Insertion follows six steps: Hash → Index → Locate → Search → Store → Maintain. Each step is O(1) except search (which depends on collisions) and maintenance (occasional O(n) rehash).
•Hashing transforms keys to indices via hash(key) % capacity. This must be deterministic for correct retrieval.
•The key uniqueness invariant ensures no duplicates: inserting an existing key updates the value rather than creating a second entry.
•Collision handling differs by strategy: Chaining searches a linked list; open addressing probes through the array.
•New entries prepend to chains in chaining (O(1)) or occupy the first available slot in open addressing.
•Rehashing prevents performance degradation by growing the table when load factor exceeds the threshold (typically 0.75).
•Amortized insertion is O(1) because rare O(n) rehashes are spread across many insertions.

What's Next:

Page Complete

3 / 4