Database Management SystemsStatic Hashing

Static Hashing: Fundamentals and Implementation

LevelIntermediate

Duration90 mins

TopicStatic Hashing

4 / 5

Chaining: Deep Dive into Linked Overflow Pages

The Workhouse of Hash Index Overflow

Chaining is the unsung hero of hash-based indexing. While the hash function gets the glory for its O(1) lookup promise, chaining quietly handles the messy reality of buckets that exceed capacity. Every production hash index relies on chaining to gracefully degrade rather than catastrophically fail when the ideal case breaks down.

In this deep dive, we move beyond the conceptual introduction to examine the engineering details that make chaining work in real database systems. We'll explore how chains are structured at the byte level, how concurrent access is managed, how chains interact with the buffer pool, and what optimizations distinguish amateur implementations from production-grade ones.

Mastering chaining implementation is essential for anyone working on storage engines, key-value stores, or database internals. Even if you never implement a hash index yourself, understanding these details helps you reason about performance characteristics and troubleshoot production issues.

What You Will Learn

By the end of this page, you will understand: (1) Detailed chain structure and page linkage, (2) Insertion, deletion, and lookup algorithms with chains, (3) Chain compaction and reorganization strategies, (4) Concurrency control for chained buckets, and (5) Production optimizations for chain performance.

Chain Structure: Anatomy of Linked Overflow Pages

A chain in a hash index is a singly-linked list of pages, all belonging to the same logical bucket. Understanding the precise structure is essential for implementing correct and efficient operations.

Page Types in a Chain

A chain comprises two types of pages:

Primary Page — The first page of the bucket, whose location is determined by the bucket ID. Always exists (even if empty).
Overflow Pages — Additional pages linked from the primary page when it fills. May be zero or more.

Each page in the chain has identical structure; the only difference is how they're located:

Primary: Computed directly from bucket ID (page_offset = bucket_id × page_size)
Overflow: Found by following links from preceding pages

chain_page_structure.h
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
/* Detailed bucket page layout for chained hash index */
 
#define PAGE_SIZE       8192    /* 8 KB pages */
#define PAGE_HEADER_SIZE  32    /* Fixed header overhead */
#define SLOT_SIZE          2    /* 2 bytes per slot (offset) */
#define NULL_PAGE_ID   0xFFFFFFFF
 
/* Page header structure - exactly 32 bytes */
typedef struct ChainPageHeader {
    uint32_t page_id;            /* This page's unique identifier */
    uint32_t bucket_id;          /* Logical bucket this page belongs to */
    uint32_t overflow_page_id;   /* Next page in chain, or NULL_PAGE_ID */
    uint16_t entry_count;        /* Number of valid entries */
    uint16_t free_space_offset;  /* Offset where free space begins */
    uint16_t slot_count;         /* Total slots (may > entry_count if deleted) */
    uint16_t flags;              /* Page state flags */
    uint32_t checksum;           /* CRC32 for integrity */
    uint32_t reserved;           /* Alignment padding / future use */
} ChainPageHeader;
 
/* Flag definitions */
#define PAGE_FLAG_PRIMARY    0x0001  /* This is the primary bucket page */
#define PAGE_FLAG_OVERFLOW   0x0002  /* This is an overflow page */
#define PAGE_FLAG_LEAF       0x0004  /* Has no overflow (end of chain) */
#define PAGE_FLAG_SORTED     0x0008  /* Entries are sorted (optimization) */
 
/* Complete page structure */
typedef struct ChainPage {
    ChainPageHeader header;
    
    /* Slot directory: grows downward from after header
     * Each slot is 2 bytes containing offset to entry
     * Slot value 0x0000 indicates deleted slot (tombstone)
     */
    uint16_t slots[(PAGE_SIZE - PAGE_HEADER_SIZE) / SLOT_SIZE];
    
    /* Entry data: grows upward from end of page
     * Format: [key_len:2][key_data:variable][value_len:2][value_data:variable]
     * Or for fixed-size: [key:key_size][rid:rid_size]
     */
} ChainPage;
 
/* Calculate available space for entries */
static inline size_t page_usable_space(void) {
    return PAGE_SIZE - PAGE_HEADER_SIZE;
}
 
/* Check if page is end of chain */
static inline bool is_chain_end(const ChainPage *page) {
    return page->header.overflow_page_id == NULL_PAGE_ID;
}

Space Accounting Within a Page

Each page must track available space to determine when overflow is needed:

Usable Space = PAGE_SIZE - HEADER_SIZE
Used Space = (slot_count × 2) + Σ(entry_sizes)
Free Space = Usable Space - Used Space

The slot directory and entry region grow toward each other:

| Header | Slots → | ← Free Space → | ← Entries |
|  32B   |   ...   |                 |    ...    |

When free space is insufficient for a new entry plus its slot, the page is considered full.

Fragmentation Within Pages

Deletions can create internal fragmentation—slots marked as deleted leave gaps in the entry region. Page compaction can reclaim this space without allocating overflow. Track fragmented space separately and compact when fragmentation exceeds a threshold (e.g., 25%).

Chain Operations: Insertion

Insertion into a chained bucket involves finding appropriate space for the new entry, potentially allocating overflow pages, and maintaining correct linkage.

Insertion Algorithm

chain_insert.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
/* Insert entry into chained bucket */
typedef enum {
    INSERT_SUCCESS,
    INSERT_DUPLICATE,
    INSERT_NO_SPACE,
    INSERT_ERROR
} InsertStatus;
 
InsertStatus chain_insert(HashIndex *index, 
                          const void *key, size_t key_len,
                          const void *value, size_t value_len,
                          bool allow_duplicates) {
    
    /* Calculate required space: slot + key header + key + value header + value */
    size_t entry_size = 2 + key_len + 2 + value_len;  /* Simplified */
    size_t required = SLOT_SIZE + entry_size;
    
    /* Compute target bucket */
    uint32_t bucket_id = compute_bucket(index, key, key_len);
    
    /* Acquire bucket lock (see concurrency section) */
    acquire_bucket_lock(index, bucket_id, LOCK_EXCLUSIVE);
    
    /* Load primary page */
    ChainPage *primary = load_primary_page(index, bucket_id);
    ChainPage *current = primary;
    ChainPage *last_with_space = NULL;
    
    /* Traverse chain */
    while (current != NULL) {
        /* Check for duplicate key if not allowed */
        if (!allow_duplicates) {
            int found = search_page_for_key(current, key, key_len);
            if (found >= 0) {
                release_bucket_lock(index, bucket_id);
                return INSERT_DUPLICATE;
            }
        }
        
        /* Track last page with sufficient space */
        if (page_has_space(current, required)) {
            last_with_space = current;
        }
        
        /* Move to next page in chain */
        if (!is_chain_end(current)) {
            current = load_page(index, current->header.overflow_page_id);
        } else {
            current = NULL;
        }
    }
    
    /* Determine insertion target */
    ChainPage *target;
    if (last_with_space != NULL) {
        /* Insert into existing page with space */
        target = last_with_space;
    } else {
        /* All pages full - need overflow */
        target = allocate_overflow_page(index, bucket_id);
        
        /* Find end of chain and link new page */
        ChainPage *tail = find_chain_tail(index, primary);
        tail->header.overflow_page_id = target->header.page_id;
        tail->header.flags &= ~PAGE_FLAG_LEAF;
        mark_dirty(tail);
    }
    
    /* Insert entry into target page */
    insert_entry_into_page(target, key, key_len, value, value_len);
    mark_dirty(target);
    
    /* Update statistics */
    index->metadata.record_count++;
    mark_metadata_dirty(index);
    
    release_bucket_lock(index, bucket_id);
    return INSERT_SUCCESS;
}

Insertion Strategies

Several strategies exist for choosing where in the chain to insert:

Strategy 1: First-Fit (Insert at first page with space)

Minimizes chain length
May require traversing entire chain to check for duplicates
Best when duplicates are disallowed

Strategy 2: Tail-Insert (Always insert at end of chain)

Constant time to find insertion point
Creates recency ordering (useful for some access patterns)
Still requires full traverse for duplicate check

Strategy 3: Sorted-Insert (Maintain sorted order within pages)

Enables binary search for lookups within pages
Higher insertion cost (shift entries)
Useful for range queries on hash buckets (rare)

Deferred Duplicate Checking

If duplicates are allowed (e.g., secondary index on non-unique column), skip the duplicate check traverse and insert directly. This reduces insertion from O(chain_length) to O(1) for the insertion itself (though allocation may still traverse).

Chain Operations: Lookup

Lookup is the most frequent operation on hash indexes, and chain traversal is its critical path. Optimizing lookup through chains directly impacts query performance.

Basic Lookup Algorithm

chain_lookup.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
/* Lookup entry in chained bucket */
typedef struct {
    bool     found;
    void    *value;
    size_t   value_len;
    uint32_t pages_scanned;
    uint32_t comparisons;
} LookupResult;
 
LookupResult chain_lookup(HashIndex *index,
                          const void *key, size_t key_len) {
    
    LookupResult result = {0};
    
    /* Compute target bucket */
    uint32_t bucket_id = compute_bucket(index, key, key_len);
    
    /* Acquire shared lock (concurrent reads allowed) */
    acquire_bucket_lock(index, bucket_id, LOCK_SHARED);
    
    /* Load primary page */
    ChainPage *current = load_primary_page(index, bucket_id);
    
    while (current != NULL) {
        result.pages_scanned++;
        
        /* Search within current page */
        for (uint16_t i = 0; i < current->header.slot_count; i++) {
            uint16_t slot = current->slots[i];
            
            /* Skip deleted slots */
            if (slot == 0) continue;
            
            /* Load entry */
            Entry entry = load_entry(current, slot);
            result.comparisons++;
            
            /* Compare key */
            if (entry.key_len == key_len && 
                memcmp(entry.key, key, key_len) == 0) {
                
                /* Found! Copy value to result */
                result.found = true;
                result.value = malloc(entry.value_len);
                memcpy(result.value, entry.value, entry.value_len);
                result.value_len = entry.value_len;
                
                release_bucket_lock(index, bucket_id);
                return result;
            }
        }
        
        /* Move to next page */
        if (!is_chain_end(current)) {
            current = load_page(index, current->header.overflow_page_id);
        } else {
            current = NULL;
        }
    }
    
    /* Key not found */
    release_bucket_lock(index, bucket_id);
    return result;
}

Lookup Optimization Techniques

Bloom Filter Pre-check:

Before traversing the chain, check a Bloom filter to determine if the key might exist:

if (!bloom_might_contain(bucket->bloom, key, key_len)) {
    return NOT_FOUND;  /* Guaranteed not present */
}
/* Proceed with chain traversal */

Bloom filters have false positives but no false negatives—a negative result saves the entire chain traverse.

Hash Signature Comparison:

Store a truncated hash signature with each entry. Compare signatures before full key comparison:

uint32_t looking_for = compute_signature(key, key_len);

for each entry in page:
    if (entry.signature == looking_for) {
        if (full_key_match(entry, key, key_len)) {
            return FOUND;
        }
    }

Signature comparison (4-byte integer) is much faster than variable-length key comparison.

Binary Search Within Sorted Pages:

If pages maintain sorted order:

int slot = binary_search(page, key, key_len);
if (slot >= 0) {
    return &page->entries[slot];
}

Reduces per-page comparisons from O(entries) to O(log entries).

Early Termination for Unique Keys

When searching for a unique key (e.g., primary key index), return immediately upon finding the first match. Don't traverse the rest of the chain. For non-unique indexes searching for all matches, the entire chain must be traversed.

Chain Operations: Deletion

Deletion from chains is more complex than insertion because it must handle:

Finding the entry to delete
Reclaiming space within the page
Potentially deallocating empty overflow pages
Maintaining chain consistency during concurrent operations

Deletion Algorithm

chain_delete.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
/* Delete entry from chained bucket */
typedef enum {
    DELETE_SUCCESS,
    DELETE_NOT_FOUND,
    DELETE_ERROR
} DeleteStatus;
 
DeleteStatus chain_delete(HashIndex *index,
                          const void *key, size_t key_len) {
    
    /* Compute target bucket */
    uint32_t bucket_id = compute_bucket(index, key, key_len);
    
    /* Acquire exclusive lock */
    acquire_bucket_lock(index, bucket_id, LOCK_EXCLUSIVE);
    
    /* Traverse chain to find entry */
    ChainPage *current = load_primary_page(index, bucket_id);
    ChainPage *prev = NULL;
    
    while (current != NULL) {
        for (uint16_t i = 0; i < current->header.slot_count; i++) {
            uint16_t slot = current->slots[i];
            if (slot == 0) continue;  /* Skip deleted */
            
            Entry entry = load_entry(current, slot);
            if (entry.key_len == key_len &&
                memcmp(entry.key, key, key_len) == 0) {
                
                /* Found - delete it */
                delete_entry_from_page(current, i);
                mark_dirty(current);
                
                /* Check if page is now empty (and not primary) */
                if (current->header.entry_count == 0 &&
                    !(current->header.flags & PAGE_FLAG_PRIMARY)) {
                    
                    /* Unlink and deallocate this page */
                    unlink_overflow_page(index, prev, current);
                }
                
                /* Update statistics */
                index->metadata.record_count--;
                mark_metadata_dirty(index);
                
                release_bucket_lock(index, bucket_id);
                return DELETE_SUCCESS;
            }
        }
        
        prev = current;
        if (!is_chain_end(current)) {
            current = load_page(index, current->header.overflow_page_id);
        } else {
            current = NULL;
        }
    }
    
    release_bucket_lock(index, bucket_id);
    return DELETE_NOT_FOUND;
}
 
/* Remove entry from page using tombstone or compaction */
void delete_entry_from_page(ChainPage *page, uint16_t slot_index) {
    /* Option 1: Tombstone (mark slot as deleted) */
    page->slots[slot_index] = 0;  /* 0 indicates deleted */
    page->header.entry_count--;
    
    /* Note: Entry space is not reclaimed until compaction
     * Track fragmented space: page->fragmented += entry_size
     * Trigger compaction when fragmented > threshold
     */
}
 
/* Unlink overflow page from chain */
void unlink_overflow_page(HashIndex *index, ChainPage *prev, 
                          ChainPage *to_remove) {
    if (prev != NULL) {
        /* Update prev to skip removed page */
        prev->header.overflow_page_id = to_remove->header.overflow_page_id;
        if (prev->header.overflow_page_id == NULL_PAGE_ID) {
            prev->header.flags |= PAGE_FLAG_LEAF;
        }
        mark_dirty(prev);
    }
    
    /* Return page to free list */
    free_overflow_page(index, to_remove->header.page_id);
}

Deletion Strategies

Tombstone Deletion (Lazy):

Mark slot as deleted without reclaiming space
Fast: O(1) per deletion
Space not immediately reclaimed
Requires eventual compaction

Immediate Compaction:

Shift remaining entries to fill gap
Space reclaimed immediately
Slower: O(entries) per deletion
Good for infrequent deletes

Hybrid Approach:

Use tombstones for individual deletes
Compact when fragmentation exceeds threshold
Balances speed and space efficiency

Delete All Duplicates vs. Delete First

For non-unique indexes, decide whether DELETE removes one matching entry or all matching entries. The algorithm above removes the first match. To remove all, continue traversing after each deletion and track count. This affects both semantics and performance.

Chain Compaction and Reorganization

Over time, deletion operations create fragmented chains with wasted space. Compaction consolidates entries to reclaim space and potentially reduce chain length.

Intra-Page Compaction

Within a single page, slide entries to eliminate gaps left by deletions:

page_compact.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/* Compact entries within a single page */
void compact_page(ChainPage *page) {
    /* Skip if already compact */
    if (!page_needs_compaction(page)) return;
    
    /* Temporary buffer for rebuilding entry region */
    char temp_entries[PAGE_SIZE];
    size_t write_offset = PAGE_SIZE;  /* Build from end backward */
    uint16_t new_slot_count = 0;
    
    /* Copy live entries to temp buffer */
    for (uint16_t i = 0; i < page->header.slot_count; i++) {
        if (page->slots[i] == 0) continue;  /* Skip deleted */
        
        Entry entry = load_entry(page, page->slots[i]);
        size_t entry_size = compute_entry_size(&entry);
        
        write_offset -= entry_size;
        memcpy(temp_entries + write_offset, &entry, entry_size);
        
        page->slots[new_slot_count++] = write_offset;
    }
    
    /* Copy compacted entries back to page */
    memcpy(page->data + write_offset, temp_entries + write_offset,
           PAGE_SIZE - write_offset);
    
    /* Update page header */
    page->header.slot_count = new_slot_count;
    page->header.free_space_offset = 
        PAGE_HEADER_SIZE + (new_slot_count * SLOT_SIZE);
    
    /* Clear fragmentation tracking */
    page->fragmented_space = 0;
    
    mark_dirty(page);
}

Inter-Page Compaction (Chain Merging)

Move entries between pages in a chain to reduce the number of overflow pages:

chain_compact.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/* Compact entire chain, merging underfull pages */
void compact_chain(HashIndex *index, uint32_t bucket_id) {
    acquire_bucket_lock(index, bucket_id, LOCK_EXCLUSIVE);
    
    ChainPage *primary = load_primary_page(index, bucket_id);
    
    /* First pass: compact individual pages */
    ChainPage *current = primary;
    while (current != NULL) {
        compact_page(current);
        current = is_chain_end(current) ? NULL : 
                  load_page(index, current->header.overflow_page_id);
    }
    
    /* Second pass: merge underfull pages */
    current = primary;
    ChainPage *prev = NULL;
    
    while (current != NULL && !is_chain_end(current)) {
        ChainPage *next = load_page(index, current->header.overflow_page_id);
        
        /* Try to merge next into current */
        size_t current_used = page_used_space(current);
        size_t next_used = page_used_space(next);
        size_t available = page_usable_space() - current_used;
        
        if (next_used <= available) {
            /* Merge next into current */
            move_all_entries(next, current);
            
            /* Unlink and free next */
            current->header.overflow_page_id = next->header.overflow_page_id;
            if (is_chain_end(current)) {
                current->header.flags |= PAGE_FLAG_LEAF;
            }
            
            mark_dirty(current);
            free_overflow_page(index, next->header.page_id);
            
            /* Don't advance current - try to merge more */
            continue;
        }
        
        prev = current;
        current = next;
    }
    
    release_bucket_lock(index, bucket_id);
}

Compaction Strategy Comparison
Strategy	When to Use	Cost	Benefit
Intra-page only	High delete rate, low space pressure	O(entries)	Reclaims space within pages
Full chain compact	Periodic maintenance	O(chain × entries)	Reduces chain length
Background compact	Continuous, low priority	Distributed over time	Minimal foreground impact
Deferred (never)	Read-heavy, few deletes	None	Maximum read performance

Compaction Thresholds

Typical compaction thresholds: (1) Page fragmentation > 30% triggers intra-page compact, (2) Chain length decreases possible by >20% triggers chain merge, (3) Avoid compacting during peak load—schedule for maintenance windows or use background threads.

Concurrency Control for Chains

In multi-threaded or multi-process database systems, concurrent access to chained buckets requires careful synchronization to prevent race conditions and maintain consistency.

Bucket-Level Locking

The simplest approach: one lock per bucket controlling access to the entire chain.

bucket_locking.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/* Bucket-level read-write locks */
typedef struct {
    pthread_rwlock_t *bucket_locks;  /* Array of N locks, one per bucket */
    uint32_t num_buckets;
} BucketLockManager;
 
void init_bucket_locks(BucketLockManager *mgr, uint32_t num_buckets) {
    mgr->num_buckets = num_buckets;
    mgr->bucket_locks = calloc(num_buckets, sizeof(pthread_rwlock_t));
    for (uint32_t i = 0; i < num_buckets; i++) {
        pthread_rwlock_init(&mgr->bucket_locks[i], NULL);
    }
}
 
void acquire_bucket_lock(BucketLockManager *mgr, uint32_t bucket_id, 
                         LockMode mode) {
    pthread_rwlock_t *lock = &mgr->bucket_locks[bucket_id];
    if (mode == LOCK_SHARED) {
        pthread_rwlock_rdlock(lock);    /* Multiple readers OK */
    } else {
        pthread_rwlock_wrlock(lock);    /* Exclusive for writes */
    }
}
 
void release_bucket_lock(BucketLockManager *mgr, uint32_t bucket_id) {
    pthread_rwlock_unlock(&mgr->bucket_locks[bucket_id]);
}
 
/* Usage pattern */
LookupResult safe_lookup(HashIndex *index, const void *key, size_t key_len) {
    uint32_t bucket = compute_bucket(index, key, key_len);
    acquire_bucket_lock(&index->lock_mgr, bucket, LOCK_SHARED);
    
    LookupResult result = chain_lookup(index, key, key_len);
    
    release_bucket_lock(&index->lock_mgr, bucket);
    return result;
}

Striped Locking (Lock Reduction)

With many buckets, per-bucket locks consume significant memory. Striped locking maps multiple buckets to fewer locks:

#define NUM_LOCK_STRIPES 1024

uint32_t get_lock_stripe(uint32_t bucket_id) {
    return bucket_id % NUM_LOCK_STRIPES;
}

Tradeoff: Fewer locks = less memory but more false contention (unrelated buckets share locks).

Page-Level Latching

For finer-grained concurrency, latch individual pages rather than whole buckets:

Page-Level vs. Bucket-Level Locking

•Page-Level Latching — Lock only the page being accessed; release before moving to next page; higher concurrency but complex ordering required.
•Bucket-Level Locking — Lock entire chain for operation duration; simpler but blocks all concurrent access to the bucket.
•Latch Hand-Off (Crabbing) — Hold latch on current page while acquiring latch on next; release previous after acquiring next; used in B-Trees, applicable to chains.

Lock Ordering for Deadlock Prevention

When multiple locks are held simultaneously (e.g., bucket lock + overflow allocation lock), always acquire in a consistent order: (1) Bucket lock first, (2) Overflow manager lock second. Violating this order risks deadlock.

Production Optimizations

Production hash indexes employ numerous optimizations beyond the basic algorithms. These optimizations can provide order-of-magnitude improvements for specific workloads.

Chain Prefetching

When traversing a chain, prefetch subsequent pages while processing the current one:

chain_prefetch.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* Prefetch-optimized chain traversal */
LookupResult lookup_with_prefetch(HashIndex *index,
                                   const void *key, size_t key_len) {
    uint32_t bucket_id = compute_bucket(index, key, key_len);
    ChainPage *current = load_primary_page(index, bucket_id);
    
    while (current != NULL) {
        /* Issue async prefetch for next page while processing current */
        if (!is_chain_end(current)) {
            buffer_prefetch_async(index->buffer_pool, 
                                  current->header.overflow_page_id);
        }
        
        /* Search current page */
        int found = search_page_for_key(current, key, key_len);
        if (found >= 0) {
            return make_result(current, found);
        }
        
        /* Next page should now be in buffer pool */
        current = is_chain_end(current) ? NULL :
                  load_page(index, current->header.overflow_page_id);
    }
    
    return NOT_FOUND_RESULT;
}

Entry Caching

For frequently accessed entries, maintain a small LRU cache in front of the hash index:

LookupResult cached_lookup(HashIndex *index, const void *key, size_t key_len) {
    Entry *cached = lru_cache_get(index->hot_cache, key, key_len);
    if (cached != NULL) {
        return make_result_from_entry(cached);
    }
    
    LookupResult result = chain_lookup(index, key, key_len);
    if (result.found) {
        lru_cache_put(index->hot_cache, key, key_len, result.value);
    }
    return result;
}

Batched Operations

For bulk inserts, sort entries by bucket first, then process each bucket once:

bulk_insert.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/* Bulk insert optimization */
void bulk_insert_sorted(HashIndex *index, Entry *entries, size_t count) {
    /* Compute bucket for each entry */
    for (size_t i = 0; i < count; i++) {
        entries[i].bucket_id = compute_bucket(index, 
                                               entries[i].key,
                                               entries[i].key_len);
    }
    
    /* Sort by bucket ID */
    qsort(entries, count, sizeof(Entry), compare_by_bucket);
    
    /* Process each bucket's entries together */
    size_t i = 0;
    while (i < count) {
        uint32_t bucket_id = entries[i].bucket_id;
        
        acquire_bucket_lock(&index->lock_mgr, bucket_id, LOCK_EXCLUSIVE);
        
        /* Insert all entries for this bucket */
        while (i < count && entries[i].bucket_id == bucket_id) {
            insert_entry_no_lock(index, &entries[i]);
            i++;
        }
        
        release_bucket_lock(&index->lock_mgr, bucket_id);
    }
}

Additional Production Optimizations

•Inline Small Values — For values < 16 bytes, store directly in entry instead of pointer to heap.
•Key Prefix Compression — If bucket keys share common prefix, store once per page.
•Overflow Page Pooling — Pre-allocate a pool of overflow pages to avoid allocation latency during inserts.
•Statistics Sampling — Track entry count, chain length incrementally; avoid expensive full scans.
•SIMD Key Comparison — Use SSE/AVX to compare multiple keys simultaneously within a page.

Summary: Chaining

We have taken a deep dive into chaining—the dominant overflow handling mechanism in database hash indexes. From page structure to production optimizations, these details form the engineering foundation for high-performance hash-based indexing.

Key Takeaways

•Chain pages are uniform — Primary and overflow pages share identical structure; only location discovery differs.
•Insertion traverses the chain — For duplicate checking and space finding; optimization strategies vary by use case.
•Lookup benefits from early termination — For unique keys, stop at first match; optimizations like Bloom filters and signatures reduce comparisons.
•Deletion creates fragmentation — Tombstone marking is fast but requires eventual compaction to reclaim space.
•Compaction reduces chain length — Inter-page merging can eliminate overflow pages; schedule during low-load periods.
•Bucket-level locking is common — Simple and correct; striped or page-level locking offers higher concurrency at complexity cost.
•Production optimizations matter — Prefetching, caching, batching, and SIMD can provide major performance wins.

What's Next:

With the complete picture of static hashing—buckets, hash functions, overflow handling, and chaining—we now turn to performance analysis. We'll examine how to model, measure, and predict the performance characteristics of static hash indexes across different workloads.

Page Complete

You now possess deep knowledge of chaining implementation in hash indexes. This understanding enables you to work on storage engine internals, diagnose performance issues, and make informed decisions about hash index design and tuning.

4 / 5

Loading learning content...

Database Management SystemsStatic Hashing

Static Hashing: Fundamentals and Implementation

LevelIntermediate

Duration90 mins

TopicStatic Hashing

4 / 5

Chaining: Deep Dive into Linked Overflow Pages

The Workhouse of Hash Index Overflow

What You Will Learn

Chain Structure: Anatomy of Linked Overflow Pages

A chain in a hash index is a singly-linked list of pages, all belonging to the same logical bucket. Understanding the precise structure is essential for implementing correct and efficient operations.

Page Types in a Chain

A chain comprises two types of pages:

Primary Page — The first page of the bucket, whose location is determined by the bucket ID. Always exists (even if empty).
Overflow Pages — Additional pages linked from the primary page when it fills. May be zero or more.

Each page in the chain has identical structure; the only difference is how they're located:

Primary: Computed directly from bucket ID (page_offset = bucket_id × page_size)
Overflow: Found by following links from preceding pages

chain_page_structure.h
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
/* Detailed bucket page layout for chained hash index */
 
#define PAGE_SIZE       8192    /* 8 KB pages */
#define PAGE_HEADER_SIZE  32    /* Fixed header overhead */
#define SLOT_SIZE          2    /* 2 bytes per slot (offset) */
#define NULL_PAGE_ID   0xFFFFFFFF
 
/* Page header structure - exactly 32 bytes */
typedef struct ChainPageHeader {
    uint32_t page_id;            /* This page's unique identifier */
    uint32_t bucket_id;          /* Logical bucket this page belongs to */
    uint32_t overflow_page_id;   /* Next page in chain, or NULL_PAGE_ID */
    uint16_t entry_count;        /* Number of valid entries */
    uint16_t free_space_offset;  /* Offset where free space begins */
    uint16_t slot_count;         /* Total slots (may > entry_count if deleted) */
    uint16_t flags;              /* Page state flags */
    uint32_t checksum;           /* CRC32 for integrity */
    uint32_t reserved;           /* Alignment padding / future use */
} ChainPageHeader;
 
/* Flag definitions */
#define PAGE_FLAG_PRIMARY    0x0001  /* This is the primary bucket page */
#define PAGE_FLAG_OVERFLOW   0x0002  /* This is an overflow page */
#define PAGE_FLAG_LEAF       0x0004  /* Has no overflow (end of chain) */
#define PAGE_FLAG_SORTED     0x0008  /* Entries are sorted (optimization) */
 
/* Complete page structure */
typedef struct ChainPage {
    ChainPageHeader header;
    
    /* Slot directory: grows downward from after header
     * Each slot is 2 bytes containing offset to entry
     * Slot value 0x0000 indicates deleted slot (tombstone)
     */
    uint16_t slots[(PAGE_SIZE - PAGE_HEADER_SIZE) / SLOT_SIZE];
    
    /* Entry data: grows upward from end of page
     * Format: [key_len:2][key_data:variable][value_len:2][value_data:variable]
     * Or for fixed-size: [key:key_size][rid:rid_size]
     */
} ChainPage;
 
/* Calculate available space for entries */
static inline size_t page_usable_space(void) {
    return PAGE_SIZE - PAGE_HEADER_SIZE;
}
 
/* Check if page is end of chain */
static inline bool is_chain_end(const ChainPage *page) {
    return page->header.overflow_page_id == NULL_PAGE_ID;
}

Space Accounting Within a Page

Each page must track available space to determine when overflow is needed:

Usable Space = PAGE_SIZE - HEADER_SIZE
Used Space = (slot_count × 2) + Σ(entry_sizes)
Free Space = Usable Space - Used Space

The slot directory and entry region grow toward each other:

| Header | Slots → | ← Free Space → | ← Entries |
|  32B   |   ...   |                 |    ...    |

When free space is insufficient for a new entry plus its slot, the page is considered full.

Fragmentation Within Pages

Chain Operations: Insertion

Insertion into a chained bucket involves finding appropriate space for the new entry, potentially allocating overflow pages, and maintaining correct linkage.

Insertion Algorithm

chain_insert.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
/* Insert entry into chained bucket */
typedef enum {
    INSERT_SUCCESS,
    INSERT_DUPLICATE,
    INSERT_NO_SPACE,
    INSERT_ERROR
} InsertStatus;
 
InsertStatus chain_insert(HashIndex *index, 
                          const void *key, size_t key_len,
                          const void *value, size_t value_len,
                          bool allow_duplicates) {
    
    /* Calculate required space: slot + key header + key + value header + value */
    size_t entry_size = 2 + key_len + 2 + value_len;  /* Simplified */
    size_t required = SLOT_SIZE + entry_size;
    
    /* Compute target bucket */
    uint32_t bucket_id = compute_bucket(index, key, key_len);
    
    /* Acquire bucket lock (see concurrency section) */
    acquire_bucket_lock(index, bucket_id, LOCK_EXCLUSIVE);
    
    /* Load primary page */
    ChainPage *primary = load_primary_page(index, bucket_id);
    ChainPage *current = primary;
    ChainPage *last_with_space = NULL;
    
    /* Traverse chain */
    while (current != NULL) {
        /* Check for duplicate key if not allowed */
        if (!allow_duplicates) {
            int found = search_page_for_key(current, key, key_len);
            if (found >= 0) {
                release_bucket_lock(index, bucket_id);
                return INSERT_DUPLICATE;
            }
        }
        
        /* Track last page with sufficient space */
        if (page_has_space(current, required)) {
            last_with_space = current;
        }
        
        /* Move to next page in chain */
        if (!is_chain_end(current)) {
            current = load_page(index, current->header.overflow_page_id);
        } else {
            current = NULL;
        }
    }
    
    /* Determine insertion target */
    ChainPage *target;
    if (last_with_space != NULL) {
        /* Insert into existing page with space */
        target = last_with_space;
    } else {
        /* All pages full - need overflow */
        target = allocate_overflow_page(index, bucket_id);
        
        /* Find end of chain and link new page */
        ChainPage *tail = find_chain_tail(index, primary);
        tail->header.overflow_page_id = target->header.page_id;
        tail->header.flags &= ~PAGE_FLAG_LEAF;
        mark_dirty(tail);
    }
    
    /* Insert entry into target page */
    insert_entry_into_page(target, key, key_len, value, value_len);
    mark_dirty(target);
    
    /* Update statistics */
    index->metadata.record_count++;
    mark_metadata_dirty(index);
    
    release_bucket_lock(index, bucket_id);
    return INSERT_SUCCESS;
}

Insertion Strategies

Several strategies exist for choosing where in the chain to insert:

Strategy 1: First-Fit (Insert at first page with space)

Minimizes chain length
May require traversing entire chain to check for duplicates
Best when duplicates are disallowed

Strategy 2: Tail-Insert (Always insert at end of chain)

Constant time to find insertion point
Creates recency ordering (useful for some access patterns)
Still requires full traverse for duplicate check

Strategy 3: Sorted-Insert (Maintain sorted order within pages)

Enables binary search for lookups within pages
Higher insertion cost (shift entries)
Useful for range queries on hash buckets (rare)

Deferred Duplicate Checking

Chain Operations: Lookup

Lookup is the most frequent operation on hash indexes, and chain traversal is its critical path. Optimizing lookup through chains directly impacts query performance.

Basic Lookup Algorithm

chain_lookup.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
/* Lookup entry in chained bucket */
typedef struct {
    bool     found;
    void    *value;
    size_t   value_len;
    uint32_t pages_scanned;
    uint32_t comparisons;
} LookupResult;
 
LookupResult chain_lookup(HashIndex *index,
                          const void *key, size_t key_len) {
    
    LookupResult result = {0};
    
    /* Compute target bucket */
    uint32_t bucket_id = compute_bucket(index, key, key_len);
    
    /* Acquire shared lock (concurrent reads allowed) */
    acquire_bucket_lock(index, bucket_id, LOCK_SHARED);
    
    /* Load primary page */
    ChainPage *current = load_primary_page(index, bucket_id);
    
    while (current != NULL) {
        result.pages_scanned++;
        
        /* Search within current page */
        for (uint16_t i = 0; i < current->header.slot_count; i++) {
            uint16_t slot = current->slots[i];
            
            /* Skip deleted slots */
            if (slot == 0) continue;
            
            /* Load entry */
            Entry entry = load_entry(current, slot);
            result.comparisons++;
            
            /* Compare key */
            if (entry.key_len == key_len && 
                memcmp(entry.key, key, key_len) == 0) {
                
                /* Found! Copy value to result */
                result.found = true;
                result.value = malloc(entry.value_len);
                memcpy(result.value, entry.value, entry.value_len);
                result.value_len = entry.value_len;
                
                release_bucket_lock(index, bucket_id);
                return result;
            }
        }
        
        /* Move to next page */
        if (!is_chain_end(current)) {
            current = load_page(index, current->header.overflow_page_id);
        } else {
            current = NULL;
        }
    }
    
    /* Key not found */
    release_bucket_lock(index, bucket_id);
    return result;
}

Lookup Optimization Techniques

Bloom Filter Pre-check:

Before traversing the chain, check a Bloom filter to determine if the key might exist:

if (!bloom_might_contain(bucket->bloom, key, key_len)) {
    return NOT_FOUND;  /* Guaranteed not present */
}
/* Proceed with chain traversal */

Bloom filters have false positives but no false negatives—a negative result saves the entire chain traverse.

Hash Signature Comparison:

Store a truncated hash signature with each entry. Compare signatures before full key comparison:

uint32_t looking_for = compute_signature(key, key_len);

for each entry in page:
    if (entry.signature == looking_for) {
        if (full_key_match(entry, key, key_len)) {
            return FOUND;
        }
    }

Signature comparison (4-byte integer) is much faster than variable-length key comparison.

Binary Search Within Sorted Pages:

If pages maintain sorted order:

int slot = binary_search(page, key, key_len);
if (slot >= 0) {
    return &page->entries[slot];
}

Reduces per-page comparisons from O(entries) to O(log entries).

Early Termination for Unique Keys

Chain Operations: Deletion

Deletion from chains is more complex than insertion because it must handle:

Finding the entry to delete
Reclaiming space within the page
Potentially deallocating empty overflow pages
Maintaining chain consistency during concurrent operations

Deletion Algorithm

chain_delete.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
/* Delete entry from chained bucket */
typedef enum {
    DELETE_SUCCESS,
    DELETE_NOT_FOUND,
    DELETE_ERROR
} DeleteStatus;
 
DeleteStatus chain_delete(HashIndex *index,
                          const void *key, size_t key_len) {
    
    /* Compute target bucket */
    uint32_t bucket_id = compute_bucket(index, key, key_len);
    
    /* Acquire exclusive lock */
    acquire_bucket_lock(index, bucket_id, LOCK_EXCLUSIVE);
    
    /* Traverse chain to find entry */
    ChainPage *current = load_primary_page(index, bucket_id);
    ChainPage *prev = NULL;
    
    while (current != NULL) {
        for (uint16_t i = 0; i < current->header.slot_count; i++) {
            uint16_t slot = current->slots[i];
            if (slot == 0) continue;  /* Skip deleted */
            
            Entry entry = load_entry(current, slot);
            if (entry.key_len == key_len &&
                memcmp(entry.key, key, key_len) == 0) {
                
                /* Found - delete it */
                delete_entry_from_page(current, i);
                mark_dirty(current);
                
                /* Check if page is now empty (and not primary) */
                if (current->header.entry_count == 0 &&
                    !(current->header.flags & PAGE_FLAG_PRIMARY)) {
                    
                    /* Unlink and deallocate this page */
                    unlink_overflow_page(index, prev, current);
                }
                
                /* Update statistics */
                index->metadata.record_count--;
                mark_metadata_dirty(index);
                
                release_bucket_lock(index, bucket_id);
                return DELETE_SUCCESS;
            }
        }
        
        prev = current;
        if (!is_chain_end(current)) {
            current = load_page(index, current->header.overflow_page_id);
        } else {
            current = NULL;
        }
    }
    
    release_bucket_lock(index, bucket_id);
    return DELETE_NOT_FOUND;
}
 
/* Remove entry from page using tombstone or compaction */
void delete_entry_from_page(ChainPage *page, uint16_t slot_index) {
    /* Option 1: Tombstone (mark slot as deleted) */
    page->slots[slot_index] = 0;  /* 0 indicates deleted */
    page->header.entry_count--;
    
    /* Note: Entry space is not reclaimed until compaction
     * Track fragmented space: page->fragmented += entry_size
     * Trigger compaction when fragmented > threshold
     */
}
 
/* Unlink overflow page from chain */
void unlink_overflow_page(HashIndex *index, ChainPage *prev, 
                          ChainPage *to_remove) {
    if (prev != NULL) {
        /* Update prev to skip removed page */
        prev->header.overflow_page_id = to_remove->header.overflow_page_id;
        if (prev->header.overflow_page_id == NULL_PAGE_ID) {
            prev->header.flags |= PAGE_FLAG_LEAF;
        }
        mark_dirty(prev);
    }
    
    /* Return page to free list */
    free_overflow_page(index, to_remove->header.page_id);
}

Deletion Strategies

Tombstone Deletion (Lazy):

Mark slot as deleted without reclaiming space
Fast: O(1) per deletion
Space not immediately reclaimed
Requires eventual compaction

Immediate Compaction:

Shift remaining entries to fill gap
Space reclaimed immediately
Slower: O(entries) per deletion
Good for infrequent deletes

Hybrid Approach:

Use tombstones for individual deletes
Compact when fragmentation exceeds threshold
Balances speed and space efficiency

Delete All Duplicates vs. Delete First

Chain Compaction and Reorganization

Over time, deletion operations create fragmented chains with wasted space. Compaction consolidates entries to reclaim space and potentially reduce chain length.

Intra-Page Compaction

Within a single page, slide entries to eliminate gaps left by deletions:

page_compact.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/* Compact entries within a single page */
void compact_page(ChainPage *page) {
    /* Skip if already compact */
    if (!page_needs_compaction(page)) return;
    
    /* Temporary buffer for rebuilding entry region */
    char temp_entries[PAGE_SIZE];
    size_t write_offset = PAGE_SIZE;  /* Build from end backward */
    uint16_t new_slot_count = 0;
    
    /* Copy live entries to temp buffer */
    for (uint16_t i = 0; i < page->header.slot_count; i++) {
        if (page->slots[i] == 0) continue;  /* Skip deleted */
        
        Entry entry = load_entry(page, page->slots[i]);
        size_t entry_size = compute_entry_size(&entry);
        
        write_offset -= entry_size;
        memcpy(temp_entries + write_offset, &entry, entry_size);
        
        page->slots[new_slot_count++] = write_offset;
    }
    
    /* Copy compacted entries back to page */
    memcpy(page->data + write_offset, temp_entries + write_offset,
           PAGE_SIZE - write_offset);
    
    /* Update page header */
    page->header.slot_count = new_slot_count;
    page->header.free_space_offset = 
        PAGE_HEADER_SIZE + (new_slot_count * SLOT_SIZE);
    
    /* Clear fragmentation tracking */
    page->fragmented_space = 0;
    
    mark_dirty(page);
}

Inter-Page Compaction (Chain Merging)

Move entries between pages in a chain to reduce the number of overflow pages:

chain_compact.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/* Compact entire chain, merging underfull pages */
void compact_chain(HashIndex *index, uint32_t bucket_id) {
    acquire_bucket_lock(index, bucket_id, LOCK_EXCLUSIVE);
    
    ChainPage *primary = load_primary_page(index, bucket_id);
    
    /* First pass: compact individual pages */
    ChainPage *current = primary;
    while (current != NULL) {
        compact_page(current);
        current = is_chain_end(current) ? NULL : 
                  load_page(index, current->header.overflow_page_id);
    }
    
    /* Second pass: merge underfull pages */
    current = primary;
    ChainPage *prev = NULL;
    
    while (current != NULL && !is_chain_end(current)) {
        ChainPage *next = load_page(index, current->header.overflow_page_id);
        
        /* Try to merge next into current */
        size_t current_used = page_used_space(current);
        size_t next_used = page_used_space(next);
        size_t available = page_usable_space() - current_used;
        
        if (next_used <= available) {
            /* Merge next into current */
            move_all_entries(next, current);
            
            /* Unlink and free next */
            current->header.overflow_page_id = next->header.overflow_page_id;
            if (is_chain_end(current)) {
                current->header.flags |= PAGE_FLAG_LEAF;
            }
            
            mark_dirty(current);
            free_overflow_page(index, next->header.page_id);
            
            /* Don't advance current - try to merge more */
            continue;
        }
        
        prev = current;
        current = next;
    }
    
    release_bucket_lock(index, bucket_id);
}

Compaction Strategy Comparison
Strategy	When to Use	Cost	Benefit
Intra-page only	High delete rate, low space pressure	O(entries)	Reclaims space within pages
Full chain compact	Periodic maintenance	O(chain × entries)	Reduces chain length
Background compact	Continuous, low priority	Distributed over time	Minimal foreground impact
Deferred (never)	Read-heavy, few deletes	None	Maximum read performance

Compaction Thresholds

Concurrency Control for Chains

In multi-threaded or multi-process database systems, concurrent access to chained buckets requires careful synchronization to prevent race conditions and maintain consistency.

Bucket-Level Locking

The simplest approach: one lock per bucket controlling access to the entire chain.

bucket_locking.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/* Bucket-level read-write locks */
typedef struct {
    pthread_rwlock_t *bucket_locks;  /* Array of N locks, one per bucket */
    uint32_t num_buckets;
} BucketLockManager;
 
void init_bucket_locks(BucketLockManager *mgr, uint32_t num_buckets) {
    mgr->num_buckets = num_buckets;
    mgr->bucket_locks = calloc(num_buckets, sizeof(pthread_rwlock_t));
    for (uint32_t i = 0; i < num_buckets; i++) {
        pthread_rwlock_init(&mgr->bucket_locks[i], NULL);
    }
}
 
void acquire_bucket_lock(BucketLockManager *mgr, uint32_t bucket_id, 
                         LockMode mode) {
    pthread_rwlock_t *lock = &mgr->bucket_locks[bucket_id];
    if (mode == LOCK_SHARED) {
        pthread_rwlock_rdlock(lock);    /* Multiple readers OK */
    } else {
        pthread_rwlock_wrlock(lock);    /* Exclusive for writes */
    }
}
 
void release_bucket_lock(BucketLockManager *mgr, uint32_t bucket_id) {
    pthread_rwlock_unlock(&mgr->bucket_locks[bucket_id]);
}
 
/* Usage pattern */
LookupResult safe_lookup(HashIndex *index, const void *key, size_t key_len) {
    uint32_t bucket = compute_bucket(index, key, key_len);
    acquire_bucket_lock(&index->lock_mgr, bucket, LOCK_SHARED);
    
    LookupResult result = chain_lookup(index, key, key_len);
    
    release_bucket_lock(&index->lock_mgr, bucket);
    return result;
}

Striped Locking (Lock Reduction)

With many buckets, per-bucket locks consume significant memory. Striped locking maps multiple buckets to fewer locks:

#define NUM_LOCK_STRIPES 1024

uint32_t get_lock_stripe(uint32_t bucket_id) {
    return bucket_id % NUM_LOCK_STRIPES;
}

Tradeoff: Fewer locks = less memory but more false contention (unrelated buckets share locks).

Page-Level Latching

For finer-grained concurrency, latch individual pages rather than whole buckets:

Page-Level vs. Bucket-Level Locking

•Page-Level Latching — Lock only the page being accessed; release before moving to next page; higher concurrency but complex ordering required.
•Bucket-Level Locking — Lock entire chain for operation duration; simpler but blocks all concurrent access to the bucket.
•Latch Hand-Off (Crabbing) — Hold latch on current page while acquiring latch on next; release previous after acquiring next; used in B-Trees, applicable to chains.

Lock Ordering for Deadlock Prevention

Production Optimizations

Production hash indexes employ numerous optimizations beyond the basic algorithms. These optimizations can provide order-of-magnitude improvements for specific workloads.

Chain Prefetching

When traversing a chain, prefetch subsequent pages while processing the current one:

chain_prefetch.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* Prefetch-optimized chain traversal */
LookupResult lookup_with_prefetch(HashIndex *index,
                                   const void *key, size_t key_len) {
    uint32_t bucket_id = compute_bucket(index, key, key_len);
    ChainPage *current = load_primary_page(index, bucket_id);
    
    while (current != NULL) {
        /* Issue async prefetch for next page while processing current */
        if (!is_chain_end(current)) {
            buffer_prefetch_async(index->buffer_pool, 
                                  current->header.overflow_page_id);
        }
        
        /* Search current page */
        int found = search_page_for_key(current, key, key_len);
        if (found >= 0) {
            return make_result(current, found);
        }
        
        /* Next page should now be in buffer pool */
        current = is_chain_end(current) ? NULL :
                  load_page(index, current->header.overflow_page_id);
    }
    
    return NOT_FOUND_RESULT;
}

Entry Caching

For frequently accessed entries, maintain a small LRU cache in front of the hash index:

LookupResult cached_lookup(HashIndex *index, const void *key, size_t key_len) {
    Entry *cached = lru_cache_get(index->hot_cache, key, key_len);
    if (cached != NULL) {
        return make_result_from_entry(cached);
    }
    
    LookupResult result = chain_lookup(index, key, key_len);
    if (result.found) {
        lru_cache_put(index->hot_cache, key, key_len, result.value);
    }
    return result;
}

Batched Operations

For bulk inserts, sort entries by bucket first, then process each bucket once:

bulk_insert.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/* Bulk insert optimization */
void bulk_insert_sorted(HashIndex *index, Entry *entries, size_t count) {
    /* Compute bucket for each entry */
    for (size_t i = 0; i < count; i++) {
        entries[i].bucket_id = compute_bucket(index, 
                                               entries[i].key,
                                               entries[i].key_len);
    }
    
    /* Sort by bucket ID */
    qsort(entries, count, sizeof(Entry), compare_by_bucket);
    
    /* Process each bucket's entries together */
    size_t i = 0;
    while (i < count) {
        uint32_t bucket_id = entries[i].bucket_id;
        
        acquire_bucket_lock(&index->lock_mgr, bucket_id, LOCK_EXCLUSIVE);
        
        /* Insert all entries for this bucket */
        while (i < count && entries[i].bucket_id == bucket_id) {
            insert_entry_no_lock(index, &entries[i]);
            i++;
        }
        
        release_bucket_lock(&index->lock_mgr, bucket_id);
    }
}

Additional Production Optimizations

•Inline Small Values — For values < 16 bytes, store directly in entry instead of pointer to heap.
•Key Prefix Compression — If bucket keys share common prefix, store once per page.
•Overflow Page Pooling — Pre-allocate a pool of overflow pages to avoid allocation latency during inserts.
•Statistics Sampling — Track entry count, chain length incrementally; avoid expensive full scans.
•SIMD Key Comparison — Use SSE/AVX to compare multiple keys simultaneously within a page.

Summary: Chaining

Key Takeaways

•Chain pages are uniform — Primary and overflow pages share identical structure; only location discovery differs.
•Insertion traverses the chain — For duplicate checking and space finding; optimization strategies vary by use case.
•Lookup benefits from early termination — For unique keys, stop at first match; optimizations like Bloom filters and signatures reduce comparisons.
•Deletion creates fragmentation — Tombstone marking is fast but requires eventual compaction to reclaim space.
•Compaction reduces chain length — Inter-page merging can eliminate overflow pages; schedule during low-load periods.
•Bucket-level locking is common — Simple and correct; striped or page-level locking offers higher concurrency at complexity cost.
•Production optimizations matter — Prefetching, caching, batching, and SIMD can provide major performance wins.

What's Next:

Page Complete

4 / 5