Database Management SystemsExtendible Hashing

Extendible Hashing

LevelIntermediate

Duration75 mins

TopicExtendible Hashing

5 / 5

Bucket Splitting

The Heart of Dynamic Growth

If directory doubling expands the addressing capacity, bucket splitting is the operation that actually uses that capacity. When a bucket overflows, splitting divides its records into two buckets, each handling a more specific subset of hash values.

Bucket splitting is the workhorse of extendible hashing—it happens far more frequently than directory doubling and is the mechanism that keeps load balanced across the data structure. Understanding splitting is essential for predicting performance, debugging issues, and implementing extendible hashing correctly.

What You Will Learn

By the end of this page, you will understand the complete bucket splitting algorithm, how records are redistributed based on hash bits, directory pointer updates, handling of edge cases like pathological splits, and performance characteristics of split operations.

When Splitting Occurs

Bucket splitting is triggered when an insertion would exceed bucket capacity:

The Trigger:

An insertion targets a bucket B that has no room for additional records.

Two scenarios:

Split Trigger Scenarios
Scenario	Condition	Required Action
Local split	B.local_depth < global_depth	Split B immediately
Split with doubling	B.local_depth = global_depth	Double directory first, then split B

The vast majority of splits are local splits (no directory doubling). As the table matures:

Most buckets have local_depth < global_depth
They can split by simply redistributing pointers
Only the rare bucket with local_depth = global_depth triggers the more expensive doubling-then-split sequence

Pre-split decision flow:

split_decision.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function handleOverflow(bucket, key, value):
    // Check if we can split locally
    if bucket.local_depth < global_depth:
        splitBucket(bucket)
        insert(key, value)  // Retry after split
        
    else if bucket.local_depth == global_depth:
        // Need more addressing bits
        doubleDirectory()
        splitBucket(bucket)
        insert(key, value)  // Retry after split
 
// Key insight:
// After split, the new key might go to either the original bucket
// or the sibling bucket. The retry recalculates the correct target.

Frequency Analysis

With global_depth d and assuming uniform hash distribution, the probability of a random split requiring directory doubling is approximately 1/(2^(d-1)). At d=16, only about 0.003% of splits trigger doubling. Local splits dominate overwhelmingly.

The Split Algorithm

The bucket splitting algorithm consists of four phases:

Create sibling bucket
Redistribute records
Update directory pointers
Update local depths

Complete algorithm:

bucket_split_algorithm.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
function splitBucket(bucket):
    old_local_depth = bucket.local_depth
    new_local_depth = old_local_depth + 1
    
    // Phase 1: Create sibling bucket
    sibling = allocateNewBucket()
    sibling.local_depth = new_local_depth
    bucket.local_depth = new_local_depth
    
    // Phase 2: Redistribute records
    discriminating_bit_position = new_local_depth
    
    for each record in bucket:
        hash = computeHash(record.key)
        bit = getBit(hash, discriminating_bit_position)  // 0 or 1
        
        if bit == 1:
            sibling.add(record)
            bucket.remove(record)
        // else: record stays in bucket
    
    // Phase 3: Update directory pointers
    // Find all entries pointing to bucket
    // Half will now point to sibling
    updateDirectoryPointers(bucket, sibling, old_local_depth)
    
    // Phase 4: Verification (debug mode)
    assert bucket.local_depth == new_local_depth
    assert sibling.local_depth == new_local_depth
    verifyRecordDistribution(bucket, sibling)
 
function updateDirectoryPointers(bucket, sibling, old_local_depth):
    num_pointers = 2 ^ (global_depth - old_local_depth)
    
    // Find starting index for this bucket's pointers
    start_index = bucket.hash_prefix << (global_depth - old_local_depth)
    
    // Half of pointers stay with bucket, half go to sibling
    half = num_pointers / 2
    
    for i = start_index + half to start_index + num_pointers - 1:
        directory[i] = sibling

Key observations:

One bit decides everything — The bit at position new_local_depth determines which bucket each record goes to
Half the pointers redirect — Of the 2^(d - old_depth) pointers to the bucket, exactly half now point to the sibling
Local depths increase by 1 — Both resulting buckets are one level more specific than the original
Record movement is physical — Records actually move from one page to another (unlike directory doubling where only pointers copy)

Record Redistribution Details

The record redistribution phase is the core of splitting. It uses a single hash bit to partition records.

The discriminating bit:

For a bucket with local_depth d' splitting to local_depth d' + 1, the discriminating bit is bit number d' + 1 (counting from the most significant bit, 1-indexed).

Visual example:

redistribution_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Scenario:
  - Bucket B with local_depth = 2, hash prefix = "01"
  - Bucket overflows, splitting to local_depth = 3
  - Discriminating bit: position 3
 
Records in bucket B (showing first 4 hash bits):
  Record A: hash = 0b0100... → bit 3 = 0 → stays in B
  Record B: hash = 0b0111... → bit 3 = 1 → moves to sibling
  Record C: hash = 0b0101... → bit 3 = 0 → stays in B
  Record D: hash = 0b0110... → bit 3 = 1 → moves to sibling
  Record E: hash = 0b0100... → bit 3 = 0 → stays in B
  Record F: hash = 0b0111... → bit 3 = 1 → moves to sibling
 
After split:
  Bucket B (prefix "010"): Records A, C, E
  Sibling S (prefix "011"): Records B, D, F
 
Verification:
  - All records in B have hashes starting with 010
  - All records in S have hashes starting with 011
  - Both buckets have local_depth = 3

Bit extraction implementation:

bit_extraction.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Extract the nth bit from a 32-bit hash (1-indexed from MSB)
inline bool getBitMSB(uint32_t hash, int n) {
    // n=1: most significant bit
    // n=32: least significant bit
    int shift = 32 - n;
    return (hash >> shift) & 1;
}
 
// Example usage during split:
void redistributeRecords(Bucket& bucket, Bucket& sibling, int new_local_depth) {
    std::vector<Record> to_move;
    
    for (auto& record : bucket.records) {
        uint32_t hash = computeHash(record.key);
        bool bit = getBitMSB(hash, new_local_depth);
        
        if (bit == 1) {
            to_move.push_back(record);
        }
    }
    
    // Move records to sibling
    for (auto& record : to_move) {
        bucket.remove(record);
        sibling.add(record);
    }
}

Expected Distribution

With a good hash function, records split roughly 50-50 between bucket and sibling. Significant deviation (e.g., 90-10) suggests hash function weaknesses or pathological data patterns.

Directory Pointer Updates

After records are redistributed, the directory must be updated so lookups find the correct bucket.

The pointer update pattern:

Before split, the bucket had 2^(d - d') directory entries. After split:

Half of these entries continue pointing to the original bucket
Half now point to the new sibling bucket

The entries are partitioned based on the discriminating bit in their index.

Detailed example:

Converting Mermaid diagram...

Algorithm for pointer update:

pointer_update_algorithm.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function updatePointers(bucket, sibling, old_local_depth):
    // Compute the range of directory indices pointing to bucket
    prefix_bits = bucket.hash_prefix  // First old_local_depth bits
    suffix_bits_count = global_depth - old_local_depth
    
    start_index = prefix_bits << suffix_bits_count
    end_index = start_index + (1 << suffix_bits_count) - 1
    
    // Partition point: indices where discriminating bit = 1
    num_entries = end_index - start_index + 1
    split_point = start_index + (num_entries / 2)
    
    // Update second half to point to sibling
    for i = split_point to end_index:
        directory[i] = sibling
    
    // First half remains pointing to bucket (no change)
 
// Example walkthrough:
// - old_local_depth = 2, global_depth = 4
// - bucket prefix = "01" (binary)
// - start_index = 01 << 2 = 0100 = 4
// - end_index = 4 + 4 - 1 = 7
// - Range: indices 4, 5, 6, 7 (binary: 0100, 0101, 0110, 0111)
// - split_point = 4 + 2 = 6
// - Indices 4,5 stay with bucket
// - Indices 6,7 point to sibling

Order Matters

Directory updates must happen atomically or be ordered correctly with record redistribution. If a lookup occurs between record movement and pointer update, it might find an empty bucket (records moved but pointer not updated) or miss a record (pointer updated before record moved).

Pathological Splits

A pathological split occurs when, after splitting, all records end up in the same bucket—the split doesn't distribute load at all.

Why this happens:

If all keys in a bucket happen to have the same bit value at position new_local_depth, one bucket gets everything and the other remains empty. The next insertion to the full bucket would trigger another split, potentially with the same result.

Example scenario:

pathological_split_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Pathological case:
 
Initial bucket (local_depth = 2, prefix "01"):
  Record 1: hash = 0b0100_1111...
  Record 2: hash = 0b0100_0011...
  Record 3: hash = 0b0100_1010...
  Record 4: hash = 0b0100_0101...  // All have bit 3 = 0!
  Record 5: hash = 0b0100_1100...
 
Split attempt (new_local_depth = 3):
  Discriminating bit = 3
  All records have bit 3 = 0
 
After split:
  Bucket (prefix "010"): All 5 records
  Sibling (prefix "011"): 0 records  ← Empty!
 
New insertion with hash 0b0100_0000... still goes to full bucket.
Another split attempt...
 
Worst case:
  If all keys hash to the same 32-bit value (collision), no split
  can ever redistribute. You'll hit maximum depth and overflow.

Handling pathological splits:

Mitigation Strategies

•Overflow chains — Allow overflow pages when splits don't help. Accept degraded O(k) lookup for problematic buckets.
•Detect and abort — If a split produces empty sibling, keep original bucket and use overflow chain instead.
•Re-hash — Use a secondary hash function for overflow cases, distributing to a separate overflow structure.
•Larger buckets — Dynamically increase bucket capacity for problematic buckets (bucket-specific page size).
•Hash function analysis — Monitor split distributions; alert if hash quality is poor.

In Practice

Pathological splits are rare with cryptographic-quality hash functions. They typically indicate either: (1) a weak hash function, (2) intentional key attacks (hash flooding), or (3) extremely unlucky key patterns. Production systems use chains as a fallback but log warnings when triggered.

Split Cost Analysis

Understanding the costs of splitting is essential for performance tuning.

Time complexity components:

Bucket Split Time Costs
Operation	Complexity	I/O Operations	Notes
Allocate sibling	O(1)	1 write (new page)	Allocate from free list
Read original bucket	O(1)	1 read	Already in buffer for insert
Compute destinations	O(B)	0	B = records in bucket
Write sibling records	O(B/2)	Included in allocation	Average half move
Rewrite original	O(B/2)	1 write	Remaining records
Update directory	O(2^(d-d'))	0-1 (if disk)	Pointer updates
Total	O(B + 2^(d-d'))	2-3 I/Os

I/O analysis:

The critical observation is that splitting requires only constant I/O operations (typically 2-3 page accesses), regardless of table size. This is what makes extendible hashing practical for disk-based databases.

Comparison with static hashing full rehash:

Extendible Split

•Affects 1 bucket
•2-3 page I/Os
•~10 ms typical
•Constant time
•No table lock needed

Static Rehash

•Affects all data
•N page I/Os (N = record count)
•Hours for large tables
•Linear in data size
•Requires exclusive lock

Batching Split Work

Some systems batch nearby splits or preemptively split buckets approaching capacity. This smooths latency spikes and can improve I/O patterns by writing multiple related pages together.

Concurrency During Splits

Bucket splitting must coordinate with concurrent lookups and other insertions.

Concurrency scenarios:

•Lookup during split — A lookup might find the bucket partially redistributed, missing the target record
•Concurrent insert to same bucket — Two threads might both decide to split the same bucket
•Insert during pointer update — After records move but before pointers update, lookups use wrong bucket

Bucket-level locking approach:

bucket_locking.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Fine-grained locking for bucket split
function splitWithLocking(bucket):
    // Lock bucket exclusively for split
    bucket.writeLock()
    
    try:
        // Re-check if split still needed (might have split due to another thread)
        if bucket.hasSpace():
            return  // Another split freed space
        
        // Allocate sibling (no lock needed - fresh page)
        sibling = allocateNewBucket()
        sibling.writeLock()  // Lock sibling before publication
        
        // Redistribute records (both buckets locked)
        redistribute(bucket, sibling)
        
        // Update directory with latching
        directoryLatch.acquire()
        updateDirectoryPointers(bucket, sibling)
        directoryLatch.release()
        
        // Now safe to release bucket locks
        sibling.writeUnlock()
        bucket.writeUnlock()
        
    finally:
        if bucket.isLocked():
            bucket.writeUnlock()
 
// Lookup with optimistic concurrency
function lookup(key):
    retry:
    bucket = findBucket(key)
    bucket.readLock()
    
    // Verify bucket is correct (might have split)
    expected_bucket = findBucket(key)  // Recalculate
    if expected_bucket != bucket:
        bucket.readUnlock()
        goto retry
    
    result = bucket.find(key)
    bucket.readUnlock()
    return result

Lock Ordering

To prevent deadlocks, always acquire locks in a consistent order: directory latch → bucket locks in page ID order. Some systems use lock-free techniques with hazard pointers or epoch-based reclamation to eliminate traditional locking overhead.

Alternative: Lock-free split:

Advanced implementations use compare-and-swap (CAS) operations:

Create sibling and populate it (no locks)
Mark original bucket as "splitting" with CAS
Lookups seeing "splitting" flag wait or retry
Update directory pointers with CAS
Remove records from original bucket with CAS
Clear "splitting" flag

This approach eliminates blocking but requires careful memory ordering.

Recovery Considerations

Bucket splitting must be crash-safe. If the system fails mid-split, recovery must restore consistency.

Crash points and their implications:

Crash Recovery Scenarios During Split
Crash Point	State on Disk	Recovery Action
Before sibling allocated	Original bucket full	Retry split on next insert
After sibling allocated, before records move	Empty sibling exists	Deallocate sibling, retry split
Mid record redistribution	Some records in both buckets	Undo partial move, retry split
After redistribution, before directory update	Records split, directory points to original only	Complete directory update
After directory update	Complete split	No recovery needed

WAL-based recovery:

Most database systems use Write-Ahead Logging (WAL) to ensure split atomicity:

wal_split_records.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// WAL records for split operation
function splitWithLogging(bucket):
    // Log intention to split
    WAL.write(SplitStartRecord {
        bucket_id: bucket.id,
        old_local_depth: bucket.local_depth
    })
    
    sibling = allocateBucket()
    WAL.write(AllocateRecord { page_id: sibling.id })
    
    // Log each record movement
    for each record to move:
        WAL.write(MoveRecord {
            record_key: record.key,
            from_bucket: bucket.id,
            to_bucket: sibling.id
        })
    
    // Log directory updates
    WAL.write(DirectoryUpdateRecord {
        entries: [affected_indices],
        new_target: sibling.id
    })
    
    // Log local depth changes
    WAL.write(DepthUpdateRecord {
        bucket: bucket.id,
        new_depth: bucket.local_depth + 1
    })
    WAL.write(DepthUpdateRecord {
        bucket: sibling.id,
        new_depth: bucket.local_depth + 1
    })
    
    // Log completion
    WAL.write(SplitCompleteRecord { bucket_id: bucket.id })
    
    // Force WAL to disk before committing changes
    WAL.flush()
    commitChangesToDisk()
 
// On crash recovery:
// - Find incomplete SplitStartRecord without matching SplitCompleteRecord
// - Redo or undo based on logged operations

Physiological Logging

Modern databases often use physiological logging—page-level physical REDO with logical UNDO. This allows efficient recovery while supporting operations like split that span multiple pages. The entire split can be treated as a single logical operation for undo purposes.

Summary: Mastering Bucket Splitting

Bucket splitting is the core growth mechanism of extendible hashing. Let's consolidate the key concepts:

Key Takeaways

•Trigger — Splitting occurs when a full bucket receives an insertion. Local splits (d' < d) are immediate; otherwise doubling precedes.
•Algorithm — Create sibling, redistribute based on discriminating bit, update directory pointers, increment local depths.
•Discriminating bit — The bit at position new_local_depth determines which bucket each record goes to.
•Directory update — Half of the pointers to the original bucket redirect to the sibling.
•Pathological splits — When all records go to one bucket; handled via overflow chains or secondary hashing.
•Cost — O(B) record processing plus 2-3 I/Os; constant regardless of total table size.
•Concurrency — Bucket-level locking or lock-free techniques ensure correctness under concurrent access.
•Recovery — WAL logging ensures split atomicity across system crashes.

Module Complete:

You have now mastered the complete architecture of extendible hashing:

Directory structure — The pointer array enabling dynamic addressing
Global depth — The system-wide counter controlling directory size
Local depth — The per-bucket counter tracking split history
Directory doubling — The metadata expansion enabling growth
Bucket splitting — The data redistribution that balances load

These concepts work together to create an index structure that provides O(1) lookup while gracefully handling arbitrary growth without the catastrophic reorganization of static hashing.

Module Complete

You have achieved comprehensive understanding of extendible hashing. This dynamic hash indexing technique is foundational to database systems like Oracle, PostgreSQL, and many key-value stores. The principles you've learned—directory indirection, depth management, and incremental splitting—apply broadly to adaptive data structures throughout computer science.

5 / 5

Loading learning content...

Database Management SystemsExtendible Hashing

Extendible Hashing

LevelIntermediate

Duration75 mins

TopicExtendible Hashing

5 / 5

Bucket Splitting

The Heart of Dynamic Growth

What You Will Learn

When Splitting Occurs

Bucket splitting is triggered when an insertion would exceed bucket capacity:

The Trigger:

An insertion targets a bucket B that has no room for additional records.

Two scenarios:

Split Trigger Scenarios
Scenario	Condition	Required Action
Local split	B.local_depth < global_depth	Split B immediately
Split with doubling	B.local_depth = global_depth	Double directory first, then split B

The vast majority of splits are local splits (no directory doubling). As the table matures:

Most buckets have local_depth < global_depth
They can split by simply redistributing pointers
Only the rare bucket with local_depth = global_depth triggers the more expensive doubling-then-split sequence

Pre-split decision flow:

split_decision.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function handleOverflow(bucket, key, value):
    // Check if we can split locally
    if bucket.local_depth < global_depth:
        splitBucket(bucket)
        insert(key, value)  // Retry after split
        
    else if bucket.local_depth == global_depth:
        // Need more addressing bits
        doubleDirectory()
        splitBucket(bucket)
        insert(key, value)  // Retry after split
 
// Key insight:
// After split, the new key might go to either the original bucket
// or the sibling bucket. The retry recalculates the correct target.

Frequency Analysis

The Split Algorithm

The bucket splitting algorithm consists of four phases:

Create sibling bucket
Redistribute records
Update directory pointers
Update local depths

Complete algorithm:

bucket_split_algorithm.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
function splitBucket(bucket):
    old_local_depth = bucket.local_depth
    new_local_depth = old_local_depth + 1
    
    // Phase 1: Create sibling bucket
    sibling = allocateNewBucket()
    sibling.local_depth = new_local_depth
    bucket.local_depth = new_local_depth
    
    // Phase 2: Redistribute records
    discriminating_bit_position = new_local_depth
    
    for each record in bucket:
        hash = computeHash(record.key)
        bit = getBit(hash, discriminating_bit_position)  // 0 or 1
        
        if bit == 1:
            sibling.add(record)
            bucket.remove(record)
        // else: record stays in bucket
    
    // Phase 3: Update directory pointers
    // Find all entries pointing to bucket
    // Half will now point to sibling
    updateDirectoryPointers(bucket, sibling, old_local_depth)
    
    // Phase 4: Verification (debug mode)
    assert bucket.local_depth == new_local_depth
    assert sibling.local_depth == new_local_depth
    verifyRecordDistribution(bucket, sibling)
 
function updateDirectoryPointers(bucket, sibling, old_local_depth):
    num_pointers = 2 ^ (global_depth - old_local_depth)
    
    // Find starting index for this bucket's pointers
    start_index = bucket.hash_prefix << (global_depth - old_local_depth)
    
    // Half of pointers stay with bucket, half go to sibling
    half = num_pointers / 2
    
    for i = start_index + half to start_index + num_pointers - 1:
        directory[i] = sibling

Key observations:

One bit decides everything — The bit at position new_local_depth determines which bucket each record goes to
Half the pointers redirect — Of the 2^(d - old_depth) pointers to the bucket, exactly half now point to the sibling
Local depths increase by 1 — Both resulting buckets are one level more specific than the original
Record movement is physical — Records actually move from one page to another (unlike directory doubling where only pointers copy)

Record Redistribution Details

The record redistribution phase is the core of splitting. It uses a single hash bit to partition records.

The discriminating bit:

For a bucket with local_depth d' splitting to local_depth d' + 1, the discriminating bit is bit number d' + 1 (counting from the most significant bit, 1-indexed).

Visual example:

redistribution_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Scenario:
  - Bucket B with local_depth = 2, hash prefix = "01"
  - Bucket overflows, splitting to local_depth = 3
  - Discriminating bit: position 3
 
Records in bucket B (showing first 4 hash bits):
  Record A: hash = 0b0100... → bit 3 = 0 → stays in B
  Record B: hash = 0b0111... → bit 3 = 1 → moves to sibling
  Record C: hash = 0b0101... → bit 3 = 0 → stays in B
  Record D: hash = 0b0110... → bit 3 = 1 → moves to sibling
  Record E: hash = 0b0100... → bit 3 = 0 → stays in B
  Record F: hash = 0b0111... → bit 3 = 1 → moves to sibling
 
After split:
  Bucket B (prefix "010"): Records A, C, E
  Sibling S (prefix "011"): Records B, D, F
 
Verification:
  - All records in B have hashes starting with 010
  - All records in S have hashes starting with 011
  - Both buckets have local_depth = 3

Bit extraction implementation:

bit_extraction.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Extract the nth bit from a 32-bit hash (1-indexed from MSB)
inline bool getBitMSB(uint32_t hash, int n) {
    // n=1: most significant bit
    // n=32: least significant bit
    int shift = 32 - n;
    return (hash >> shift) & 1;
}
 
// Example usage during split:
void redistributeRecords(Bucket& bucket, Bucket& sibling, int new_local_depth) {
    std::vector<Record> to_move;
    
    for (auto& record : bucket.records) {
        uint32_t hash = computeHash(record.key);
        bool bit = getBitMSB(hash, new_local_depth);
        
        if (bit == 1) {
            to_move.push_back(record);
        }
    }
    
    // Move records to sibling
    for (auto& record : to_move) {
        bucket.remove(record);
        sibling.add(record);
    }
}

Expected Distribution

With a good hash function, records split roughly 50-50 between bucket and sibling. Significant deviation (e.g., 90-10) suggests hash function weaknesses or pathological data patterns.

Directory Pointer Updates

After records are redistributed, the directory must be updated so lookups find the correct bucket.

The pointer update pattern:

Before split, the bucket had 2^(d - d') directory entries. After split:

Half of these entries continue pointing to the original bucket
Half now point to the new sibling bucket

The entries are partitioned based on the discriminating bit in their index.

Detailed example:

Converting Mermaid diagram...

Algorithm for pointer update:

pointer_update_algorithm.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function updatePointers(bucket, sibling, old_local_depth):
    // Compute the range of directory indices pointing to bucket
    prefix_bits = bucket.hash_prefix  // First old_local_depth bits
    suffix_bits_count = global_depth - old_local_depth
    
    start_index = prefix_bits << suffix_bits_count
    end_index = start_index + (1 << suffix_bits_count) - 1
    
    // Partition point: indices where discriminating bit = 1
    num_entries = end_index - start_index + 1
    split_point = start_index + (num_entries / 2)
    
    // Update second half to point to sibling
    for i = split_point to end_index:
        directory[i] = sibling
    
    // First half remains pointing to bucket (no change)
 
// Example walkthrough:
// - old_local_depth = 2, global_depth = 4
// - bucket prefix = "01" (binary)
// - start_index = 01 << 2 = 0100 = 4
// - end_index = 4 + 4 - 1 = 7
// - Range: indices 4, 5, 6, 7 (binary: 0100, 0101, 0110, 0111)
// - split_point = 4 + 2 = 6
// - Indices 4,5 stay with bucket
// - Indices 6,7 point to sibling

Order Matters

Pathological Splits

A pathological split occurs when, after splitting, all records end up in the same bucket—the split doesn't distribute load at all.

Why this happens:

Example scenario:

pathological_split_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Pathological case:
 
Initial bucket (local_depth = 2, prefix "01"):
  Record 1: hash = 0b0100_1111...
  Record 2: hash = 0b0100_0011...
  Record 3: hash = 0b0100_1010...
  Record 4: hash = 0b0100_0101...  // All have bit 3 = 0!
  Record 5: hash = 0b0100_1100...
 
Split attempt (new_local_depth = 3):
  Discriminating bit = 3
  All records have bit 3 = 0
 
After split:
  Bucket (prefix "010"): All 5 records
  Sibling (prefix "011"): 0 records  ← Empty!
 
New insertion with hash 0b0100_0000... still goes to full bucket.
Another split attempt...
 
Worst case:
  If all keys hash to the same 32-bit value (collision), no split
  can ever redistribute. You'll hit maximum depth and overflow.

Handling pathological splits:

Mitigation Strategies

•Overflow chains — Allow overflow pages when splits don't help. Accept degraded O(k) lookup for problematic buckets.
•Detect and abort — If a split produces empty sibling, keep original bucket and use overflow chain instead.
•Re-hash — Use a secondary hash function for overflow cases, distributing to a separate overflow structure.
•Larger buckets — Dynamically increase bucket capacity for problematic buckets (bucket-specific page size).
•Hash function analysis — Monitor split distributions; alert if hash quality is poor.

In Practice

Split Cost Analysis

Understanding the costs of splitting is essential for performance tuning.

Time complexity components:

Bucket Split Time Costs
Operation	Complexity	I/O Operations	Notes
Allocate sibling	O(1)	1 write (new page)	Allocate from free list
Read original bucket	O(1)	1 read	Already in buffer for insert
Compute destinations	O(B)	0	B = records in bucket
Write sibling records	O(B/2)	Included in allocation	Average half move
Rewrite original	O(B/2)	1 write	Remaining records
Update directory	O(2^(d-d'))	0-1 (if disk)	Pointer updates
Total	O(B + 2^(d-d'))	2-3 I/Os

I/O analysis:

Comparison with static hashing full rehash:

Extendible Split

•Affects 1 bucket
•2-3 page I/Os
•~10 ms typical
•Constant time
•No table lock needed

Static Rehash

•Affects all data
•N page I/Os (N = record count)
•Hours for large tables
•Linear in data size
•Requires exclusive lock

Batching Split Work

Some systems batch nearby splits or preemptively split buckets approaching capacity. This smooths latency spikes and can improve I/O patterns by writing multiple related pages together.

Concurrency During Splits

Bucket splitting must coordinate with concurrent lookups and other insertions.

Concurrency scenarios:

•Lookup during split — A lookup might find the bucket partially redistributed, missing the target record
•Concurrent insert to same bucket — Two threads might both decide to split the same bucket
•Insert during pointer update — After records move but before pointers update, lookups use wrong bucket

Bucket-level locking approach:

bucket_locking.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Fine-grained locking for bucket split
function splitWithLocking(bucket):
    // Lock bucket exclusively for split
    bucket.writeLock()
    
    try:
        // Re-check if split still needed (might have split due to another thread)
        if bucket.hasSpace():
            return  // Another split freed space
        
        // Allocate sibling (no lock needed - fresh page)
        sibling = allocateNewBucket()
        sibling.writeLock()  // Lock sibling before publication
        
        // Redistribute records (both buckets locked)
        redistribute(bucket, sibling)
        
        // Update directory with latching
        directoryLatch.acquire()
        updateDirectoryPointers(bucket, sibling)
        directoryLatch.release()
        
        // Now safe to release bucket locks
        sibling.writeUnlock()
        bucket.writeUnlock()
        
    finally:
        if bucket.isLocked():
            bucket.writeUnlock()
 
// Lookup with optimistic concurrency
function lookup(key):
    retry:
    bucket = findBucket(key)
    bucket.readLock()
    
    // Verify bucket is correct (might have split)
    expected_bucket = findBucket(key)  // Recalculate
    if expected_bucket != bucket:
        bucket.readUnlock()
        goto retry
    
    result = bucket.find(key)
    bucket.readUnlock()
    return result

Lock Ordering

Alternative: Lock-free split:

Advanced implementations use compare-and-swap (CAS) operations:

Create sibling and populate it (no locks)
Mark original bucket as "splitting" with CAS
Lookups seeing "splitting" flag wait or retry
Update directory pointers with CAS
Remove records from original bucket with CAS
Clear "splitting" flag

This approach eliminates blocking but requires careful memory ordering.

Recovery Considerations

Bucket splitting must be crash-safe. If the system fails mid-split, recovery must restore consistency.

Crash points and their implications:

Crash Recovery Scenarios During Split
Crash Point	State on Disk	Recovery Action
Before sibling allocated	Original bucket full	Retry split on next insert
After sibling allocated, before records move	Empty sibling exists	Deallocate sibling, retry split
Mid record redistribution	Some records in both buckets	Undo partial move, retry split
After redistribution, before directory update	Records split, directory points to original only	Complete directory update
After directory update	Complete split	No recovery needed

WAL-based recovery:

Most database systems use Write-Ahead Logging (WAL) to ensure split atomicity:

wal_split_records.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// WAL records for split operation
function splitWithLogging(bucket):
    // Log intention to split
    WAL.write(SplitStartRecord {
        bucket_id: bucket.id,
        old_local_depth: bucket.local_depth
    })
    
    sibling = allocateBucket()
    WAL.write(AllocateRecord { page_id: sibling.id })
    
    // Log each record movement
    for each record to move:
        WAL.write(MoveRecord {
            record_key: record.key,
            from_bucket: bucket.id,
            to_bucket: sibling.id
        })
    
    // Log directory updates
    WAL.write(DirectoryUpdateRecord {
        entries: [affected_indices],
        new_target: sibling.id
    })
    
    // Log local depth changes
    WAL.write(DepthUpdateRecord {
        bucket: bucket.id,
        new_depth: bucket.local_depth + 1
    })
    WAL.write(DepthUpdateRecord {
        bucket: sibling.id,
        new_depth: bucket.local_depth + 1
    })
    
    // Log completion
    WAL.write(SplitCompleteRecord { bucket_id: bucket.id })
    
    // Force WAL to disk before committing changes
    WAL.flush()
    commitChangesToDisk()
 
// On crash recovery:
// - Find incomplete SplitStartRecord without matching SplitCompleteRecord
// - Redo or undo based on logged operations

Physiological Logging

Summary: Mastering Bucket Splitting

Bucket splitting is the core growth mechanism of extendible hashing. Let's consolidate the key concepts:

Key Takeaways

•Trigger — Splitting occurs when a full bucket receives an insertion. Local splits (d' < d) are immediate; otherwise doubling precedes.
•Algorithm — Create sibling, redistribute based on discriminating bit, update directory pointers, increment local depths.
•Discriminating bit — The bit at position new_local_depth determines which bucket each record goes to.
•Directory update — Half of the pointers to the original bucket redirect to the sibling.
•Pathological splits — When all records go to one bucket; handled via overflow chains or secondary hashing.
•Cost — O(B) record processing plus 2-3 I/Os; constant regardless of total table size.
•Concurrency — Bucket-level locking or lock-free techniques ensure correctness under concurrent access.
•Recovery — WAL logging ensures split atomicity across system crashes.

Module Complete:

You have now mastered the complete architecture of extendible hashing:

Directory structure — The pointer array enabling dynamic addressing
Global depth — The system-wide counter controlling directory size
Local depth — The per-bucket counter tracking split history
Directory doubling — The metadata expansion enabling growth
Bucket splitting — The data redistribution that balances load

These concepts work together to create an index structure that provides O(1) lookup while gracefully handling arbitrary growth without the catastrophic reorganization of static hashing.

Module Complete

5 / 5