Database Management SystemsExtendible Hashing

Extendible Hashing

LevelIntermediate

Duration75 mins

TopicExtendible Hashing

2 / 5

Global Depth

The Master Control of Directory Size

If the directory is the backbone of extendible hashing, then global depth is its control dial. This single integer value determines how large the directory is, how many hash bits are examined for bucket lookup, and when the entire directory must expand.

Think of global depth as the precision setting on a measurement instrument. At low precision (small global depth), you make coarse distinctions—multiple different hash values map to the same bucket. At high precision (large global depth), every nuance in the hash value leads to a different bucket.

What You Will Learn

By the end of this page, you will understand how global depth controls directory sizing, why it increases during directory doubling, its relationship to hash bit usage, and how it provides an upper bound on the local depth of any bucket.

Definition and Role

Formal Definition:

The global depth (denoted as d or GD) is a non-negative integer that specifies:

How many leading bits of the hash value are used to compute the directory index
The size of the directory array, which equals 2^d
The maximum distinguishing power of the current directory configuration

The fundamental relationship:

directory_size = 2^(global_depth)
directory_index = first_d_bits(hash(key))

This relationship is invariant throughout the lifetime of the extendible hash structure. Whenever you see a directory of size 16, you immediately know the global depth is 4. When global depth increases by 1, the directory doubles in size.

Global Depth and Directory Size
Global Depth (d)	Directory Size (2^d)	Hash Bits Examined	Max Addressable Values
1	2	1 bit	0, 1
2	4	2 bits	00, 01, 10, 11
3	8	3 bits	000...111
4	16	4 bits	0000...1111
8	256	8 bits	00000000...11111111
10	1,024	10 bits	First 10 bits
16	65,536	16 bits	First 16 bits
20	1,048,576	20 bits	First 20 bits

What the global depth tells you:

The global depth encapsulates the history of directory growth. A global depth of 5 means the directory has doubled (at least) 4 times from its initial size of 2. It represents the current level of addressing precision available to the hash structure.

Global Depth as History

If you start with global depth 1 (two buckets) and observe a table with global depth 12, you know the directory has doubled 11 times to accommodate data growth. That's 4,096 directory entries, though the actual number of distinct buckets may be far fewer due to sharing.

Hash Bit Interpretation

Understanding exactly how global depth interacts with hash values is essential for implementing extendible hashing correctly.

The bit extraction process:

Given a 32-bit hash value h(k) and global depth d, the directory index is the integer formed by the first (most significant) d bits of the hash.

Example walkthrough:

hash_bit_extraction.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Given:
  - Key: "customer_12345"
  - Hash function produces: h(key) = 0xB7A3F1E2
  - In binary: 1011 0111 1010 0011 1111 0001 1110 0010
  - Global depth: d = 4
 
Extraction:
  Step 1: Take first 4 bits → 1011 (binary)
  Step 2: Convert to decimal → 11
  Step 3: directory_index = 11
 
Therefore:
  - This key maps to directory[11]
  - directory[11] points to some bucket B
  - The key is stored in bucket B
 
If global depth were 6 instead:
  Step 1: Take first 6 bits → 101101 (binary)
  Step 2: Convert to decimal → 45
  Step 3: directory_index = 45

Why most significant bits?

The choice of most significant bits (MSB) rather than least significant bits (LSB) is a deliberate design decision with important implications:

MSB Advantages

•Directory doubling is simple: duplicate each entry
•Entries sharing a bucket are contiguous in the directory
•Binary search on directory indices is efficient
•Extension only affects the high-order bit

If LSB Were Used

•Doubling would interleave old and new entries
•Shared bucket entries would be scattered
•Pattern recognition becomes harder
•Split logic becomes more complex

Some Implementations Use LSB

While MSB is most common, some implementations (notably certain versions of linear hashing) use LSB. The key is consistency—once chosen, the bit selection strategy must remain fixed for the lifetime of the hash structure.

Global Depth as Upper Bound

One of the most important properties of global depth is its relationship to local depth:

The Fundamental Invariant:

For every bucket B with local depth d', and global depth d: d' ≤ d (local depth never exceeds global depth)

This invariant is maintained at all times and has deep implications for how extendible hashing operates.

Why This Invariant Matters

•Directory coverage — Every bucket can be addressed by at least one directory entry. If local depth exceeded global depth, some buckets would be unreachable.
•Split safety — A bucket with d' = d can only split after the directory doubles. This coordinates bucket splits with directory growth.
•Pointer sharing — The number of directory entries pointing to a bucket is 2^(d - d'), which is only meaningful when d ≥ d'.
•Recovery correctness — During crash recovery, verifying d ≥ d' for all buckets validates structural integrity.

Converting Mermaid diagram...

Interpretation of the d = d' case:

When a bucket's local depth equals the global depth (d' = d), this bucket:

Has exactly one directory entry pointing to it
Uses all available hash bits for addressing
Cannot split without first doubling the directory

This is the critical case that triggers directory doubling. When an at-capacity bucket with d' = d receives a new insertion that would overflow it, the system must first double the directory to make room for the split.

The Split Trigger

An overflow in a bucket where d' = d is the only scenario that forces directory doubling. All other overflow cases (d' < d) can be resolved by a local bucket split without touching the directory size.

Directory Doubling Mechanics

When global depth must increase, the directory undergoes a doubling operation. This is the most expensive operation in extendible hashing, but it's designed to be as efficient as possible.

The doubling algorithm:

directory_doubling.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function doubleDirectory():
    old_size = 2^global_depth  // Current directory size
    new_size = 2 * old_size    // New size after doubling
    
    // Step 1: Allocate new directory array
    new_directory = allocate(new_size)
    
    // Step 2: Duplicate all pointers
    for i = 0 to old_size - 1:
        new_directory[i] = directory[i]           // Original position
        new_directory[i + old_size] = directory[i] // Duplicated position
    
    // Step 3: Replace old directory with new
    directory = new_directory
    global_depth = global_depth + 1
    
    // Step 4: Now the triggering bucket can split
    // (handled by subsequent split operation)
 
// Visual example:
// Before (d=2): [B0] [B1] [B2] [B2]  (buckets B0, B1, B2)
//                0    1    2    3
//
// After (d=3):  [B0] [B1] [B2] [B2] [B0] [B1] [B2] [B2]
//                0    1    2    3    4    5    6    7
//
// Note: All pointers duplicated, no data moved

Key observations:

No data moves during doubling — Only pointers are copied. All buckets remain in their original locations.
Each entry is duplicated — Entry i appears at positions i and i + old_size in the new directory.
Bucket local depths unchanged — The doubling doesn't affect any bucket's local depth. All buckets maintain their existing pointer counts.
All buckets now have doubled pointers — After doubling, every bucket's pointer count doubles (since 2^((d+1) - d') = 2 × 2^(d - d')).

Cost analysis:

Directory Doubling Costs
Operation	Cost	Notes
Memory allocation	O(2^d) time	Allocate array of 2^(d+1) pointers
Pointer copying	O(2^d) time	Copy each pointer exactly once
Memory overhead	Doubles from 2^d × ptr_size	Temporary peak during transition
I/O operations	0 bucket I/Os	No data pages are read or written
Lock duration	Proportional to 2^d	May require exclusive lock on directory

Doubling is Logarithmic in Database Size

If your buckets hold ~100 records each and you have 1 million records, you need about 10,000 buckets. With global depth 14, you can address 16,384 buckets. The directory size is only 16K × 8 bytes = 128 KB. Directory doubling copies just 128 KB of pointers—trivial compared to moving any actual data.

When Does Global Depth Increase?

Global depth increases under one specific condition:

The Trigger Condition:

Global depth increases when a bucket with local_depth = global_depth (i.e., d' = d) overflows and needs to split.

Let's trace through the decision logic:

insert_decision_tree.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
function insert(key, value):
    bucket = findBucket(key)  // O(1) via directory
    
    if bucket.hasSpace():
        bucket.insert(key, value)
        return SUCCESS
    
    // Bucket overflow case
    if bucket.local_depth < global_depth:
        // Case 1: Can split without directory growth
        splitBucket(bucket)
        insert(key, value)  // Retry after split
        
    else:
        // Case 2: d' = d, must grow directory first
        assert bucket.local_depth == global_depth
        doubleDirectory()   // Increase global_depth
        splitBucket(bucket) // Now we can split
        insert(key, value)  // Retry after split
 
// Key insight:
// - Case 1 is local: affects only one bucket
// - Case 2 is global: affects entire directory structure

Frequency of directory doubling:

Directory doubling becomes increasingly rare as the table grows:

With d=1: Every bucket that fills triggers doubling
With d=10: Only 1 in 512 buckets can trigger doubling (the d'=10 buckets)
With d=20: Only 1 in 524,288 buckets can trigger doubling

This is because most buckets have local_depth < global_depth and can split locally. The probability that a random overflow hits a d' = d bucket decreases as the table grows.

Doubling Frequency Pattern

•Initial phase (d=1-3): Frequent doubling as structure expands
•Growth phase (d=4-10): Occasional doubling, most splits are local
•Stable phase (d=10+): Rare doubling, almost all splits are local
•Mature phase (d=16+): Doubling essentially never happens

Amortized Cost

Although each directory doubling costs O(2^d), doublings happen at most O(log N) times for N insertions. The amortized cost per insertion is O(1), making extendible hashing efficient for dynamic datasets.

Global Depth Limits

While theoretically global depth could grow indefinitely, practical systems impose limits.

Theoretical maximum:

If using a 32-bit hash function, the maximum meaningful global depth is 32 (extracting all bits). At depth 32:

Directory size = 2^32 = 4.3 billion entries
Directory memory = 4.3 billion × 8 bytes = 34 GB

For 64-bit hashes, the theoretical maximum is 64, though this would require 136 exabytes for the directory alone—clearly impractical.

Practical limits:

Practical Global Depth Limits by System
System Type	Typical Max GD	Memory Limit	Rationale
Embedded system	12-14	< 1 MB directory	Severe memory constraints
Standard server	18-22	< 64 MB directory	Directory should fit in L3 cache or RAM
Large-memory server	24-26	< 1 GB directory	Bulk of RAM for data, not metadata
Disk-based directory	28-32	Multi-GB on disk	Directory I/O becomes a factor

What happens at the limit?

When global depth reaches the system's configured maximum:

No more directory doubling — The directory size is capped
Buckets may overflow — If a bucket with d' = d_max overflows, it cannot split
Overflow chains emerge — The system falls back to chaining additional pages
Performance degrades — Lookups become O(1 + average_chain_length)

This is identical to the degradation in static hashing, showing that extendible hashing gracefully degrades rather than failing catastrophically.

Monitoring Global Depth

Production systems should monitor global depth and alert when it approaches configured limits. A sudden spike in global depth often indicates poor hash function distribution or unexpected data patterns requiring investigation.

depth_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Hypothetical monitoring query for extendible hash index
SELECT 
    index_name,
    global_depth,
    max_allowed_depth,
    (max_allowed_depth - global_depth) AS depth_headroom,
    bucket_count,
    directory_size_bytes,
    CASE 
        WHEN global_depth >= max_allowed_depth - 2 THEN 'CRITICAL'
        WHEN global_depth >= max_allowed_depth - 4 THEN 'WARNING'
        ELSE 'OK'
    END AS status
FROM sys.extendible_hash_indexes
WHERE database_name = 'production';
 
-- Alert if any index is within 2 levels of maximum
-- This gives time to:
--   1. Investigate data distribution
--   2. Consider larger bucket size
--   3. Plan for potential overflow handling

Global Depth in Concurrent Systems

In multi-threaded database systems, global depth requires special handling because it affects the interpretation of all lookups.

The concurrency challenge:

Consider this scenario:

Thread A reads global_depth = 5, computes directory index 19
Thread B triggers directory doubling, global_depth becomes 6
Thread A accesses directory[19] — but the directory structure has changed!

Solutions used in practice:

Concurrency Control Strategies

•Read-Write Locks on Directory — Readers share-lock the directory; doublers exclusively lock it. Simple but creates contention.
•Versioned Directories — Each directory version has its depth. Readers capture the version at start. Old versions remain valid until all readers finish.
•Copy-on-Write (COW) — Doubling creates a new directory; the old one remains valid. Readers see a consistent snapshot.
•Lock-Free with Validation — Readers optimistically read, then validate that global_depth hasn't changed. Retry if validation fails.
•Epoch-Based Reclamation — Old directory versions are freed only when no threads from that epoch remain active.

versioned_directory.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Versioned directory for lock-free reads
class VersionedDirectory {
    atomic<Version> current_version
    
    struct Version {
        Directory* directory
        uint32_t global_depth
        uint64_t version_number
    }
}
 
function lookup(key):
    // Optimistic read
    v = current_version.load()
    
    index = extractBits(hash(key), v.global_depth)
    bucket_ptr = v.directory[index]
    
    // Validate version hasn't changed
    if current_version.load().version_number != v.version_number:
        return lookup(key)  // Retry with new version
    
    return bucket_ptr
 
function doubleDirectory():
    old_v = current_version.load()
    new_dir = allocateDoubledDirectory(old_v.directory)
    
    new_v = Version {
        directory: new_dir,
        global_depth: old_v.global_depth + 1,
        version_number: old_v.version_number + 1
    }
    
    // Atomic swap to new version
    current_version.compare_and_swap(old_v, new_v)
    
    // Schedule old directory for deferred cleanup
    deferredFree(old_v.directory)

Read Optimizations

Since directory doublings are rare in mature systems, optimistic locking with retry is highly effective. Most reads succeed without contention. The system only pays the retry cost during actual doublings, which happen O(log N) times over the table's lifetime.

Summary: The Global Depth Mechanism

Global depth is the master control parameter of extendible hashing. Let's consolidate the key concepts:

Key Takeaways

•Global depth determines directory size — The directory always has exactly 2^d entries, no more, no less.
•Global depth specifies hash bit usage — The first d bits of each hash value form the directory index.
•Global depth ≥ local depth always — No bucket's local depth can exceed the global depth. This invariant enables the sharing mechanism.
•Directory doubling increases global depth — When a bucket with d' = d overflows, the directory must double first.
•Doubling is metadata-only — No data moves during directory doubling; only pointers are copied.
•Doubling becomes rare — As the table grows, the fraction of buckets that can trigger doubling decreases exponentially.
•Practical limits exist — Memory constraints cap global depth around 20-26 for in-memory directories.

What's next:

Global depth works in tandem with local depth—the per-bucket counter that indicates how many hash bits distinguish that bucket's contents. Next, we'll explore how local depth enables localized splitting without global reorganization.

Page Complete

You now understand global depth—the system-wide counter that controls directory size and hash bit interpretation. The interplay between global depth and local depth is what makes extendible hashing dynamic without being disruptive.

2 / 5

Loading learning content...

Database Management SystemsExtendible Hashing

Extendible Hashing

LevelIntermediate

Duration75 mins

TopicExtendible Hashing

2 / 5

Global Depth

The Master Control of Directory Size

What You Will Learn

Definition and Role

Formal Definition:

The global depth (denoted as d or GD) is a non-negative integer that specifies:

How many leading bits of the hash value are used to compute the directory index
The size of the directory array, which equals 2^d
The maximum distinguishing power of the current directory configuration

The fundamental relationship:

directory_size = 2^(global_depth)
directory_index = first_d_bits(hash(key))

Global Depth and Directory Size
Global Depth (d)	Directory Size (2^d)	Hash Bits Examined	Max Addressable Values
1	2	1 bit	0, 1
2	4	2 bits	00, 01, 10, 11
3	8	3 bits	000...111
4	16	4 bits	0000...1111
8	256	8 bits	00000000...11111111
10	1,024	10 bits	First 10 bits
16	65,536	16 bits	First 16 bits
20	1,048,576	20 bits	First 20 bits

What the global depth tells you:

Global Depth as History

Hash Bit Interpretation

Understanding exactly how global depth interacts with hash values is essential for implementing extendible hashing correctly.

The bit extraction process:

Given a 32-bit hash value h(k) and global depth d, the directory index is the integer formed by the first (most significant) d bits of the hash.

Example walkthrough:

hash_bit_extraction.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Given:
  - Key: "customer_12345"
  - Hash function produces: h(key) = 0xB7A3F1E2
  - In binary: 1011 0111 1010 0011 1111 0001 1110 0010
  - Global depth: d = 4
 
Extraction:
  Step 1: Take first 4 bits → 1011 (binary)
  Step 2: Convert to decimal → 11
  Step 3: directory_index = 11
 
Therefore:
  - This key maps to directory[11]
  - directory[11] points to some bucket B
  - The key is stored in bucket B
 
If global depth were 6 instead:
  Step 1: Take first 6 bits → 101101 (binary)
  Step 2: Convert to decimal → 45
  Step 3: directory_index = 45

Why most significant bits?

The choice of most significant bits (MSB) rather than least significant bits (LSB) is a deliberate design decision with important implications:

MSB Advantages

•Directory doubling is simple: duplicate each entry
•Entries sharing a bucket are contiguous in the directory
•Binary search on directory indices is efficient
•Extension only affects the high-order bit

If LSB Were Used

•Doubling would interleave old and new entries
•Shared bucket entries would be scattered
•Pattern recognition becomes harder
•Split logic becomes more complex

Some Implementations Use LSB

Global Depth as Upper Bound

One of the most important properties of global depth is its relationship to local depth:

The Fundamental Invariant:

For every bucket B with local depth d', and global depth d: d' ≤ d (local depth never exceeds global depth)

This invariant is maintained at all times and has deep implications for how extendible hashing operates.

Why This Invariant Matters

•Directory coverage — Every bucket can be addressed by at least one directory entry. If local depth exceeded global depth, some buckets would be unreachable.
•Split safety — A bucket with d' = d can only split after the directory doubles. This coordinates bucket splits with directory growth.
•Pointer sharing — The number of directory entries pointing to a bucket is 2^(d - d'), which is only meaningful when d ≥ d'.
•Recovery correctness — During crash recovery, verifying d ≥ d' for all buckets validates structural integrity.

Converting Mermaid diagram...

Interpretation of the d = d' case:

When a bucket's local depth equals the global depth (d' = d), this bucket:

Has exactly one directory entry pointing to it
Uses all available hash bits for addressing
Cannot split without first doubling the directory

The Split Trigger

Directory Doubling Mechanics

When global depth must increase, the directory undergoes a doubling operation. This is the most expensive operation in extendible hashing, but it's designed to be as efficient as possible.

The doubling algorithm:

directory_doubling.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function doubleDirectory():
    old_size = 2^global_depth  // Current directory size
    new_size = 2 * old_size    // New size after doubling
    
    // Step 1: Allocate new directory array
    new_directory = allocate(new_size)
    
    // Step 2: Duplicate all pointers
    for i = 0 to old_size - 1:
        new_directory[i] = directory[i]           // Original position
        new_directory[i + old_size] = directory[i] // Duplicated position
    
    // Step 3: Replace old directory with new
    directory = new_directory
    global_depth = global_depth + 1
    
    // Step 4: Now the triggering bucket can split
    // (handled by subsequent split operation)
 
// Visual example:
// Before (d=2): [B0] [B1] [B2] [B2]  (buckets B0, B1, B2)
//                0    1    2    3
//
// After (d=3):  [B0] [B1] [B2] [B2] [B0] [B1] [B2] [B2]
//                0    1    2    3    4    5    6    7
//
// Note: All pointers duplicated, no data moved

Key observations:

No data moves during doubling — Only pointers are copied. All buckets remain in their original locations.
Each entry is duplicated — Entry i appears at positions i and i + old_size in the new directory.
Bucket local depths unchanged — The doubling doesn't affect any bucket's local depth. All buckets maintain their existing pointer counts.
All buckets now have doubled pointers — After doubling, every bucket's pointer count doubles (since 2^((d+1) - d') = 2 × 2^(d - d')).

Cost analysis:

Directory Doubling Costs
Operation	Cost	Notes
Memory allocation	O(2^d) time	Allocate array of 2^(d+1) pointers
Pointer copying	O(2^d) time	Copy each pointer exactly once
Memory overhead	Doubles from 2^d × ptr_size	Temporary peak during transition
I/O operations	0 bucket I/Os	No data pages are read or written
Lock duration	Proportional to 2^d	May require exclusive lock on directory

Doubling is Logarithmic in Database Size

When Does Global Depth Increase?

Global depth increases under one specific condition:

The Trigger Condition:

Global depth increases when a bucket with local_depth = global_depth (i.e., d' = d) overflows and needs to split.

Let's trace through the decision logic:

insert_decision_tree.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
function insert(key, value):
    bucket = findBucket(key)  // O(1) via directory
    
    if bucket.hasSpace():
        bucket.insert(key, value)
        return SUCCESS
    
    // Bucket overflow case
    if bucket.local_depth < global_depth:
        // Case 1: Can split without directory growth
        splitBucket(bucket)
        insert(key, value)  // Retry after split
        
    else:
        // Case 2: d' = d, must grow directory first
        assert bucket.local_depth == global_depth
        doubleDirectory()   // Increase global_depth
        splitBucket(bucket) // Now we can split
        insert(key, value)  // Retry after split
 
// Key insight:
// - Case 1 is local: affects only one bucket
// - Case 2 is global: affects entire directory structure

Frequency of directory doubling:

Directory doubling becomes increasingly rare as the table grows:

With d=1: Every bucket that fills triggers doubling
With d=10: Only 1 in 512 buckets can trigger doubling (the d'=10 buckets)
With d=20: Only 1 in 524,288 buckets can trigger doubling

This is because most buckets have local_depth < global_depth and can split locally. The probability that a random overflow hits a d' = d bucket decreases as the table grows.

Doubling Frequency Pattern

•Initial phase (d=1-3): Frequent doubling as structure expands
•Growth phase (d=4-10): Occasional doubling, most splits are local
•Stable phase (d=10+): Rare doubling, almost all splits are local
•Mature phase (d=16+): Doubling essentially never happens

Amortized Cost

Global Depth Limits

While theoretically global depth could grow indefinitely, practical systems impose limits.

Theoretical maximum:

If using a 32-bit hash function, the maximum meaningful global depth is 32 (extracting all bits). At depth 32:

Directory size = 2^32 = 4.3 billion entries
Directory memory = 4.3 billion × 8 bytes = 34 GB

For 64-bit hashes, the theoretical maximum is 64, though this would require 136 exabytes for the directory alone—clearly impractical.

Practical limits:

Practical Global Depth Limits by System
System Type	Typical Max GD	Memory Limit	Rationale
Embedded system	12-14	< 1 MB directory	Severe memory constraints
Standard server	18-22	< 64 MB directory	Directory should fit in L3 cache or RAM
Large-memory server	24-26	< 1 GB directory	Bulk of RAM for data, not metadata
Disk-based directory	28-32	Multi-GB on disk	Directory I/O becomes a factor

What happens at the limit?

When global depth reaches the system's configured maximum:

No more directory doubling — The directory size is capped
Buckets may overflow — If a bucket with d' = d_max overflows, it cannot split
Overflow chains emerge — The system falls back to chaining additional pages
Performance degrades — Lookups become O(1 + average_chain_length)

This is identical to the degradation in static hashing, showing that extendible hashing gracefully degrades rather than failing catastrophically.

Monitoring Global Depth

depth_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Hypothetical monitoring query for extendible hash index
SELECT 
    index_name,
    global_depth,
    max_allowed_depth,
    (max_allowed_depth - global_depth) AS depth_headroom,
    bucket_count,
    directory_size_bytes,
    CASE 
        WHEN global_depth >= max_allowed_depth - 2 THEN 'CRITICAL'
        WHEN global_depth >= max_allowed_depth - 4 THEN 'WARNING'
        ELSE 'OK'
    END AS status
FROM sys.extendible_hash_indexes
WHERE database_name = 'production';
 
-- Alert if any index is within 2 levels of maximum
-- This gives time to:
--   1. Investigate data distribution
--   2. Consider larger bucket size
--   3. Plan for potential overflow handling

Global Depth in Concurrent Systems

In multi-threaded database systems, global depth requires special handling because it affects the interpretation of all lookups.

The concurrency challenge:

Consider this scenario:

Thread A reads global_depth = 5, computes directory index 19
Thread B triggers directory doubling, global_depth becomes 6
Thread A accesses directory[19] — but the directory structure has changed!

Solutions used in practice:

Concurrency Control Strategies

•Read-Write Locks on Directory — Readers share-lock the directory; doublers exclusively lock it. Simple but creates contention.
•Versioned Directories — Each directory version has its depth. Readers capture the version at start. Old versions remain valid until all readers finish.
•Copy-on-Write (COW) — Doubling creates a new directory; the old one remains valid. Readers see a consistent snapshot.
•Lock-Free with Validation — Readers optimistically read, then validate that global_depth hasn't changed. Retry if validation fails.
•Epoch-Based Reclamation — Old directory versions are freed only when no threads from that epoch remain active.

versioned_directory.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Versioned directory for lock-free reads
class VersionedDirectory {
    atomic<Version> current_version
    
    struct Version {
        Directory* directory
        uint32_t global_depth
        uint64_t version_number
    }
}
 
function lookup(key):
    // Optimistic read
    v = current_version.load()
    
    index = extractBits(hash(key), v.global_depth)
    bucket_ptr = v.directory[index]
    
    // Validate version hasn't changed
    if current_version.load().version_number != v.version_number:
        return lookup(key)  // Retry with new version
    
    return bucket_ptr
 
function doubleDirectory():
    old_v = current_version.load()
    new_dir = allocateDoubledDirectory(old_v.directory)
    
    new_v = Version {
        directory: new_dir,
        global_depth: old_v.global_depth + 1,
        version_number: old_v.version_number + 1
    }
    
    // Atomic swap to new version
    current_version.compare_and_swap(old_v, new_v)
    
    // Schedule old directory for deferred cleanup
    deferredFree(old_v.directory)

Read Optimizations

Summary: The Global Depth Mechanism

Global depth is the master control parameter of extendible hashing. Let's consolidate the key concepts:

Key Takeaways

•Global depth determines directory size — The directory always has exactly 2^d entries, no more, no less.
•Global depth specifies hash bit usage — The first d bits of each hash value form the directory index.
•Global depth ≥ local depth always — No bucket's local depth can exceed the global depth. This invariant enables the sharing mechanism.
•Directory doubling increases global depth — When a bucket with d' = d overflows, the directory must double first.
•Doubling is metadata-only — No data moves during directory doubling; only pointers are copied.
•Doubling becomes rare — As the table grows, the fraction of buckets that can trigger doubling decreases exponentially.
•Practical limits exist — Memory constraints cap global depth around 20-26 for in-memory directories.

What's next:

Page Complete

2 / 5