Loading learning content...
If the directory is the backbone of extendible hashing, then global depth is its control dial. This single integer value determines how large the directory is, how many hash bits are examined for bucket lookup, and when the entire directory must expand.
Think of global depth as the precision setting on a measurement instrument. At low precision (small global depth), you make coarse distinctions—multiple different hash values map to the same bucket. At high precision (large global depth), every nuance in the hash value leads to a different bucket.
By the end of this page, you will understand how global depth controls directory sizing, why it increases during directory doubling, its relationship to hash bit usage, and how it provides an upper bound on the local depth of any bucket.
Formal Definition:
The global depth (denoted as d or GD) is a non-negative integer that specifies:
2^dThe fundamental relationship:
directory_size = 2^(global_depth)
directory_index = first_d_bits(hash(key))
This relationship is invariant throughout the lifetime of the extendible hash structure. Whenever you see a directory of size 16, you immediately know the global depth is 4. When global depth increases by 1, the directory doubles in size.
| Global Depth (d) | Directory Size (2^d) | Hash Bits Examined | Max Addressable Values |
|---|---|---|---|
| 1 | 2 | 1 bit | 0, 1 |
| 2 | 4 | 2 bits | 00, 01, 10, 11 |
| 3 | 8 | 3 bits | 000...111 |
| 4 | 16 | 4 bits | 0000...1111 |
| 8 | 256 | 8 bits | 00000000...11111111 |
| 10 | 1,024 | 10 bits | First 10 bits |
| 16 | 65,536 | 16 bits | First 16 bits |
| 20 | 1,048,576 | 20 bits | First 20 bits |
What the global depth tells you:
The global depth encapsulates the history of directory growth. A global depth of 5 means the directory has doubled (at least) 4 times from its initial size of 2. It represents the current level of addressing precision available to the hash structure.
If you start with global depth 1 (two buckets) and observe a table with global depth 12, you know the directory has doubled 11 times to accommodate data growth. That's 4,096 directory entries, though the actual number of distinct buckets may be far fewer due to sharing.
Understanding exactly how global depth interacts with hash values is essential for implementing extendible hashing correctly.
The bit extraction process:
Given a 32-bit hash value h(k) and global depth d, the directory index is the integer formed by the first (most significant) d bits of the hash.
Example walkthrough:
1234567891011121314151617181920
Given: - Key: "customer_12345" - Hash function produces: h(key) = 0xB7A3F1E2 - In binary: 1011 0111 1010 0011 1111 0001 1110 0010 - Global depth: d = 4 Extraction: Step 1: Take first 4 bits → 1011 (binary) Step 2: Convert to decimal → 11 Step 3: directory_index = 11 Therefore: - This key maps to directory[11] - directory[11] points to some bucket B - The key is stored in bucket B If global depth were 6 instead: Step 1: Take first 6 bits → 101101 (binary) Step 2: Convert to decimal → 45 Step 3: directory_index = 45Why most significant bits?
The choice of most significant bits (MSB) rather than least significant bits (LSB) is a deliberate design decision with important implications:
While MSB is most common, some implementations (notably certain versions of linear hashing) use LSB. The key is consistency—once chosen, the bit selection strategy must remain fixed for the lifetime of the hash structure.
One of the most important properties of global depth is its relationship to local depth:
The Fundamental Invariant:
For every bucket B with local depth
d', and global depthd: d' ≤ d (local depth never exceeds global depth)
This invariant is maintained at all times and has deep implications for how extendible hashing operates.
Interpretation of the d = d' case:
When a bucket's local depth equals the global depth (d' = d), this bucket:
This is the critical case that triggers directory doubling. When an at-capacity bucket with d' = d receives a new insertion that would overflow it, the system must first double the directory to make room for the split.
An overflow in a bucket where d' = d is the only scenario that forces directory doubling. All other overflow cases (d' < d) can be resolved by a local bucket split without touching the directory size.
When global depth must increase, the directory undergoes a doubling operation. This is the most expensive operation in extendible hashing, but it's designed to be as efficient as possible.
The doubling algorithm:
123456789101112131415161718192021222324252627
function doubleDirectory(): old_size = 2^global_depth // Current directory size new_size = 2 * old_size // New size after doubling // Step 1: Allocate new directory array new_directory = allocate(new_size) // Step 2: Duplicate all pointers for i = 0 to old_size - 1: new_directory[i] = directory[i] // Original position new_directory[i + old_size] = directory[i] // Duplicated position // Step 3: Replace old directory with new directory = new_directory global_depth = global_depth + 1 // Step 4: Now the triggering bucket can split // (handled by subsequent split operation) // Visual example:// Before (d=2): [B0] [B1] [B2] [B2] (buckets B0, B1, B2)// 0 1 2 3//// After (d=3): [B0] [B1] [B2] [B2] [B0] [B1] [B2] [B2]// 0 1 2 3 4 5 6 7//// Note: All pointers duplicated, no data movedKey observations:
No data moves during doubling — Only pointers are copied. All buckets remain in their original locations.
Each entry is duplicated — Entry i appears at positions i and i + old_size in the new directory.
Bucket local depths unchanged — The doubling doesn't affect any bucket's local depth. All buckets maintain their existing pointer counts.
All buckets now have doubled pointers — After doubling, every bucket's pointer count doubles (since 2^((d+1) - d') = 2 × 2^(d - d')).
Cost analysis:
| Operation | Cost | Notes |
|---|---|---|
| Memory allocation | O(2^d) time | Allocate array of 2^(d+1) pointers |
| Pointer copying | O(2^d) time | Copy each pointer exactly once |
| Memory overhead | Doubles from 2^d × ptr_size | Temporary peak during transition |
| I/O operations | 0 bucket I/Os | No data pages are read or written |
| Lock duration | Proportional to 2^d | May require exclusive lock on directory |
If your buckets hold ~100 records each and you have 1 million records, you need about 10,000 buckets. With global depth 14, you can address 16,384 buckets. The directory size is only 16K × 8 bytes = 128 KB. Directory doubling copies just 128 KB of pointers—trivial compared to moving any actual data.
Global depth increases under one specific condition:
The Trigger Condition:
Global depth increases when a bucket with
local_depth = global_depth(i.e.,d' = d) overflows and needs to split.
Let's trace through the decision logic:
1234567891011121314151617181920212223
function insert(key, value): bucket = findBucket(key) // O(1) via directory if bucket.hasSpace(): bucket.insert(key, value) return SUCCESS // Bucket overflow case if bucket.local_depth < global_depth: // Case 1: Can split without directory growth splitBucket(bucket) insert(key, value) // Retry after split else: // Case 2: d' = d, must grow directory first assert bucket.local_depth == global_depth doubleDirectory() // Increase global_depth splitBucket(bucket) // Now we can split insert(key, value) // Retry after split // Key insight:// - Case 1 is local: affects only one bucket// - Case 2 is global: affects entire directory structureFrequency of directory doubling:
Directory doubling becomes increasingly rare as the table grows:
This is because most buckets have local_depth < global_depth and can split locally. The probability that a random overflow hits a d' = d bucket decreases as the table grows.
Although each directory doubling costs O(2^d), doublings happen at most O(log N) times for N insertions. The amortized cost per insertion is O(1), making extendible hashing efficient for dynamic datasets.
While theoretically global depth could grow indefinitely, practical systems impose limits.
Theoretical maximum:
If using a 32-bit hash function, the maximum meaningful global depth is 32 (extracting all bits). At depth 32:
For 64-bit hashes, the theoretical maximum is 64, though this would require 136 exabytes for the directory alone—clearly impractical.
Practical limits:
| System Type | Typical Max GD | Memory Limit | Rationale |
|---|---|---|---|
| Embedded system | 12-14 | < 1 MB directory | Severe memory constraints |
| Standard server | 18-22 | < 64 MB directory | Directory should fit in L3 cache or RAM |
| Large-memory server | 24-26 | < 1 GB directory | Bulk of RAM for data, not metadata |
| Disk-based directory | 28-32 | Multi-GB on disk | Directory I/O becomes a factor |
What happens at the limit?
When global depth reaches the system's configured maximum:
This is identical to the degradation in static hashing, showing that extendible hashing gracefully degrades rather than failing catastrophically.
Production systems should monitor global depth and alert when it approaches configured limits. A sudden spike in global depth often indicates poor hash function distribution or unexpected data patterns requiring investigation.
123456789101112131415161718192021
-- Hypothetical monitoring query for extendible hash indexSELECT index_name, global_depth, max_allowed_depth, (max_allowed_depth - global_depth) AS depth_headroom, bucket_count, directory_size_bytes, CASE WHEN global_depth >= max_allowed_depth - 2 THEN 'CRITICAL' WHEN global_depth >= max_allowed_depth - 4 THEN 'WARNING' ELSE 'OK' END AS statusFROM sys.extendible_hash_indexesWHERE database_name = 'production'; -- Alert if any index is within 2 levels of maximum-- This gives time to:-- 1. Investigate data distribution-- 2. Consider larger bucket size-- 3. Plan for potential overflow handlingIn multi-threaded database systems, global depth requires special handling because it affects the interpretation of all lookups.
The concurrency challenge:
Consider this scenario:
Solutions used in practice:
123456789101112131415161718192021222324252627282930313233343536373839
// Versioned directory for lock-free readsclass VersionedDirectory { atomic<Version> current_version struct Version { Directory* directory uint32_t global_depth uint64_t version_number }} function lookup(key): // Optimistic read v = current_version.load() index = extractBits(hash(key), v.global_depth) bucket_ptr = v.directory[index] // Validate version hasn't changed if current_version.load().version_number != v.version_number: return lookup(key) // Retry with new version return bucket_ptr function doubleDirectory(): old_v = current_version.load() new_dir = allocateDoubledDirectory(old_v.directory) new_v = Version { directory: new_dir, global_depth: old_v.global_depth + 1, version_number: old_v.version_number + 1 } // Atomic swap to new version current_version.compare_and_swap(old_v, new_v) // Schedule old directory for deferred cleanup deferredFree(old_v.directory)Since directory doublings are rare in mature systems, optimistic locking with retry is highly effective. Most reads succeed without contention. The system only pays the retry cost during actual doublings, which happen O(log N) times over the table's lifetime.
Global depth is the master control parameter of extendible hashing. Let's consolidate the key concepts:
What's next:
Global depth works in tandem with local depth—the per-bucket counter that indicates how many hash bits distinguish that bucket's contents. Next, we'll explore how local depth enables localized splitting without global reorganization.
You now understand global depth—the system-wide counter that controls directory size and hash bit interpretation. The interplay between global depth and local depth is what makes extendible hashing dynamic without being disruptive.