Loading learning content...
While global depth is the system-wide counter controlling directory size, local depth is the per-bucket counter that remembers each bucket's individual split history. Every bucket carries its local depth as metadata, and this single integer encodes crucial information about the bucket's creation, its relationship to the directory, and whether it can split independently.
If global depth is the orchestra conductor, local depth is each musician's individual tempo marking—together they coordinate to produce a harmonious, adaptive data structure.
By the end of this page, you will understand how local depth tracks bucket-specific history, why the difference (global_depth - local_depth) determines pointer count, how local depth governs split behavior, and the invariants that local depth must satisfy.
Formal Definition:
The local depth of a bucket B (denoted as d' or LD) is a non-negative integer that specifies:
The fundamental relationships:
12345678910111213141516171819202122232425
Given: - Global depth: d - Local depth of bucket B: d' Then: - Number of directory entries pointing to B = 2^(d - d') - The first d' bits of all keys in B are identical - B was created by (d' - initial_depth) splits from the original bucket Examples (assuming d = 4, initial_depth = 1): Bucket with d' = 4: - Directory entries pointing here: 2^(4-4) = 1 - Keys share first 4 bits - Created by 3 splits Bucket with d' = 2: - Directory entries pointing here: 2^(4-2) = 4 - Keys share first 2 bits - Created by 1 split Bucket with d' = 1: - Directory entries pointing here: 2^(4-1) = 8 - Keys share first 1 bit - This is an original bucket (never split)What local depth means intuitively:
Local depth answers the question: "How specific is this bucket about which keys it accepts?"
As buckets split, they become more specific—each split doubles the specificity by considering one more bit of the hash value.
If you start with global_depth = 1 (two buckets, each with local_depth = 1), then after n splits on a particular bucket lineage, the resulting bucket has local_depth = 1 + n. Local depth literally counts the split history.
The most operationally important property of local depth is how it determines the number of directory entries pointing to a bucket.
The pointer count formula:
pointer_count(bucket) = 2^(global_depth - local_depth)
This formula is always valid because of the invariant 0 ≤ d' ≤ d.
Visual demonstration:
Key observations from the diagram:
Verifying the invariant:
The total pointer count across all buckets must equal the directory size:
2^(3-2) + 2^(3-2) + 2^(3-1) = 2 + 2 + 4 = 8 = 2^3 ✓
For a bucket with local_depth d', the directory entries pointing to it are all indices where the first d' bits match the bucket's hash prefix. For Bucket C (d'=1, prefix '1'), entries 100, 101, 110, 111 all start with '1', so all four point to C.
Local depth not only affects directory pointers but also constrains which keys can reside in a bucket.
The key constraint:
All keys in a bucket with local_depth d' must have the same first d' bits in their hash values.
This is what makes the bucket addressable—any directory entry that points to this bucket shares those d' leading bits.
Example walkthrough:
1234567891011121314151617181920
Scenario: Bucket with local_depth = 3, hash prefix = "010" Valid keys (their hashes start with 010): - hash("alice") = 0b01001101... → prefix 010 ✓ - hash("zebra") = 0b01011111... → prefix 010 ✓ - hash("data") = 0b01000001... → prefix 010 ✓ Invalid keys (wrong prefix): - hash("bob") = 0b11001101... → prefix 110 ✗ - hash("cat") = 0b00011111... → prefix 000 ✗ - hash("epsilon") = 0b01100001... → prefix 011 ✗ Key insight: After the first 3 bits (which must be 010), the remaining hash bits can be anything. The bucket may contain keys with hashes 01000000..., 01001111..., 01010101..., etc. When this bucket splits, bit 4 will differentiate: - Bit 4 = 0: Keys move to new bucket with prefix 0100 - Bit 4 = 1: Keys stay with prefix 0101Why this matters for correctness:
The local depth ensures that every key is findable. When searching for a key:
If the invariant were violated (a key with wrong prefix in a bucket), lookups would fail to find existing keys.
A bucket containing a key whose hash prefix doesn't match the bucket's expected prefix indicates data corruption. Database systems can validate this invariant during consistency checks (e.g., DBCC CHECKDB in SQL Server, pg_amcheck in PostgreSQL).
Local depth increases when a bucket splits. This is the fundamental mechanism of extendible hashing evolution.
The split operation and local depth:
123456789101112131415161718192021222324252627282930313233343536
function splitBucket(bucket): old_local_depth = bucket.local_depth new_local_depth = old_local_depth + 1 // Create sibling bucket with same new depth sibling = createNewBucket(local_depth = new_local_depth) bucket.local_depth = new_local_depth // Redistribute keys based on the (old_local_depth + 1)th bit discriminating_bit = new_local_depth for each key in bucket.keys: hash = hash(key) bit = extractBit(hash, discriminating_bit) // 0 or 1 if bit == 1: sibling.insert(key, bucket.getValue(key)) bucket.remove(key) // Update directory pointers updateDirectoryPointers(bucket, sibling) // After split: // - Original bucket has local_depth = old + 1 // - Sibling bucket has local_depth = old + 1 // - Each has half the original pointer count // (2^(d - (old+1)) = 2^(d - old) / 2) // Example:// Before: Bucket B with d' = 2, global d = 4// - Pointers: 2^(4-2) = 4 entries point to B//// After split: Bucket B and Sibling S, both with d' = 3// - Pointers to B: 2^(4-3) = 2// - Pointers to S: 2^(4-3) = 2// - Total: 2 + 2 = 4 (same as before) ✓Pointer redistribution during split:
When a bucket with local_depth d' splits:
The total pointer count is preserved: 2^(d - d') = 2^(d - d' - 1) + 2^(d - d' - 1)
| Before Split | Operation | After Split |
|---|---|---|
| Bucket A, d'=2 | Split triggered by overflow | Bucket A: d'=3, Bucket A': d'=3 |
| Entries 0100, 0101, 0110, 0111 → A | Bit 3 partitions entries | 0100,0101 → A; 0110,0111 → A' |
| 4 pointers to A | Each bucket gets half | 2 pointers to A, 2 pointers to A' |
When local_depth equals global_depth, the bucket has only 1 pointer. Splitting would require 0.5 pointers each—impossible! This is why d' = d overflow triggers directory doubling first, which increases d, creating room for the split.
A key feature of extendible hashing is that different buckets can have different local depths. This asymmetry reflects non-uniform data distribution.
Why asymmetry occurs:
Consider a scenario where keys hash non-uniformly:
The '1...' bucket fills and splits repeatedly, increasing its local depth. The '0...' bucket remains small and never splits, keeping its original local depth.
Real-world example:
1234567891011121314151617181920212223242526272829
Scenario: Customer ID hashing where most customers are recent Hash function tends to cluster recent IDs: - Old customer IDs → hash prefixes 00..., 01... - Recent customer IDs → hash prefix 10..., 11... Initial state (d=2): Bucket 00: d'=2, 20 records Bucket 01: d'=2, 25 records Bucket 10: d'=2, 150 records → overflow! Bucket 11: d'=2, 180 records → overflow! After multiple splits (d=4): Bucket 00: d'=2, 20 records (4 directory ptrs) Bucket 01: d'=2, 25 records (4 directory ptrs) Bucket 1000: d'=4, 35 records (1 directory ptr) Bucket 1001: d'=4, 38 records (1 directory ptr) Bucket 1010: d'=4, 40 records (1 directory ptr) Bucket 1011: d'=4, 37 records (1 directory ptr) Bucket 1100: d'=4, 42 records (1 directory ptr) Bucket 1101: d'=4, 45 records (1 directory ptr) Bucket 1110: d'=4, 48 records (1 directory ptr) Bucket 1111: d'=4, 43 records (1 directory ptr) Observations: - Sparse regions (00, 01) share pointers, saving buckets - Dense regions (10.., 11..) are fully split - Directory is 16 entries but only 10 buckets exist - Space adapts to actual data distributionBenefits of asymmetric local depths:
The variance in local depths across buckets indicates data skew. In a perfectly uniform distribution, all buckets would have similar local depths. Large variance suggests hash function issues or inherently skewed data that might benefit from a different hash strategy.
Just as splitting increases local depth, merging (the inverse operation) decreases it. Merging is used when buckets become sparse after many deletions.
When can buckets merge?
Two buckets can merge if:
Identifying siblings:
Two buckets are siblings if their hash prefixes differ only in the last bit (the bit at position local_depth). Bucket with prefix 01010 and bucket with prefix 01011 are siblings if both have local_depth = 5.
123456789101112131415161718192021222324252627282930313233343536373839404142
function considerMerge(bucket): if bucket.local_depth == 0: return // Cannot merge original bucket // Find sibling by flipping the last bit of prefix sibling_prefix = bucket.prefix XOR (1 << (total_bits - bucket.local_depth)) sibling = findBucketByPrefix(sibling_prefix) if sibling == null: return // Sibling doesn't exist (maybe already merged) if sibling.local_depth != bucket.local_depth: return // Not actually siblings if bucket.size + sibling.size > BUCKET_CAPACITY: return // Merged bucket would overflow // Perform merge mergeBuckets(bucket, sibling) function mergeBuckets(bucket, sibling): // Move all entries from sibling to bucket for each entry in sibling: bucket.insert(entry) // Decrease local depth bucket.local_depth = bucket.local_depth - 1 // Update directory: pointers that went to sibling now go to bucket for each directory_entry pointing to sibling: directory_entry = bucket // Free sibling bucket deallocate(sibling) // After merge: // - Bucket has local_depth decreased by 1 // - Bucket has 2x the pointer count // - Sibling is deallocated // Note: Directory shrinking is less common// Even if all buckets could merge, the directory usually stays largeDirectory shrinking (halving):
Theoretically, when every bucket has local_depth < global_depth, the directory could shrink. However, most systems don't implement automatic directory shrinking because:
Unlike splits (which are triggered by overflow), merges are often deferred or triggered by explicit compaction commands. Aggressive merging can lead to thrashing if data patterns fluctuate. Many systems skip automatic merging entirely.
In a physical implementation, local depth must be stored somewhere persistent. There are two common approaches:
Approach 1: Store in bucket header
Each bucket page contains a header with metadata including local depth:
┌─────────────────────────────────────────────────┐│ Bucket Page Header ││ ─ page_id: 4 bytes ││ ─ local_depth: 1 byte ← Local depth here ││ ─ record_count: 2 bytes ││ ─ free_space: 2 bytes ││ ─ next_overflow: 4 bytes (if using chains) ││ ─ checksum: 4 bytes │├─────────────────────────────────────────────────┤│ Slot Directory ││ [offset1, len1] [offset2, len2] ... │├─────────────────────────────────────────────────┤│ ││ Record Storage Area ││ (grows from bottom, slots from top) ││ │└─────────────────────────────────────────────────┘ Advantages: - Local depth always available when reading bucket - No additional metadata lookup required - Self-describing pages for recovery Disadvantages: - Extra I/O to read local depth if only checking metadata - Slight space overhead per pageApproach 2: Store in directory structure
The directory entry can store both the bucket pointer and local depth:
1234567891011121314151617181920
// Directory entry structure with embedded local depthstruct DirectoryEntry { uint32_t bucket_page_id; // 4 bytes: physical page location uint8_t local_depth; // 1 byte: bucket's local depth uint8_t padding[3]; // 3 bytes: alignment};// Total: 8 bytes per entry // Advantages:// - Quick access to local depth without bucket I/O// - Useful for split planning without reading bucket contents//// Disadvantages:// - Redundant if local depth also in bucket header// - Must keep both copies in sync // Alternative: Compute from directory pattern// If entries i and i+1 both point to same bucket, local_depth < global_depth// Count consecutive identical pointers to derive local_depth// But this is O(log n) lookup vs O(1) with stored depthMost implementations store local depth in the bucket header and optionally cache it in the directory structure. The bucket header is authoritative; directory cache is for optimization. During recovery, local depths are verified by reading bucket headers.
Local depth is the per-bucket counter that enables extendible hashing's adaptive behavior. Let's consolidate the key concepts:
What's next:
Now that we understand both global depth and local depth, we'll explore directory doubling—the operation that increases global depth when a maximally-split bucket overflows. This is the most structurally significant operation in extendible hashing.
You now understand local depth—the per-bucket counter that enables adaptive, localized growth. The interplay between global_depth and local_depth is what makes extendible hashing simultaneously dynamic and efficient.