System Design (HLD)Indexes and Query Performance

Indexes and Query Performance

LevelIntermediate

Duration90 mins

TopicIndexes and Query Performance

3 / 5

Hash Indexes

The Promise of O(1) Lookups

If B-trees offer O(log N) lookups, and hash tables offer O(1) lookups, why would anyone use B-trees? This seemingly obvious question leads to a nuanced understanding of database indexing that separates superficial knowledge from deep expertise.

Hash indexes leverage hash tables—the same data structure you use for dictionaries and maps in application code—to provide constant-time exact-match lookups. In theory, this makes them the fastest possible index type for equality queries. In practice, their adoption in database systems is surprisingly limited.

This page explores hash indexes comprehensively: their internal mechanics, their compelling strengths, their critical limitations, and the specific scenarios where they represent the optimal indexing choice.

What You Will Master

By the end of this page, you'll understand how hash indexes work internally, why O(1) isn't always better than O(log N), when hash indexes are the right choice, which databases support them, and how to make informed decisions between hash and B-tree indexes.

Hash Index Fundamentals

A hash index is built on a hash table data structure. Understanding how hash tables work is essential for understanding hash index behavior.

The Hash Function:

At the core of any hash index is a hash function that maps keys to bucket locations:

hash(key) → bucket_number

Good hash functions have these properties:

Deterministic: Same key always produces same hash
Uniform distribution: Keys spread evenly across buckets
Fast computation: O(1) to compute the hash
Avalanche effect: Small key changes produce very different hashes

Databases typically use cryptographic-strength or near-cryptographic hash functions (like MurmurHash, CityHash, or xxHash) to ensure good distribution.

Bucket Structure:

The hash table consists of an array of buckets (also called slots). Each bucket can store one or more entries. The number of buckets is typically a power of 2 to enable fast modulo operations:

bucket_number = hash(key) mod num_buckets

hash-index-structure.txt
Hash Index Structure
 
Hash function: hash(key) mod 8 (8 buckets)
 
Keys to index: 'alice@email.com', 'bob@email.com', 'carol@email.com'
 
hash('alice@email.com') = 0xA3F2... mod 8 = 5
hash('bob@email.com')   = 0x7B21... mod 8 = 2  
hash('carol@email.com') = 0x2E45... mod 8 = 5  ← Collision with alice!
 
Bucket Array:
┌────────────────────────────────────────────────────────┐
│ Bucket 0 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 1 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 2 │ 'bob@email.com' → row_ptr_42                │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 3 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 4 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 5 │ 'alice@email.com' → row_ptr_17              │
│          │ 'carol@email.com' → row_ptr_89  ← Chained   │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 6 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 7 │ (empty)                                     │
└──────────┴─────────────────────────────────────────────┘
 
Lookup 'bob@email.com':
1. Compute hash('bob@email.com') mod 8 = 2
2. Go directly to bucket 2
3. Find 'bob@email.com' → return row_ptr_42
4. Total operations: 1 hash + 1 bucket access = O(1)

Collision Handling:

When two different keys hash to the same bucket, we have a collision. Databases handle collisions through chaining or open addressing:

Chaining (most common in databases):

Each bucket stores a linked list of entries
Colliding keys are added to the list
Lookup scans the list after finding the bucket

Open Addressing (less common):

Colliding entries probe for the next empty bucket
Requires more complex deletion handling
Can have better cache performance for sparse tables

Load Factor:

The load factor (entries / buckets) determines collision probability:

Load factor 0.5: ~50% of buckets have entries, collisions are rare
Load factor 1.0: All buckets used, many collisions likely
Load factor 2.0+: Significant chaining, degraded performance

Databases typically resize the hash table to maintain a load factor between 0.5 and 0.75.

The O(1) Caveat

O(1) lookup assumes the hash computation is O(1) relative to n, collisions are bounded (average case), and the hash table doesn't need resizing. In the worst case (all keys in one bucket), hash table lookup degrades to O(n). Good hash functions make this practically impossible for reasonable data.

Hash Index Lookup Process

Understanding the exact lookup process reveals why hash indexes are so fast for equality queries.

Step-by-Step Lookup:

Hash Index Lookup Algorithm

•Compute Hash: Calculate hash(search_key) using the index's hash function. This is a CPU operation, typically a few hundred nanoseconds.
•Calculate Bucket: Apply modulo: bucket_number = hash mod num_buckets. This yields the exact bucket location.
•Read Bucket Page: Fetch the page containing the target bucket from disk or buffer pool. This is typically 1 disk I/O if not cached.
•Scan Bucket Entries: Iterate through entries in the bucket, comparing stored keys with the search key. With low collision rates, this is 1-2 comparisons.
•Extract Row Pointer: If found, retrieve the row pointer from the matching entry.
•Access Table Row: Use the row pointer to fetch the actual row data from the heap.

hash-lookup-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Query using hash index
SELECT * FROM users WHERE email = 'alice@example.com';
 
-- If there's a HASH index on email:
 
-- PostgreSQL execution:
-- 1. Calculate hash('alice@example.com')
-- 2. Locate bucket in hash index
-- 3. Read bucket page (1 I/O or cache hit)
-- 4. Find matching entry in bucket
-- 5. Use row pointer to read table row (1 I/O or cache hit)
-- Total: 2 I/O operations (both often cached → <1ms)
 
-- Compare to B-tree:
-- 1. Read root page
-- 2. Read internal page(s)
-- 3. Read leaf page
-- 4. Use row pointer to read table row
-- Total: 3-4 I/O operations (still very fast)

Why O(1) vs O(log N) Doesn't Matter as Much as You'd Think:

Let's compare hash vs B-tree for a 1-billion row table:

Index Type	Index Traversal	I/O Operations
Hash index	1 bucket access	1
B-tree index	log₅₀₀(1B) ≈ 3-4 levels	3-4

The hash index saves 2-3 I/O operations per query. At 100μs per SSD read:

Hash: 100μs
B-tree: 300-400μs

This is a 3-4x improvement—significant, but not the orders-of-magnitude difference that "O(1) vs O(log N)" might suggest. The logarithm grows so slowly that for practical database sizes, it's effectively a small constant.

When Cache Matters:

With good caching (buffer pool holds hot pages):

B-tree root and internal nodes: nearly always cached
Only leaf node + heap access might hit disk
Effective B-tree cost: ~1-2 I/O

Hash indexes also benefit from caching, but B-tree's upper levels are reused across all queries, while hash buckets are more dispersed.

The Practical Performance Reality

In real-world systems, the difference between hash and B-tree index lookups is often measured in tens of microseconds. Both are 'fast enough' for most applications. The choice should be driven by query patterns (equality vs range), not theoretical complexity classes.

The Critical Limitation: No Range Queries

The fundamental limitation of hash indexes is their inability to support range queries, ordering, or prefix matching. This single limitation explains why B-trees dominate despite their theoretical inferiority for point lookups.

Why Hash Indexes Can't Support Ranges:

Hash functions are designed to produce pseudo-random output. Two similar keys will hash to completely different buckets:

hash('alice') → bucket 42
hash('aliceb') → bucket 7
hash('alicea') → bucket 128

There's no relationship between key proximity and bucket location. To find all keys between 'alice' and 'bob', you'd need to scan every bucket—which defeats the purpose of indexing entirely.

Query Pattern Support: Hash vs B-tree
Query Pattern	Hash Index	B-tree Index	Example
Exact equality	✅ O(1)	✅ O(log N)	WHERE id = 42
IN list	✅ O(k)	✅ O(k log N)	WHERE id IN (1,2,3)
Range (BETWEEN)	❌ Full scan	✅ O(log N + k)	WHERE price BETWEEN 10 AND 50
Greater/Less than	❌ Full scan	✅ O(log N + k)	WHERE created_at > '2024-01-01'
ORDER BY	❌ Extra sort	✅ Index order	ORDER BY name ASC
MIN/MAX	❌ Full scan	✅ O(log N)	SELECT MAX(price)
Prefix LIKE	❌ Full scan	✅ Range scan	WHERE name LIKE 'Al%'
Suffix LIKE	❌ Full scan	❌ Full scan	WHERE name LIKE '%son'
GROUP BY	❌ Hash+sort	✅ Often avoids sort	GROUP BY category

The Practical Impact:

Most real-world queries involve more than pure equality:

-- Dashboard query: needs ranges and ordering
SELECT * FROM orders 
WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
ORDER BY created_at DESC
LIMIT 100;

-- Analytics query: needs aggregation with ordering
SELECT status, COUNT(*) 
FROM orders 
GROUP BY status 
ORDER BY COUNT(*) DESC;

-- Search query: needs prefix matching
SELECT * FROM products
WHERE name LIKE 'Apple%';

None of these queries can use a hash index effectively. A single B-tree index on created_at or name handles all of them.

The Versatility Tax:

B-trees pay a small performance tax on equality queries (3x slower than hash) but support every query pattern. Hash indexes are faster for one pattern but useless for everything else. For most systems, B-tree versatility wins.

The Hidden Cost of Hash Indexes

If you create a hash index and later need range queries, you must create an additional B-tree index—doubling maintenance overhead and storage. Starting with B-tree avoids this trap. Choose hash only when you're certain you'll never need range support on that column.

Hash Index Maintenance

Hash index maintenance involves different trade-offs compared to B-trees. Understanding these helps evaluate the total cost of hash index ownership.

Insert Operations:

Hash index inserts are typically O(1):

Compute hash of new key
Locate target bucket
Add entry to bucket chain

However, if the table needs resizing (load factor too high), the cost spikes dramatically:

Resizing (Rehashing):

When the hash table grows past its load factor threshold, it must:

Allocate a new bucket array (typically 2x size)
Rehash every existing entry to new bucket locations
This is O(N) and blocks concurrent operations

This resize operation is expensive and causes latency spikes. Some hash implementations use incremental resizing (rehash a few entries per operation) to amortize this cost.

Hash Index Write Advantages

•Insert is O(1) average case (find bucket, append)
•No tree rebalancing needed
•No page splits to propagate
•Update is O(1) for the index entry
•Delete is O(1) (mark or remove from chain)

Hash Index Write Challenges

•Resize operations are O(N) and disruptive
•Long chains degrade performance
•Poor hash distribution wastes space
•Concurrent resize is complex (PostgreSQL limits this)
•Crash recovery is more complex than B-trees

Space Utilization:

Hash indexes can have significant space overhead:

Empty buckets: At load factor 0.5, half the buckets are empty
Bucket pointers: The bucket array itself consumes space
Collision chains: Long chains waste pointer space

The total space is often similar to or greater than B-trees, despite storing the same information.

Fragmentation:

Hash indexes can fragment when:

Many entries are deleted, leaving empty slots
Chains become unbalanced due to skewed access patterns
The hash function produces poor distribution for the actual data

Unlike B-trees (where REINDEX is well-understood), hash index maintenance is less standardized across databases.

Database-Specific Implementations

PostgreSQL's hash indexes were historically not WAL-logged and unsafe for crash recovery. As of PostgreSQL 10, they're fully WAL-logged but still have limitations (no concurrent bulk loading, for example). Always check your database's specific hash index implementation and guarantees.

Dynamic and Extendible Hashing

To address the resize problem, database systems use sophisticated hashing schemes that grow incrementally.

Extendible Hashing:

Extendible hashing uses a directory that doubles in size independently of the bucket pages:

Directory entries point to bucket pages
Multiple directory entries can point to the same bucket (before split)
When a bucket overflows, only that bucket splits
Directory doubles only when needed

This approach:

Avoids full table rehashing
Splits work proportional to bucket size, not table size
Provides more gradual growth

extendible-hashing.txt
Extendible Hashing Example
 
Initial state (global depth = 1, using 1 bit of hash):
┌─────────────┐
│  Directory  │
├─────────────┤
│ 0 → Bucket A│    ┌─────────────────┐
│             │───→│ Bucket A (d=1)  │
│ 1 → Bucket B│    │ [entries...]    │
│             │───→└─────────────────┘
└─────────────┘    ┌─────────────────┐
                   │ Bucket B (d=1)  │
               ───→│ [entries...]    │
                   └─────────────────┘
 
After Bucket A overflows (split):
- Bucket A splits into A' and A''
- Directory depth increases to 2 bits
- Only the overflowing bucket was rehashed!
 
┌─────────────┐
│  Directory  │
├─────────────┤    ┌─────────────────┐
│ 00 → Bucket A'│──→│ Bucket A' (d=2) │
│ 01 → Bucket A''│  │ [entries w/ 00] │
│ 10 → Bucket B│   └─────────────────┘
│ 11 → Bucket B│   ┌─────────────────┐
└─────────────┘ ──→│ Bucket A'' (d=2)│
                   │ [entries w/ 01] │
    Note: 10 and   └─────────────────┘
    11 still point ┌─────────────────┐
    to same bucket!│ Bucket B (d=1)  │
               ───→│ [entries...]    │← Not split, not rehashed
                   └─────────────────┘

Linear Hashing:

Linear hashing provides even smoother growth:

Split one bucket at a time in round-robin order
Use two hash functions: h₀(k) for unsplit buckets, h₁(k) for split buckets
After full round, all buckets are split and h₀ becomes h₁
No directory needed—split point is tracked with a single pointer

Advantages:

Constant-time splits (one bucket at a time)
No directory overhead
Predictable memory usage

Trade-offs:

Slightly more complex lookup logic
May split buckets that don't need splitting

Database Usage:

These techniques are used in:

PostgreSQL hash indexes (modified extendible hashing)
In-memory hash joins and aggregations
Hash-based temporary structures

Internal Hash Tables in Query Processing

Even if you don't use hash indexes explicitly, databases use hash tables internally for: hash joins (building hash table from smaller table), hash aggregation (GROUP BY), detecting duplicates (DISTINCT), and building temporary result sets. Understanding hash table behavior helps you tune these operations.

Hash Index Support Across Databases

Hash index support varies significantly across database systems, reflecting different design philosophies and use cases.

Hash Index Support by Database System
Database	Hash Index Support	Notes
PostgreSQL	✅ Yes (since v10)	Fully WAL-logged, but limited. Use for pure equality on unique columns.
MySQL (InnoDB)	❌ No	InnoDB uses only B+ tree indexes. MEMORY engine supports hash.
MySQL (MEMORY)	✅ Yes	MEMORY engine default. Fast but data is not persistent.
SQL Server	❌ No disk-based	Only for memory-optimized tables (Hekaton).
Oracle	❌ No traditional	Index-organized tables use B-tree. Hash partitioning available.
SQLite	❌ No	B-tree only for indexes.
MongoDB	✅ Yes (hashed)	Hashed indexes for shard keys. Cannot support range queries.
Redis	✅ Inherent	Core data structure is hash table. O(1) key lookups.
DynamoDB	✅ Inherent	Partition key lookup is hash-based. O(1) for key access.

PostgreSQL Hash Indexes:

PostgreSQL provides explicit hash index support:

-- Create hash index
CREATE INDEX idx_users_email_hash 
    ON users USING HASH (email);

-- Only useful for equality
SELECT * FROM users WHERE email = 'alice@example.com';  -- Uses hash
SELECT * FROM users WHERE email LIKE 'alice%';          -- Can't use hash

When to Use (PostgreSQL):

Very high-volume equality lookups
Column has high cardinality (many unique values)
Never need range queries on this column
Benchmarks show measurable improvement over B-tree

MySQL/InnoDB Workaround:

Since InnoDB doesn't support hash indexes, you can simulate them:

-- Add a hash column
ALTER TABLE users ADD COLUMN email_hash BIGINT UNSIGNED;
UPDATE users SET email_hash = CRC32(email);
CREATE INDEX idx_email_hash ON users(email_hash);

-- Query using hash (still need email check for collision handling)
SELECT * FROM users 
WHERE email_hash = CRC32('alice@example.com') 
  AND email = 'alice@example.com';

This gives hash-like performance with B-tree storage.

NoSQL and Hash-Based Systems

Key-value stores (Redis, DynamoDB) are fundamentally hash-based. This is why they have O(1) key lookups but struggle with range queries. DynamoDB requires sort keys for ranges; Redis requires Sorted Sets. The hash vs B-tree trade-off applies across all data systems.

When to Choose Hash Indexes

Given the trade-offs, when should you choose a hash index over a B-tree? The decision should be based on clear, measurable criteria.

Hash Index Ideal Conditions

•100% equality queries: Every query on this column uses = or IN, never >, <, BETWEEN, LIKE, or ORDER BY. This is the fundamental requirement.
•High query volume: The column is queried thousands or millions of times per second. Only at extreme scale does the 2-3x speedup matter.
•High cardinality: Many unique values (like UUIDs, email addresses, API keys). Low cardinality columns don't benefit as much.
•Read-heavy workload: Reads vastly outnumber writes. Hash resize overhead is amortized over many reads.
•Measured improvement: Benchmarks on production-like data show measurable latency or throughput improvement.
•Database supports it well: Your database has mature, crash-safe hash index implementation (PostgreSQL 10+, not older versions).

Real-World Hash Index Use Cases:

1. Session Lookup Tables:

-- Session ID lookup is always exact match
CREATE TABLE sessions (
    session_id VARCHAR(64) PRIMARY KEY,
    user_id INTEGER,
    created_at TIMESTAMP
);
CREATE INDEX idx_session_hash ON sessions USING HASH (session_id);

-- Query pattern: always equality
SELECT * FROM sessions WHERE session_id = 'abc123...';

2. Cache Key Tables:

-- Cache keys are arbitrary strings, looked up by exact match
CREATE TABLE cache (
    cache_key VARCHAR(255) PRIMARY KEY,
    value JSONB,
    expires_at TIMESTAMP
);
CREATE INDEX idx_cache_key_hash ON cache USING HASH (cache_key);

3. UUID/GUID Lookups:

-- UUIDs are random, equality-only, high cardinality
CREATE TABLE documents (
    doc_id UUID PRIMARY KEY,
    content JSONB
);
CREATE INDEX idx_doc_hash ON documents USING HASH (doc_id);

The Safe Default

When in doubt, use B-tree. It handles all query patterns well. Only switch to hash when you have clear evidence that: (1) queries are exclusively equality, (2) B-tree is a measured bottleneck, and (3) hash indexes provide verified improvement on your workload.

Hash Indexes in Distributed Systems

In distributed databases, hashing plays a crucial role beyond indexing—it determines data distribution across nodes.

Hash Partitioning (Sharding):

Distributed databases use hash functions to determine which node stores each record:

node = hash(shard_key) mod num_nodes

This ensures even data distribution across the cluster. Examples:

DynamoDB: Partition key is hashed to determine partition
Cassandra: Partition key hashed using murmur3
MongoDB: Hashed shard key distributes across shards

Consistent Hashing:

Basic modulo hashing has a problem: when you add a node, every record potentially moves. Consistent hashing solves this:

Arrange nodes on a hash ring (0 to 2³²)
Each key hashes to a point on the ring
Key is stored on the first node clockwise from its hash point
Adding a node only affects keys between it and its predecessor

consistent-hashing.txt
Consistent Hashing Ring
 
Hash ring (0 to 360 degrees for simplicity):
 
            0°/360°
              ●
             /  \
        N1 /      \ N4
          ● 45°    ● 315°
         /          \
        /            \
       ● 90°          ● 270°
      N2              (empty)
        \            /
         \          /
          ● 135°   ● 225°
         N3        (empty)
              ●
            180°
 
Keys are hashed and stored on the next clockwise node:
- hash('user_1') = 30° → Stored on N1 (45°)
- hash('user_2') = 100° → Stored on N3 (135°)
- hash('user_3') = 350° → Stored on N1 (45°, wraps around)
 
Adding a new node N5 at 60°:
- Only keys between 45° and 60° move from N2 to N5
- All other data stays in place!
 
Virtual nodes: Each physical node has multiple points on ring
for better distribution.

Hash Indexes vs Hash Partitioning:

These are related but distinct concepts:

Concept	Purpose	Scope
Hash Index	Fast equality lookup within a table	Single node
Hash Partition	Distribute data across cluster nodes	Cluster-wide
Consistent Hashing	Minimize data movement on cluster changes	Cluster topology

Query Routing with Hash Partitioning:

When data is hash-partitioned:

Point queries (WHERE shard_key = X): Route directly to one node. O(1) nodes contacted.
Range queries (WHERE shard_key > X): Must fan out to ALL nodes. O(N) nodes contacted.
Queries on non-shard-key: Must fan out to ALL nodes. O(N) nodes contacted.

This is why hash-partitioned systems (DynamoDB, Cassandra) have excellent single-key performance but struggle with range queries—the limitation mirrors local hash indexes.

Range Partitioning Alternative

When range queries are essential, databases use range partitioning instead of hash partitioning. Data is split by key ranges, enabling efficient range scans within a single partition. The trade-off is potential hotspots if keys are accessed unevenly. Many systems (CockroachDB, TiDB) use range partitioning with automatic splitting.

Summary: Hash Indexes in Perspective

We've explored hash indexes comprehensively, from their theoretical advantages through their practical limitations. Let's consolidate the essential insights:

Key Takeaways

•Hash indexes provide O(1) equality lookups by mapping keys to bucket locations via hash functions—theoretically faster than B-tree's O(log N).
•The fundamental limitation is no range query support—hash functions destroy key ordering, making BETWEEN, >, <, ORDER BY, and prefix LIKE impossible.
•In practice, the speed difference is small—B-trees with good caching achieve 2-4 disk reads even for billion-row tables, often <1ms difference from hash.
•B-tree versatility usually wins—supporting all query patterns with one index beats slightly faster equality at the cost of no ranges.
•Choose hash indexes only when: queries are 100% equality, volume is extreme, cardinality is high, and benchmarks prove improvement.
•Database support varies—PostgreSQL supports hash indexes well (since v10), but InnoDB, SQLite, and many others don't offer them.
•Distributed systems use hashing for partitioning—consistent hashing distributes data across nodes, with similar range query limitations.

What's Next:

With single-column indexes covered (B-tree and hash), we move to a more complex topic in the next page: Composite indexes. You'll learn how to design indexes on multiple columns, understand column order importance, and master the prefix rule that governs composite index usage.

Hash Index Understanding Complete

You now understand when hash indexes shine, when they fail, and how to make informed decisions between hash and B-tree indexes. This understanding extends to distributed systems where hash-based partitioning follows similar principles. Use B-tree as your default; switch to hash only with solid justification.

3 / 5

Loading learning content...

System Design (HLD)Indexes and Query Performance

Indexes and Query Performance

LevelIntermediate

Duration90 mins

TopicIndexes and Query Performance

3 / 5

Hash Indexes

The Promise of O(1) Lookups

What You Will Master

Hash Index Fundamentals

A hash index is built on a hash table data structure. Understanding how hash tables work is essential for understanding hash index behavior.

The Hash Function:

At the core of any hash index is a hash function that maps keys to bucket locations:

hash(key) → bucket_number

Good hash functions have these properties:

Deterministic: Same key always produces same hash
Uniform distribution: Keys spread evenly across buckets
Fast computation: O(1) to compute the hash
Avalanche effect: Small key changes produce very different hashes

Databases typically use cryptographic-strength or near-cryptographic hash functions (like MurmurHash, CityHash, or xxHash) to ensure good distribution.

Bucket Structure:

The hash table consists of an array of buckets (also called slots). Each bucket can store one or more entries. The number of buckets is typically a power of 2 to enable fast modulo operations:

bucket_number = hash(key) mod num_buckets

hash-index-structure.txt
Hash Index Structure
 
Hash function: hash(key) mod 8 (8 buckets)
 
Keys to index: 'alice@email.com', 'bob@email.com', 'carol@email.com'
 
hash('alice@email.com') = 0xA3F2... mod 8 = 5
hash('bob@email.com')   = 0x7B21... mod 8 = 2  
hash('carol@email.com') = 0x2E45... mod 8 = 5  ← Collision with alice!
 
Bucket Array:
┌────────────────────────────────────────────────────────┐
│ Bucket 0 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 1 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 2 │ 'bob@email.com' → row_ptr_42                │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 3 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 4 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 5 │ 'alice@email.com' → row_ptr_17              │
│          │ 'carol@email.com' → row_ptr_89  ← Chained   │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 6 │ (empty)                                     │
├──────────┼─────────────────────────────────────────────┤
│ Bucket 7 │ (empty)                                     │
└──────────┴─────────────────────────────────────────────┘
 
Lookup 'bob@email.com':
1. Compute hash('bob@email.com') mod 8 = 2
2. Go directly to bucket 2
3. Find 'bob@email.com' → return row_ptr_42
4. Total operations: 1 hash + 1 bucket access = O(1)

Collision Handling:

When two different keys hash to the same bucket, we have a collision. Databases handle collisions through chaining or open addressing:

Chaining (most common in databases):

Each bucket stores a linked list of entries
Colliding keys are added to the list
Lookup scans the list after finding the bucket

Open Addressing (less common):

Colliding entries probe for the next empty bucket
Requires more complex deletion handling
Can have better cache performance for sparse tables

Load Factor:

The load factor (entries / buckets) determines collision probability:

Load factor 0.5: ~50% of buckets have entries, collisions are rare
Load factor 1.0: All buckets used, many collisions likely
Load factor 2.0+: Significant chaining, degraded performance

Databases typically resize the hash table to maintain a load factor between 0.5 and 0.75.

The O(1) Caveat

Hash Index Lookup Process

Understanding the exact lookup process reveals why hash indexes are so fast for equality queries.

Step-by-Step Lookup:

Hash Index Lookup Algorithm

•Compute Hash: Calculate hash(search_key) using the index's hash function. This is a CPU operation, typically a few hundred nanoseconds.
•Calculate Bucket: Apply modulo: bucket_number = hash mod num_buckets. This yields the exact bucket location.
•Read Bucket Page: Fetch the page containing the target bucket from disk or buffer pool. This is typically 1 disk I/O if not cached.
•Scan Bucket Entries: Iterate through entries in the bucket, comparing stored keys with the search key. With low collision rates, this is 1-2 comparisons.
•Extract Row Pointer: If found, retrieve the row pointer from the matching entry.
•Access Table Row: Use the row pointer to fetch the actual row data from the heap.

hash-lookup-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Query using hash index
SELECT * FROM users WHERE email = 'alice@example.com';
 
-- If there's a HASH index on email:
 
-- PostgreSQL execution:
-- 1. Calculate hash('alice@example.com')
-- 2. Locate bucket in hash index
-- 3. Read bucket page (1 I/O or cache hit)
-- 4. Find matching entry in bucket
-- 5. Use row pointer to read table row (1 I/O or cache hit)
-- Total: 2 I/O operations (both often cached → <1ms)
 
-- Compare to B-tree:
-- 1. Read root page
-- 2. Read internal page(s)
-- 3. Read leaf page
-- 4. Use row pointer to read table row
-- Total: 3-4 I/O operations (still very fast)

Why O(1) vs O(log N) Doesn't Matter as Much as You'd Think:

Let's compare hash vs B-tree for a 1-billion row table:

Index Type	Index Traversal	I/O Operations
Hash index	1 bucket access	1
B-tree index	log₅₀₀(1B) ≈ 3-4 levels	3-4

The hash index saves 2-3 I/O operations per query. At 100μs per SSD read:

Hash: 100μs
B-tree: 300-400μs

When Cache Matters:

With good caching (buffer pool holds hot pages):

B-tree root and internal nodes: nearly always cached
Only leaf node + heap access might hit disk
Effective B-tree cost: ~1-2 I/O

Hash indexes also benefit from caching, but B-tree's upper levels are reused across all queries, while hash buckets are more dispersed.

The Practical Performance Reality

The Critical Limitation: No Range Queries

Why Hash Indexes Can't Support Ranges:

Hash functions are designed to produce pseudo-random output. Two similar keys will hash to completely different buckets:

hash('alice') → bucket 42
hash('aliceb') → bucket 7
hash('alicea') → bucket 128

There's no relationship between key proximity and bucket location. To find all keys between 'alice' and 'bob', you'd need to scan every bucket—which defeats the purpose of indexing entirely.

Query Pattern Support: Hash vs B-tree
Query Pattern	Hash Index	B-tree Index	Example
Exact equality	✅ O(1)	✅ O(log N)	WHERE id = 42
IN list	✅ O(k)	✅ O(k log N)	WHERE id IN (1,2,3)
Range (BETWEEN)	❌ Full scan	✅ O(log N + k)	WHERE price BETWEEN 10 AND 50
Greater/Less than	❌ Full scan	✅ O(log N + k)	WHERE created_at > '2024-01-01'
ORDER BY	❌ Extra sort	✅ Index order	ORDER BY name ASC
MIN/MAX	❌ Full scan	✅ O(log N)	SELECT MAX(price)
Prefix LIKE	❌ Full scan	✅ Range scan	WHERE name LIKE 'Al%'
Suffix LIKE	❌ Full scan	❌ Full scan	WHERE name LIKE '%son'
GROUP BY	❌ Hash+sort	✅ Often avoids sort	GROUP BY category

The Practical Impact:

Most real-world queries involve more than pure equality:

-- Dashboard query: needs ranges and ordering
SELECT * FROM orders 
WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
ORDER BY created_at DESC
LIMIT 100;

-- Analytics query: needs aggregation with ordering
SELECT status, COUNT(*) 
FROM orders 
GROUP BY status 
ORDER BY COUNT(*) DESC;

-- Search query: needs prefix matching
SELECT * FROM products
WHERE name LIKE 'Apple%';

None of these queries can use a hash index effectively. A single B-tree index on created_at or name handles all of them.

The Versatility Tax:

The Hidden Cost of Hash Indexes

Hash Index Maintenance

Hash index maintenance involves different trade-offs compared to B-trees. Understanding these helps evaluate the total cost of hash index ownership.

Insert Operations:

Hash index inserts are typically O(1):

Compute hash of new key
Locate target bucket
Add entry to bucket chain

However, if the table needs resizing (load factor too high), the cost spikes dramatically:

Resizing (Rehashing):

When the hash table grows past its load factor threshold, it must:

Allocate a new bucket array (typically 2x size)
Rehash every existing entry to new bucket locations
This is O(N) and blocks concurrent operations

This resize operation is expensive and causes latency spikes. Some hash implementations use incremental resizing (rehash a few entries per operation) to amortize this cost.

Hash Index Write Advantages

•Insert is O(1) average case (find bucket, append)
•No tree rebalancing needed
•No page splits to propagate
•Update is O(1) for the index entry
•Delete is O(1) (mark or remove from chain)

Hash Index Write Challenges

•Resize operations are O(N) and disruptive
•Long chains degrade performance
•Poor hash distribution wastes space
•Concurrent resize is complex (PostgreSQL limits this)
•Crash recovery is more complex than B-trees

Space Utilization:

Hash indexes can have significant space overhead:

Empty buckets: At load factor 0.5, half the buckets are empty
Bucket pointers: The bucket array itself consumes space
Collision chains: Long chains waste pointer space

The total space is often similar to or greater than B-trees, despite storing the same information.

Fragmentation:

Hash indexes can fragment when:

Many entries are deleted, leaving empty slots
Chains become unbalanced due to skewed access patterns
The hash function produces poor distribution for the actual data

Unlike B-trees (where REINDEX is well-understood), hash index maintenance is less standardized across databases.

Database-Specific Implementations

Dynamic and Extendible Hashing

To address the resize problem, database systems use sophisticated hashing schemes that grow incrementally.

Extendible Hashing:

Extendible hashing uses a directory that doubles in size independently of the bucket pages:

Directory entries point to bucket pages
Multiple directory entries can point to the same bucket (before split)
When a bucket overflows, only that bucket splits
Directory doubles only when needed

This approach:

Avoids full table rehashing
Splits work proportional to bucket size, not table size
Provides more gradual growth

extendible-hashing.txt
Extendible Hashing Example
 
Initial state (global depth = 1, using 1 bit of hash):
┌─────────────┐
│  Directory  │
├─────────────┤
│ 0 → Bucket A│    ┌─────────────────┐
│             │───→│ Bucket A (d=1)  │
│ 1 → Bucket B│    │ [entries...]    │
│             │───→└─────────────────┘
└─────────────┘    ┌─────────────────┐
                   │ Bucket B (d=1)  │
               ───→│ [entries...]    │
                   └─────────────────┘
 
After Bucket A overflows (split):
- Bucket A splits into A' and A''
- Directory depth increases to 2 bits
- Only the overflowing bucket was rehashed!
 
┌─────────────┐
│  Directory  │
├─────────────┤    ┌─────────────────┐
│ 00 → Bucket A'│──→│ Bucket A' (d=2) │
│ 01 → Bucket A''│  │ [entries w/ 00] │
│ 10 → Bucket B│   └─────────────────┘
│ 11 → Bucket B│   ┌─────────────────┐
└─────────────┘ ──→│ Bucket A'' (d=2)│
                   │ [entries w/ 01] │
    Note: 10 and   └─────────────────┘
    11 still point ┌─────────────────┐
    to same bucket!│ Bucket B (d=1)  │
               ───→│ [entries...]    │← Not split, not rehashed
                   └─────────────────┘

Linear Hashing:

Linear hashing provides even smoother growth:

Split one bucket at a time in round-robin order
Use two hash functions: h₀(k) for unsplit buckets, h₁(k) for split buckets
After full round, all buckets are split and h₀ becomes h₁
No directory needed—split point is tracked with a single pointer

Advantages:

Constant-time splits (one bucket at a time)
No directory overhead
Predictable memory usage

Trade-offs:

Slightly more complex lookup logic
May split buckets that don't need splitting

Database Usage:

These techniques are used in:

PostgreSQL hash indexes (modified extendible hashing)
In-memory hash joins and aggregations
Hash-based temporary structures

Internal Hash Tables in Query Processing

Hash Index Support Across Databases

Hash index support varies significantly across database systems, reflecting different design philosophies and use cases.

Hash Index Support by Database System
Database	Hash Index Support	Notes
PostgreSQL	✅ Yes (since v10)	Fully WAL-logged, but limited. Use for pure equality on unique columns.
MySQL (InnoDB)	❌ No	InnoDB uses only B+ tree indexes. MEMORY engine supports hash.
MySQL (MEMORY)	✅ Yes	MEMORY engine default. Fast but data is not persistent.
SQL Server	❌ No disk-based	Only for memory-optimized tables (Hekaton).
Oracle	❌ No traditional	Index-organized tables use B-tree. Hash partitioning available.
SQLite	❌ No	B-tree only for indexes.
MongoDB	✅ Yes (hashed)	Hashed indexes for shard keys. Cannot support range queries.
Redis	✅ Inherent	Core data structure is hash table. O(1) key lookups.
DynamoDB	✅ Inherent	Partition key lookup is hash-based. O(1) for key access.

PostgreSQL Hash Indexes:

PostgreSQL provides explicit hash index support:

-- Create hash index
CREATE INDEX idx_users_email_hash 
    ON users USING HASH (email);

-- Only useful for equality
SELECT * FROM users WHERE email = 'alice@example.com';  -- Uses hash
SELECT * FROM users WHERE email LIKE 'alice%';          -- Can't use hash

When to Use (PostgreSQL):

Very high-volume equality lookups
Column has high cardinality (many unique values)
Never need range queries on this column
Benchmarks show measurable improvement over B-tree

MySQL/InnoDB Workaround:

Since InnoDB doesn't support hash indexes, you can simulate them:

-- Add a hash column
ALTER TABLE users ADD COLUMN email_hash BIGINT UNSIGNED;
UPDATE users SET email_hash = CRC32(email);
CREATE INDEX idx_email_hash ON users(email_hash);

-- Query using hash (still need email check for collision handling)
SELECT * FROM users 
WHERE email_hash = CRC32('alice@example.com') 
  AND email = 'alice@example.com';

This gives hash-like performance with B-tree storage.

NoSQL and Hash-Based Systems

When to Choose Hash Indexes

Given the trade-offs, when should you choose a hash index over a B-tree? The decision should be based on clear, measurable criteria.

Hash Index Ideal Conditions

•100% equality queries: Every query on this column uses = or IN, never >, <, BETWEEN, LIKE, or ORDER BY. This is the fundamental requirement.
•High query volume: The column is queried thousands or millions of times per second. Only at extreme scale does the 2-3x speedup matter.
•High cardinality: Many unique values (like UUIDs, email addresses, API keys). Low cardinality columns don't benefit as much.
•Read-heavy workload: Reads vastly outnumber writes. Hash resize overhead is amortized over many reads.
•Measured improvement: Benchmarks on production-like data show measurable latency or throughput improvement.
•Database supports it well: Your database has mature, crash-safe hash index implementation (PostgreSQL 10+, not older versions).

Real-World Hash Index Use Cases:

1. Session Lookup Tables:

-- Session ID lookup is always exact match
CREATE TABLE sessions (
    session_id VARCHAR(64) PRIMARY KEY,
    user_id INTEGER,
    created_at TIMESTAMP
);
CREATE INDEX idx_session_hash ON sessions USING HASH (session_id);

-- Query pattern: always equality
SELECT * FROM sessions WHERE session_id = 'abc123...';

2. Cache Key Tables:

-- Cache keys are arbitrary strings, looked up by exact match
CREATE TABLE cache (
    cache_key VARCHAR(255) PRIMARY KEY,
    value JSONB,
    expires_at TIMESTAMP
);
CREATE INDEX idx_cache_key_hash ON cache USING HASH (cache_key);

3. UUID/GUID Lookups:

-- UUIDs are random, equality-only, high cardinality
CREATE TABLE documents (
    doc_id UUID PRIMARY KEY,
    content JSONB
);
CREATE INDEX idx_doc_hash ON documents USING HASH (doc_id);

The Safe Default

Hash Indexes in Distributed Systems

In distributed databases, hashing plays a crucial role beyond indexing—it determines data distribution across nodes.

Hash Partitioning (Sharding):

Distributed databases use hash functions to determine which node stores each record:

node = hash(shard_key) mod num_nodes

This ensures even data distribution across the cluster. Examples:

DynamoDB: Partition key is hashed to determine partition
Cassandra: Partition key hashed using murmur3
MongoDB: Hashed shard key distributes across shards

Consistent Hashing:

Basic modulo hashing has a problem: when you add a node, every record potentially moves. Consistent hashing solves this:

Arrange nodes on a hash ring (0 to 2³²)
Each key hashes to a point on the ring
Key is stored on the first node clockwise from its hash point
Adding a node only affects keys between it and its predecessor

consistent-hashing.txt
Consistent Hashing Ring
 
Hash ring (0 to 360 degrees for simplicity):
 
            0°/360°
              ●
             /  \
        N1 /      \ N4
          ● 45°    ● 315°
         /          \
        /            \
       ● 90°          ● 270°
      N2              (empty)
        \            /
         \          /
          ● 135°   ● 225°
         N3        (empty)
              ●
            180°
 
Keys are hashed and stored on the next clockwise node:
- hash('user_1') = 30° → Stored on N1 (45°)
- hash('user_2') = 100° → Stored on N3 (135°)
- hash('user_3') = 350° → Stored on N1 (45°, wraps around)
 
Adding a new node N5 at 60°:
- Only keys between 45° and 60° move from N2 to N5
- All other data stays in place!
 
Virtual nodes: Each physical node has multiple points on ring
for better distribution.

Hash Indexes vs Hash Partitioning:

These are related but distinct concepts:

Concept	Purpose	Scope
Hash Index	Fast equality lookup within a table	Single node
Hash Partition	Distribute data across cluster nodes	Cluster-wide
Consistent Hashing	Minimize data movement on cluster changes	Cluster topology

Query Routing with Hash Partitioning:

When data is hash-partitioned:

Point queries (WHERE shard_key = X): Route directly to one node. O(1) nodes contacted.
Range queries (WHERE shard_key > X): Must fan out to ALL nodes. O(N) nodes contacted.
Queries on non-shard-key: Must fan out to ALL nodes. O(N) nodes contacted.

This is why hash-partitioned systems (DynamoDB, Cassandra) have excellent single-key performance but struggle with range queries—the limitation mirrors local hash indexes.

Range Partitioning Alternative

Summary: Hash Indexes in Perspective

We've explored hash indexes comprehensively, from their theoretical advantages through their practical limitations. Let's consolidate the essential insights:

Key Takeaways

•Hash indexes provide O(1) equality lookups by mapping keys to bucket locations via hash functions—theoretically faster than B-tree's O(log N).
•The fundamental limitation is no range query support—hash functions destroy key ordering, making BETWEEN, >, <, ORDER BY, and prefix LIKE impossible.
•In practice, the speed difference is small—B-trees with good caching achieve 2-4 disk reads even for billion-row tables, often <1ms difference from hash.
•B-tree versatility usually wins—supporting all query patterns with one index beats slightly faster equality at the cost of no ranges.
•Choose hash indexes only when: queries are 100% equality, volume is extreme, cardinality is high, and benchmarks prove improvement.
•Database support varies—PostgreSQL supports hash indexes well (since v10), but InnoDB, SQLite, and many others don't offer them.
•Distributed systems use hashing for partitioning—consistent hashing distributes data across nodes, with similar range query limitations.

What's Next:

Hash Index Understanding Complete

3 / 5