Loading content...
Throughout this module, we've examined hash and tree indexes from multiple angles: fundamental architecture, point query performance, range query capabilities, and space utilization. Now we synthesize this knowledge into a practical decision framework that you can apply to real-world index selection challenges.
The goal isn't to memorize rules, but to develop the analytical capability to evaluate any indexing scenario systematically. By the end of this page, you'll possess a repeatable methodology for making confident, well-reasoned index type decisions.
By the end of this page, you will possess: a systematic workload analysis methodology, clear decision criteria for hash vs tree selection, awareness of common anti-patterns and their consequences, real-world case studies illustrating decision processes, and confidence to make and defend index type choices.
The index selection decision can be approached as a structured evaluation process. Here's the framework we'll develop:
Step 1: Workload Characterization
Step 2: Query Pattern Analysis
Step 3: Performance Priority Assessment
Step 4: Operational Considerations
Step 5: Decision and Documentation
When in doubt, choose B+tree. It's the safer default that handles the widest range of workloads. Only choose hash indexes when you have strong evidence that your workload falls into the narrow category where hash excels AND will continue to do so.
Rigorous workload characterization is the foundation of sound index selection. Begin by gathering quantitative data about your access patterns.
Query Type Distribution
Classify queries that will use the proposed index:
| Query Type | Example | Hash Suitability | B+Tree Suitability |
|---|---|---|---|
| Point lookup (=) | WHERE id = 123 | ✓ Excellent | ✓ Very Good |
| Bounded range | WHERE x BETWEEN a AND b | ✗ Cannot use | ✓ Excellent |
| Unbounded range | WHERE x > a | ✗ Cannot use | ✓ Excellent |
| Prefix match | WHERE name LIKE 'Smi%' | ✗ Cannot use | ✓ Excellent |
| MIN/MAX | SELECT MIN(x) | ✗ Full scan | ✓ Excellent |
| ORDER BY | ORDER BY created_at | ✗ External sort | ✓ Excellent |
| IN list | WHERE id IN (1,2,3) | ✓ Multiple lookups | ✓ Multiple lookups |
| NOT EQUAL | WHERE status != 'done' | ✗ Full scan | ✗ Usually full scan |
Critical Question: What percentage of queries on this column are pure equality lookups?
123456789101112131415161718192021
-- PostgreSQL: Analyze query patterns from pg_stat_statementsSELECT query, calls, total_time / calls as avg_time_ms, CASE WHEN query ~ 'WHERE.*user_id\s*=' AND query !~ 'user_id\s*[<>]' THEN 'equality' WHEN query ~ 'WHERE.*user_id\s*BETWEEN' OR query ~ 'WHERE.*user_id\s*[<>]' THEN 'range' ELSE 'other' END as query_typeFROM pg_stat_statementsWHERE query LIKE '%user_id%'ORDER BY calls DESCLIMIT 20; -- Result interpretation:-- If 100% 'equality' → Hash may be viable-- If ANY 'range' queries → B+tree requiredDon't just analyze current queries—anticipate future needs. A hash index that works today becomes a liability when new features require range queries. Consult with product and business stakeholders about upcoming requirements.
Hash indexes are appropriate only when all of the following conditions are met. A single 'No' answer means B+tree is the better choice.
= value)? No BETWEEN, <, >, <=, >=, LIKE 'prefix%'?This isn't a scoring system—it's a checklist. If ANY answer is 'No' or 'Maybe', the conservative and usually optimal choice is B+tree. Hash indexes are a specialized tool for specialized situations.
Checklist Application Examples
Example 1: User Session Lookup Table
WHERE session_id = ? (100% equality) ✓Verdict: Hash is potentially viable—verify database support.
Example 2: Order ID Lookup
WHERE order_id = ? (currently 100%) ✓Verdict: B+tree—even the possibility of ordering needs disqualifies hash.
The following decision tree encapsulates the selection logic in a visual, step-by-step format. Follow from top to bottom.
┌─────────────────────────────────────┐
│ Do ANY queries need range/order? │
│ (BETWEEN, <, >, ORDER BY, MIN/MAX) │
└─────────────────┬───────────────────┘
│
┌───────────Yes─────────┼─────────No────────┐
▼ │ ▼
┌─────────────────┐ │ ┌─────────────────────┐
│ USE B+TREE │ │ │ Is key distribution │
│ (Only option) │ │ │ uniform/random? │
└─────────────────┘ │ └──────────┬──────────┘
│ │
│ ┌───Yes────┼───No─────┐
│ ▼ │ ▼
│ ┌──────────┐ │ ┌─────────────┐
│ │ Is data │ │ │ USE B+TREE │
│ │ in-memory│ │ │ (Skew risks)│
│ │ or SSD? │ │ └─────────────┘
│ └────┬─────┘ │
│ │ │
│ Yes │ No │
│ ▼ │ ▼ │
│ ┌────┴─────┐ │
│ │ Database │ │
│ │ supports │ │
│ │ hash? │ │
│ └────┬─────┘ │
│ │ │
│ Yes │ No │
│ ▼ │ ▼ │
│ ┌────┴────────────┐
│ │ Hash viable! │
│ │ But B+tree also │
│ │ works well here │
│ └─────────────────┘
Notice that every path eventually leads to B+tree being at least acceptable. The decision tree doesn't identify scenarios where hash is required—only scenarios where it's a viable alternative. This reflects the reality that B+tree is the universal solvent of indexing.
| Scenario | Recommended Index | Confidence |
|---|---|---|
| Any range query requirements | B+tree | High (100%) |
| Pure equality on skewed data | B+tree | High (95%) |
| Pure equality, uniform, disk-based | B+tree | High (90%) |
| Pure equality, uniform, in-memory | Hash or B+tree | Medium (both viable) |
| Unknown or uncertain requirements | B+tree | High (safe default) |
Understanding common mistakes helps avoid costly missteps. Here are anti-patterns we've observed in production systems.
These anti-patterns don't just cause performance issues—they create operational emergencies. A hash index that worked fine for months suddenly becoming a bottleneck is harder to debug and more disruptive to fix than starting with the right index type.
Let's walk through detailed case studies showing the decision framework in action.
Case Study 1: Redis-Like Key-Value Cache Layer
Context: Building an in-memory caching layer as a PostgreSQL extension for session data.
Analysis:
GET session_123)Decision: Hash table — This is exactly the scenario hash excels at. In-memory operation eliminates I/O concerns, pure equality workload, uniform keys. This is what Redis does internally.
Outcome: O(1) lookups achieved, sub-microsecond latency, perfect for caching use case.
Case Study 2: E-Commerce Order Lookup
Context: Primary index on order_id column in orders table.
Analysis:
WHERE order_id = ?)WHERE customer_id = ? ORDER BY order_date)WHERE order_date BETWEEN ...)Decision: B+tree — Even though most lookups are equality, the ordering and range requirements are non-negotiable. Sequential IDs would create hash hot spots anyway.
Outcome: Slightly slower point lookups (still <1ms), but all query patterns supported efficiently.
Case Study 3: IOT Sensor Data Ingestion
Context: High-velocity sensor readings with sensor_id + timestamp keys.
Analysis:
sensor_id = X AND timestamp BETWEEN ...)Decision: B+tree on (sensor_id, timestamp) — Time-range queries are essential for analysis. Composite B+tree handles both patterns.
Alternative considered: Partitioned by time with hash on sensor_id within partition — overly complex for marginal gain.
Outcome: Point and range queries efficient; range queries on time windows critical for dashboards.
Notice how even in Case Study 1, which favored hash, it was for an in-memory, specialized structure—not a standard database index. For general-purpose database indexing, B+tree wins the vast majority of cases.
Different databases handle hash indexes differently. Here's practical guidance for major systems.
PostgreSQL
1234567891011121314151617181920
-- PostgreSQL hash index support-- Pre-v10: NOT recommended (no WAL logging - crash unsafe)-- v10+: WAL-logged, but still limited optimization -- Creating a hash indexCREATE INDEX idx_session_id_hash ON sessions USING hash (session_id); -- When hash might help (rare):-- - Very high equality lookup rate-- - In-memory working set (fits in shared_buffers)-- - PostgreSQL 10+ -- Usually better: Just use B-tree (default)CREATE INDEX idx_session_id ON sessions (session_id); -- Check if hash index is being usedEXPLAIN (ANALYZE, BUFFERS) SELECT * FROM sessions WHERE session_id = 'abc123'; -- PostgreSQL optimizer may still choose seq scan over hash index!MySQL/InnoDB
123456789101112131415161718
-- MySQL/InnoDB: B+tree only for standard tables-- Hash indexes only available for MEMORY storage engine -- Standard InnoDB table: B+tree automaticCREATE TABLE orders ( order_id BIGINT PRIMARY KEY, -- Clustered B+tree customer_id BIGINT, INDEX idx_customer (customer_id) -- Secondary B+tree); -- MEMORY table with hash (not persistent!)CREATE TABLE sessions ( session_id VARCHAR(64) PRIMARY KEY) ENGINE = MEMORY;-- Default for MEMORY is HASH, but data lost on restart -- Practical MySQL recommendation: -- Use InnoDB with B+tree - it's highly optimized and well-testedOracle
123456789101112131415161718192021
-- Oracle: Hash clusters and hash-partitioned indexes available -- Hash cluster (tables organized by hash)CREATE CLUSTER sessions_cluster (session_id VARCHAR2(64)) HASHKEYS 10000 SIZE 200; CREATE TABLE sessions ( session_id VARCHAR2(64) PRIMARY KEY, data CLOB) CLUSTER sessions_cluster (session_id); -- Hash-partitioned index (for partitioning, not lookups)CREATE INDEX idx_orders_hash_part ON orders (order_id) GLOBAL PARTITION BY HASH (order_id) PARTITIONS 16; -- Oracle recommendation:-- B-tree indexes for general use-- Hash clusters only for specific, validated use cases-- Hash partitioning for partition management, not performanceEvery major database has invested heavily in B-tree optimization—it's their default for good reason. Hash index support varies from partial (PostgreSQL) to nonexistent for general use (MySQL/InnoDB). When in doubt, trust the database's default: B-tree.
This module has provided a comprehensive comparison of hash and tree indexes. Let's consolidate the core insights into actionable wisdom.
| Dimension | Hash Index | B+Tree Index | Winner |
|---|---|---|---|
| Point queries | O(1) | O(log n) | Hash (slightly) |
| Range queries | Cannot support | O(log n + k) | B+Tree (decisively) |
| Ordering | No support | Natural | B+Tree |
| Space overhead | 5-15% | 10-20% | Hash (slightly) |
| Memory efficiency | Poor (uniform) | Excellent (hierarchical) | B+Tree |
| Flexibility | Very limited | Highly flexible | B+Tree |
| Database support | Variable/limited | Universal | B+Tree |
| Overall verdict | B+Tree for 95%+ of cases |
You now possess the analytical framework to make informed hash vs tree index decisions. Remember: B+tree is the safe default; hash is a specialized optimization for narrow use cases. When you do choose hash, document your reasoning—future maintainers will thank you.
Final Thoughts
The hash versus tree index decision exemplifies a broader engineering principle: flexibility usually beats raw performance for specialized operations. The theoretical O(1) advantage of hashing rarely justifies losing range query capability, ordered access, and prefix matching in real-world systems.
B+trees aren't just 'good enough'—they're excellent general-purpose structures refined over decades. Trust that excellence unless you have compelling, validated evidence for a specialized alternative.
Congratulations on completing the Hash vs Tree Indexes module! You've mastered: fundamental architectural differences, point and range query performance analysis, space utilization trade-offs, and a practical decision framework. Apply this knowledge to make confident, well-reasoned index selection decisions in your database systems.