When Hashing Fails - Learning Module

Loading content...

0/276

Ordered Data Requirements — When to Use BST Instead

The Ordering Blind Spot

Hash tables are extraordinary. O(1) average-time insert, lookup, and delete. Constant-time membership testing. Blazing fast key-value storage. With performance like this, why would you ever use anything else?

The answer reveals a fundamental limitation that no amount of clever hashing can overcome: hash tables destroy ordering information.

The very mechanism that makes hash tables fast—transforming keys into pseudo-random bucket indices—makes it impossible to efficiently answer questions like:

"What is the smallest key greater than X?"
"Give me all keys between A and B."
"What is the 100th largest element?"
"List all keys in sorted order."

These operations require ordering, and hash tables fundamentally cannot provide it. For such requirements, balanced binary search trees (BSTs) aren't just an alternative—they're the correct choice.

What You Will Learn

By the end of this page, you will understand exactly which operations hash tables cannot efficiently support, why BSTs excel at ordered operations, how to recognize when your problem requires ordering, and how to make the right data structure choice for your specific requirements.

The Fundamental Limitation of Hash Tables

To understand why hash tables cannot support ordered operations, we need to revisit how they work at a fundamental level.

Hash Tables Destroy Order by Design:

A hash function is designed to spread keys uniformly across buckets, regardless of their natural ordering. Consider keys 1, 2, 3, 4, 5. In a sorted array, they'd be stored in positions 0, 1, 2, 3, 4. In a hash table with 10 buckets:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
NATURAL ORDER: 1 < 2 < 3 < 4 < 5
 
AFTER HASHING (example hash function):
═══════════════════════════════════════════════════════════════════
 
Key  │ hash(key)  │ Bucket Index (hash % 10)
─────┼────────────┼───────────────────────────
  1  │ 7839462    │ 2
  2  │ 3284751    │ 1
  3  │ 9571638    │ 8
  4  │ 1426395    │ 5
  5  │ 6847293    │ 3
 
Hash Table Layout:
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │
├────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
│    │  2 │  1 │  5 │    │  4 │    │    │  3 │    │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
 
PHYSICAL ORDER IN TABLE: 2, 1, 5, 4, 3
 
There is NO relationship between bucket position and key order!
To find keys in sorted order, you must:
1. Collect all keys: O(n)
2. Sort them: O(n log n)
3. Total: O(n log n) — much worse than BST's O(n) in-order traversal

The Core Issue: No Spatial Locality

In a hash table, keys that are "close" in value (like 100 and 101) can end up in completely different buckets. There's no way to find "nearby" keys without examining all keys.

Contrast this with a BST:

Keys less than X are in the left subtree
Keys greater than X are in the right subtree
The structure itself encodes ordering information

Operations That Require Order:

Ordered Operations: Hash Table vs BST
Operation	Hash Table	Balanced BST
Find minimum	O(n) — scan all elements	O(log n) — leftmost node
Find maximum	O(n) — scan all elements	O(log n) — rightmost node
Find predecessor (next smaller)	O(n) — scan all elements	O(log n) — tree traversal
Find successor (next larger)	O(n) — scan all elements	O(log n) — tree traversal
Range query [a, b]	O(n) — scan all elements	O(log n + k) — k = result size
Sorted iteration	O(n log n) — collect + sort	O(n) — in-order traversal
Rank query (k-th element)	O(n log n) — sort first	O(log n) — with augmentation

The O(n) Trap

When you need ordered operations on hash table data, you're forced to scan all elements—O(n). This completely negates the O(1) advantage hash tables provide for point lookups. If your workload is dominated by ordered operations, a hash table is actively harmful to performance.

When BSTs Outperform Hash Tables

Let's examine specific use cases where balanced BSTs are the clearly superior choice, even though their O(log n) operations are theoretically slower than hash tables' O(1).

Use Case 1: Price Books in Trading Systems

Stock exchanges maintain "order books" that track buy and sell orders at various prices. Critical operations include:

Find best bid (highest buy price) → Needs maximum
Find best ask (lowest sell price) → Needs minimum
Match orders at a price level → Needs exact lookup
Find orders in price range → Needs range query

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
TRADING ORDER BOOK STRUCTURE:
═══════════════════════════════════════════════════════════════════
 
BUY ORDERS (bids):           SELL ORDERS (asks):
├── $150.25: 1000 shares     ├── $150.50: 500 shares  ← Best ask (min)
├── $150.20: 2500 shares     ├── $150.55: 750 shares
├── $150.15: 800 shares      ├── $150.60: 1200 shares
├── $150.10: 3000 shares     └── $150.75: 2000 shares
└── $150.00: 5000 shares
       ↑ Best bid (max)
 
TYPICAL OPERATIONS:
─────────────────────────────────────────────────────────────────────
 
1. GET BEST BID (highest buy price):
   Hash Table: Scan all bids → O(n)
   BST:        Find rightmost node → O(log n)
 
2. GET BEST ASK (lowest sell price):  
   Hash Table: Scan all asks → O(n)
   BST:        Find leftmost node → O(log n)
 
3. MATCH ORDERS (when bid ≥ ask):
   Need to frequently access min/max → BST wins decisively
 
4. PRICE IMPROVEMENT SEARCH (find orders within $0.05 of target):
   Hash Table: Scan all, filter by price → O(n)
   BST:        Range query [target-0.05, target+0.05] → O(log n + k)
 
REAL-WORLD IMPACT:
─────────────────────────────────────────────────────────────────────
Exchanges process millions of orders per second.
BST operations: millions × O(log n) = millions × ~20 comparisons
Hash operations: millions × O(n) = millions × thousands of comparisons
 
BST is 100-1000x faster for this workload!

Use Case 2: Leaderboards and Rankings

Gaming leaderboards, performance dashboards, and ranked lists require:

Get top 10 players → Needs k largest elements
Get player's rank → Needs count of larger elements
Get players near a specific rank → Needs range access

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# HASH TABLE APPROACH (poor for leaderboards)
class HashLeaderboard:
    def __init__(self):
        self.scores = {}  # player_id -> score
    
    def update_score(self, player_id, score):
        self.scores[player_id] = score  # O(1) ✓
    
    def get_score(self, player_id):
        return self.scores.get(player_id)  # O(1) ✓
    
    def get_top_k(self, k):
        # Must sort ALL scores to find top k!
        sorted_scores = sorted(self.scores.items(), 
                               key=lambda x: x[1], 
                               reverse=True)
        return sorted_scores[:k]  # O(n log n) ✗
    
    def get_rank(self, player_id):
        player_score = self.scores.get(player_id)
        if player_score is None:
            return None
        # Must count ALL players with higher score!
        rank = sum(1 for s in self.scores.values() 
                   if s > player_score) + 1
        return rank  # O(n) ✗
 
 
# BST APPROACH (designed for leaderboards)
# Using a balanced BST with augmented nodes
class BSTLeaderboard:
    def __init__(self):
        self.tree = BalancedBST()  # e.g., Red-Black Tree
        self.player_nodes = {}     # player_id -> tree node
    
    def update_score(self, player_id, new_score):
        if player_id in self.player_nodes:
            self.tree.delete(self.player_nodes[player_id])
        node = self.tree.insert(new_score, player_id)
        self.player_nodes[player_id] = node  # O(log n)
    
    def get_top_k(self, k):
        # In-order traversal (reverse) yields sorted order
        return self.tree.get_largest_k(k)  # O(log n + k) ✓
    
    def get_rank(self, player_id):
        node = self.player_nodes.get(player_id)
        if node is None:
            return None
        # With size augmentation, rank is computable via tree structure
        return self.tree.get_rank(node)  # O(log n) ✓
 
 
# PERFORMANCE COMPARISON (1 million players, k=10):
# ─────────────────────────────────────────────────────────────────
# get_top_k(10):
#   Hash: Sort 1M elements         → ~13 million comparisons
#   BST:  Navigate to rightmost    → ~20 comparisons + 10 traversal steps
#   
# get_rank(player_id):
#   Hash: Count larger scores      → ~1 million comparisons
#   BST:  Tree traversal           → ~20 comparisons

Use Case 3: Time-Series Data and Event Logs

Systems that store timestamped events often need:

Get most recent event → Needs maximum timestamp
Get events in time range [start, end] → Needs range query
Get event right before timestamp T → Needs predecessor

Real-World Confirmation

Databases use B-trees (a generalized BST) for indexed columns precisely because queries need ordering: WHERE date BETWEEN x AND y, ORDER BY timestamp, MIN/MAX aggregations. Hash indexes exist but are rarely the default because so many queries require order.

Range Queries — The Decisive Advantage

Range queries are perhaps the clearest case where BSTs dominate. The operation "find all elements between A and B" appears constantly in real applications.

The Range Query Problem:

Given a data structure with n elements, find all elements x where low ≤ x ≤ high, where k is the number of results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
RANGE QUERY: Find all x where 25 ≤ x ≤ 75
═══════════════════════════════════════════════════════════════════
 
DATA: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
EXPECTED RESULT: [30, 40, 50, 60, 70] (k=5 elements)
 
 
HASH TABLE APPROACH:
─────────────────────────────────────────────────────────────────────
 
Hash Table (no ordering):
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 60 │    │ 100│ 30 │    │ 80 │ 20 │    │ 40 │ 70 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
  ↑                 ↑              ↑         ↑   ↑
  |                 |              |         |   |
  check             check          check     check check
  25≤60≤75?✓        25≤30≤75?✓     25≤20≤75?✗ ..  25≤70≤75?✓
 
Algorithm:
  for each element e in table:
      if low ≤ e ≤ high:
          add to result
          
Time: O(n) — must check EVERY element
For n=1,000,000, k=100: must examine 1,000,000 to find 100
 
 
BST APPROACH:
─────────────────────────────────────────────────────────────────────
 
Balanced BST:
                    [50]
                   /    \
               [30]      [70]
              /    \    /    \
           [20]   [40] [60]  [80]
           /               \
        [10]               [90]
                              \
                             [100]
 
Algorithm:
  1. Find first element ≥ 25:
     Navigate: 50→30→20→(20<25)→30 ✓
     Comparisons: O(log n) = 4
     
  2. In-order traverse until > 75:
     30→40→50→60→70→(80>75)→stop
     Comparisons: O(k) = 5
     
Time: O(log n + k)
For n=1,000,000, k=100: examine ~20 + 100 = 120 elements
 
SPEEDUP: 1,000,000 / 120 ≈ 8,000×

The O(log n + k) Guarantee:

The BST range query complexity deserves scrutiny:

O(log n): Time to find the starting point (navigate from root to first qualifying element)
O(k): Time to collect the k qualifying elements (in-order traversal doesn't revisit nodes)

This is optimal—you can't do better than O(k) because you must at least output k elements. The O(log n) overhead for finding where to start is negligible for most workloads.

Contrast with Hash Table:

A hash table must check every element because qualifying elements could be in any bucket. There's no structure to exploit—the hash function destroyed all ordering information.

Range Query Performance (n = 1,000,000 elements)
Result Size (k)	Hash Table (O(n))	BST (O(log n + k))	Speedup
k = 10	1,000,000 ops	30 ops	33,000×
k = 100	1,000,000 ops	120 ops	8,300×
k = 1,000	1,000,000 ops	1,020 ops	980×
k = 10,000	1,000,000 ops	10,020 ops	100×
k = 100,000	1,000,000 ops	100,020 ops	10×
k = 500,000	1,000,000 ops	500,020 ops	2×

The Smaller k, The Bigger the Win

BST advantage is greatest when k << n (result set is small relative to total data). This is exactly the common case! Most range queries return a tiny fraction of total data. Finding this week's orders in a database of years of orders? k is tiny compared to n.

Predecessor and Successor Queries

Another operation hash tables fundamentally cannot support efficiently is finding the predecessor (largest element smaller than X) or successor (smallest element larger than X).

Real-World Applications:

Auto-complete: Find suggestions starting with a prefix (successor of "pre" is first word ≥ "pre")
Price matching: Find best price below a limit (predecessor of limit)
Scheduling: Find next available slot (successor of current time)
IP routing: Find longest prefix match (requires predecessor/successor)
Pagination: "Show me the next 10 items after this one"

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# HASH TABLE: Predecessor requires scanning ALL elements
def hash_predecessor(hash_table, target):
    """
    Find largest element smaller than target.
    With a hash table, must check every element.
    """
    predecessor = None
    for key in hash_table:
        if key < target:
            if predecessor is None or key > predecessor:
                predecessor = key
    return predecessor  # O(n) — must scan everything
 
# Usage: Find highest price < $100
# prices = {99.99, 149.99, 79.99, 199.99, 49.99, 129.99}
# hash_predecessor(prices, 100) 
#   → Checks all 6 prices, returns 99.99
 
 
# BST: Predecessor is O(log n) via tree navigation
class BSTNode:
    def __init__(self, key, left=None, right=None, parent=None):
        self.key = key
        self.left = left
        self.right = right
        self.parent = parent
 
def bst_predecessor(root, target):
    """
    Find largest element smaller than target in O(log n).
    """
    predecessor = None
    node = root
    
    while node is not None:
        if node.key < target:
            # This node qualifies; look for larger in right subtree
            predecessor = node.key
            node = node.right
        else:
            # Node too large; look in left subtree
            node = node.left
    
    return predecessor  # O(log n) — follows single path
 
 
# VISUALIZATION OF BST PREDECESSOR SEARCH:
# ═══════════════════════════════════════════════════════════════════
# 
# Find predecessor of 75 in BST:
#
#                     [50] ← 50 < 75, save as candidate, go right
#                    /    \
#                [30]      [70] ← 70 < 75, save as candidate, go right
#               /    \    /    \
#            [20]   [40] [60]  [80] ← 80 ≥ 75, go left
#                                   ↓
#                               [null] ← done!
#
# Predecessor = 70 (last saved candidate)
# Comparisons: 3 (not 6)

The Floor and Ceiling Operations:

Related to predecessor/successor are floor (largest element ≤ X) and ceiling (smallest element ≥ X):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
DATA: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
 
FLOOR(45):   Largest element ≤ 45  →  40 ✓
CEILING(45): Smallest element ≥ 45 →  50 ✓
 
FLOOR(50):   Largest element ≤ 50  →  50 ✓ (exact match)
CEILING(50): Smallest element ≥ 50 →  50 ✓ (exact match)
 
FLOOR(5):    Largest element ≤ 5   →  None (no element ≤ 5)
CEILING(105): Smallest element ≥ 105 → None (no element ≥ 105)
 
 
USE CASE: Price matching with constraints
───────────────────────────────────────────────────────────────────
User budget: $75
Available prices: [29.99, 49.99, 69.99, 89.99, 109.99]
 
FLOOR(75) = 69.99 → "Best option within your budget"
CEILING(75) = 89.99 → "Cheapest option above your budget"
 
Hash table: O(n) to find each
BST: O(log n) for each — crucial when prices change frequently

The Navigation Advantage

BST operations like predecessor, successor, floor, and ceiling all work by navigating the tree structure. Each comparison eliminates half the remaining candidates. This is impossible with hash tables because the structure provides no navigational information—elements are scattered into buckets with no relationship to their values.

Sorted Iteration

A common requirement is iterating through elements in sorted order. This arises in:

Displaying sorted lists to users
Report generation
Data export
Merge operations
Aggregation queries

The Iteration Complexity Difference:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
REQUIREMENT: Iterate through all elements in sorted order
 
HASH TABLE APPROACH:
═══════════════════════════════════════════════════════════════════
 
Step 1: Collect all elements into a list
        Time: O(n)
 
Step 2: Sort the list
        Time: O(n log n)
 
Step 3: Iterate through sorted list
        Time: O(n)
 
Total: O(n log n) — dominated by sorting step
 
 
BST APPROACH (In-Order Traversal):
═══════════════════════════════════════════════════════════════════
 
                    [50]
                   /    \
               [30]      [70]
              /    \    /    \
           [20]   [40] [60]  [80]
 
In-order traversal: Left → Root → Right
 
Visit order: 20 → 30 → 40 → 50 → 60 → 70 → 80
 
def inorder(node):
    if node is None:
        return
    inorder(node.left)   # Visit left subtree
    yield node.key       # Process current node
    inorder(node.right)  # Visit right subtree
 
Each node is visited exactly once.
Total: O(n)
 
 
SPACE COMPARISON:
═══════════════════════════════════════════════════════════════════
 
Hash Table sorted iteration:
- Must allocate O(n) additional space for the sorted copy
- Cannot iterate "in place"
 
BST in-order traversal:
- O(h) stack space for recursion (h = tree height)
- For balanced tree: O(log n) space
- Can be made O(1) with Morris traversal (modifies tree temporarily)

Incremental Sorted Access:

Beyond one-time iteration, consider scenarios where you need sorted access incrementally:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# HASH TABLE: No good option for incremental sorted access
class HashTableSortedIterator:
    def __init__(self, hash_table):
        # Must sort EVERYTHING upfront even if we only need a few elements
        self.sorted_keys = sorted(hash_table.keys())  # O(n log n)
        self.index = 0
    
    def has_next(self):
        return self.index < len(self.sorted_keys)
    
    def next(self):
        key = self.sorted_keys[self.index]
        self.index += 1
        return key  # O(1) per call, but O(n log n) setup
 
 
# BST: True incremental sorted access
class BSTSortedIterator:
    def __init__(self, root):
        # Only O(log n) setup: go to minimum
        self.stack = []
        self._push_left(root)  # O(log n)
    
    def _push_left(self, node):
        while node:
            self.stack.append(node)
            node = node.left
    
    def has_next(self):
        return len(self.stack) > 0
    
    def next(self):
        node = self.stack.pop()
        self._push_left(node.right)
        return node.key  # Amortized O(1) per call, O(log n) setup
 
 
# USAGE PATTERN: "Get first 10 sorted elements"
# ───────────────────────────────────────────────────────────────────
 
# Hash table: Pay O(n log n) upfront, then O(1) per element
# Total for 10 elements: O(n log n) + O(10) = O(n log n)
 
# BST: Pay O(log n) setup, then amortized O(1) per element
# Total for 10 elements: O(log n) + O(10) = O(log n)
 
# For n = 1,000,000 and k = 10:
# Hash: ~20 million operations
# BST:  ~30 operations
# 
# The BST is almost a MILLION times faster!

Pagination Anti-Pattern

A common mistake is using hash tables for data that needs pagination ('show next 20 items'). Each page request sorts all data to find the right 20 items: O(n log n) per page! With a BST, you navigate directly to the starting point and collect 20 items: O(log n + 20) per page.

Decision Framework: Hash Table vs BST

With a clear understanding of when each data structure excels, let's formalize the decision process.

The Operation Mix Question:

The right choice depends on your operation mix—which operations dominate your workload?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
START: What operations does your application need?
═══════════════════════════════════════════════════════════════════
 
        ┌─────────────────────────────────────────────────────────┐
        │              PRIMARY OPERATION ANALYSIS                 │
        └────────────────────────────┬────────────────────────────┘
                                     │
                ┌────────────────────┴────────────────────┐
                ▼                                         ▼
        ┌───────────────────┐                   ┌───────────────────┐
        │ Need ordering?    │                   │ Only need exact   │
        │ - Range queries   │                   │ match lookup?     │
        │ - Min/Max         │                   │ - Contains key?   │
        │ - Sorted access   │                   │ - Get by key      │
        │ - Predecessor/    │                   │ - Insert/delete   │
        │   Successor       │                   │   by key          │
        │ - k-th element    │                   │ - Count distinct  │
        └────────┬──────────┘                   └────────┬──────────┘
                 │                                       │
                 ▼                                       ▼
        ┌────────────────────┐              ┌────────────────────────┐
        │   Use BST          │              │   Use Hash Table       │
        │   (TreeMap/TreeSet)│              │   (HashMap/HashSet)    │
        │                    │              │                        │
        │   O(log n) all ops │              │   O(1) avg all ops     │
        │   O(log n + k)     │              │   O(n) ordered ops     │
        │   range queries    │              │                        │
        │   O(n) sorted      │              │   O(n log n) sorted    │
        │   iteration        │              │   iteration            │
        └────────────────────┘              └────────────────────────┘
                 │                                       │
                 │      ┌────────────────────────┐       │
                 └─────►│  MIXED WORKLOAD?       │◄──────┘
                        │  Both ordered AND      │
                        │  point operations      │
                        └───────────┬────────────┘
                                    │
                                    ▼
                        ┌────────────────────────┐
                        │ CONSIDER:              │
                        │                        │
                        │ • Use both structures  │
                        │ • BST if ordering wins │
                        │ • SkipList as middle   │
                        │   ground               │
                        │ • Database indexes     │
                        │   (they face this too) │
                        └────────────────────────┘

Quick Reference Table:

When to Use Each Data Structure
Use Case	Best Choice	Reason
User session lookup	Hash Table	Point lookup only, O(1) wins
Cache implementation	Hash Table	Fast lookup by key
Configuration storage	Hash Table	Key-value access pattern
Leaderboard/ranking	BST	Needs sorted access, rank queries
Time-series queries	BST	Range queries by timestamp
Order book (trading)	BST	Min/max, range queries
Database index	B-tree (BST family)	Range scans, sorted access
Spell checker	Either (depends)	Hash for existence, BST for suggestions
Priority queue	Heap or BST	Need min/max extraction
Unique ID set	Hash Table	Only existence checks

When in Doubt, Profile

If you're unsure, prototype both and measure with realistic data. The 'slower' O(log n) BST operations are still fast in absolute terms—often microseconds. For many applications, the difference between O(1) and O(log n) is imperceptible, but the difference between O(log n) and O(n) for ordered operations is dramatic.

Language-Specific Ordered Containers

Modern languages provide ordered containers alongside hash-based ones. Knowing which to use is essential.

Java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import java.util.*;
 
// HASH-BASED: O(1) operations, no ordering
HashMap<String, Integer> hashMap = new HashMap<>();
HashSet<String> hashSet = new HashSet<>();
 
// TREE-BASED: O(log n) operations, full ordering
TreeMap<String, Integer> treeMap = new TreeMap<>();
TreeSet<String> treeSet = new TreeSet<>();
 
// TreeMap-specific ordered operations:
treeMap.put("apple", 1);
treeMap.put("banana", 2);
treeMap.put("cherry", 3);
treeMap.put("date", 4);
 
treeMap.firstKey();           // → "apple" (minimum)
treeMap.lastKey();            // → "date" (maximum)
treeMap.lowerKey("cherry");   // → "banana" (predecessor)
treeMap.higherKey("banana");  // → "cherry" (successor)
treeMap.floorKey("cat");      // → "cherry" (floor)
treeMap.ceilingKey("cat");    // → "cherry" (ceiling)
 
// Range view (subMap):
SortedMap<String, Integer> range = 
    treeMap.subMap("banana", "date");  // ["banana", "cherry"]
// Returns a VIEW, not a copy — O(log n) to create!
 
// NavigableMap extends SortedMap with more methods:
NavigableMap<String, Integer> nav = treeMap;
nav.descendingMap();          // Reverse-order view
nav.headMap("cherry", true);  // All keys ≤ "cherry"
nav.tailMap("banana", true);  // All keys ≥ "banana"

Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# HASH-BASED (built-in): O(1) operations
hash_dict = {}
hash_set = set()
 
# Python has no built-in sorted container!
# Use the 'sortedcontainers' package (pip install sortedcontainers)
 
from sortedcontainers import SortedDict, SortedSet, SortedList
 
# SORTED CONTAINERS: O(log n) operations with ordering
sorted_dict = SortedDict()
sorted_set = SortedSet()
sorted_list = SortedList()
 
# SortedDict operations:
sorted_dict['apple'] = 1
sorted_dict['banana'] = 2
sorted_dict['cherry'] = 3
sorted_dict['date'] = 4
 
sorted_dict.peekitem(0)      # → ('apple', 1) - minimum
sorted_dict.peekitem(-1)     # → ('date', 4) - maximum
 
# Index-based access (O(log n)):
sorted_dict.peekitem(2)      # → ('cherry', 3) - 3rd element
 
# Range operations via slicing:
list(sorted_dict.irange('banana', 'date'))  # → ['banana', 'cherry', 'date']
 
# SortedList specific operations:
sl = SortedList([30, 10, 50, 20, 40])
sl.bisect_left(25)           # → 2 (index where 25 would go)
sl.bisect_right(30)          # → 3 (index after existing 30s)
sl[2]                        # → 30 (3rd smallest)
 
# IMPORTANT: sortedcontainers uses different implementation
# than traditional BST, but provides similar O(log n) guarantees

C++:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <unordered_map>
#include <unordered_set>
#include <map>
#include <set>
 
// HASH-BASED: O(1) average operations
std::unordered_map<std::string, int> hash_map;
std::unordered_set<std::string> hash_set;
 
// TREE-BASED (Red-Black Tree): O(log n) operations with ordering
std::map<std::string, int> tree_map;
std::set<std::string> tree_set;
 
// std::map ordered operations:
tree_map["apple"] = 1;
tree_map["banana"] = 2;
tree_map["cherry"] = 3;
tree_map["date"] = 4;
 
tree_map.begin()->first;           // "apple" (minimum)
tree_map.rbegin()->first;          // "date" (maximum)
 
// lower_bound: iterator to first element ≥ key
auto it = tree_map.lower_bound("cat");  // points to "cherry"
 
// upper_bound: iterator to first element > key
auto it2 = tree_map.upper_bound("banana");  // points to "cherry"
 
// Range iteration:
for (auto it = tree_map.lower_bound("banana"); 
     it != tree_map.upper_bound("cherry"); 
     ++it) {
    // Iterates: banana, cherry
}
 
// equal_range: pair of (lower_bound, upper_bound)
auto [lo, hi] = tree_map.equal_range("banana");

Language Container Summary
Language	Hash-Based	Ordered (BST)
Java	HashMap, HashSet	TreeMap, TreeSet
Python	dict, set	sortedcontainers.SortedDict/Set
C++	unordered_map, unordered_set	map, set
JavaScript	Map, Set, Object	No built-in (use libraries)
Go	map	No built-in (use btree package)
Rust	HashMap, HashSet	BTreeMap, BTreeSet

Implementation Details Vary

While we use 'BST' as shorthand, actual implementations vary. Java uses Red-Black trees, C++ typically uses Red-Black trees, and some libraries use B-trees or skip lists. The interface and complexity guarantees are similar, but cache performance and constant factors differ.

Summary: Know When Order Matters

Hash tables are powerful, but they cannot do everything. When your problem involves ordering, they are fundamentally the wrong tool.

Key Takeaways

•Hash tables destroy ordering — By design, hashing scatters keys to achieve uniform distribution, eliminating any ordering information.
•Ordered operations become O(n) — Finding min/max, predecessors, successors, or range queries all require scanning every element.
•BSTs preserve ordering — The tree structure encodes ordering, enabling O(log n) ordered operations.
•Range queries show the biggest difference — BST achieves O(log n + k) vs hash table's O(n), often thousands of times faster.
•Use case determines the right choice — Point lookups favor hash tables; ordered operations favor BSTs.
•Modern languages provide both — Use TreeMap/TreeSet, sortedcontainers, std::map, or BTreeMap when ordering matters.
•O(log n) is often fast enough — Don't over-optimize for O(1) if you need ordering; O(log n) is still very fast.

Use Hash Table When

•Only exact-match lookups
•Existence checking
•Counting occurrences
•Grouping/indexing by key
•Deduplication

Use BST When

•Range queries needed
•Min/max extraction
•Sorted iteration
•Predecessor/successor
•Ranking/k-th element

What's Next:

We've covered security vulnerabilities (collision attacks), quality issues (poor hash functions), and capability limitations (ordered operations). The final consideration is practical: space overhead. Hash tables use more memory than their contents strictly require, and in some contexts, this overhead is unacceptable.

Page Complete

You now understand when hash tables are the wrong choice due to ordering requirements. This knowledge prevents the common mistake of forcing hash tables where BSTs belong, resulting in O(n) operations that should be O(log n). Next, we'll examine the space overhead of hash tables and when it becomes problematic.