Loading content...
We've established that different data structures have different cost profiles. Now comes the practical application: given a real problem, how do you evaluate multiple viable solutions and choose the best one?
This page presents comprehensive case studies where we solve the same problem using multiple data structures. For each approach, we'll analyze the trade-offs in detail. This process—evaluating alternatives rather than jumping to the first solution—is the hallmark of experienced engineering.
The key insight: Most problems have multiple valid solutions. The "best" solution depends on your specific requirements, constraints, and usage patterns. Learning to reason through these trade-offs is more valuable than memorizing which structure to use for which problem.
Through detailed case studies, you'll develop the skill of trade-off analysis: identifying viable solutions, quantifying their costs and benefits, and making reasoned decisions. This skill applies far beyond data structures—it's fundamental to all engineering.
Problem Description:
You're building a leaderboard for an online game with millions of players. The system must support:
Constraints:
Let's analyze multiple approaches:
Approach 1: Sorted Array
Store all players in an array sorted by score (descending).
# Data structure
leaderboard = [(player_id, score), ...] # Sorted by score descending
# Operations:
def update_score(player_id, new_score):
# Find player: O(log n) binary search or O(n) if by ID
# Remove from current position: O(n) shift
# Find new position: O(log n)
# Insert: O(n) shift
# Total: O(n)
def get_rank(player_id):
# Binary search by player_id: O(n) since not sorted by ID
# Must scan until found: O(n)
# Once found, position = rank
# Total: O(n)
def get_top_n(n):
return leaderboard[:n] # O(n) slice, but just O(N) to return N items
# Effectively O(N) where N is requested
def get_around_rank_k(k):
return leaderboard[k-10:k+10] # O(1) access + O(20) = O(1)
| Operation | Complexity | At 10M Players |
|---|---|---|
| Update score | O(n) | ~10 million operations 😰 |
| Get rank by ID | O(n) | ~10 million operations 😰 |
| Get top N | O(N) | O(N) — acceptable |
| Get around rank K | O(1) | Instant ✓ |
O(n) per score update with thousands of updates per second and 10M players means billions of operations per second. System would collapse immediately. Sorted array is disqualified despite excellent range access.
Summary: Leaderboard Data Structure Comparison
| Requirement | Sorted Array | Hash+Sort | Balanced BST | Skip List |
|---|---|---|---|---|
| Real-time updates | ✗ | ✗ | ✓ | ✓ |
| Real-time ranks | ✗ | ✗ | ✓ | ✓ |
| Implementation complexity | Low | Low | High | Medium |
| Memory overhead | Low | Medium | High | Medium |
Best choice depends on requirements:
Problem Description:
Build an autocomplete system for a search engine. As users type, suggest completions.
Constraints:
Let's analyze multiple approaches:
Approach 1: Array with Linear Filtering
# Data structure
terms = [(term, frequency), ...] # All search terms
def get_suggestions(prefix):
# Filter all terms starting with prefix: O(n)
matches = [t for t in terms if t[0].startswith(prefix)]
# Sort by frequency: O(m log m) where m = matches
matches.sort(key=lambda x: -x[1])
return matches[:10]
# Total: O(n) filtering + O(m log m) sorting
# At 10M terms: 10 million string comparisons per query 😰
O(n) per query with millions of terms and millisecond requirements is impossible. This approach might work for a personal notes app with 100 notes, not a search engine.
Summary: Autocomplete Data Structure Comparison
| Approach | Query Time | Insert Time | Space | Best For |
|---|---|---|---|---|
| Array Filter | O(n) | O(1) or O(n) | O(n) | Toy projects |
| Sorted Array | O(log n + k) | O(n) | O(n) | Small, static dictionaries |
| Trie + Cache | O(L) | O(L) | O(total chars) | Production autocomplete |
| Hash per Prefix | O(1) | O(L × updates) | O(n × avg_len) | Small dictionaries, memory abundant |
Industry solution: Trie with caching, often with additional optimizations like compressed tries (radix trees) and distributed storage for web-scale systems.
Problem Description:
Implement a Least Recently Used (LRU) cache with fixed capacity.
Constraints:
This is a classic interview problem that beautifully illustrates data structure combination.
Approach 1: Array Sorted by Access Time
# Store entries sorted by last access time (most recent at end)
cache = [(key, value, timestamp), ...] # Sorted by timestamp
def get(key):
for i, (k, v, _) in enumerate(cache):
if k == key:
# Found! Move to end (most recent)
entry = cache.pop(i) # O(n) shift
cache.append((k, v, now())) # O(1)
return v
return None # Not found
def put(key, value):
# Check if exists (update)
for i, (k, _, _) in enumerate(cache):
if k == key:
cache.pop(i) # O(n)
cache.append((key, value, now())) # O(1)
return
# New entry
if len(cache) >= capacity:
cache.pop(0) # Remove oldest (front) — O(n)
cache.append((key, value, now()))
Both get and put are O(n) due to linear search and array shifting. The problem requires O(1). This approach is fundamentally unsuitable.
Key Insight: Combining Data Structures
This case study demonstrates a crucial principle: when no single structure meets all requirements, combine them.
| Requirement | Hash Table Alone | DLL Alone | Hash + DLL |
|---|---|---|---|
| O(1) key lookup | ✓ | ✗ (O(n)) | ✓ |
| O(1) access LRU | ✗ (O(n)) | ✓ | ✓ |
| O(1) move to MRU | ✗ (no order) | ✓ | ✓ |
| O(1) evict LRU | ✗ (O(n) find) | ✓ | ✓ |
Neither structure alone suffices. Together, they provide all required O(1) operations.
Many optimal solutions combine structures: Hash + Heap for top-k frequent elements, Hash + Tree for ordered dictionary with O(1) lookup, Array + Hash for O(1) random access with O(1) contains check. When interviewing, if one structure doesn't work, ask: 'What if I combined two?'
The case studies above demonstrate a systematic approach to data structure selection. Let's codify this into a reusable framework.
Step 1: Enumerate All Required Operations
List every operation the system must support, not just the obvious ones. For each:
Step 2: Identify Dominant Operations
Which operations are:
These are your dominant operations. The solution MUST optimize for these.
Step 3: Generate Candidate Structures
For each dominant operation, identify structures that optimize for it:
Step 4: Evaluate Each Candidate
For each candidate structure, analyze:
Step 5: Consider Combinations
If no single structure meets all requirements:
Step 6: Make a Reasoned Decision
Document your choice with rationale:
Senior engineers don't jump to solutions. They analyze requirements, enumerate options, evaluate trade-offs, and document decisions. Even if the 'obvious' solution is correct, the analysis process catches edge cases and builds confidence that the choice is right for the specific context.
Even experienced developers fall into these traps. Being aware of them helps you avoid costly mistakes.
For each pitfall, the antidote is the same: analyze the specific problem deeply. Generic advice ('always use X') fails. Understanding your exact requirements, constraints, and usage patterns leads to correct decisions.
We've explored multiple approaches to three significant problems, demonstrating that data structure selection is not about memorizing mappings but about systematic trade-off analysis.
The Professional Mindset:
When faced with a new problem:
This systematic approach works for any problem. It transforms data structure selection from guesswork into engineering.
You've completed Module 2: Why We Need Different Data Structures. You now understand not just that different structures have different costs, but how to systematically analyze problems, evaluate trade-offs, and select optimal solutions. This foundation will serve you throughout your study of specific data structures and algorithms in the chapters ahead.