Loading content...
Every experienced engineer has witnessed it: a complex algorithmic solution reduced to simplicity by choosing the right data structure. What seems like a challenging problem requiring intricate logic becomes trivial when data is organized correctly. Conversely, the wrong data structure can make even simple problems frustratingly difficult.
This page develops your ability to map problem requirements to data structure choices—a skill that separates struggling developers who write complicated code from expert engineers who write elegant solutions. By the end, you'll have a systematic framework for data structure selection that applies to virtually any problem you encounter.
You will learn to analyze problems in terms of the operations they require, match those operations to data structures that perform them efficiently, and recognize the common patterns that suggest specific data structure choices.
The fundamental principle of data structure selection is deceptively simple: choose based on what you need to do, not what data looks like.
Beginners often select data structures based on superficial similarity to the input. "The input is a list of numbers, so I'll use an array." This approach ignores the critical question: what operations does the algorithm need to perform on this data?
The operation-first approach:
This reversal—from "what is the data?" to "what must I do with the data?"—is transformative.
Operations inside loops (especially nested loops) dominate your time complexity. An O(n) operation performed n times creates O(n²) overall complexity. Focus your data structure optimization on operations in the innermost loops—these are your critical path operations.
Example: The Two-Sum Problem
Consider finding two numbers in an array that sum to a target.
The operation-focused engineer recognizes: I need O(1) lookup, performed O(n) times. That's a hash table.
The result: O(n) instead of O(n²). Same problem, same data, completely different performance—because of data structure choice.
1234567891011121314151617181920212223242526272829
// Step 1: Identify required operations// - Iterate through array once ✓ (any structure works)// - For each element: check if complement exists (CRITICAL - inside loop)// - For each element: retrieve complement's index (CRITICAL - inside loop) // Step 2: Match operations to data structures// - Check existence: Hash Set O(1), Array O(n), BST O(log n)// - Retrieve by value: Hash Map O(1), Array O(n), BST O(log n) // Step 3: Select based on critical path// Hash Map wins: O(1) for both critical operations function twoSum(nums: number[], target: number): [number, number] { const seen = new Map<number, number>(); // value -> index for (let i = 0; i < nums.length; i++) { const complement = target - nums[i]; // O(1) existence check (critical path) if (seen.has(complement)) { // O(1) retrieval (critical path) return [seen.get(complement)!, i]; } seen.set(nums[i], i); // O(1) insert } return [-1, -1]; // No solution found}Here is the definitive reference for mapping operations to optimal data structures. For each operation type, we identify which structures excel and when to use each.
| Operation | Best Structure | Complexity | Alternative | Notes |
|---|---|---|---|---|
| Access by index | Array | O(1) | — | Arrays are unbeatable for indexed access |
| Lookup by key | Hash Map | O(1) avg | BST O(log n) | Hash when keys aren't ordered |
| Existence check | Hash Set | O(1) avg | BST O(log n) | Sets when you only need membership |
| Find min/max | Heap | O(1) | BST O(log n) | Heap when you only need extreme |
| Find k-th element | Order Statistic Tree | O(log n) | Quickselect O(n) | Use quickselect for one-time query |
| Range lookup | BST / Sorted Array | O(log n + k) | — | BST for dynamic, sorted array for static |
| Operation | Best Structure | Complexity | Alternative | Notes |
|---|---|---|---|---|
| Insert at end | Dynamic Array | O(1) amortized | Linked List O(1) | Array has better cache locality |
| Insert at front | Deque / Linked List | O(1) | Array O(n) | Avoid array for frequent front inserts |
| Insert in sorted order | BST / Skip List | O(log n) | Sorted Array O(n) | Never use array for frequent sorted inserts |
| Delete by value | Hash Set | O(1) avg | BST O(log n) | Hash when order doesn't matter |
| Delete min/max | Heap | O(log n) | BST O(log n) | Heap is simpler when you only need extreme |
| Delete arbitrary | Linked List + Hash | O(1) | BST O(log n) | LRU cache pattern uses this |
| Operation | Best Structure | Complexity | Alternative | Notes |
|---|---|---|---|---|
| Maintain sorted order | BST / TreeSet | O(log n) ops | Sorted Array O(n) insert | BST for dynamic, array for static |
| Range sum query | Fenwick Tree / Segment Tree | O(log n) | Prefix Sum O(1) | Prefix sum only works without updates |
| Range min/max query | Segment Tree / Sparse Table | O(log n) / O(1) | — | Sparse table for static data |
| Prefix operations | Prefix Array | O(1) query, O(n) build | — | Classic for cumulative queries on static data |
| Sliding window aggregate | Monotonic Deque | O(1) amortized | Heap O(log n) | Deque when window slides one element |
Hash structures (HashMap, HashSet) offer O(1) average operations but don't maintain order. Tree structures (BST, TreeMap) have O(log n) operations but support ordered operations like predecessor, successor, and range queries. Choose hash when order doesn't matter; choose tree when you need ordering or range operations.
Beyond individual operations, certain problem patterns strongly suggest specific data structures. Learn these mappings for instant recognition:
12345678910111213141516171819202122232425262728293031323334353637
// Pattern: For each element, find the next greater element to the right// Signal: "next greater" → Monotonic Stack // Why monotonic stack?// - We need to track elements waiting to find their next greater// - When we find a greater element, it resolves all waiting smaller elements// - Stack naturally handles this "waiting" with LIFO order function nextGreaterElements(nums: number[]): number[] { const n = nums.length; const result = new Array(n).fill(-1); const stack: number[] = []; // Stack of indices (not values) for (let i = 0; i < n; i++) { // While current element is greater than stack top // Current element is the "next greater" for all smaller waiting elements while (stack.length > 0 && nums[i] > nums[stack[stack.length - 1]]) { const waitingIdx = stack.pop()!; result[waitingIdx] = nums[i]; } // Current element now waits for its next greater stack.push(i); } // Elements remaining in stack have no next greater (result stays -1) return result;} // Example: [2, 1, 2, 4, 3]// Stack evolution:// i=0: push 0, stack=[0]// i=1: 1 < 2, push 1, stack=[0,1]// i=2: 2 > 1, pop 1 (result[1]=2), push 2, stack=[0,2]// i=3: 4 > 2, pop 2 (result[2]=4), 4 > 2, pop 0 (result[0]=4), push 3, stack=[3]// i=4: 3 < 4, push 4, stack=[3,4]// Result: [4, 2, 4, -1, -1]Hash-based structures are the workhorses of algorithm design, providing O(1) average-case operations for the most common needs. Understanding their variants and when to apply each is essential.
Hash tables have O(n) worst-case for operations when many collisions occur. In practice, this is rare with good hash functions. However, if an adversary controls input, they might craft collision-heavy inputs. Also remember: hash tables don't maintain insertion order (though LinkedHashMap variants do) and don't support ordered operations like 'find all elements between A and B'.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
// Problem 1: Check if array has duplicates// Need: Membership test// Choice: Hash Setfunction containsDuplicate(nums: number[]): boolean { const seen = new Set<number>(); for (const num of nums) { if (seen.has(num)) return true; seen.add(num); } return false;} // Problem 2: Group anagrams together// Need: Group by normalized form// Choice: Hash Map (key → list)function groupAnagrams(strs: string[]): string[][] { const groups = new Map<string, string[]>(); for (const str of strs) { // Normalize: sort characters to create canonical form const key = str.split('').sort().join(''); if (!groups.has(key)) { groups.set(key, []); } groups.get(key)!.push(str); } return Array.from(groups.values());} // Problem 3: Find majority element (appears > n/2 times)// Need: Count frequencies// Choice: Hash Map as counter (though Boyer-Moore is O(1) space)function majorityElement(nums: number[]): number { const counts = new Map<number, number>(); const threshold = nums.length / 2; for (const num of nums) { const newCount = (counts.get(num) || 0) + 1; if (newCount > threshold) return num; counts.set(num, newCount); } return -1; // Should never reach if majority element exists}Heaps (implemented as priority queues in most languages) are the go-to structure when you need efficient access to extreme values—the minimum or maximum element. Their unique property is O(1) access to the extreme with O(log n) updates.
| Operation | Time Complexity | Notes |
|---|---|---|
| Find min/max | O(1) | The defining advantage of heaps |
| Insert | O(log n) | Bubble up to maintain heap property |
| Extract min/max | O(log n) | Remove root, bubble down replacement |
| Build heap | O(n) | Bottom-up heapify is faster than n inserts |
| Arbitrary delete | O(n) or O(log n) | O(n) to find, O(log n) with index tracking |
| Decrease/increase key | O(log n) | Requires index tracking for efficient update |
To find the k-th largest element: use a min-heap of size k. For each new element, if it's larger than heap minimum, replace the minimum. After processing all elements, the heap minimum is the k-th largest. This is O(n log k), better than sorting O(n log n) when k is small.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
// Problem: Merge k sorted linked lists into one sorted list// Pattern: Need global minimum from k sources repeatedly// Choice: Min-Heap interface ListNode { val: number; next: ListNode | null;} // Min-heap implementation (or use a library)class MinHeap<T> { private heap: T[] = []; constructor(private compare: (a: T, b: T) => number) {} push(item: T) { this.heap.push(item); this.bubbleUp(this.heap.length - 1); } pop(): T | undefined { if (this.heap.length === 0) return undefined; const top = this.heap[0]; const last = this.heap.pop()!; if (this.heap.length > 0) { this.heap[0] = last; this.bubbleDown(0); } return top; } get size() { return this.heap.length; } private bubbleUp(i: number) { while (i > 0) { const parent = Math.floor((i - 1) / 2); if (this.compare(this.heap[i], this.heap[parent]) >= 0) break; [this.heap[i], this.heap[parent]] = [this.heap[parent], this.heap[i]]; i = parent; } } private bubbleDown(i: number) { while (true) { let smallest = i; const left = 2 * i + 1, right = 2 * i + 2; if (left < this.heap.length && this.compare(this.heap[left], this.heap[smallest]) < 0) { smallest = left; } if (right < this.heap.length && this.compare(this.heap[right], this.heap[smallest]) < 0) { smallest = right; } if (smallest === i) break; [this.heap[i], this.heap[smallest]] = [this.heap[smallest], this.heap[i]]; i = smallest; } }} function mergeKLists(lists: (ListNode | null)[]): ListNode | null { const heap = new MinHeap<ListNode>((a, b) => a.val - b.val); // Initialize heap with first node from each list for (const list of lists) { if (list) heap.push(list); } const dummy = { val: 0, next: null }; let current = dummy; while (heap.size > 0) { const smallest = heap.pop()!; current.next = smallest; current = smallest; // Push next node from same list if (smallest.next) { heap.push(smallest.next); } } return dummy.next;}Stacks and queues enforce specific access orders that match many natural problem structures. Recognizing when a problem has LIFO (Last-In-First-Out) or FIFO (First-In-First-Out) semantics immediately suggests these structures.
The Monotonic Stack Pattern:
A monotonic stack maintains elements in sorted order (either increasing or decreasing). It's the key to solving problems about finding the next/previous greater/smaller element efficiently.
1234567891011121314151617181920212223242526272829303132333435
// Problem: Find largest rectangle that can be formed in a histogram// Pattern: For each bar, we need to know how far left and right it extends// (until a shorter bar is encountered)// Choice: Monotonic increasing stack function largestRectangleArea(heights: number[]): number { const n = heights.length; const stack: number[] = []; // Stack of indices, heights are increasing let maxArea = 0; // Process each bar plus a sentinel at the end for (let i = 0; i <= n; i++) { // Use 0 as sentinel to flush remaining bars const currentHeight = i < n ? heights[i] : 0; // Pop bars that are taller than current // Each popped bar forms a rectangle with current as right boundary while (stack.length > 0 && currentHeight < heights[stack[stack.length - 1]]) { const height = heights[stack.pop()!]; // Width: from previous stack element to current position const width = stack.length === 0 ? i : i - stack[stack.length - 1] - 1; maxArea = Math.max(maxArea, height * width); } stack.push(i); } return maxArea;} // Example: heights = [2, 1, 5, 6, 2, 3]// At i=4 (height 2): pop 6 and 5, form rectangles// Pop 6: width = 4-3-1 = 0? No, width = 4-2-1 = 1, area = 6// Pop 5: height 5, left bound is stack top (1), width = 4-1-1 = 2, area = 10// Continue... Final answer: 10A deque (double-ended queue) supports O(1) operations at both ends. It's essential for the sliding window maximum problem: maintain a monotonic decreasing deque of indices; the front always holds the maximum in the current window. Elements are removed from front when they leave the window, and from back when a larger element enters.
Trees model hierarchical data and enable O(log n) operations on ordered data. Different tree variants serve different purposes—choosing the right one depends on what operations you need beyond basic ordered storage.
| Tree Type | Primary Use Case | Key Operations | When to Choose |
|---|---|---|---|
| Binary Search Tree | Ordered storage with O(log n) ops | Insert, delete, search, predecessor/successor | When you need ordered operations on dynamic data |
| AVL / Red-Black Tree | Self-balancing BST | Same as BST with guaranteed balance | When worst-case O(log n) is required |
| B-Tree / B+ Tree | Disk-based storage | Optimized for block reads | Database indices, file systems |
| Trie (Prefix Tree) | String prefix operations | Prefix search, autocomplete | Dictionary, autocomplete, IP routing |
| Segment Tree | Range queries with updates | Range sum/min/max, point updates | When you need both range queries and updates |
| Fenwick Tree (BIT) | Prefix sums with updates | Prefix sum, point update | Simpler alternative to segment tree for sums |
| Suffix Tree/Array | String pattern matching | Find all occurrences, LCP queries | Multiple pattern searches, string algorithms |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
// Pattern: Need to efficiently find/suggest words by prefix// Choice: Trie (Prefix Tree) class TrieNode { children = new Map<string, TrieNode>(); isEndOfWord = false; word: string | null = null; // Store complete word at end nodes} class Trie { private root = new TrieNode(); insert(word: string): void { let node = this.root; for (const char of word) { if (!node.children.has(char)) { node.children.set(char, new TrieNode()); } node = node.children.get(char)!; } node.isEndOfWord = true; node.word = word; } search(word: string): boolean { const node = this.findNode(word); return node !== null && node.isEndOfWord; } startsWith(prefix: string): boolean { return this.findNode(prefix) !== null; } // Autocomplete: find all words with given prefix autocomplete(prefix: string, limit: number = 10): string[] { const node = this.findNode(prefix); if (!node) return []; const results: string[] = []; this.collectWords(node, results, limit); return results; } private findNode(prefix: string): TrieNode | null { let node = this.root; for (const char of prefix) { if (!node.children.has(char)) return null; node = node.children.get(char)!; } return node; } private collectWords(node: TrieNode, results: string[], limit: number): void { if (results.length >= limit) return; if (node.isEndOfWord && node.word) { results.push(node.word); } for (const child of node.children.values()) { this.collectWords(child, results, limit); } }} // Usage: Search engine autocompleteconst trie = new Trie();trie.insert("algorithm");trie.insert("algebra");trie.insert("alphabetical");console.log(trie.autocomplete("alg")); // ["algorithm", "algebra"]Graphs require careful consideration of representation—the choice between adjacency list and adjacency matrix dramatically affects both space and time complexity for different operations.
| Operation | Adjacency List | Adjacency Matrix | Winner |
|---|---|---|---|
| Space | O(V + E) | O(V²) | List (unless dense) |
| Add edge | O(1) | O(1) | Tie |
| Remove edge | O(E) or O(degree) | O(1) | Matrix |
| Check if edge exists | O(degree) | O(1) | Matrix |
| Get all neighbors | O(degree) | O(V) | List |
| Iterate all edges | O(E) | O(V²) | List |
Use adjacency list for sparse graphs (E << V²), which is most real-world graphs. Use adjacency matrix for dense graphs (E ≈ V²), or when you need O(1) edge existence checks. A useful heuristic: if the graph represents relationships (social networks, web links), it's usually sparse. If it represents quantitative relationships (distances between all city pairs), it might be dense.
Specialized graph structures:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
// Adjacency List (most common for algorithms)class GraphAdjList { private adjacency = new Map<number, number[]>(); addEdge(u: number, v: number, directed = false): void { if (!this.adjacency.has(u)) this.adjacency.set(u, []); this.adjacency.get(u)!.push(v); if (!directed) { if (!this.adjacency.has(v)) this.adjacency.set(v, []); this.adjacency.get(v)!.push(u); } } getNeighbors(v: number): number[] { return this.adjacency.get(v) || []; }} // Adjacency List with weightsclass WeightedGraph { private adjacency = new Map<number, Array<[number, number]>>(); addEdge(u: number, v: number, weight: number, directed = false): void { if (!this.adjacency.has(u)) this.adjacency.set(u, []); this.adjacency.get(u)!.push([v, weight]); if (!directed) { if (!this.adjacency.has(v)) this.adjacency.set(v, []); this.adjacency.get(v)!.push([u, weight]); } } getNeighbors(v: number): Array<[number, number]> { return this.adjacency.get(v) || []; }} // Building from edge list (common input format)function buildGraph(n: number, edges: number[][]): Map<number, number[]> { const graph = new Map<number, number[]>(); for (let i = 0; i < n; i++) { graph.set(i, []); } for (const [u, v] of edges) { graph.get(u)!.push(v); graph.get(v)!.push(u); // Remove for directed graph } return graph;}Many elegant solutions combine multiple data structures to achieve properties that no single structure provides. Learning these combinations opens new solution possibilities.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
// Problem: Implement a set with O(1) insert, delete, and getRandom// Challenge: No single structure provides all three in O(1)// Solution: Hash Map + Array composite class RandomizedSet { private map = new Map<number, number>(); // value -> index in array private array: number[] = []; insert(val: number): boolean { if (this.map.has(val)) return false; // Add to end of array, track index in map this.map.set(val, this.array.length); this.array.push(val); return true; } remove(val: number): boolean { if (!this.map.has(val)) return false; const idx = this.map.get(val)!; const lastVal = this.array[this.array.length - 1]; // Swap with last element (O(1) removal from array end) this.array[idx] = lastVal; this.map.set(lastVal, idx); // Remove last element this.array.pop(); this.map.delete(val); return true; } getRandom(): number { // O(1) random access from array const randomIdx = Math.floor(Math.random() * this.array.length); return this.array[randomIdx]; }} // Key insight: Array gives O(1) random access, but O(n) delete by value// Map gives O(1) delete by value, but no random access// Combined: Map tracks where each value is in array// Delete: swap target with last, then pop (both O(1))When no standard structure meets all requirements, consider: What operations do I need? Which structures excel at each? Can I combine them, using one to compensate for another's weakness? This design thinking is crucial for system design interviews and real-world engineering.
We've developed a comprehensive framework for selecting data structures based on operational requirements. This skill—perhaps more than any other—separates efficient solutions from inefficient ones.
What's next:
Now that we can map problems to data structures, the next page explores mapping problems to algorithms—the techniques and paradigms that transform data structures into solutions.
You now have a systematic approach to data structure selection. Practice analyzing problems operation-first, and you'll find that efficient solutions emerge naturally from the right structural choices.