Loading content...
Imagine you're a search engine processing results from a hundred different index servers, each returning a sorted list of relevant documents. Or you're a database merging sorted runs from an external sort operation. Or you're a logging system combining timestamp-ordered events from dozens of microservices.
The common thread: you have k sorted sequences and need to produce one unified sorted output.
Merging two sorted lists is simple—the classic merge operation from MergeSort runs in O(n) time. But what happens when k grows large? Naively merging pairs iteratively becomes increasingly inefficient. This is where heaps provide an elegant, optimal solution.
By the end of this page, you will understand why the naive pairwise merge approach is suboptimal, master the heap-based k-way merge that achieves O(n log k) time, explore real-world applications from external sorting to distributed systems, and implement solutions for both linked list and array-based variations.
The K-Way Merge Problem:
Given k sorted sequences (lists, arrays, or streams), produce a single sorted sequence containing all elements from all inputs.
Formal Statement:
Example:
Input:
List 1: [1, 4, 5]
List 2: [1, 3, 4]
List 3: [2, 6]
Output: [1, 1, 2, 3, 4, 4, 5, 6]
Throughout this page, N refers to the total number of elements across all k lists (N = n₁ + n₂ + ... + nₖ). This distinction matters for complexity analysis—we want algorithms that scale well with both N and k.
Why This Problem Matters:
The k-way merge is not just a textbook exercise—it's a fundamental operation in:
Before tackling k lists, let's revisit the simpler case: merging exactly two sorted lists. This is the merge step from MergeSort.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
from typing import Optional class ListNode: "Definition for singly-linked list." def __init__(self, val=0, next=None): self.val = val self.next = next def merge_two_lists(l1: Optional[ListNode], l2: Optional[ListNode]) -> Optional[ListNode]: """ Merge two sorted linked lists. Time: O(n + m) where n, m are list lengths Space: O(1) extra (just rewiring pointers) The classic two-pointer merge technique. """ # Dummy head simplifies edge cases dummy = ListNode(-1) current = dummy while l1 and l2: if l1.val <= l2.val: current.next = l1 l1 = l1.next else: current.next = l2 l2 = l2.next current = current.next # Attach remaining elements (one list exhausted) current.next = l1 if l1 else l2 return dummy.next def merge_two_arrays(arr1: list[int], arr2: list[int]) -> list[int]: """ Merge two sorted arrays. Time: O(n + m) Space: O(n + m) for the result array """ result = [] i = j = 0 while i < len(arr1) and j < len(arr2): if arr1[i] <= arr2[j]: result.append(arr1[i]) i += 1 else: result.append(arr2[j]) j += 1 # Append remaining elements result.extend(arr1[i:]) result.extend(arr2[j:]) return result # Exampleprint(merge_two_arrays([1, 4, 5], [1, 3, 4]))# Output: [1, 1, 3, 4, 4, 5]Key Properties of Two-Way Merge:
This is optimal for two lists. But how do we extend to k lists?
Let's analyze two intuitive but suboptimal approaches before introducing the heap solution.
Why is this O(k × N)?
Consider k lists, each with n elements (N = k × n total):
Total work: 2n + 3n + 4n + ... + kn = n(2 + 3 + ... + k) = n × O(k²) = O(k × N)
A better pairwise approach uses divide-and-conquer: pair up all k lists, merge each pair in parallel, repeat until one list remains. This gives O(N log k) time—already much better! But heaps offer an equally efficient solution with different trade-offs.
The heap-based solution uses a fundamentally different approach: instead of merging pairs, we maintain a min-heap of k elements representing the current smallest unprocessed element from each list.
The Key Insight:
At any moment, the global minimum among all remaining elements must be the minimum of the k 'frontier' elements—one from each list. A min-heap lets us find this global minimum in O(1) and update in O(log k).
Algorithm:
Why This Works:
The heap always contains at most k elements (one frontier element per list). Since lists are sorted, the smallest remaining element is always at the frontier. The heap efficiently maintains access to the smallest frontier element.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
import heapqfrom typing import List, Optional class ListNode: def __init__(self, val=0, next=None): self.val = val self.next = next def merge_k_lists(lists: List[Optional[ListNode]]) -> Optional[ListNode]: """ LeetCode 23: Merge k Sorted Lists Merge k sorted linked lists using a min-heap. Time: O(N log k) where N = total elements, k = number of lists Space: O(k) for the heap The heap contains at most k elements (one from each list). Each of N elements is pushed and popped exactly once. """ # Min-heap storing (value, list_index, node) # list_index breaks ties to avoid comparing ListNode objects min_heap = [] # Initialize: add first element from each non-empty list for i, head in enumerate(lists): if head: heapq.heappush(min_heap, (head.val, i, head)) # Dummy head for result list dummy = ListNode(-1) current = dummy while min_heap: # Extract minimum val, list_idx, node = heapq.heappop(min_heap) # Add to result current.next = node current = current.next # If this list has more elements, add the next one if node.next: heapq.heappush(min_heap, (node.next.val, list_idx, node.next)) return dummy.next def merge_k_arrays(arrays: List[List[int]]) -> List[int]: """ Merge k sorted arrays using a min-heap. Time: O(N log k) Space: O(k) for heap + O(N) for result """ result = [] # Heap stores (value, array_index, element_index) min_heap = [] # Initialize with first element from each non-empty array for i, arr in enumerate(arrays): if arr: heapq.heappush(min_heap, (arr[0], i, 0)) while min_heap: val, arr_idx, elem_idx = heapq.heappop(min_heap) result.append(val) # If this array has more elements if elem_idx + 1 < len(arrays[arr_idx]): next_elem = arrays[arr_idx][elem_idx + 1] heapq.heappush(min_heap, (next_elem, arr_idx, elem_idx + 1)) return result # Example usagearrays = [[1, 4, 5], [1, 3, 4], [2, 6]]print(merge_k_arrays(arrays))# Output: [1, 1, 2, 3, 4, 4, 5, 6] # Step-by-step trace:# Initial heap: [(1,0,0), (1,1,0), (2,2,0)]# Extract (1,0,0): result=[1], push (4,0,1)# Extract (1,1,0): result=[1,1], push (3,1,1)# Extract (2,2,0): result=[1,1,2], push (6,2,1)# Extract (3,1,1): result=[1,1,2,3], push (4,1,2)# Extract (4,0,1): result=[1,1,2,3,4], push (5,0,2)# Extract (4,1,2): result=[1,1,2,3,4,4], no more in list 1# Extract (5,0,2): result=[1,1,2,3,4,4,5], no more in list 0# Extract (6,2,1): result=[1,1,2,3,4,4,5,6], no more in list 2# Heap empty, done!Let's rigorously analyze the heap-based k-way merge and compare it to alternatives.
Time Complexity: O(N log k)
Breakdown:
Why the heap size is always ≤ k:
Space Complexity: O(k)
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Sequential Pairwise | O(k × N) | O(N) | Each element touched up to k times |
| Collect and Sort | O(N log N) | O(N) | Ignores pre-sorted structure |
| Divide & Conquer Pairwise | O(N log k) | O(N) | log k merge rounds, N work each |
| Heap-Based | O(N log k) | O(k) | Optimal time, minimal space |
When k is small compared to N, the difference is significant. For 10 million elements across 100 lists: O(N log N) ≈ 230 million ops vs O(N log k) ≈ 67 million ops—a 3.4x improvement. As the number of elements per list grows (N/k increases), the advantage grows.
Is O(N log k) Optimal?
Yes, for comparison-based algorithms! Here's the information-theoretic argument:
Therefore, O(N log k) is optimal for comparison-based k-way merge.
While the heap solution is elegant, there's an equally efficient divide-and-conquer approach worth understanding. This method pairs up lists and merges them, repeating until one list remains.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
from typing import List, Optional class ListNode: def __init__(self, val=0, next=None): self.val = val self.next = next def merge_k_lists_divide_conquer(lists: List[Optional[ListNode]]) -> Optional[ListNode]: """ Merge k sorted lists using divide and conquer. Time: O(N log k) - same as heap approach Space: O(log k) - recursion stack for divide and conquer This approach may have better cache locality than heap-based. """ if not lists: return None def merge_two(l1: Optional[ListNode], l2: Optional[ListNode]) -> Optional[ListNode]: """Standard two-list merge.""" dummy = ListNode(-1) current = dummy while l1 and l2: if l1.val <= l2.val: current.next = l1 l1 = l1.next else: current.next = l2 l2 = l2.next current = current.next current.next = l1 if l1 else l2 return dummy.next def merge_range(start: int, end: int) -> Optional[ListNode]: """Recursively merge lists[start:end+1].""" if start > end: return None if start == end: return lists[start] mid = (start + end) // 2 left = merge_range(start, mid) right = merge_range(mid + 1, end) return merge_two(left, right) return merge_range(0, len(lists) - 1) # Iterative version (avoids recursion)def merge_k_lists_iterative(lists: List[Optional[ListNode]]) -> Optional[ListNode]: """ Iterative divide and conquer - pairs up lists and merges. Round 1: Merge pairs -> k/2 lists Round 2: Merge pairs -> k/4 lists ... Final: 1 list Time: O(N log k) Space: O(1) extra (modifies input list) """ if not lists: return None def merge_two(l1, l2): dummy = ListNode(-1) current = dummy while l1 and l2: if l1.val <= l2.val: current.next = l1 l1 = l1.next else: current.next = l2 l2 = l2.next current = current.next current.next = l1 or l2 return dummy.next # Reduce lists by half each round while len(lists) > 1: merged = [] for i in range(0, len(lists), 2): l1 = lists[i] l2 = lists[i + 1] if i + 1 < len(lists) else None merged.append(merge_two(l1, l2)) lists = merged return lists[0]The k-way merge pattern is fundamental to many critical systems. Understanding these applications helps you recognize when to apply this pattern.
Modern microservice architectures generate logs from dozens or hundreds of services. Each service produces timestamp-ordered log entries. A centralized logging system must merge these streams in timestamp order for querying. This is exactly k-way merge—often operating on potentially infinite streams.
The k-way merge pattern appears in several disguised forms. Recognizing these variations is key to applying the pattern effectively.
| Problem | Input Structure | Heap Contents | Key Insight |
|---|---|---|---|
| Merge K Sorted Lists | k linked lists | (value, list_idx, node) | Standard k-way merge |
| Merge K Sorted Arrays | k arrays | (value, arr_idx, elem_idx) | Track element index |
| K-th Smallest in Matrix | n×n sorted matrix | (value, row, col) | Matrix = n sorted rows |
| Smallest Range Covering K Lists | k lists | (value, list_idx, elem_idx) | Track max too, minimize range |
| Find K Pairs with Smallest Sums | 2 arrays | (sum, i, j) | Virtual k lists of pair sums |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
import heapqfrom typing import List def kth_smallest_in_matrix(matrix: List[List[int]], k: int) -> int: """ LeetCode 378: Kth Smallest Element in a Sorted Matrix Matrix where each row and column is sorted. Find k-th smallest element overall. Approach: View as k-way merge of n sorted rows. Use min-heap with first element of each row. Time: O(k log n) where n = number of rows Space: O(n) for the heap """ n = len(matrix) # Heap: (value, row, col) min_heap = [(matrix[i][0], i, 0) for i in range(n)] heapq.heapify(min_heap) # Extract k times for _ in range(k): val, row, col = heapq.heappop(min_heap) # This is the answer on k-th extraction if _ == k - 1: return val # Push next element from same row if exists if col + 1 < n: heapq.heappush(min_heap, (matrix[row][col + 1], row, col + 1)) return -1 # Should never reach def smallest_range_covering_k_lists(nums: List[List[int]]) -> List[int]: """ LeetCode 632: Smallest Range Covering Elements from K Lists Find smallest range [a, b] that includes at least one number from each of the k lists. Approach: K-way merge while tracking current max. Range = [heap_min, current_max]. Minimize (max - min). Time: O(N log k) Space: O(k) """ # Heap: (value, list_idx, elem_idx) min_heap = [] current_max = float('-inf') # Initialize with first element from each list for i, lst in enumerate(nums): heapq.heappush(min_heap, (lst[0], i, 0)) current_max = max(current_max, lst[0]) best_range = [min_heap[0][0], current_max] while True: min_val, list_idx, elem_idx = heapq.heappop(min_heap) # Current range: [min_val, current_max] if current_max - min_val < best_range[1] - best_range[0]: best_range = [min_val, current_max] # Try to advance in this list if elem_idx + 1 < len(nums[list_idx]): next_val = nums[list_idx][elem_idx + 1] heapq.heappush(min_heap, (next_val, list_idx, elem_idx + 1)) current_max = max(current_max, next_val) else: # One list exhausted - cannot have a valid range anymore break return best_range # Examplesmatrix = [ [1, 5, 9], [10, 11, 13], [12, 13, 15]]print(f"8th smallest: {kth_smallest_in_matrix(matrix, 8)}") # 13 nums = [[4, 10, 15, 24, 26], [0, 9, 12, 20], [5, 18, 22, 30]]print(f"Smallest range: {smallest_range_covering_k_lists(nums)}") # [20, 24]Implementing k-way merge correctly requires attention to several subtle details that can cause bugs.
elem_idx + 1 or node.next? Be consistent with your representation.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
import heapqfrom typing import List, Optionalfrom dataclasses import dataclass, field @dataclass(order=True)class HeapEntry: """ Properly ordered heap entry that avoids comparison issues. order=True makes dataclass generate comparison methods based on field order. 'node' is excluded from comparison. """ value: int list_index: int elem_index: int node: object = field(compare=False) # Excluded from comparisons def merge_k_lists_robust(lists: List[Optional['ListNode']]) -> Optional['ListNode']: """ Robust implementation handling all edge cases. """ # Filter out empty lists upfront non_empty = [(head.val, i, head) for i, head in enumerate(lists) if head] if not non_empty: return None # Build initial heap min_heap = [] for val, idx, node in non_empty: heapq.heappush(min_heap, HeapEntry(val, idx, 0, node)) dummy = ListNode(-1) current = dummy while min_heap: entry = heapq.heappop(min_heap) node = entry.node current.next = node current = current.next # Check if source list has more elements if node.next: heapq.heappush(min_heap, HeapEntry( node.next.val, entry.list_index, entry.elem_index + 1, node.next )) return dummy.next # Alternative: Use tuple with index as tiebreakerdef merge_k_lists_tuple(lists): """Using tuples with guaranteed ordering.""" min_heap = [] for i, head in enumerate(lists): if head: # (value, list_index, node) - list_index breaks ties heapq.heappush(min_heap, (head.val, i, head)) dummy = ListNode(-1) current = dummy while min_heap: val, list_idx, node = heapq.heappop(min_heap) current.next = node current = current.next if node.next: # Same list_idx ensures proper tiebreaking heapq.heappush(min_heap, (node.next.val, list_idx, node.next)) return dummy.nextPython's heapq compares tuples element-by-element. If first elements are equal, it compares second elements. If you store (value, node), and two values are equal, Python tries to compare nodes and crashes. Always include an index: (value, index, node).
We've thoroughly explored the k-way merge pattern—a fundamental technique with broad applications. Let's consolidate the key insights:
What's Next:
The k-way merge and k-th element patterns are powerful, but heaps have even more tricks. In the next page, we'll explore median maintenance—a pattern where we use two heaps together to maintain a running median as data streams in. This technique builds directly on the heap intuition we've developed.
You now understand the k-way merge pattern deeply—from the naive approaches and why they're suboptimal, through the elegant heap solution, to real-world applications and subtle implementation details. This pattern appears constantly in systems programming and distributed computing.