Loading learning content...
In a dataset of a million user transactions, which one ranks exactly at the 99th percentile? Among billions of sensor readings, what's the 10th smallest outlier? Finding not the extreme, but a specific rank within data is one of the most fundamental and frequently encountered problems in computing.
This is the order statistics problem—and more specifically, finding the k-th largest or k-th smallest element. Unlike finding the minimum or maximum (which heaps solve trivially), finding an arbitrary order statistic requires careful algorithmic thinking.
Why not just sort the data? You could—but at O(n log n), sorting is overkill when you only need one specific rank. And what if the data is streaming in continuously? Sorting becomes impossible. Heaps offer an elegant, efficient solution that works for both static and streaming data, achieving O(n log k) time—a dramatic improvement when k is much smaller than n.
By the end of this page, you will understand why heaps are uniquely suited for k-th element problems, master the counterintuitive technique of using a min-heap to find the k-th largest (and vice versa), analyze the time and space complexity trade-offs, and recognize this pattern in disguised problem variations.
Before diving into heap-based solutions, let's precisely define the problem and understand why naive approaches fall short.
The K-th Largest Element Problem: Given an unsorted array of n elements, find the element that would be at position k if the array were sorted in descending order. Equivalently, this is the element larger than exactly (n - k) other elements.
The K-th Smallest Element Problem: Symmetrically, find the element at position k if sorted in ascending order—larger than exactly (k - 1) elements.
Example:
Given array [3, 2, 1, 5, 6, 4] and k = 2:
5 (only 6 is larger)2 (only 1 is smaller)Different problems use different conventions. The '1st largest' typically means the maximum, while in 0-indexed terms, it would be the element at index 0 after sorting descending. Always clarify this in interviews. Throughout this page, we use 1-indexed k (k=1 means the extreme element).
Why This Problem Matters:
The k-th element problem isn't just an academic exercise—it appears constantly in real systems:
SELECT ... ORDER BY ... LIMIT kLet's systematically analyze the obvious approaches before understanding why heaps offer a superior solution.
The breakthrough comes from realizing we don't need to track all n elements—we only need to maintain the 'boundary' at position k. A heap of size k is sufficient to solve this problem, dramatically reducing space and enabling streaming solutions.
Here's where heap-based solutions become genuinely elegant—and initially confusing. The optimal approach uses what seems like the wrong heap type:
To find the k-th LARGEST, use a MIN-heap of size k. To find the k-th SMALLEST, use a MAX-heap of size k.
This counterintuitive choice is the key insight that separates those who understand heaps from those who merely know them.
This seems backward at first. If we want the k-th largest, shouldn't we use a max-heap? The crucial realization: we're maintaining the k largest elements seen so far, and we need quick access to the SMALLEST of those k—that's the boundary element that might be replaced.
The Algorithm for K-th Largest (using Min-Heap of size k):
Why This Works: At any point, the heap contains exactly the k largest elements seen so far. The heap's minimum is the smallest of these k—which is, by definition, the k-th largest overall. Elements smaller than this minimum can never be in the top k, so we ignore them.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
import heapqfrom typing import List def find_kth_largest(nums: List[int], k: int) -> int: """ Find the k-th largest element using a min-heap of size k. Time Complexity: O(n log k) Space Complexity: O(k) The heap maintains the k largest elements seen so far. Its root (minimum) is the k-th largest overall. """ # Python's heapq is a min-heap by default - perfect for this problem! min_heap = [] for num in nums: if len(min_heap) < k: # Phase 1: Fill the heap with first k elements heapq.heappush(min_heap, num) elif num > min_heap[0]: # Phase 2: Replace minimum if current is larger # heapreplace is more efficient than pop + push heapq.heapreplace(min_heap, num) # If num <= min_heap[0], skip it - can't be in top k # The root of min-heap is the k-th largest return min_heap[0] # Example usagenums = [3, 2, 1, 5, 6, 4]k = 2result = find_kth_largest(nums, k)print(f"The {k}th largest element is: {result}") # Output: 5 # Trace through the algorithm:# Process 3: heap = [3]# Process 2: heap = [2, 3] (k=2 elements now)# Process 1: 1 < 2, skip# Process 5: 5 > 2, replace: heap = [3, 5]# Process 6: 6 > 3, replace: heap = [5, 6]# Process 4: 4 < 5, skip# Final heap minimum = 5 = 2nd largest ✓Let's trace through the algorithm step by step to build intuition for why the min-heap approach works.
| Step | Element | Action | Heap State | Reasoning |
|---|---|---|---|---|
| 1 | 3 | Push (heap size < k) | [3] | Building initial k-element heap |
| 2 | 2 | Push (heap size < k) | [2, 3] | Heap now has k=2 elements |
| 3 | 1 | Skip (1 < heap min 2) | [2, 3] | 1 can never be in top 2 |
| 4 | 5 | Replace (5 > 2) | [3, 5] | 5 kicks out 2, joins top 2 |
| 5 | 6 | Replace (6 > 3) | [5, 6] | 6 kicks out 3, joins top 2 |
| 6 | 4 | Skip (4 < heap min 5) | [5, 6] | 4 can never be in top 2 |
Final Result: The heap contains [5, 6]—the two largest elements. The heap minimum 5 is the 2nd largest.
Key Observations:
The heap acts as a 'sliding window' of top-k elements: At any moment, it holds exactly the k largest elements seen so far.
The min-heap root is the 'entry barrier': Any new element must beat this minimum to join the top k.
Rejected elements are permanently excluded: If an element is smaller than the current heap minimum, it cannot possibly be in the final top k.
Efficiency comes from rejection: We only perform O(log k) heap operations when an element actually enters the heap. Elements that are rejected require only O(1) comparison.
Think of the min-heap as a VIP club with exactly k spots. The heap minimum is the 'bouncer' at the door. A new arrival must be more impressive than the least impressive current VIP to get in—and if they do, the former least-impressive VIP gets kicked out.
Finding the k-th smallest uses the exact same logic, but with the heap polarity reversed:
To find the k-th SMALLEST, use a MAX-heap of size k.
The max-heap now maintains the k smallest elements seen so far, with quick access to the largest of them (the k-th smallest overall).
1234567891011121314151617181920212223242526272829303132333435363738394041
import heapqfrom typing import List def find_kth_smallest(nums: List[int], k: int) -> int: """ Find the k-th smallest element using a max-heap of size k. Python's heapq is a min-heap, so we negate values to simulate max-heap. Time Complexity: O(n log k) Space Complexity: O(k) """ max_heap = [] # Will store negated values for num in nums: if len(max_heap) < k: # Phase 1: Fill the heap with first k elements (negated) heapq.heappush(max_heap, -num) elif num < -max_heap[0]: # num < current maximum # Phase 2: Replace maximum if current is smaller heapq.heapreplace(max_heap, -num) # If num >= -max_heap[0], skip it - can't be in bottom k # The root of max-heap (negated) is the k-th smallest return -max_heap[0] # Example usagenums = [3, 2, 1, 5, 6, 4]k = 2result = find_kth_smallest(nums, k)print(f"The {k}th smallest element is: {result}") # Output: 2 # Trace:# Process 3: heap = [-3] → represents max-heap [3]# Process 2: heap = [-3, -2] → max-heap [3, 2]# Process 1: 1 < 3, replace: heap = [-2, -1] → max-heap [2, 1]# Process 5: 5 > 2, skip# Process 6: 6 > 2, skip# Process 4: 4 > 2, skip# Final: -heap[0] = 2 = 2nd smallest ✓| Finding | Use This Heap | Heap Size | Answer Location |
|---|---|---|---|
| K-th Largest | Min-Heap | k | Heap root (minimum) |
| K-th Smallest | Max-Heap | k | Heap root (maximum) |
| All Top K Largest | Min-Heap | k | Entire heap contents |
| All Top K Smallest | Max-Heap | k | Entire heap contents |
Understanding the complexity tradeoffs is crucial for knowing when the heap approach is optimal.
| Approach | Time Complexity | Best When |
|---|---|---|
| Sort + Index | O(n log n) | Multiple queries on same data |
| Heap (size k) | O(n log k) | k << n (small k) |
| Heap (size n-k) | O(n log(n-k)) | k close to n |
| QuickSelect | O(n) average, O(n²) worst | One-time query, average case OK |
| Introselect | O(n) worst case | Guaranteed linear time needed |
Detailed Analysis of the Heap Approach:
Time Complexity: O(n log k)
When k is small (k << n): log k is nearly constant. If k = 10 and n = 1,000,000:
When k is close to n: The heap approach is less advantageous. Consider finding the (n-1)-th largest (the 2nd smallest). Using a min-heap of size (n-1) is essentially O(n log n).
Optimization: If k > n/2, find the (n-k+1)-th smallest instead using the opposite heap type. This ensures heap size ≤ n/2.
123456789101112131415161718192021222324252627282930313233343536373839404142
import heapqfrom typing import List def find_kth_largest_optimized(nums: List[int], k: int) -> int: """ Optimized version that uses the smaller heap. If k > n/2, it's more efficient to find (n-k+1)th smallest using a max-heap of size (n-k+1). This ensures heap size is at most ceil(n/2), reducing worst-case operations. """ n = len(nums) # Use the more efficient formulation if k > n // 2: # Find (n-k+1)th smallest instead k_small = n - k + 1 max_heap = [] for num in nums: if len(max_heap) < k_small: heapq.heappush(max_heap, -num) elif num < -max_heap[0]: heapq.heapreplace(max_heap, -num) return -max_heap[0] else: # Standard min-heap approach for k-th largest min_heap = [] for num in nums: if len(min_heap) < k: heapq.heappush(min_heap, num) elif num > min_heap[0]: heapq.heapreplace(min_heap, num) return min_heap[0] # Example: Finding 999,999th largest in 1,000,000 elements# Without optimization: heap of 999,999 elements# With optimization: heap of 2 elements (finding 2nd smallest)nums = list(range(1_000_000))print(find_kth_largest_optimized(nums, 999_999)) # Much faster!Space Complexity: O(k)
The heap stores exactly k elements. This is a significant advantage over sorting (O(1) to O(n) depending on algorithm) when k is small.
More importantly, the O(k) space enables streaming: we can process elements as they arrive without storing the entire dataset. This is impossible with sorting-based approaches.
One of the most powerful applications of the heap-based k-th element algorithm is handling streaming data—where elements arrive continuously and we need to answer queries at any moment.
The heap approach naturally extends to this scenario:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
import heapqfrom typing import Optional class KthLargestStream: """ Maintains the k-th largest element in a stream. Supports: - add(val): O(log k) - Add a new element - get_kth(): O(1) - Get current k-th largest Use case: Real-time leaderboard, percentile tracking, top-k monitoring """ def __init__(self, k: int, initial_nums: list[int] = None): """ Initialize with k and optionally some initial numbers. """ self.k = k self.min_heap = [] if initial_nums: for num in initial_nums: self.add(num) def add(self, val: int) -> Optional[int]: """ Add a new value to the stream. Returns the current k-th largest (or None if < k elements). """ if len(self.min_heap) < self.k: heapq.heappush(self.min_heap, val) elif val > self.min_heap[0]: heapq.heapreplace(self.min_heap, val) return self.get_kth() def get_kth(self) -> Optional[int]: """ Get the current k-th largest. Returns None if fewer than k elements have been seen. """ if len(self.min_heap) < self.k: return None return self.min_heap[0] # Example: Real-time stock price monitoring# Track the 3rd highest price seen throughout the daytracker = KthLargestStream(k=3) prices = [105.2, 102.8, 108.5, 101.3, 110.0, 107.2, 112.5]for price in prices: kth = tracker.add(price) print(f"New price: {price:>6.1f} | 3rd highest so far: {kth}") # Output:# New price: 105.2 | 3rd highest so far: None# New price: 102.8 | 3rd highest so far: None# New price: 108.5 | 3rd highest so far: 102.8# New price: 101.3 | 3rd highest so far: 102.8# New price: 110.0 | 3rd highest so far: 105.2# New price: 107.2 | 3rd highest so far: 107.2# New price: 112.5 | 3rd highest so far: 108.5This exact pattern is tested in LeetCode problem 703. The streaming k-th largest class is a direct solution. Understanding this pattern unlocks an entire category of real-time monitoring problems.
While heaps provide an elegant O(n log k) solution, there exists a fundamentally different approach that achieves O(n) average time: QuickSelect.
QuickSelect is the selection analog of QuickSort, using the same partitioning strategy but only recursing into the half containing our target rank.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
import randomfrom typing import List def quickselect(nums: List[int], k: int) -> int: """ Find k-th largest using QuickSelect algorithm. Time: O(n) average, O(n²) worst case Space: O(1) extra (modifies input) Note: Modifies the input array! """ # Convert to 0-indexed position from end target_idx = len(nums) - k def partition(left: int, right: int, pivot_idx: int) -> int: """Partition around pivot, return final pivot position.""" pivot = nums[pivot_idx] # Move pivot to end nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx] store_idx = left for i in range(left, right): if nums[i] < pivot: nums[store_idx], nums[i] = nums[i], nums[store_idx] store_idx += 1 # Move pivot to final position nums[right], nums[store_idx] = nums[store_idx], nums[right] return store_idx def select(left: int, right: int) -> int: """Recursively select until target found.""" if left == right: return nums[left] # Random pivot for O(n) expected time pivot_idx = random.randint(left, right) pivot_idx = partition(left, right, pivot_idx) if target_idx == pivot_idx: return nums[pivot_idx] elif target_idx < pivot_idx: return select(left, pivot_idx - 1) else: return select(pivot_idx + 1, right) return select(0, len(nums) - 1) # Examplenums = [3, 2, 1, 5, 6, 4]k = 2result = quickselect(nums.copy(), k) # Copy to preserve originalprint(f"The {k}th largest: {result}") # 5| Scenario | Best Choice | Reason |
|---|---|---|
| Single query, can modify array | QuickSelect | O(n) average beats O(n log k) |
| Streaming data | Heap | QuickSelect can't handle streams |
| Multiple queries, same k | Heap | Heap state persists across additions |
| Multiple different k queries | Sort once | O(n log n) + O(1) per query |
| Small k, large n | Heap | O(n log k) ≈ O(n) when k is tiny |
| Guaranteed worst-case bound | Introselect or Heap | QuickSelect can degrade to O(n²) |
The k-th element pattern appears in many disguised forms. Recognizing these variations is key to interview success.
The key skill is recognizing that a problem can be reduced to finding a k-th element. 'Top K by some metric' → 'K-th largest by that metric'. 'K closest to target' → 'K smallest by distance'. Train yourself to see through the problem's disguise.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
import heapqfrom collections import Counterfrom typing import List def top_k_frequent(nums: List[int], k: int) -> List[int]: """ LeetCode 347: Top K Frequent Elements Return the k most frequent elements. Uses min-heap of size k, keyed by frequency. Time: O(n log k) Space: O(n) for counter + O(k) for heap """ # Count frequencies freq = Counter(nums) # Min-heap storing (frequency, element) # We want k highest frequencies, so use min-heap of size k min_heap = [] for num, count in freq.items(): if len(min_heap) < k: heapq.heappush(min_heap, (count, num)) elif count > min_heap[0][0]: heapq.heapreplace(min_heap, (count, num)) # Extract elements (not frequencies) return [num for count, num in min_heap] def k_closest_points(points: List[List[int]], k: int) -> List[List[int]]: """ LeetCode 973: K Closest Points to Origin Return k points closest to origin (0, 0). Uses max-heap of size k, keyed by distance. Time: O(n log k) Space: O(k) """ # Max-heap storing (-distance_squared, point) # We want k smallest distances, so use max-heap of size k max_heap = [] for x, y in points: dist_sq = x*x + y*y if len(max_heap) < k: heapq.heappush(max_heap, (-dist_sq, [x, y])) elif dist_sq < -max_heap[0][0]: heapq.heapreplace(max_heap, (-dist_sq, [x, y])) return [point for dist, point in max_heap] # Examplesprint(top_k_frequent([1,1,1,2,2,3], 2)) # [1, 2]print(k_closest_points([[1,3],[-2,2]], 1)) # [[-2, 2]]We've completed a deep dive into one of the most important heap patterns. Let's consolidate the key insights:
What's Next:
The k-th element pattern is just the beginning. In the next page, we'll explore another powerful heap application: merging k sorted lists. This pattern builds on the same heap intuition but applies it to a different problem structure—maintaining the frontier of k separate sequences.
You now understand the k-th element pattern deeply—not just the algorithm, but the underlying reasoning. When you encounter variations in interviews or production code, you'll recognize the pattern and apply the technique with confidence.