Top-K Problems - Learning Module

Loading content...

0/276

Space-Efficient K Element Tracking

When Memory Becomes the Bottleneck

Consider a real-time bidding platform processing 10 million ad impressions per second. For each advertisement campaign, you need to track the top 1,000 highest-value placements. With 100,000 active campaigns, that's 100 million elements to track across all Top-K heaps—and you have exactly 16GB of RAM available.

Or imagine a distributed log aggregation system where each of 5,000 servers tracks its own Top-100 errors. The central aggregator must merge these results every second. How do you minimize network bandwidth while still maintaining accuracy?

These scenarios demand more than the basic Top-K pattern. They require space-efficient K element tracking—techniques that minimize memory footprint while maintaining the correctness guarantees we need.

In this page, we'll explore how to squeeze maximum efficiency from bounded data structures, implement memory-aware eviction policies, and design systems that gracefully handle memory constraints.

What You Will Learn

By the end of this page, you will understand: (1) why O(K) space isn't always "constant enough," (2) techniques for reducing per-element memory overhead, (3) how to implement memory-bounded priority queues, (4) strategies for handling dynamic updates efficiently, and (5) trade-offs between exactness and space efficiency.

The True Cost of O(K) Space

We established that the heap-based Top-K approach uses O(K) space. This is a dramatic improvement over O(n) for storing all elements. But "O(K)" hides important constant factors that matter in practice.

Breaking Down Memory Usage:

Consider a min-heap storing K integers in a typical 64-bit environment:

Memory per element = sizeof(int) + heap overhead
                   = 4 bytes (int) + ~8 bytes (heap node metadata)
                   = ~12 bytes per element

For K = 1,000:     12,000 bytes ≈ 12 KB
For K = 1,000,000: 12,000,000 bytes ≈ 12 MB

This seems reasonable. But now consider tracking Top-K across M different categories:

Total memory = M × K × 12 bytes

M = 100,000 categories, K = 1,000 elements each:
Total = 100,000 × 1,000 × 12 = 1.2 GB

Suddenly, "O(K)" doesn't feel so small anymore.

Memory Scaling for Multi-Category Top-K
Categories (M)	K per Category	Per-Element Size	Total Memory
1,000	100	12 bytes	1.2 MB
10,000	100	12 bytes	12 MB
100,000	100	12 bytes	120 MB
100,000	1,000	12 bytes	1.2 GB
1,000,000	1,000	12 bytes	12 GB

The Hidden Multiplier

In multi-tenant or multi-category systems, the total memory is M × K × (element size). Even with small K, a large number of categories M can exhaust available memory. This is why per-element memory optimization matters.

Object Overhead in Managed Languages:

In languages like Java or Python, the overhead is even worse:

Java Integer object:
  - Object header: 12-16 bytes
  - int value: 4 bytes
  - Padding: 0-4 bytes
  - Total: ~20 bytes for a single int!

Java heap node with references:
  - Integer object: 20 bytes
  - Left/right child references: 16 bytes
  - Parent reference: 8 bytes
  - Total: ~44 bytes per element

This "object tax" can make naive implementations 4-10x larger than the raw data would suggest.

Minimizing Per-Element Overhead

The first line of defense against memory bloat is reducing per-element overhead. Here are proven techniques:

Technique 1: Array-Based Heap (Implicit Structure)

Instead of node objects with explicit pointers, use an array where parent-child relationships are implicit:

Parent of index i:     (i - 1) / 2
Left child of index i:  2i + 1
Right child of index i: 2i + 2

This eliminates pointer overhead entirely:

Implementation	Memory per Element	For K=1000
Node-based (3 pointers)	~28 bytes	28 KB
Array-based	4 bytes	4 KB
Improvement	7x	24 KB saved

memory_efficient_heap.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
from array import array
from typing import Optional, List
 
class CompactMinHeap:
    """
    Memory-efficient min-heap using Python's array module.
    
    Uses 'l' (long) typecode for signed integers.
    Memory usage: ~8 bytes per element (vs ~56 bytes for list of objects).
    
    This implementation is ~7x more memory-efficient than using
    a list of Python integers with heapq.
    """
    
    def __init__(self, capacity: int):
        # Pre-allocate fixed-size array
        self._data = array('l', [0] * capacity)
        self._size = 0
        self._capacity = capacity
    
    @property
    def size(self) -> int:
        return self._size
    
    def is_full(self) -> bool:
        return self._size >= self._capacity
    
    def peek(self) -> Optional[int]:
        if self._size == 0:
            return None
        return self._data[0]
    
    def push(self, value: int) -> Optional[int]:
        """
        Push value. If full, evict minimum and push only if value is larger.
        Returns evicted value if any, None otherwise.
        """
        if self._size < self._capacity:
            self._data[self._size] = value
            self._size += 1
            self._bubble_up(self._size - 1)
            return None
        elif value > self._data[0]:
            evicted = self._data[0]
            self._data[0] = value
            self._bubble_down(0)
            return evicted
        return None
    
    def _bubble_up(self, index: int) -> None:
        while index > 0:
            parent = (index - 1) // 2
            if self._data[parent] <= self._data[index]:
                break
            self._data[parent], self._data[index] = \
                self._data[index], self._data[parent]
            index = parent
    
    def _bubble_down(self, index: int) -> None:
        while True:
            smallest = index
            left = 2 * index + 1
            right = 2 * index + 2
            
            if left < self._size and self._data[left] < self._data[smallest]:
                smallest = left
            if right < self._size and self._data[right] < self._data[smallest]:
                smallest = right
            
            if smallest == index:
                break
            
            self._data[index], self._data[smallest] = \
                self._data[smallest], self._data[index]
            index = smallest
    
    def to_list(self) -> List[int]:
        return list(self._data[:self._size])
 
 
# Memory comparison
import sys
 
def compare_memory():
    K = 10000
    
    # Standard Python list with heapq
    standard_heap = list(range(K))
    
    # Compact array-based heap
    compact_heap = CompactMinHeap(K)
    for i in range(K):
        compact_heap.push(i)
    
    print(f"Standard heap size: {sys.getsizeof(standard_heap)} bytes")
    print(f"Compact heap size: {compact_heap._data.buffer_info()[1] * 8} bytes")
    # Output shows ~6-8x memory reduction
 
compare_memory()

Technique 2: Value Encoding

If your values have limited range, encode them more compactly:

# For values 0-65535, use uint16 instead of int64
from array import array

# Standard: 8 bytes per element
standard = array('q', range(1000))  # 8000 bytes

# Compact: 2 bytes per element  
compact = array('H', range(1000))   # 2000 bytes (4x smaller)

Technique 3: Index Indirection

When storing complex objects, store only indices in the heap:

class TopKProducts:
    def __init__(self, products: List[Product], k: int):
        self.products = products  # Full product data
        self.heap = []  # Store only (score, index) pairs
        
        for i, p in enumerate(products):
            if len(self.heap) < k:
                heapq.heappush(self.heap, (p.score, i))
            elif p.score > self.heap[0][0]:
                heapq.heapreplace(self.heap, (p.score, i))
    
    def get_top_k(self) -> List[Product]:
        # Dereference indices to get actual products
        return [self.products[i] for _, i in self.heap]

This stores only 16 bytes per heap entry (float + int) instead of full Product objects.

The Bounded Priority Queue Pattern

The core abstraction for space-efficient Top-K tracking is the Bounded Priority Queue (BPQ)—a priority queue with a maximum capacity that automatically evicts the lowest-priority element when full.

Interface Definition:

BoundedPriorityQueue<T>:
    constructor(capacity: int, comparator: (T, T) -> int)
    
    insert(element: T) -> Optional<T>   // Returns evicted element, if any
    peek() -> Optional<T>               // View threshold element
    extractMin() -> Optional<T>         // Remove and return minimum
    size() -> int
    isFull() -> bool
    toList() -> List<T>

Key Properties:

Fixed memory footprint: Never exceeds capacity × element_size
Automatic eviction: No manual management needed
Threshold access: peek() always returns the "admission threshold"
Insertion efficiency: O(log K) regardless of whether eviction occurs

bounded_priority_queue.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
from heapq import heappush, heapreplace
from typing import TypeVar, Generic, Optional, List, Callable
from dataclasses import dataclass
 
T = TypeVar('T')
 
@dataclass
class HeapEntry(Generic[T]):
    """Wrapper to make any type comparable by a key function."""
    key: float
    value: T
    
    def __lt__(self, other: 'HeapEntry') -> bool:
        return self.key < other.key
 
 
class BoundedPriorityQueue(Generic[T]):
    """
    A priority queue with bounded capacity.
    
    Maintains the top K elements by automatically evicting the 
    lowest-priority element when capacity is exceeded.
    
    Use Cases:
    - Real-time Top-K tracking
    - Memory-bounded result aggregation
    - Streaming analytics
    
    Time Complexity:
    - insert: O(log K)
    - peek: O(1)
    - extractMin: O(log K)
    
    Space Complexity: O(K)
    """
    
    def __init__(
        self, 
        capacity: int, 
        key_func: Callable[[T], float] = lambda x: x
    ):
        """
        Initialize bounded priority queue.
        
        Args:
            capacity: Maximum number of elements to retain
            key_func: Function to extract comparison key from elements
        """
        if capacity <= 0:
            raise ValueError("Capacity must be positive")
        
        self._capacity = capacity
        self._key_func = key_func
        self._heap: List[HeapEntry[T]] = []
    
    @property
    def capacity(self) -> int:
        return self._capacity
    
    @property
    def size(self) -> int:
        return len(self._heap)
    
    def is_full(self) -> bool:
        return len(self._heap) >= self._capacity
    
    def is_empty(self) -> bool:
        return len(self._heap) == 0
    
    def peek(self) -> Optional[T]:
        """Return the minimum element without removing it."""
        if self._heap:
            return self._heap[0].value
        return None
    
    def peek_key(self) -> Optional[float]:
        """Return the minimum key (threshold for admission)."""
        if self._heap:
            return self._heap[0].key
        return None
    
    def insert(self, element: T) -> Optional[T]:
        """
        Insert element into the queue.
        
        If queue is full and element has higher priority than minimum,
        evicts minimum and inserts new element.
        
        Returns:
            Evicted element if eviction occurred, None otherwise
        """
        key = self._key_func(element)
        entry = HeapEntry(key, element)
        
        if len(self._heap) < self._capacity:
            heappush(self._heap, entry)
            return None
        elif key > self._heap[0].key:
            evicted = heapreplace(self._heap, entry)
            return evicted.value
        
        return None  # Element didn't qualify
    
    def would_qualify(self, element: T) -> bool:
        """Check if element would be admitted without modifying queue."""
        if not self.is_full():
            return True
        key = self._key_func(element)
        return key > self._heap[0].key
    
    def to_list(self) -> List[T]:
        """Return all elements (unordered)."""
        return [entry.value for entry in self._heap]
    
    def to_sorted_list(self, reverse: bool = True) -> List[T]:
        """Return all elements in sorted order."""
        sorted_entries = sorted(self._heap, reverse=reverse)
        return [entry.value for entry in sorted_entries]
    
    def clear(self) -> None:
        """Remove all elements from the queue."""
        self._heap.clear()
    
    def threshold(self) -> Optional[float]:
        """
        Return current admission threshold.
        Elements with key > threshold will be admitted.
        Returns None if queue is not full.
        """
        if self.is_full():
            return self._heap[0].key
        return None
 
 
# Example: Product recommendation system
@dataclass
class Product:
    id: str
    name: str
    relevance_score: float
 
def demo_product_recommendations():
    # Track top 3 products by relevance
    top_products = BoundedPriorityQueue[Product](
        capacity=3,
        key_func=lambda p: p.relevance_score
    )
    
    products = [
        Product("p1", "Wireless Mouse", 0.85),
        Product("p2", "USB Cable", 0.45),
        Product("p3", "Monitor Stand", 0.72),
        Product("p4", "Keyboard", 0.91),
        Product("p5", "Webcam", 0.68),
        Product("p6", "Headphones", 0.88),
    ]
    
    for product in products:
        evicted = top_products.insert(product)
        if evicted:
            print(f"Admitted {product.name} (score={product.relevance_score}), "
                  f"evicted {evicted.name} (score={evicted.relevance_score})")
        else:
            print(f"Admitted {product.name} (score={product.relevance_score})")
    
    print("\nTop 3 products:")
    for product in top_products.to_sorted_list():
        print(f"  {product.name}: {product.relevance_score}")
 
demo_product_recommendations()

The Threshold Property

The BPQ's peek() method returns the current admission threshold—the minimum value that an element must exceed to enter the Top-K. This is useful for early rejection: if you can quickly compute a lower bound on an element's score and it's below the threshold, skip the full computation.

Dynamic Updates and Modifications

Real-world Top-K systems often need to handle updates to existing elements, not just insertions of new ones. Consider:

A leaderboard where player scores change mid-game
Product rankings that update as new reviews arrive
Server monitoring where CPU usage fluctuates continuously

Standard heaps don't support efficient updates—changing an element's priority requires O(K) to find it, then O(log K) to restore the heap property. We need smarter approaches.

Approach 1: Indexed Priority Queue

Maintain a hash map from element ID to heap index. This enables O(1) lookup followed by O(log K) update:

IndexedPriorityQueue:
    heap: array of (priority, element_id)
    index_map: dict of element_id -> heap_index
    
    update_priority(element_id, new_priority):
        idx = index_map[element_id]  // O(1)
        old_priority = heap[idx].priority
        heap[idx].priority = new_priority
        
        if new_priority > old_priority:
            bubble_down(idx)  // O(log K)
        else:
            bubble_up(idx)    // O(log K)

indexed_priority_queue.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
from typing import Dict, List, Tuple, Optional, TypeVar, Generic
from dataclasses import dataclass
 
T = TypeVar('T')
 
class IndexedMinPQ(Generic[T]):
    """
    Indexed Min Priority Queue for efficient updates.
    
    Supports:
    - insert(key, priority): O(log n)
    - update_priority(key, new_priority): O(log n)
    - delete(key): O(log n)
    - peek_min(): O(1)
    - extract_min(): O(log n)
    - contains(key): O(1)
    
    Perfect for:
    - Leaderboards with score updates
    - Dijkstra's algorithm
    - Dynamic Top-K tracking
    """
    
    def __init__(self, capacity: int = None):
        self._heap: List[Tuple[float, T]] = []  # (priority, key)
        self._index: Dict[T, int] = {}  # key -> heap index
        self._capacity = capacity
    
    @property
    def size(self) -> int:
        return len(self._heap)
    
    def is_empty(self) -> bool:
        return len(self._heap) == 0
    
    def is_full(self) -> bool:
        return self._capacity is not None and len(self._heap) >= self._capacity
    
    def contains(self, key: T) -> bool:
        return key in self._index
    
    def get_priority(self, key: T) -> Optional[float]:
        if key not in self._index:
            return None
        idx = self._index[key]
        return self._heap[idx][0]
    
    def peek_min(self) -> Optional[Tuple[T, float]]:
        if self._heap:
            priority, key = self._heap[0]
            return (key, priority)
        return None
    
    def insert(self, key: T, priority: float) -> Optional[Tuple[T, float]]:
        """
        Insert element with given priority.
        If at capacity, evicts minimum if new priority is higher.
        Returns evicted (key, priority) or None.
        """
        if key in self._index:
            # Key exists, update instead
            self.update_priority(key, priority)
            return None
        
        if self._capacity is not None and len(self._heap) >= self._capacity:
            # At capacity - check if new element qualifies
            if priority > self._heap[0][0]:
                evicted = self.extract_min()
                self._insert_new(key, priority)
                return evicted
            return None
        
        self._insert_new(key, priority)
        return None
    
    def _insert_new(self, key: T, priority: float) -> None:
        idx = len(self._heap)
        self._heap.append((priority, key))
        self._index[key] = idx
        self._bubble_up(idx)
    
    def update_priority(self, key: T, new_priority: float) -> bool:
        """
        Update priority for existing key.
        Returns True if key existed and was updated.
        """
        if key not in self._index:
            return False
        
        idx = self._index[key]
        old_priority = self._heap[idx][0]
        self._heap[idx] = (new_priority, key)
        
        if new_priority < old_priority:
            self._bubble_up(idx)
        else:
            self._bubble_down(idx)
        
        return True
    
    def extract_min(self) -> Optional[Tuple[T, float]]:
        if not self._heap:
            return None
        
        min_priority, min_key = self._heap[0]
        del self._index[min_key]
        
        if len(self._heap) == 1:
            self._heap.pop()
        else:
            # Move last element to root and bubble down
            last_priority, last_key = self._heap.pop()
            self._heap[0] = (last_priority, last_key)
            self._index[last_key] = 0
            self._bubble_down(0)
        
        return (min_key, min_priority)
    
    def delete(self, key: T) -> bool:
        """Delete element by key. Returns True if element existed."""
        if key not in self._index:
            return False
        
        idx = self._index[key]
        del self._index[key]
        
        if idx == len(self._heap) - 1:
            self._heap.pop()
        else:
            last_priority, last_key = self._heap.pop()
            old_priority = self._heap[idx][0]
            self._heap[idx] = (last_priority, last_key)
            self._index[last_key] = idx
            
            if last_priority < old_priority:
                self._bubble_up(idx)
            else:
                self._bubble_down(idx)
        
        return True
    
    def _bubble_up(self, idx: int) -> None:
        while idx > 0:
            parent = (idx - 1) // 2
            if self._heap[parent][0] <= self._heap[idx][0]:
                break
            self._swap(idx, parent)
            idx = parent
    
    def _bubble_down(self, idx: int) -> None:
        while True:
            smallest = idx
            left = 2 * idx + 1
            right = 2 * idx + 2
            
            if left < len(self._heap) and \
               self._heap[left][0] < self._heap[smallest][0]:
                smallest = left
            if right < len(self._heap) and \
               self._heap[right][0] < self._heap[smallest][0]:
                smallest = right
            
            if smallest == idx:
                break
            
            self._swap(idx, smallest)
            idx = smallest
    
    def _swap(self, i: int, j: int) -> None:
        # Update index map
        _, key_i = self._heap[i]
        _, key_j = self._heap[j]
        self._index[key_i] = j
        self._index[key_j] = i
        # Swap in heap
        self._heap[i], self._heap[j] = self._heap[j], self._heap[i]
    
    def to_list(self) -> List[Tuple[T, float]]:
        return [(key, priority) for priority, key in self._heap]
 
 
# Demo: Real-time game leaderboard
def demo_leaderboard():
    # Track top 5 players
    leaderboard = IndexedMinPQ[str](capacity=5)
    
    # Initial scores
    scores = [
        ("Alice", 1200), ("Bob", 1450), ("Charlie", 980),
        ("Diana", 1100), ("Eve", 1380), ("Frank", 1520)
    ]
    
    print("Adding initial players...")
    for name, score in scores:
        evicted = leaderboard.insert(name, score)
        if evicted:
            print(f"  {name} (score={score}) entered, "
                  f"{evicted[0]} (score={evicted[1]}) evicted")
    
    print("\nCurrent Top 5:")
    for name, score in sorted(leaderboard.to_list(), 
                              key=lambda x: -x[1]):
        print(f"  {name}: {score}")
    
    # Score updates
    print("\nUpdating scores...")
    leaderboard.update_priority("Alice", 1550)  # Alice improves!
    print("  Alice's new score: 1550")
    
    print("\nUpdated Top 5:")
    for name, score in sorted(leaderboard.to_list(), 
                              key=lambda x: -x[1]):
        print(f"  {name}: {score}")
 
demo_leaderboard()

Space Trade-off

The indexed priority queue adds O(K) space for the index map, effectively doubling memory usage. This is worthwhile when updates are frequent. For insert-only workloads, the simpler non-indexed heap is more memory-efficient.

The Lazy Deletion Pattern

When deletions are infrequent and update complexity is critical, the lazy deletion pattern offers an alternative to indexed heaps.

The Idea: Instead of immediately removing deleted elements, mark them as deleted. Clean up lazily when they reach the heap root.

LazyDeletionHeap:
    heap: standard min-heap
    deleted: set of element IDs
    
    delete(element_id):
        deleted.add(element_id)  // O(1)
    
    extract_min():
        while heap is not empty:
            min_element = heap.extract_min()
            if min_element.id not in deleted:
                return min_element
            deleted.remove(min_element.id)  // Cleanup
        return null

Advantages:

Delete is O(1), not O(log K)
No index maintenance overhead
Simple implementation

Disadvantages:

Heap may contain "ghost" entries temporarily
Memory usage can exceed K during deletions
extract_min may need to skip multiple ghosts

lazy_deletion_heap.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
import heapq
from typing import List, Set, Optional, Tuple, TypeVar, Generic
from dataclasses import dataclass, field
 
T = TypeVar('T')
 
@dataclass(order=True)
class LazyEntry(Generic[T]):
    """Entry for lazy deletion heap."""
    priority: float
    element_id: str = field(compare=False)
    data: T = field(compare=False)
 
 
class LazyDeletionTopK(Generic[T]):
    """
    Top-K tracker with O(1) deletions using lazy deletion.
    
    Ideal when:
    - Deletions are less frequent than insertions
    - Deletion speed is more important than memory
    - You can tolerate temporary overcounting
    
    Time Complexity:
    - insert: O(log n) where n is current heap size
    - delete: O(1)
    - extract_min: O(log n) amortized (may skip ghosts)
    - get_top_k: O(k log n)
    
    Space Complexity: O(n) where n includes ghost entries
    """
    
    def __init__(self, k: int):
        self.k = k
        self._heap: List[LazyEntry[T]] = []
        self._deleted: Set[str] = set()
        self._active_count = 0  # Track non-deleted entries
    
    @property
    def size(self) -> int:
        """Number of active (non-deleted) entries."""
        return self._active_count
    
    def insert(self, element_id: str, priority: float, data: T) -> None:
        """
        Insert new element or update existing one.
        For updates, we insert new entry and mark old one deleted.
        """
        if element_id in self._deleted:
            # Remove from deleted set (reactivating the ID)
            self._deleted.discard(element_id)
        else:
            self._active_count += 1
        
        entry = LazyEntry(priority, element_id, data)
        heapq.heappush(self._heap, entry)
        
        # Enforce capacity by evicting minimum active entry
        self._enforce_capacity()
    
    def delete(self, element_id: str) -> bool:
        """
        Mark element as deleted. O(1) operation.
        Returns True if element was active.
        """
        # We don't check if element exists in heap - just mark it
        if element_id not in self._deleted:
            self._deleted.add(element_id)
            self._active_count -= 1
            return True
        return False
    
    def update(self, element_id: str, new_priority: float, data: T) -> None:
        """
        Update priority for existing element.
        Implemented by inserting new entry and lazy-deleting old one.
        """
        # Mark old entry as deleted
        self._deleted.add(element_id)
        # Insert new entry with same ID
        entry = LazyEntry(new_priority, element_id, data)
        heapq.heappush(self._heap, entry)
        # Note: active_count unchanged (one deleted, one added)
    
    def peek_min(self) -> Optional[Tuple[str, float, T]]:
        """Return minimum active element without removing it."""
        self._cleanup_ghosts()
        if self._heap:
            entry = self._heap[0]
            return (entry.element_id, entry.priority, entry.data)
        return None
    
    def _cleanup_ghosts(self) -> None:
        """Remove deleted entries from heap root."""
        while self._heap and self._heap[0].element_id in self._deleted:
            entry = heapq.heappop(self._heap)
            self._deleted.discard(entry.element_id)
    
    def _enforce_capacity(self) -> None:
        """
        Evict entries until we have at most K active elements.
        """
        while self._active_count > self.k:
            self._cleanup_ghosts()
            if self._heap:
                evicted = heapq.heappop(self._heap)
                if evicted.element_id not in self._deleted:
                    self._active_count -= 1
    
    def get_top_k(self) -> List[Tuple[str, float, T]]:
        """
        Return all active elements in the Top-K.
        """
        self._cleanup_ghosts()
        
        results = []
        for entry in self._heap:
            if entry.element_id not in self._deleted:
                results.append((entry.element_id, entry.priority, entry.data))
        
        return sorted(results, key=lambda x: -x[1])[:self.k]
    
    def compact(self) -> int:
        """
        Rebuild heap without ghost entries.
        Returns number of ghosts removed.
        """
        active_entries = [
            e for e in self._heap 
            if e.element_id not in self._deleted
        ]
        ghosts_removed = len(self._heap) - len(active_entries)
        
        self._heap = active_entries
        heapq.heapify(self._heap)
        self._deleted.clear()
        
        return ghosts_removed
 
 
# Demo: Server monitoring with updates
def demo_server_monitoring():
    # Track top 3 servers by CPU usage
    top_cpu = LazyDeletionTopK[dict](k=3)
    
    # Initial readings
    servers = [
        ("server-1", 45.2, {"region": "us-east"}),
        ("server-2", 78.5, {"region": "us-west"}),
        ("server-3", 23.1, {"region": "eu-west"}),
        ("server-4", 91.0, {"region": "ap-south"}),
        ("server-5", 67.3, {"region": "us-east"}),
    ]
    
    print("Initial readings:")
    for sid, cpu, meta in servers:
        top_cpu.insert(sid, cpu, meta)
        print(f"  {sid}: {cpu}%")
    
    print(f"\nTop 3 by CPU:")
    for sid, cpu, meta in top_cpu.get_top_k():
        print(f"  {sid}: {cpu}% ({meta['region']})")
    
    # Server goes offline
    print("\nserver-4 goes offline (deleted)...")
    top_cpu.delete("server-4")
    
    # CPU updates
    print("server-2 CPU drops to 35%...")
    top_cpu.update("server-2", 35.0, {"region": "us-west"})
    
    print(f"\nUpdated Top 3:")
    for sid, cpu, meta in top_cpu.get_top_k():
        print(f"  {sid}: {cpu}% ({meta['region']})")
 
demo_server_monitoring()

Memory Management Strategies

When operating under strict memory budgets, several strategic approaches help maintain efficiency:

Strategy 1: Periodic Compaction

For lazy deletion heaps, schedule periodic cleanup:

class CompactingTopK:
    GHOST_RATIO_THRESHOLD = 0.3  # Compact when 30%+ are ghosts
    
    def maybe_compact(self):
        ghost_ratio = len(self._deleted) / max(len(self._heap), 1)
        if ghost_ratio > self.GHOST_RATIO_THRESHOLD:
            self.compact()

Strategy 2: Approximate Top-K

When exact Top-K isn't required, use probabilistic structures:

Count-Min Sketch + Heap: Track approximate frequencies, maintain Top-K by estimated counts
Lossy Counting: Allow some elements to be evicted early, reducing memory
Space-Saving Algorithm: Approximate Top-K frequent elements in O(K) space

These trade accuracy for guaranteed space bounds.

Memory Strategies Comparison
Strategy	Space	Accuracy	Updates	Best For
Exact BPQ	O(K)	Perfect	O(log K)	Small K, accuracy critical
Indexed BPQ	O(2K)	Perfect	O(log K)	Frequent updates
Lazy Deletion	O(K+D)*	Perfect	O(1) delete	Rare deletes, fast updates
Approximate	O(K)	~90-99%	O(1)	Massive scale, approximate OK

*D = number of pending deletions; can grow unbounded without compaction

Strategy 3: Hierarchical Aggregation

For distributed systems, use a two-level approach:

Level 1: Each node maintains local Top-K₁
Level 2: Aggregator merges Top-K₁ from all nodes → Global Top-K₂

Requirements:
- K₁ ≥ K₂ (local must be at least as large as global)
- Higher K₁ improves accuracy but increases network traffic
- K₁ = K₂ × 2 is often a good balance

This bounds memory at each node while preserving global accuracy.

Memory Budget Calculation

When designing a system, work backwards from memory budget:

Available Memory = M bytes Per-element overhead = E bytes
Number of categories = C

Max K per category = M / (C × E)

If this K is too small, consider hierarchical aggregation or approximate algorithms.

Summary: Space-Efficient K Element Tracking

We've explored techniques for minimizing memory usage while maintaining Top-K tracking capabilities. Let's consolidate:

Key Takeaways

•O(K) isn't always enough — Per-element overhead and multi-category scaling can exhaust memory even with small K.
•Array-based heaps eliminate pointer overhead — Use contiguous arrays with implicit parent-child relationships for maximum density.
•TypedArrays reduce per-element cost — Use Int32Array or Float64Array instead of objects for 4-16x memory reduction.
•Bounded Priority Queue encapsulates the pattern — Automatic eviction with fixed capacity is the right abstraction.
•Indexed heaps enable O(log K) updates — Worth the 2x space cost when updates are frequent.
•Lazy deletion trades space for deletion speed — O(1) delete, but requires periodic compaction.
•Choose strategy based on workload — Insert-heavy? Use simple BPQ. Update-heavy? Use indexed. Delete-heavy? Use lazy.

What's Next:

The next page explores stream processing with heaps — how to maintain Top-K over infinite data streams where you can't store all elements, including sliding windows and time-decayed rankings.

Memory Efficiency Mastered

You now understand how to optimize memory usage for Top-K tracking in memory-constrained environments. These techniques are essential for building production systems that handle millions of elements across thousands of categories.