Loading learning content...
Imagine a hospital emergency room where patients arrive continuously, each with a different severity level. Every time a new patient walks in, the system must instantly integrate them into the queue while maintaining the invariant that the most critical patient is always accessible for immediate treatment. The challenge isn't just recording the new patient—it's positioning them correctly among potentially thousands of others, and doing so in a fraction of a second.
This is the insert operation in a heap: the ability to add a new element while preserving the heap's structural and ordering properties. What makes heaps remarkable is that this seemingly complex reorganization happens in O(log n) time—regardless of heap size. A heap with 10 elements and a heap with 10 million elements both require the same bounded number of comparisons relative to their height.
In this page, we'll dissect the insertion algorithm completely: the intuition behind it, the precise mechanics, the proof of correctness, the complexity analysis, and the subtle implementation details that distinguish robust code from buggy code.
By the end of this page, you will deeply understand: (1) Why elements are always inserted at the end of the heap array; (2) How the bubble-up (heapify-up) process restores the heap property; (3) The invariant maintained at each step of the algorithm; (4) Why insertion is O(log n) in the worst case; (5) Common implementation pitfalls and how to avoid them.
Before we examine the heap's insertion strategy, let's understand why insertion is non-trivial by considering what doesn't work.
Attempted Strategy 1: Insert at Root
Suppose we tried inserting a new element at the root (index 0). This would immediately violate the complete binary tree property. The root already has children, and we'd need to "push" the entire tree down, displacing every element. This is an O(n) operation—completely unacceptable for a data structure designed for efficiency.
Attempted Strategy 2: Insert at Random Position
What if we found a random empty spot? But heaps don't have random empty spots. The array is densely packed because heaps are complete binary trees—every index from 0 to n-1 is occupied. We can't just place an element anywhere.
Attempted Strategy 3: Find Correct Position First
We might try scanning the heap to find where the new element "should" go based on its value. But this would require O(n) comparisons in the worst case, and after insertion, we'd still need to shift elements. This is no better than a sorted array.
The heap's solution is elegant: insert at the end, then fix upward. By adding the element at the first available position (maintaining the complete tree structure) and then "bubbling" it up to its correct location, we achieve O(log n) insertion with minimal disruption.
The heap separates two concerns: (1) Maintaining the complete binary tree shape (solved by always inserting at the end); (2) Maintaining the heap ordering property (solved by bubbling up). This separation is what makes heaps efficient—shape maintenance is O(1), and order restoration is O(log n).
The bubble-up algorithm (also called heapify-up, sift-up, up-heap, or percolate-up) is the core mechanism for restoring the heap property after insertion. Let's define it precisely.
Algorithm: Bubble-Up for Max-Heap
BUBBLE-UP(heap, index):
while index > 0:
parent_index = (index - 1) // 2
if heap[index] > heap[parent_index]:
swap heap[index] and heap[parent_index]
index = parent_index
else:
break // Heap property satisfied
Step-by-Step Mechanics:
(index - 1) // 2| Relationship | Formula | Example (index = 5) |
|---|---|---|
| Parent of index i | (i - 1) // 2 | (5 - 1) // 2 = 2 |
| Left child of index i | 2 * i + 1 | 2 * 2 + 1 = 5 |
| Right child of index i | 2 * i + 2 | 2 * 2 + 2 = 6 |
Why These Formulas Work:
The formulas derive from the level-order indexing of a complete binary tree:
The parent formula (i - 1) // 2 works because within each level, nodes are paired with their parent from the previous level. The integer division naturally handles both left and right children: left child 2i + 1 and right child 2i + 2 both map back to parent i when the formula is applied.
Let's trace through a complete insertion example with a max-heap. We'll start with a valid max-heap and insert a new element that needs to bubble all the way to the root.
Initial Max-Heap (array representation):
Index: 0 1 2 3 4 5 6
Value: [90, 70, 80, 30, 40, 50, 60]
Tree visualization:
90
/
70 80
/ \ /
30 40 50 60
Operation: Insert value 95
Since 95 is larger than the current root (90), it should become the new root. Let's watch the algorithm discover this.
[90, 70, 80, 30, 40, 50, 60, 95]
Tree: 95 is now the left child of node 30 (at index 3)[90, 70, 80, 95, 40, 50, 60, 30]
Now 95 is at index 3.[90, 95, 80, 70, 40, 50, 60, 30]
Now 95 is at index 1.[95, 90, 80, 70, 40, 50, 60, 30]
Now 95 is at index 0 (the root).Final Max-Heap (array representation):
Index: 0 1 2 3 4 5 6 7
Value: [95, 90, 80, 70, 40, 50, 60, 30]
Final tree visualization:
95
/
90 80
/ \ /
70 40 50 60
/
30
Observations:
To rigorously prove that bubble-up correctly restores the heap property, we use loop invariant analysis. This is the gold standard for algorithm correctness proofs.
Definition: Max-Heap Property
A max-heap satisfies: for every node i (except the root), heap[parent(i)] ≥ heap[i].
Loop Invariant for Bubble-Up:
At the start of each iteration of the while loop, the array
heap[0..n-1]satisfies the max-heap property everywhere EXCEPT possibly between nodeindexand its parentparent(index). All other parent-child relationships satisfy the heap property.
Initialization (before first iteration):
After inserting the new element at position n-1 (the end), the only possible violation is between this new node and its parent. All other relationships were valid before insertion and remain valid because we only added a leaf—no existing parent-child relationships were affected.
✓ Invariant holds initially.
Maintenance (each iteration preserves the invariant):
Suppose the invariant holds at the start of an iteration. Let i = index be the current position of the element being bubbled, and p = parent(i).
Case 1: heap[i] ≤ heap[p] (no violation)
The heap property is satisfied at (p, i). Combined with the invariant (all other pairs are valid), the entire heap is valid. The loop terminates. ✓
Case 2: heap[i] > heap[p] (violation exists)
We swap heap[i] and heap[p]. After the swap:
The pair (p, i) is now valid — The larger value is at p (the parent), and the smaller value is at i (the child).
The pair (grandparent, p) might be invalid — The value now at p came from i. If this value is larger than the grandparent, we have a new violation.
The pair (p, sibling of i) is valid — The value at p increased (we swapped in a larger value). Since new_heap[p] ≥ old_heap[p] ≥ heap[sibling(i)] (by the old heap property), the sibling relationship holds.
The subtree rooted at i is valid — The value now at i was previously at p. This value was larger than all descendants before the swap. The children of i (which were children of i before, not p) still satisfy: they were less than old_heap[i] ≤ old_heap[p] = new_heap[i]. ✓
We update index = p. The only possible violation is now between the new position and its parent.
✓ Invariant maintained.
Termination:
The loop terminates when either:
index = 0 (we've reached the root, no parent to violate)heap[index] ≤ heap[parent(index)] (no violation with parent)In both cases, since the invariant held and there's no remaining violation, the entire heap satisfies the max-heap property.
✓ Algorithm is correct.
Loop invariant proofs aren't just academic exercises. They force you to understand exactly what the algorithm maintains at each step. When you debug heap code that isn't working, the invariant tells you what to check: is there a unique violation point? Are all other relationships intact? This structured thinking prevents hours of confused debugging.
The time complexity of heap insertion is O(log n) in the worst case. Let's analyze this thoroughly.
Step 1: Adding to the End — O(1)
Appending an element to the end of a dynamic array is amortized O(1). If the array needs resizing, the amortized cost is still O(1) due to the geometric growth strategy (typically doubling capacity when full).
Step 2: Bubble-Up — O(log n)
The key question: how many times can the while loop execute?
Each iteration moves the element from one level of the tree to its parent level. The maximum number of levels in a complete binary tree with n elements is ⌊log₂(n)⌋ + 1, which means the maximum number of ancestor nodes (and thus potential swaps) from any leaf is ⌊log₂(n)⌋.
Formal Height Analysis:
For a complete binary tree with n nodes:
Taking logarithms: h ≤ log₂(n) < h + 1, confirming h = ⌊log₂(n)⌋.
| Heap Size (n) | Height (⌊log₂n⌋) | Max Swaps | Max Comparisons |
|---|---|---|---|
| 7 | 2 | 2 | 3 |
| 15 | 3 | 3 | 4 |
| 1,023 | 9 | 9 | 10 |
| 1,048,575 (~1M) | 19 | 19 | 20 |
| 1,073,741,823 (~1B) | 29 | 29 | 30 |
Work Per Iteration:
Each iteration of bubble-up performs:
Total work per iteration: O(1)
Combined Complexity:
Total time = O(1) for adding + O(log n) × O(1) for bubble-up = O(log n)
Worst-Case vs. Average-Case:
The worst case occurs when the inserted element must bubble all the way to the root (e.g., inserting the new maximum in a max-heap). However, the average case can be significantly better.
If elements are inserted in random order, the expected number of swaps is approximately O(1) to O(log log n), because a random element is more likely to stop early in the bubble-up process. However, we typically state the worst-case O(log n) because adversarial inputs can always trigger it.
Heap insertion is an in-place operation. Beyond the O(n) space for the heap array itself, insertion requires only O(1) auxiliary space: a few index variables and a temporary for swapping. No additional data structures are needed.
Let's examine production-quality implementations of heap insertion in multiple languages, highlighting important details and common pitfalls.
Python Implementation:
class MaxHeap:
def __init__(self):
self.heap = []
def insert(self, value):
"""Insert a value into the max-heap in O(log n) time."""
# Step 1: Add to end (O(1) amortized)
self.heap.append(value)
# Step 2: Bubble up to restore heap property
self._bubble_up(len(self.heap) - 1)
def _bubble_up(self, index):
"""Restore heap property by bubbling element up."""
while index > 0:
parent_index = (index - 1) // 2
if self.heap[index] > self.heap[parent_index]:
# Swap child and parent
self.heap[index], self.heap[parent_index] = (
self.heap[parent_index], self.heap[index]
)
index = parent_index
else:
break # Heap property satisfied
JavaScript Implementation:
class MaxHeap {
constructor() {
this.heap = [];
}
insert(value) {
this.heap.push(value); // Add to end
this._bubbleUp(this.heap.length - 1);
}
_bubbleUp(index) {
while (index > 0) {
const parentIndex = Math.floor((index - 1) / 2);
if (this.heap[index] > this.heap[parentIndex]) {
// Swap
[this.heap[index], this.heap[parentIndex]] =
[this.heap[parentIndex], this.heap[index]];
index = parentIndex;
} else {
break;
}
}
}
}
Java Implementation (with generics):
import java.util.ArrayList;
import java.util.List;
public class MaxHeap<T extends Comparable<T>> {
private List<T> heap;
public MaxHeap() {
this.heap = new ArrayList<>();
}
public void insert(T value) {
heap.add(value); // Add to end
bubbleUp(heap.size() - 1);
}
private void bubbleUp(int index) {
while (index > 0) {
int parentIndex = (index - 1) / 2;
if (heap.get(index).compareTo(heap.get(parentIndex)) > 0) {
// Swap
T temp = heap.get(index);
heap.set(index, heap.get(parentIndex));
heap.set(parentIndex, temp);
index = parentIndex;
} else {
break;
}
}
}
}
TypeScript Implementation (with full type safety):
class MaxHeap<T> {
private heap: T[] = [];
private compare: (a: T, b: T) => number;
constructor(compareFn?: (a: T, b: T) => number) {
// Default comparison for numbers/strings
this.compare = compareFn ?? ((a, b) => (a > b ? 1 : a < b ? -1 : 0));
}
insert(value: T): void {
this.heap.push(value);
this.bubbleUp(this.heap.length - 1);
}
private bubbleUp(index: number): void {
while (index > 0) {
const parentIndex = Math.floor((index - 1) / 2);
if (this.compare(this.heap[index], this.heap[parentIndex]) > 0) {
[this.heap[index], this.heap[parentIndex]] =
[this.heap[parentIndex], this.heap[index]];
index = parentIndex;
} else {
break;
}
}
}
}
Heap insertion is conceptually simple, but implementation bugs are common. Let's examine the most frequent mistakes and how to avoid them.
index / 2 instead of (index - 1) / 2. This places the parent incorrectly for all non-root indices. Remember: with 0-based indexing, the formula is (i - 1) // 2.child > parent means swap. For a min-heap: child < parent means swap. Mixing these up creates an invalid heap that passes basic tests but fails edge cases.break (or while condition update), the loop becomes infinite when the element reaches its correct position mid-tree./ operator returns floats. Use Math.floor((index - 1) / 2) or bitwise (index - 1) >> 1 for correct integer division. Failing to do so causes index errors.index = parent_index is essential. Without it, the algorithm checks the same position repeatedly instead of progressing up the tree.Basic tests like 'insert one element' or 'insert in sorted order' often pass even with buggy code. Robust testing requires: (1) Insert elements that must bubble to root; (2) Insert elements that stay at leaf; (3) Insert elements that stop mid-tree; (4) Insert sequences that create all tree shapes (left-heavy, right-heavy, balanced); (5) Verify heap property after every operation.
The only difference between max-heap and min-heap insertion is the comparison direction. Everything else—the structure, the algorithm, the complexity—remains identical.
Max-Heap Comparison:
if self.heap[index] > self.heap[parent_index]: # Child larger → swap
swap()
Min-Heap Comparison:
if self.heap[index] < self.heap[parent_index]: # Child smaller → swap
swap()
Generic Approach Using Comparators:
The cleanest implementation uses a comparator function, allowing a single codebase for both heap types:
class Heap:
def __init__(self, comparator=lambda a, b: a > b): # Default: max-heap
self.heap = []
self.compare = comparator # Returns True if a should be higher than b
def _should_swap(self, child_idx, parent_idx):
return self.compare(self.heap[child_idx], self.heap[parent_idx])
# ... rest of implementation uses self._should_swap() ...
# Usage:
max_heap = Heap() # Default max-heap
min_heap = Heap(comparator=lambda a, b: a < b) # Min-heap
This pattern is how Python's heapq module works (though it only provides min-heap; for max-heap, you negate values) and how Java's PriorityQueue uses Comparator objects.
While the basic bubble-up algorithm is efficient, real-world implementations often incorporate additional optimizations and considerations.
Optimization 1: Shift Instead of Swap
Instead of swapping at each level, we can "bubble" the value by shifting parents down and only placing the new value once:
def _bubble_up_optimized(self, index):
value = self.heap[index] # Save the new value
while index > 0:
parent_index = (index - 1) // 2
if value > self.heap[parent_index]: # max-heap
self.heap[index] = self.heap[parent_index] # Shift parent down
index = parent_index
else:
break
self.heap[index] = value # Place value in final position
This reduces memory writes from 3 per swap to 1 per level, plus 1 final write. For large objects, this can significantly reduce overhead.
Optimization 2: Early Termination Hints
If you know the distribution of inserted values, you might optimize:
Consideration 3: Memory Allocation Strategy
For high-frequency insertions, pre-allocating array capacity avoids repeated resizing:
class HeapWithCapacity:
def __init__(self, initial_capacity=16):
self.heap = [None] * initial_capacity
self.size = 0
def insert(self, value):
if self.size >= len(self.heap):
self._resize(len(self.heap) * 2) # Double capacity
self.heap[self.size] = value
self.size += 1
self._bubble_up(self.size - 1)
Consideration 4: Thread Safety
For concurrent priority queues, insertion must be atomic or properly synchronized:
PriorityBlockingQueue handles this internallyConsideration 5: Duplicate Handling
Heaps naturally support duplicates—the comparison determines relative position, and equal elements can exist anywhere relative to each other. No special handling needed.
In production code, prefer battle-tested library implementations: Python's heapq, Java's PriorityQueue, C++'s std::priority_queue. Write custom heaps only when: (1) You need operations not provided (like decreaseKey); (2) You're learning; (3) You have specific performance requirements the library doesn't meet.
We've thoroughly examined heap insertion—from intuition through implementation to optimization. Let's consolidate the key takeaways:
What's Next:
Now that we understand how to add elements to a heap, we need to understand how to remove them. The next page covers the extract operation—removing the root element (the maximum in a max-heap or minimum in a min-heap) and restoring the heap property using the bubble-down (heapify-down) algorithm. This is the complementary operation that makes heaps useful as priority queues.
You now have a complete understanding of heap insertion: the algorithm, its correctness, its complexity, and its implementation. You can implement insert for both max-heaps and min-heaps, trace through examples, and avoid common pitfalls. Next, we'll tackle the equally important extract operation.