Loading learning content...
In 1964, J.W.J. Williams invented the heap data structure and HeapSort algorithm. Shortly after, Robert W. Floyd discovered the elegant bottom-up heap construction algorithm that achieves O(n) time complexity—a significant improvement over the O(n log n) naive approach.
Floyd's algorithm, often called "bottom-up heapify" or simply "heapify," is one of those rare algorithms where the improvement isn't just a constant factor—it's logarithmic. Going from O(n log n) to O(n) is like going from reading a book word-by-word to skimming each chapter summary. The counterintuitive insight is that by processing nodes in the reverse order (bottom-to-top instead of top-to-bottom), and moving values down instead of up, we achieve dramatically better performance.
In this page, we'll present the algorithm in complete detail: the intuition, the precise mechanics, a thorough correctness proof, and production-ready implementations. By the end, you'll not only understand the algorithm—you'll understand why it works and appreciate its elegance.
By the end of this page, you will: (1) Implement Floyd's bottom-up heapify algorithm; (2) Trace through complete examples step-by-step; (3) Prove its correctness using loop invariants; (4) Understand the heapify-down subroutine deeply; (5) See how the algorithm exploits tree structure for efficiency.
Recall from the previous page why the naive approach is slow: it makes the most numerous nodes (leaves) do the most work (bubble to root). The bottom-up approach inverts this entirely.
Key Observations:
Half the nodes are leaves — Leaves have no children, so they're already trivially valid heaps. We don't need to do any work on them.
Nodes near the bottom have short downward paths — A node at level k can bubble down at most (h - k) levels, where h is the tree height.
Few nodes are near the top — Only 1 node (the root) can bubble down h levels; only 2 nodes can bubble down h-1 levels; etc.
The Inversion:
Naive (bubble UP): Work = Σ 2^k × k (many nodes × long up-distance)
Optimal (bubble DOWN): Work = Σ 2^k × (h-k) (many nodes × short down-distance)
The second sum converges to O(n), not O(n log n). We'll prove this mathematically in the next page.
For n = 15 (height = 3):
As n grows, this gap widens dramatically. For n = 1 million:
The algorithm's brilliance is recognizing that the direction of repair matters. Both directions achieve correctness, but bubbling down from bottom-to-top distributes work optimally. Many nodes do little work (short descent), few nodes do much work (long descent).
Floyd's algorithm has two components: the main BUILD-MAX-HEAP procedure and the HEAPIFY-DOWN subroutine (also called MAX-HEAPIFY, sift-down, or bubble-down).
Algorithm: BUILD-MAX-HEAP
BUILD-MAX-HEAP(A):
n = length(A)
// Start from the last internal node and work backwards to the root
for i from ⌊n/2⌋ - 1 down to 0:
HEAPIFY-DOWN(A, n, i)
return A
Why Start at ⌊n/2⌋ - 1?
Indices from ⌊n/2⌋ to n-1 are leaf nodes. Leaves have no children, so they trivially satisfy the heap property. We only need to process internal nodes, which are at indices 0 to ⌊n/2⌋ - 1.
Algorithm: HEAPIFY-DOWN (Max-Heap)
HEAPIFY-DOWN(A, heap_size, index):
while true:
left = 2 * index + 1
right = 2 * index + 2
largest = index
// Find the largest among node and its children
if left < heap_size and A[left] > A[largest]:
largest = left
if right < heap_size and A[right] > A[largest]:
largest = right
// If largest is not the current node, swap and continue
if largest != index:
swap A[index] and A[largest]
index = largest
else:
break // Heap property satisfied at this subtree
HEAPIFY-DOWN Explained:
Given a node at index where both subtrees are valid heaps, HEAPIFY-DOWN positions the node's value correctly with respect to its children, then recursively fixes any violation created in the affected subtree.
Key Precondition:
Before calling HEAPIFY-DOWN(A, n, i), the subtrees rooted at left(i) and right(i) must already be valid max-heaps.
This precondition is why we process nodes bottom-to-top: when we process node i, its children at 2i+1 and 2i+2 have already been processed (or are leaves, which are trivially heaps).
Processing Order Example (n = 10):
0
/ \
1 2
/ \ / \
3 4 5 6
/ \ /
7 8 9
Last internal node = ⌊10/2⌋ - 1 = 4
Processing order: 4 → 3 → 2 → 1 → 0
- Node 4's children: 9 (leaf)
- Node 3's children: 7, 8 (leaves)
- Node 2's children: 5, 6 (leaves)
- Node 1's children: 3, 4 (already heapified)
- Node 0's children: 1, 2 (already heapified)
Let's trace through the algorithm on a concrete example, showing every step.
Input Array: A = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7] (n = 10)
Initial Tree Structure:
4
/ \
1 3
/ \ / \
2 16 9 10
/ \ /
14 8 7
Last internal node: ⌊10/2⌋ - 1 = 4
We process indices 4, 3, 2, 1, 0 in that order.
Final Array: [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
Final Tree Structure:
16
/ \
14 10
/ \ / \
8 7 9 3
/ \ /
2 4 1
Verification (Max-Heap Property):
16 ≥ 14, 16 ≥ 10 ✓
14 ≥ 8, 14 ≥ 7 ✓
10 ≥ 9, 10 ≥ 3 ✓
8 ≥ 2, 8 ≥ 4 ✓
7 ≥ 1 ✓
✓ Valid max-heap!
Work Count:
Compare this to the naive approach which required 9 swaps on the same input!
We prove correctness using loop invariant analysis for both the main algorithm and the HEAPIFY-DOWN subroutine.
Loop Invariant for BUILD-MAX-HEAP:
At the start of each iteration of the for loop with index i, each node j where j > i is the root of a valid max-heap.
Initialization (before first iteration, i = ⌊n/2⌋ - 1):
Nodes ⌊n/2⌋ to n-1 are all leaves. A leaf node is trivially a valid max-heap (no children to violate the property).
✓ Invariant holds initially.
Maintenance:
Assume the invariant holds at the start of iteration i. That is, all nodes j > i are roots of valid max-heaps.
During iteration i, we call HEAPIFY-DOWN(A, n, i). The children of node i are at indices 2i+1 and 2i+2. Since 2i+1 > i and 2i+2 > i, by the invariant, both children are roots of valid max-heaps.
HEAPIFY-DOWN has a precondition that both subtrees are valid heaps—this precondition is satisfied. After HEAPIFY-DOWN completes, node i is the root of a valid max-heap.
Now all nodes j ≥ i are roots of valid max-heaps.
After decrementing i, all nodes j > (new i) are roots of valid max-heaps.
✓ Invariant maintained.
Termination:
The loop terminates when i < 0. At this point, by the invariant (applied when i was 0), node 0 is the root of a valid max-heap. Since node 0 is the root of the entire tree, the entire array is a valid max-heap.
✓ Algorithm is correct.
Correctness of HEAPIFY-DOWN:
We now prove that HEAPIFY-DOWN correctly creates a valid max-heap rooted at index, given that both subtrees are valid max-heaps.
Loop Invariant for HEAPIFY-DOWN:
At the start of each iteration, the subtree rooted at
indexwould be a valid max-heap IF the value currently atindexwere larger than both its children.
Initialization:
Before the first iteration, both subtrees are valid max-heaps (by precondition). The only potential violation is between index and its children.
Maintenance:
If the value at index is already largest, we're done—the subtree is a valid heap (no violation).
If a child is larger, we swap. After the swap:
index to the child and continueTermination:
The loop terminates when either:
In either case, the subtree rooted at the original index is now a valid max-heap.
✓ HEAPIFY-DOWN is correct.
The correctness proof reveals why bottom-up processing is essential: we must process children before parents so that HEAPIFY-DOWN's precondition is satisfied. Processing top-down would violate this precondition—the subtrees wouldn't be heaps when we try to fix the parent.
Here are carefully crafted implementations in multiple languages, with attention to edge cases and clarity.
Python Implementation:
from typing import List, Callable, TypeVar
T = TypeVar('T')
def build_max_heap(arr: List[T]) -> List[T]:
"""
Build a max-heap from an array using Floyd's bottom-up algorithm.
Time: O(n), Space: O(1) auxiliary
Modifies the array in-place and returns it.
"""
n = len(arr)
# Start from last internal node, work up to root
# Last internal node is at index (n // 2) - 1
for i in range((n // 2) - 1, -1, -1):
_heapify_down(arr, n, i)
return arr
def _heapify_down(arr: List[T], heap_size: int, index: int) -> None:
"""
Restore max-heap property for subtree rooted at index.
Assumes subtrees of index are already valid max-heaps.
"""
while True:
left = 2 * index + 1
right = 2 * index + 2
largest = index
if left < heap_size and arr[left] > arr[largest]:
largest = left
if right < heap_size and arr[right] > arr[largest]:
largest = right
if largest != index:
arr[index], arr[largest] = arr[largest], arr[index]
index = largest
else:
break
def build_min_heap(arr: List[T]) -> List[T]:
"""Build a min-heap (same algorithm, opposite comparison)."""
n = len(arr)
for i in range((n // 2) - 1, -1, -1):
_heapify_down_min(arr, n, i)
return arr
def _heapify_down_min(arr: List[T], heap_size: int, index: int) -> None:
while True:
left = 2 * index + 1
right = 2 * index + 2
smallest = index
if left < heap_size and arr[left] < arr[smallest]:
smallest = left
if right < heap_size and arr[right] < arr[smallest]:
smallest = right
if smallest != index:
arr[index], arr[smallest] = arr[smallest], arr[index]
index = smallest
else:
break
TypeScript Implementation:
type Comparator<T> = (a: T, b: T) => number;
function buildHeap<T>(
arr: T[],
compare: Comparator<T> = (a, b) => (a > b ? 1 : a < b ? -1 : 0)
): T[] {
const n = arr.length;
// Start from last internal node
for (let i = Math.floor(n / 2) - 1; i >= 0; i--) {
heapifyDown(arr, n, i, compare);
}
return arr;
}
function heapifyDown<T>(
arr: T[],
heapSize: number,
index: number,
compare: Comparator<T>
): void {
while (true) {
const left = 2 * index + 1;
const right = 2 * index + 2;
let largest = index;
if (left < heapSize && compare(arr[left], arr[largest]) > 0) {
largest = left;
}
if (right < heapSize && compare(arr[right], arr[largest]) > 0) {
largest = right;
}
if (largest !== index) {
[arr[index], arr[largest]] = [arr[largest], arr[index]];
index = largest;
} else {
break;
}
}
}
// Usage
const arr = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7];
buildHeap(arr); // Max-heap
console.log(arr); // [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
const minArr = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7];
buildHeap(minArr, (a, b) => b - a); // Min-heap (inverted comparator)
Java Implementation:
public class HeapBuilder {
public static void buildMaxHeap(int[] arr) {
int n = arr.length;
// Start from last internal node
for (int i = n / 2 - 1; i >= 0; i--) {
heapifyDown(arr, n, i);
}
}
private static void heapifyDown(int[] arr, int heapSize, int index) {
while (true) {
int left = 2 * index + 1;
int right = 2 * index + 2;
int largest = index;
if (left < heapSize && arr[left] > arr[largest]) {
largest = left;
}
if (right < heapSize && arr[right] > arr[largest]) {
largest = right;
}
if (largest != index) {
int temp = arr[index];
arr[index] = arr[largest];
arr[largest] = temp;
index = largest;
} else {
break;
}
}
}
}
A production-quality implementation must handle edge cases correctly. Let's examine each.
Edge Case 1: Empty Array
arr = []
build_max_heap(arr)
# (n // 2) - 1 = -1, loop range(-1, -1, -1) is empty
# Result: [] (unchanged, correct)
✓ Works correctly—no iterations, array unchanged.
Edge Case 2: Single Element
arr = [42]
build_max_heap(arr)
# (1 // 2) - 1 = 0 - 1 = -1, loop range(-1, -1, -1) is empty
# Wait, that's wrong! Let's check...
# Actually: range((1//2) - 1, -1, -1) = range(0 - 1, -1, -1) = range(-1, -1, -1)
# Empty range, no iterations
# Result: [42] (unchanged, correct—single element is valid heap)
Hmm, wait. Let's verify more carefully:
Edge Case 3: Two Elements
arr = [1, 5]
build_max_heap(arr)
# n = 2
# Last internal node = (2 // 2) - 1 = 1 - 1 = 0
# Process node 0:
# left = 1, right = 2 (out of bounds)
# arr[1] = 5 > arr[0] = 1, so largest = 1
# Swap: [5, 1]
# Result: [5, 1] (valid max-heap)
✓ Works correctly.
Edge Case 4: Already a Valid Heap
arr = [10, 5, 8, 3, 4]
build_max_heap(arr)
# This is already a valid max-heap
# Algorithm will check each internal node but find no swaps needed
# Result: [10, 5, 8, 3, 4] (unchanged)
✓ Works correctly—no unnecessary modifications.
Edge Case 5: All Equal Elements
arr = [7, 7, 7, 7, 7]
build_max_heap(arr)
# Any arrangement of equal elements is a valid heap
# Algorithm will find largest == index at each step (no swaps)
# Result: [7, 7, 7, 7, 7] (unchanged)
✓ Works correctly.
Edge Case 6: Strictly Decreasing Array
arr = [5, 4, 3, 2, 1]
build_max_heap(arr)
# This is already a valid max-heap (parent always > children)
# No swaps needed
✓ Works correctly—best case for build-heap.
Edge Case 7: Strictly Increasing Array (Worst Case)
arr = [1, 2, 3, 4, 5]
# This is the opposite of a max-heap
# Each internal node will need to bubble down
# But still O(n) total work
✓ Works correctly—but exercises maximum work distribution.
Watch out for: (1) Using n/2 instead of (n/2) - 1 as the starting index (off-by-one); (2) Using 'greater than or equal' instead of 'greater than' in comparisons (affects behavior with duplicates but not correctness); (3) Forgetting to check if left/right indices are within heap_size bounds; (4) Recursive implementations that overflow stack for very large heaps.
HEAPIFY-DOWN can be implemented recursively or iteratively. Both are correct, but they have different trade-offs.
Recursive Implementation:
def heapify_down_recursive(arr, heap_size, index):
left = 2 * index + 1
right = 2 * index + 2
largest = index
if left < heap_size and arr[left] > arr[largest]:
largest = left
if right < heap_size and arr[right] > arr[largest]:
largest = right
if largest != index:
arr[index], arr[largest] = arr[largest], arr[index]
heapify_down_recursive(arr, heap_size, largest) # Recursive call
Iterative Implementation (from earlier):
def heapify_down_iterative(arr, heap_size, index):
while True:
left = 2 * index + 1
right = 2 * index + 2
largest = index
if left < heap_size and arr[left] > arr[largest]:
largest = left
if right < heap_size and arr[right] > arr[largest]:
largest = right
if largest != index:
arr[index], arr[largest] = arr[largest], arr[index]
index = largest
else:
break
| Aspect | Recursive | Iterative |
|---|---|---|
| Time Complexity | O(log n) | O(log n) |
| Space Complexity | O(log n) stack space | O(1) auxiliary |
| Code Clarity | Often cleaner | Slightly more verbose |
| Stack Overflow Risk | Yes, for n > ~10^6 | No |
| Tail-Call Optimization | Works if supported | N/A |
| Performance | Slightly slower (call overhead) | Slightly faster |
| Recommendation | Learning/small heaps | Production code |
Tail-Call Consideration:
The recursive version has the recursive call in tail position (last operation before returning). Languages with tail-call optimization (Scheme, some functional languages) can convert this to iteration automatically. However, most mainstream languages (Python, Java, JavaScript) don't guarantee tail-call optimization, so the iterative version is safer.
Recommendation:
Use the iterative version in production code. It:
The recursive version is elegant for teaching and for languages that guarantee TCO.
Most standard library heap implementations use the iterative approach. Python's heapq module, Java's PriorityQueue, and C++'s make_heap all use iterative heapify. This is the battle-tested approach for production systems.
Converting the algorithm from max-heap to min-heap requires only changing the comparison operators.
Max-Heap HEAPIFY-DOWN:
if arr[left] > arr[largest]: # Find LARGEST
largest = left
if arr[right] > arr[largest]:
largest = right
Min-Heap HEAPIFY-DOWN:
if arr[left] < arr[smallest]: # Find SMALLEST
smallest = left
if arr[right] < arr[smallest]:
smallest = right
Generic Implementation with Comparator:
The cleanest approach uses a comparator function:
def build_heap(arr, compare):
"""Build heap using custom comparator.
For max-heap: compare = lambda a, b: a > b
For min-heap: compare = lambda a, b: a < b
"""
n = len(arr)
for i in range((n // 2) - 1, -1, -1):
_heapify_down(arr, n, i, compare)
return arr
def _heapify_down(arr, heap_size, index, compare):
while True:
left = 2 * index + 1
right = 2 * index + 2
extreme = index # 'largest' for max-heap, 'smallest' for min-heap
if left < heap_size and compare(arr[left], arr[extreme]):
extreme = left
if right < heap_size and compare(arr[right], arr[extreme]):
extreme = right
if extreme != index:
arr[index], arr[extreme] = arr[extreme], arr[index]
index = extreme
else:
break
# Usage
max_heap = build_heap([4, 1, 3, 2, 16], lambda a, b: a > b)
min_heap = build_heap([4, 1, 3, 2, 16], lambda a, b: a < b)
Python's heapq module only provides min-heap operations. To use it as a max-heap, the common idiom is to negate values: insert -x, and when extracting, negate again to get x. Alternatively, wrap values in a class with inverted comparison. The generic comparator approach above is cleaner when you control the implementation.
We've thoroughly explored Floyd's bottom-up heapify algorithm. Let's consolidate the essential knowledge.
What's Next:
We claimed that bottom-up heapify is O(n), but we haven't proven it yet. The next page provides the rigorous mathematical analysis showing exactly why the sum Σ 2^k × (h-k) converges to O(n). This analysis involves a beautiful application of infinite series and reveals deep insights about tree structure.
Understanding why heapify is O(n)—not just that it is—distinguishes engineers who truly understand algorithms from those who merely memorize them.
You can now implement Floyd's bottom-up heapify algorithm, trace through its execution, and prove its correctness. You understand the core insight (invert work distribution) and the mechanics (bottom-up processing, heapify-down). Next, we'll complete the picture with the mathematical proof that this achieves O(n) complexity.