Data Structures & AlgorithmsHeaps & Priority Queues

Heapify — Building a Heap Efficiently

LevelIntermediate

Duration70 mins

TopicHeaps & Priority Queues

3 / 4

Optimal Approach — Bottom-Up Heapify, O(n)

The Elegant O(n) Solution

In 1964, J.W.J. Williams invented the heap data structure and HeapSort algorithm. Shortly after, Robert W. Floyd discovered the elegant bottom-up heap construction algorithm that achieves O(n) time complexity—a significant improvement over the O(n log n) naive approach.

Floyd's algorithm, often called "bottom-up heapify" or simply "heapify," is one of those rare algorithms where the improvement isn't just a constant factor—it's logarithmic. Going from O(n log n) to O(n) is like going from reading a book word-by-word to skimming each chapter summary. The counterintuitive insight is that by processing nodes in the reverse order (bottom-to-top instead of top-to-bottom), and moving values down instead of up, we achieve dramatically better performance.

In this page, we'll present the algorithm in complete detail: the intuition, the precise mechanics, a thorough correctness proof, and production-ready implementations. By the end, you'll not only understand the algorithm—you'll understand why it works and appreciate its elegance.

What You Will Learn

By the end of this page, you will: (1) Implement Floyd's bottom-up heapify algorithm; (2) Trace through complete examples step-by-step; (3) Prove its correctness using loop invariants; (4) Understand the heapify-down subroutine deeply; (5) See how the algorithm exploits tree structure for efficiency.

The Core Insight: Invert the Work Distribution

Recall from the previous page why the naive approach is slow: it makes the most numerous nodes (leaves) do the most work (bubble to root). The bottom-up approach inverts this entirely.

Key Observations:

Half the nodes are leaves — Leaves have no children, so they're already trivially valid heaps. We don't need to do any work on them.
Nodes near the bottom have short downward paths — A node at level k can bubble down at most (h - k) levels, where h is the tree height.
Few nodes are near the top — Only 1 node (the root) can bubble down h levels; only 2 nodes can bubble down h-1 levels; etc.

The Inversion:

Naive (bubble UP):   Work = Σ 2^k × k        (many nodes × long up-distance)
Optimal (bubble DOWN): Work = Σ 2^k × (h-k)  (many nodes × short down-distance)

The second sum converges to O(n), not O(n log n). We'll prove this mathematically in the next page.

Naive: Bubble Up

•Level 0 (1 node): 0 swaps each
•Level 1 (2 nodes): 1 swap each
•Level 2 (4 nodes): 2 swaps each
•Level 3 (8 nodes): 3 swaps each
•Total: 0 + 2 + 8 + 24 = 34

Optimal: Bubble Down

•Level 3 (8 nodes): SKIP (leaves)
•Level 2 (4 nodes): 1 swap each
•Level 1 (2 nodes): 2 swaps each
•Level 0 (1 node): 3 swaps each
•Total: 0 + 4 + 4 + 3 = 11

For n = 15 (height = 3):

Naive total work: 34 (theoretical maximum)
Optimal total work: 11 (theoretical maximum)
Improvement: 3× fewer operations

As n grows, this gap widens dramatically. For n = 1 million:

Naive: ~20 million operations
Optimal: ~2 million operations
Improvement: 10× fewer operations

The Direction Matters

The algorithm's brilliance is recognizing that the direction of repair matters. Both directions achieve correctness, but bubbling down from bottom-to-top distributes work optimally. Many nodes do little work (short descent), few nodes do much work (long descent).

The Bottom-Up Heapify Algorithm

Floyd's algorithm has two components: the main BUILD-MAX-HEAP procedure and the HEAPIFY-DOWN subroutine (also called MAX-HEAPIFY, sift-down, or bubble-down).

Algorithm: BUILD-MAX-HEAP

BUILD-MAX-HEAP(A):
    n = length(A)
    
    // Start from the last internal node and work backwards to the root
    for i from ⌊n/2⌋ - 1 down to 0:
        HEAPIFY-DOWN(A, n, i)
    
    return A

Why Start at ⌊n/2⌋ - 1?

Indices from ⌊n/2⌋ to n-1 are leaf nodes. Leaves have no children, so they trivially satisfy the heap property. We only need to process internal nodes, which are at indices 0 to ⌊n/2⌋ - 1.

Algorithm: HEAPIFY-DOWN (Max-Heap)

HEAPIFY-DOWN(A, heap_size, index):
    while true:
        left = 2 * index + 1
        right = 2 * index + 2
        largest = index
        
        // Find the largest among node and its children
        if left < heap_size and A[left] > A[largest]:
            largest = left
        if right < heap_size and A[right] > A[largest]:
            largest = right
        
        // If largest is not the current node, swap and continue
        if largest != index:
            swap A[index] and A[largest]
            index = largest
        else:
            break  // Heap property satisfied at this subtree

HEAPIFY-DOWN Explained:

Given a node at index where both subtrees are valid heaps, HEAPIFY-DOWN positions the node's value correctly with respect to its children, then recursively fixes any violation created in the affected subtree.

Key Precondition:

Before calling HEAPIFY-DOWN(A, n, i), the subtrees rooted at left(i) and right(i) must already be valid max-heaps.

This precondition is why we process nodes bottom-to-top: when we process node i, its children at 2i+1 and 2i+2 have already been processed (or are leaves, which are trivially heaps).

Processing Order Example (n = 10):

         0
       /   \
      1     2
     / \   / \
    3   4 5   6
   / \ /
  7  8 9

Last internal node = ⌊10/2⌋ - 1 = 4

Processing order: 4 → 3 → 2 → 1 → 0

- Node 4's children: 9 (leaf)
- Node 3's children: 7, 8 (leaves)
- Node 2's children: 5, 6 (leaves)
- Node 1's children: 3, 4 (already heapified)
- Node 0's children: 1, 2 (already heapified)

Complete Worked Example

Let's trace through the algorithm on a concrete example, showing every step.

Input Array: A = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7] (n = 10)

Initial Tree Structure:

           4
         /   \
        1     3
       / \   / \
      2  16 9  10
     / \ /
    14 8 7

Last internal node: ⌊10/2⌋ - 1 = 4

We process indices 4, 3, 2, 1, 0 in that order.

Step-by-Step Heapify Trace

•HEAPIFY-DOWN(A, 10, 4): Node 4 has value 16. Children: index 9 (value 7). 16 > 7, so 16 stays. No swap needed. Array unchanged: [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
•HEAPIFY-DOWN(A, 10, 3): Node 3 has value 2. Children: index 7 (value 14), index 8 (value 8). Largest = 14 at index 7. Swap 2 ↔ 14. Array: [4, 1, 3, 14, 16, 9, 10, 2, 8, 7]. Node 3 now has value 14. Check children of index 7 (now value 2): no children. Done.
•HEAPIFY-DOWN(A, 10, 2): Node 2 has value 3. Children: index 5 (value 9), index 6 (value 10). Largest = 10 at index 6. Swap 3 ↔ 10. Array: [4, 1, 10, 14, 16, 9, 3, 2, 8, 7]. Node 2 now has value 10. Check children of index 6 (now value 3): no children. Done.
•HEAPIFY-DOWN(A, 10, 1): Node 1 has value 1. Children: index 3 (value 14), index 4 (value 16). Largest = 16 at index 4. Swap 1 ↔ 16. Array: [4, 16, 10, 14, 1, 9, 3, 2, 8, 7]. Node 1 now has value 16. Continue from index 4 (now value 1). Children: index 9 (value 7). 7 > 1, swap 1 ↔ 7. Array: [4, 16, 10, 14, 7, 9, 3, 2, 8, 1]. Done.
•HEAPIFY-DOWN(A, 10, 0): Node 0 has value 4. Children: index 1 (value 16), index 2 (value 10). Largest = 16 at index 1. Swap 4 ↔ 16. Array: [16, 4, 10, 14, 7, 9, 3, 2, 8, 1]. Node 0 now has value 16. Continue from index 1 (now value 4). Children: index 3 (value 14), index 4 (value 7). Largest = 14 at index 3. Swap 4 ↔ 14. Array: [16, 14, 10, 4, 7, 9, 3, 2, 8, 1]. Continue from index 3 (now value 4). Children: index 7 (value 2), index 8 (value 8). Largest = 8 at index 8. Swap 4 ↔ 8. Array: [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]. Continue from index 8 (now value 4): no children. Done.

Final Array: [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

Final Tree Structure:

           16
         /    \
        14     10
       /  \   /  \
      8    7 9    3
     / \  /
    2  4 1

Verification (Max-Heap Property):

16 ≥ 14, 16 ≥ 10 ✓
14 ≥ 8, 14 ≥ 7 ✓
10 ≥ 9, 10 ≥ 3 ✓
8 ≥ 2, 8 ≥ 4 ✓
7 ≥ 1 ✓

✓ Valid max-heap!

Work Count:

Node 4: 0 swaps
Node 3: 1 swap
Node 2: 1 swap
Node 1: 2 swaps
Node 0: 3 swaps
Total: 7 swaps

Compare this to the naive approach which required 9 swaps on the same input!

Correctness Proof via Loop Invariant

We prove correctness using loop invariant analysis for both the main algorithm and the HEAPIFY-DOWN subroutine.

Loop Invariant for BUILD-MAX-HEAP:

At the start of each iteration of the for loop with index i, each node j where j > i is the root of a valid max-heap.

Initialization (before first iteration, i = ⌊n/2⌋ - 1):

Nodes ⌊n/2⌋ to n-1 are all leaves. A leaf node is trivially a valid max-heap (no children to violate the property).

✓ Invariant holds initially.

Maintenance:

Assume the invariant holds at the start of iteration i. That is, all nodes j > i are roots of valid max-heaps.

During iteration i, we call HEAPIFY-DOWN(A, n, i). The children of node i are at indices 2i+1 and 2i+2. Since 2i+1 > i and 2i+2 > i, by the invariant, both children are roots of valid max-heaps.

HEAPIFY-DOWN has a precondition that both subtrees are valid heaps—this precondition is satisfied. After HEAPIFY-DOWN completes, node i is the root of a valid max-heap.

Now all nodes j ≥ i are roots of valid max-heaps.

After decrementing i, all nodes j > (new i) are roots of valid max-heaps.

✓ Invariant maintained.

Termination:

The loop terminates when i < 0. At this point, by the invariant (applied when i was 0), node 0 is the root of a valid max-heap. Since node 0 is the root of the entire tree, the entire array is a valid max-heap.

✓ Algorithm is correct.

Correctness of HEAPIFY-DOWN:

We now prove that HEAPIFY-DOWN correctly creates a valid max-heap rooted at index, given that both subtrees are valid max-heaps.

Loop Invariant for HEAPIFY-DOWN:

At the start of each iteration, the subtree rooted at index would be a valid max-heap IF the value currently at index were larger than both its children.

Initialization:

Before the first iteration, both subtrees are valid max-heaps (by precondition). The only potential violation is between index and its children.

Maintenance:

If the value at index is already largest, we're done—the subtree is a valid heap (no violation).

If a child is larger, we swap. After the swap:

The original violation is fixed (larger value is now at parent)
A new potential violation exists at the child position (where we moved the smaller value)
The subtree rooted at the other child remains valid
We update index to the child and continue

Termination:

The loop terminates when either:

No child is larger than the current value (heap property satisfied)
We reach a leaf (no children to compare)

In either case, the subtree rooted at the original index is now a valid max-heap.

✓ HEAPIFY-DOWN is correct.

The Power of Bottom-Up Processing

The correctness proof reveals why bottom-up processing is essential: we must process children before parents so that HEAPIFY-DOWN's precondition is satisfied. Processing top-down would violate this precondition—the subtrees wouldn't be heaps when we try to fix the parent.

Production-Ready Implementations

Here are carefully crafted implementations in multiple languages, with attention to edge cases and clarity.

Python Implementation:

from typing import List, Callable, TypeVar

T = TypeVar('T')

def build_max_heap(arr: List[T]) -> List[T]:
    """
    Build a max-heap from an array using Floyd's bottom-up algorithm.
    Time: O(n), Space: O(1) auxiliary
    
    Modifies the array in-place and returns it.
    """
    n = len(arr)
    
    # Start from last internal node, work up to root
    # Last internal node is at index (n // 2) - 1
    for i in range((n // 2) - 1, -1, -1):
        _heapify_down(arr, n, i)
    
    return arr

def _heapify_down(arr: List[T], heap_size: int, index: int) -> None:
    """
    Restore max-heap property for subtree rooted at index.
    Assumes subtrees of index are already valid max-heaps.
    """
    while True:
        left = 2 * index + 1
        right = 2 * index + 2
        largest = index
        
        if left < heap_size and arr[left] > arr[largest]:
            largest = left
        if right < heap_size and arr[right] > arr[largest]:
            largest = right
        
        if largest != index:
            arr[index], arr[largest] = arr[largest], arr[index]
            index = largest
        else:
            break


def build_min_heap(arr: List[T]) -> List[T]:
    """Build a min-heap (same algorithm, opposite comparison)."""
    n = len(arr)
    
    for i in range((n // 2) - 1, -1, -1):
        _heapify_down_min(arr, n, i)
    
    return arr

def _heapify_down_min(arr: List[T], heap_size: int, index: int) -> None:
    while True:
        left = 2 * index + 1
        right = 2 * index + 2
        smallest = index
        
        if left < heap_size and arr[left] < arr[smallest]:
            smallest = left
        if right < heap_size and arr[right] < arr[smallest]:
            smallest = right
        
        if smallest != index:
            arr[index], arr[smallest] = arr[smallest], arr[index]
            index = smallest
        else:
            break

TypeScript Implementation:

type Comparator<T> = (a: T, b: T) => number;

function buildHeap<T>(
    arr: T[],
    compare: Comparator<T> = (a, b) => (a > b ? 1 : a < b ? -1 : 0)
): T[] {
    const n = arr.length;
    
    // Start from last internal node
    for (let i = Math.floor(n / 2) - 1; i >= 0; i--) {
        heapifyDown(arr, n, i, compare);
    }
    
    return arr;
}

function heapifyDown<T>(
    arr: T[],
    heapSize: number,
    index: number,
    compare: Comparator<T>
): void {
    while (true) {
        const left = 2 * index + 1;
        const right = 2 * index + 2;
        let largest = index;
        
        if (left < heapSize && compare(arr[left], arr[largest]) > 0) {
            largest = left;
        }
        if (right < heapSize && compare(arr[right], arr[largest]) > 0) {
            largest = right;
        }
        
        if (largest !== index) {
            [arr[index], arr[largest]] = [arr[largest], arr[index]];
            index = largest;
        } else {
            break;
        }
    }
}

// Usage
const arr = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7];
buildHeap(arr);  // Max-heap
console.log(arr);  // [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

const minArr = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7];
buildHeap(minArr, (a, b) => b - a);  // Min-heap (inverted comparator)

Java Implementation:

public class HeapBuilder {
    
    public static void buildMaxHeap(int[] arr) {
        int n = arr.length;
        
        // Start from last internal node
        for (int i = n / 2 - 1; i >= 0; i--) {
            heapifyDown(arr, n, i);
        }
    }
    
    private static void heapifyDown(int[] arr, int heapSize, int index) {
        while (true) {
            int left = 2 * index + 1;
            int right = 2 * index + 2;
            int largest = index;
            
            if (left < heapSize && arr[left] > arr[largest]) {
                largest = left;
            }
            if (right < heapSize && arr[right] > arr[largest]) {
                largest = right;
            }
            
            if (largest != index) {
                int temp = arr[index];
                arr[index] = arr[largest];
                arr[largest] = temp;
                index = largest;
            } else {
                break;
            }
        }
    }
}

Edge Cases and Robustness

A production-quality implementation must handle edge cases correctly. Let's examine each.

Edge Case 1: Empty Array

arr = []
build_max_heap(arr)
# (n // 2) - 1 = -1, loop range(-1, -1, -1) is empty
# Result: [] (unchanged, correct)

✓ Works correctly—no iterations, array unchanged.

Edge Case 2: Single Element

arr = [42]
build_max_heap(arr)
# (1 // 2) - 1 = 0 - 1 = -1, loop range(-1, -1, -1) is empty
# Wait, that's wrong! Let's check...
# Actually: range((1//2) - 1, -1, -1) = range(0 - 1, -1, -1) = range(-1, -1, -1)
# Empty range, no iterations
# Result: [42] (unchanged, correct—single element is valid heap)

Hmm, wait. Let's verify more carefully:

n = 1
Last internal node = (1 // 2) - 1 = 0 - 1 = -1
Since -1 < 0, the index 0 is NOT internal; element at index 0 is a leaf (the only element)
No internal nodes to process—correct!

Edge Case 3: Two Elements

arr = [1, 5]
build_max_heap(arr)
# n = 2
# Last internal node = (2 // 2) - 1 = 1 - 1 = 0
# Process node 0:
#   left = 1, right = 2 (out of bounds)
#   arr[1] = 5 > arr[0] = 1, so largest = 1
#   Swap: [5, 1]
# Result: [5, 1] (valid max-heap)

✓ Works correctly.

Edge Case 4: Already a Valid Heap

arr = [10, 5, 8, 3, 4]
build_max_heap(arr)
# This is already a valid max-heap
# Algorithm will check each internal node but find no swaps needed
# Result: [10, 5, 8, 3, 4] (unchanged)

✓ Works correctly—no unnecessary modifications.

Edge Case 5: All Equal Elements

arr = [7, 7, 7, 7, 7]
build_max_heap(arr)
# Any arrangement of equal elements is a valid heap
# Algorithm will find largest == index at each step (no swaps)
# Result: [7, 7, 7, 7, 7] (unchanged)

✓ Works correctly.

Edge Case 6: Strictly Decreasing Array

arr = [5, 4, 3, 2, 1]
build_max_heap(arr)
# This is already a valid max-heap (parent always > children)
# No swaps needed

✓ Works correctly—best case for build-heap.

Edge Case 7: Strictly Increasing Array (Worst Case)

arr = [1, 2, 3, 4, 5]
# This is the opposite of a max-heap
# Each internal node will need to bubble down
# But still O(n) total work

✓ Works correctly—but exercises maximum work distribution.

Common Implementation Bugs

Watch out for: (1) Using n/2 instead of (n/2) - 1 as the starting index (off-by-one); (2) Using 'greater than or equal' instead of 'greater than' in comparisons (affects behavior with duplicates but not correctness); (3) Forgetting to check if left/right indices are within heap_size bounds; (4) Recursive implementations that overflow stack for very large heaps.

Recursive vs. Iterative HEAPIFY-DOWN

HEAPIFY-DOWN can be implemented recursively or iteratively. Both are correct, but they have different trade-offs.

Recursive Implementation:

def heapify_down_recursive(arr, heap_size, index):
    left = 2 * index + 1
    right = 2 * index + 2
    largest = index
    
    if left < heap_size and arr[left] > arr[largest]:
        largest = left
    if right < heap_size and arr[right] > arr[largest]:
        largest = right
    
    if largest != index:
        arr[index], arr[largest] = arr[largest], arr[index]
        heapify_down_recursive(arr, heap_size, largest)  # Recursive call

Iterative Implementation (from earlier):

def heapify_down_iterative(arr, heap_size, index):
    while True:
        left = 2 * index + 1
        right = 2 * index + 2
        largest = index
        
        if left < heap_size and arr[left] > arr[largest]:
            largest = left
        if right < heap_size and arr[right] > arr[largest]:
            largest = right
        
        if largest != index:
            arr[index], arr[largest] = arr[largest], arr[index]
            index = largest
        else:
            break

Recursive vs. Iterative HEAPIFY-DOWN Comparison
Aspect	Recursive	Iterative
Time Complexity	O(log n)	O(log n)
Space Complexity	O(log n) stack space	O(1) auxiliary
Code Clarity	Often cleaner	Slightly more verbose
Stack Overflow Risk	Yes, for n > ~10^6	No
Tail-Call Optimization	Works if supported	N/A
Performance	Slightly slower (call overhead)	Slightly faster
Recommendation	Learning/small heaps	Production code

Tail-Call Consideration:

The recursive version has the recursive call in tail position (last operation before returning). Languages with tail-call optimization (Scheme, some functional languages) can convert this to iteration automatically. However, most mainstream languages (Python, Java, JavaScript) don't guarantee tail-call optimization, so the iterative version is safer.

Recommendation:

Use the iterative version in production code. It:

Has O(1) auxiliary space instead of O(log n)
Avoids stack overflow for very large heaps
Is typically slightly faster due to no function call overhead

The recursive version is elegant for teaching and for languages that guarantee TCO.

Standard Library Implementations

Most standard library heap implementations use the iterative approach. Python's heapq module, Java's PriorityQueue, and C++'s make_heap all use iterative heapify. This is the battle-tested approach for production systems.

Adapting for Min-Heap

Converting the algorithm from max-heap to min-heap requires only changing the comparison operators.

Max-Heap HEAPIFY-DOWN:

if arr[left] > arr[largest]:    # Find LARGEST
    largest = left
if arr[right] > arr[largest]:
    largest = right

Min-Heap HEAPIFY-DOWN:

if arr[left] < arr[smallest]:   # Find SMALLEST
    smallest = left
if arr[right] < arr[smallest]:
    smallest = right

Generic Implementation with Comparator:

The cleanest approach uses a comparator function:

def build_heap(arr, compare):
    """Build heap using custom comparator.
    
    For max-heap: compare = lambda a, b: a > b
    For min-heap: compare = lambda a, b: a < b
    """
    n = len(arr)
    
    for i in range((n // 2) - 1, -1, -1):
        _heapify_down(arr, n, i, compare)
    
    return arr

def _heapify_down(arr, heap_size, index, compare):
    while True:
        left = 2 * index + 1
        right = 2 * index + 2
        extreme = index  # 'largest' for max-heap, 'smallest' for min-heap
        
        if left < heap_size and compare(arr[left], arr[extreme]):
            extreme = left
        if right < heap_size and compare(arr[right], arr[extreme]):
            extreme = right
        
        if extreme != index:
            arr[index], arr[extreme] = arr[extreme], arr[index]
            index = extreme
        else:
            break

# Usage
max_heap = build_heap([4, 1, 3, 2, 16], lambda a, b: a > b)
min_heap = build_heap([4, 1, 3, 2, 16], lambda a, b: a < b)

Python's heapq Quirk

Python's heapq module only provides min-heap operations. To use it as a max-heap, the common idiom is to negate values: insert -x, and when extracting, negate again to get x. Alternatively, wrap values in a class with inverted comparison. The generic comparator approach above is cleaner when you control the implementation.

Summary: The Optimal Heapify Algorithm

We've thoroughly explored Floyd's bottom-up heapify algorithm. Let's consolidate the essential knowledge.

Key Takeaways

•Core Insight: Invert the work distribution—bubble DOWN instead of UP, process BOTTOM-TO-TOP instead of TOP-TO-BOTTOM.
•Algorithm: Process nodes from ⌊n/2⌋ - 1 down to 0, calling HEAPIFY-DOWN on each.
•HEAPIFY-DOWN: Swap node with its largest child if violated, repeat until heap property holds.
•Precondition: HEAPIFY-DOWN requires both subtrees to already be valid heaps. Bottom-up processing ensures this.
•Correctness: Proven via loop invariant—after processing node i, all nodes ≥ i are roots of valid heaps.
•Time Complexity: O(n)—we'll prove this rigorously in the next page.
•Space Complexity: O(1) auxiliary with iterative HEAPIFY-DOWN; O(log n) with recursive.
•Prefer Iterative: Use iterative HEAPIFY-DOWN in production for O(1) space and no stack overflow risk.

What's Next:

We claimed that bottom-up heapify is O(n), but we haven't proven it yet. The next page provides the rigorous mathematical analysis showing exactly why the sum Σ 2^k × (h-k) converges to O(n). This analysis involves a beautiful application of infinite series and reveals deep insights about tree structure.

Understanding why heapify is O(n)—not just that it is—distinguishes engineers who truly understand algorithms from those who merely memorize them.

Page Complete

You can now implement Floyd's bottom-up heapify algorithm, trace through its execution, and prove its correctness. You understand the core insight (invert work distribution) and the mechanics (bottom-up processing, heapify-down). Next, we'll complete the picture with the mathematical proof that this achieves O(n) complexity.

3 / 4

Loading learning content...

Data Structures & AlgorithmsHeaps & Priority Queues

Heapify — Building a Heap Efficiently

LevelIntermediate

Duration70 mins

TopicHeaps & Priority Queues

3 / 4

Optimal Approach — Bottom-Up Heapify, O(n)

The Elegant O(n) Solution

What You Will Learn

The Core Insight: Invert the Work Distribution

Recall from the previous page why the naive approach is slow: it makes the most numerous nodes (leaves) do the most work (bubble to root). The bottom-up approach inverts this entirely.

Key Observations:

Half the nodes are leaves — Leaves have no children, so they're already trivially valid heaps. We don't need to do any work on them.
Nodes near the bottom have short downward paths — A node at level k can bubble down at most (h - k) levels, where h is the tree height.
Few nodes are near the top — Only 1 node (the root) can bubble down h levels; only 2 nodes can bubble down h-1 levels; etc.

The Inversion:

Naive (bubble UP):   Work = Σ 2^k × k        (many nodes × long up-distance)
Optimal (bubble DOWN): Work = Σ 2^k × (h-k)  (many nodes × short down-distance)

The second sum converges to O(n), not O(n log n). We'll prove this mathematically in the next page.

Naive: Bubble Up

•Level 0 (1 node): 0 swaps each
•Level 1 (2 nodes): 1 swap each
•Level 2 (4 nodes): 2 swaps each
•Level 3 (8 nodes): 3 swaps each
•Total: 0 + 2 + 8 + 24 = 34

Optimal: Bubble Down

•Level 3 (8 nodes): SKIP (leaves)
•Level 2 (4 nodes): 1 swap each
•Level 1 (2 nodes): 2 swaps each
•Level 0 (1 node): 3 swaps each
•Total: 0 + 4 + 4 + 3 = 11

For n = 15 (height = 3):

Naive total work: 34 (theoretical maximum)
Optimal total work: 11 (theoretical maximum)
Improvement: 3× fewer operations

As n grows, this gap widens dramatically. For n = 1 million:

Naive: ~20 million operations
Optimal: ~2 million operations
Improvement: 10× fewer operations

The Direction Matters

The Bottom-Up Heapify Algorithm

Floyd's algorithm has two components: the main BUILD-MAX-HEAP procedure and the HEAPIFY-DOWN subroutine (also called MAX-HEAPIFY, sift-down, or bubble-down).

Algorithm: BUILD-MAX-HEAP

BUILD-MAX-HEAP(A):
    n = length(A)
    
    // Start from the last internal node and work backwards to the root
    for i from ⌊n/2⌋ - 1 down to 0:
        HEAPIFY-DOWN(A, n, i)
    
    return A

Why Start at ⌊n/2⌋ - 1?

Indices from ⌊n/2⌋ to n-1 are leaf nodes. Leaves have no children, so they trivially satisfy the heap property. We only need to process internal nodes, which are at indices 0 to ⌊n/2⌋ - 1.

Algorithm: HEAPIFY-DOWN (Max-Heap)

HEAPIFY-DOWN(A, heap_size, index):
    while true:
        left = 2 * index + 1
        right = 2 * index + 2
        largest = index
        
        // Find the largest among node and its children
        if left < heap_size and A[left] > A[largest]:
            largest = left
        if right < heap_size and A[right] > A[largest]:
            largest = right
        
        // If largest is not the current node, swap and continue
        if largest != index:
            swap A[index] and A[largest]
            index = largest
        else:
            break  // Heap property satisfied at this subtree

HEAPIFY-DOWN Explained:

Key Precondition:

Before calling HEAPIFY-DOWN(A, n, i), the subtrees rooted at left(i) and right(i) must already be valid max-heaps.

This precondition is why we process nodes bottom-to-top: when we process node i, its children at 2i+1 and 2i+2 have already been processed (or are leaves, which are trivially heaps).

Processing Order Example (n = 10):

         0
       /   \
      1     2
     / \   / \
    3   4 5   6
   / \ /
  7  8 9

Last internal node = ⌊10/2⌋ - 1 = 4

Processing order: 4 → 3 → 2 → 1 → 0

- Node 4's children: 9 (leaf)
- Node 3's children: 7, 8 (leaves)
- Node 2's children: 5, 6 (leaves)
- Node 1's children: 3, 4 (already heapified)
- Node 0's children: 1, 2 (already heapified)

Complete Worked Example

Let's trace through the algorithm on a concrete example, showing every step.

Input Array: A = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7] (n = 10)

Initial Tree Structure:

           4
         /   \
        1     3
       / \   / \
      2  16 9  10
     / \ /
    14 8 7

Last internal node: ⌊10/2⌋ - 1 = 4

We process indices 4, 3, 2, 1, 0 in that order.

Step-by-Step Heapify Trace

•HEAPIFY-DOWN(A, 10, 4): Node 4 has value 16. Children: index 9 (value 7). 16 > 7, so 16 stays. No swap needed. Array unchanged: [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
•HEAPIFY-DOWN(A, 10, 3): Node 3 has value 2. Children: index 7 (value 14), index 8 (value 8). Largest = 14 at index 7. Swap 2 ↔ 14. Array: [4, 1, 3, 14, 16, 9, 10, 2, 8, 7]. Node 3 now has value 14. Check children of index 7 (now value 2): no children. Done.
•HEAPIFY-DOWN(A, 10, 2): Node 2 has value 3. Children: index 5 (value 9), index 6 (value 10). Largest = 10 at index 6. Swap 3 ↔ 10. Array: [4, 1, 10, 14, 16, 9, 3, 2, 8, 7]. Node 2 now has value 10. Check children of index 6 (now value 3): no children. Done.
•HEAPIFY-DOWN(A, 10, 1): Node 1 has value 1. Children: index 3 (value 14), index 4 (value 16). Largest = 16 at index 4. Swap 1 ↔ 16. Array: [4, 16, 10, 14, 1, 9, 3, 2, 8, 7]. Node 1 now has value 16. Continue from index 4 (now value 1). Children: index 9 (value 7). 7 > 1, swap 1 ↔ 7. Array: [4, 16, 10, 14, 7, 9, 3, 2, 8, 1]. Done.
•HEAPIFY-DOWN(A, 10, 0): Node 0 has value 4. Children: index 1 (value 16), index 2 (value 10). Largest = 16 at index 1. Swap 4 ↔ 16. Array: [16, 4, 10, 14, 7, 9, 3, 2, 8, 1]. Node 0 now has value 16. Continue from index 1 (now value 4). Children: index 3 (value 14), index 4 (value 7). Largest = 14 at index 3. Swap 4 ↔ 14. Array: [16, 14, 10, 4, 7, 9, 3, 2, 8, 1]. Continue from index 3 (now value 4). Children: index 7 (value 2), index 8 (value 8). Largest = 8 at index 8. Swap 4 ↔ 8. Array: [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]. Continue from index 8 (now value 4): no children. Done.

Final Array: [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

Final Tree Structure:

           16
         /    \
        14     10
       /  \   /  \
      8    7 9    3
     / \  /
    2  4 1

Verification (Max-Heap Property):

16 ≥ 14, 16 ≥ 10 ✓
14 ≥ 8, 14 ≥ 7 ✓
10 ≥ 9, 10 ≥ 3 ✓
8 ≥ 2, 8 ≥ 4 ✓
7 ≥ 1 ✓

✓ Valid max-heap!

Work Count:

Node 4: 0 swaps
Node 3: 1 swap
Node 2: 1 swap
Node 1: 2 swaps
Node 0: 3 swaps
Total: 7 swaps

Compare this to the naive approach which required 9 swaps on the same input!

Correctness Proof via Loop Invariant

We prove correctness using loop invariant analysis for both the main algorithm and the HEAPIFY-DOWN subroutine.

Loop Invariant for BUILD-MAX-HEAP:

At the start of each iteration of the for loop with index i, each node j where j > i is the root of a valid max-heap.

Initialization (before first iteration, i = ⌊n/2⌋ - 1):

Nodes ⌊n/2⌋ to n-1 are all leaves. A leaf node is trivially a valid max-heap (no children to violate the property).

✓ Invariant holds initially.

Maintenance:

Assume the invariant holds at the start of iteration i. That is, all nodes j > i are roots of valid max-heaps.

During iteration i, we call HEAPIFY-DOWN(A, n, i). The children of node i are at indices 2i+1 and 2i+2. Since 2i+1 > i and 2i+2 > i, by the invariant, both children are roots of valid max-heaps.

HEAPIFY-DOWN has a precondition that both subtrees are valid heaps—this precondition is satisfied. After HEAPIFY-DOWN completes, node i is the root of a valid max-heap.

Now all nodes j ≥ i are roots of valid max-heaps.

After decrementing i, all nodes j > (new i) are roots of valid max-heaps.

✓ Invariant maintained.

Termination:

✓ Algorithm is correct.

Correctness of HEAPIFY-DOWN:

We now prove that HEAPIFY-DOWN correctly creates a valid max-heap rooted at index, given that both subtrees are valid max-heaps.

Loop Invariant for HEAPIFY-DOWN:

At the start of each iteration, the subtree rooted at index would be a valid max-heap IF the value currently at index were larger than both its children.

Initialization:

Before the first iteration, both subtrees are valid max-heaps (by precondition). The only potential violation is between index and its children.

Maintenance:

If the value at index is already largest, we're done—the subtree is a valid heap (no violation).

If a child is larger, we swap. After the swap:

The original violation is fixed (larger value is now at parent)
A new potential violation exists at the child position (where we moved the smaller value)
The subtree rooted at the other child remains valid
We update index to the child and continue

Termination:

The loop terminates when either:

No child is larger than the current value (heap property satisfied)
We reach a leaf (no children to compare)

In either case, the subtree rooted at the original index is now a valid max-heap.

✓ HEAPIFY-DOWN is correct.

The Power of Bottom-Up Processing

Production-Ready Implementations

Here are carefully crafted implementations in multiple languages, with attention to edge cases and clarity.

Python Implementation:

from typing import List, Callable, TypeVar

T = TypeVar('T')

def build_max_heap(arr: List[T]) -> List[T]:
    """
    Build a max-heap from an array using Floyd's bottom-up algorithm.
    Time: O(n), Space: O(1) auxiliary
    
    Modifies the array in-place and returns it.
    """
    n = len(arr)
    
    # Start from last internal node, work up to root
    # Last internal node is at index (n // 2) - 1
    for i in range((n // 2) - 1, -1, -1):
        _heapify_down(arr, n, i)
    
    return arr

def _heapify_down(arr: List[T], heap_size: int, index: int) -> None:
    """
    Restore max-heap property for subtree rooted at index.
    Assumes subtrees of index are already valid max-heaps.
    """
    while True:
        left = 2 * index + 1
        right = 2 * index + 2
        largest = index
        
        if left < heap_size and arr[left] > arr[largest]:
            largest = left
        if right < heap_size and arr[right] > arr[largest]:
            largest = right
        
        if largest != index:
            arr[index], arr[largest] = arr[largest], arr[index]
            index = largest
        else:
            break


def build_min_heap(arr: List[T]) -> List[T]:
    """Build a min-heap (same algorithm, opposite comparison)."""
    n = len(arr)
    
    for i in range((n // 2) - 1, -1, -1):
        _heapify_down_min(arr, n, i)
    
    return arr

def _heapify_down_min(arr: List[T], heap_size: int, index: int) -> None:
    while True:
        left = 2 * index + 1
        right = 2 * index + 2
        smallest = index
        
        if left < heap_size and arr[left] < arr[smallest]:
            smallest = left
        if right < heap_size and arr[right] < arr[smallest]:
            smallest = right
        
        if smallest != index:
            arr[index], arr[smallest] = arr[smallest], arr[index]
            index = smallest
        else:
            break

TypeScript Implementation:

type Comparator<T> = (a: T, b: T) => number;

function buildHeap<T>(
    arr: T[],
    compare: Comparator<T> = (a, b) => (a > b ? 1 : a < b ? -1 : 0)
): T[] {
    const n = arr.length;
    
    // Start from last internal node
    for (let i = Math.floor(n / 2) - 1; i >= 0; i--) {
        heapifyDown(arr, n, i, compare);
    }
    
    return arr;
}

function heapifyDown<T>(
    arr: T[],
    heapSize: number,
    index: number,
    compare: Comparator<T>
): void {
    while (true) {
        const left = 2 * index + 1;
        const right = 2 * index + 2;
        let largest = index;
        
        if (left < heapSize && compare(arr[left], arr[largest]) > 0) {
            largest = left;
        }
        if (right < heapSize && compare(arr[right], arr[largest]) > 0) {
            largest = right;
        }
        
        if (largest !== index) {
            [arr[index], arr[largest]] = [arr[largest], arr[index]];
            index = largest;
        } else {
            break;
        }
    }
}

// Usage
const arr = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7];
buildHeap(arr);  // Max-heap
console.log(arr);  // [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]

const minArr = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7];
buildHeap(minArr, (a, b) => b - a);  // Min-heap (inverted comparator)

Java Implementation:

public class HeapBuilder {
    
    public static void buildMaxHeap(int[] arr) {
        int n = arr.length;
        
        // Start from last internal node
        for (int i = n / 2 - 1; i >= 0; i--) {
            heapifyDown(arr, n, i);
        }
    }
    
    private static void heapifyDown(int[] arr, int heapSize, int index) {
        while (true) {
            int left = 2 * index + 1;
            int right = 2 * index + 2;
            int largest = index;
            
            if (left < heapSize && arr[left] > arr[largest]) {
                largest = left;
            }
            if (right < heapSize && arr[right] > arr[largest]) {
                largest = right;
            }
            
            if (largest != index) {
                int temp = arr[index];
                arr[index] = arr[largest];
                arr[largest] = temp;
                index = largest;
            } else {
                break;
            }
        }
    }
}

Edge Cases and Robustness

A production-quality implementation must handle edge cases correctly. Let's examine each.

Edge Case 1: Empty Array

arr = []
build_max_heap(arr)
# (n // 2) - 1 = -1, loop range(-1, -1, -1) is empty
# Result: [] (unchanged, correct)

✓ Works correctly—no iterations, array unchanged.

Edge Case 2: Single Element

arr = [42]
build_max_heap(arr)
# (1 // 2) - 1 = 0 - 1 = -1, loop range(-1, -1, -1) is empty
# Wait, that's wrong! Let's check...
# Actually: range((1//2) - 1, -1, -1) = range(0 - 1, -1, -1) = range(-1, -1, -1)
# Empty range, no iterations
# Result: [42] (unchanged, correct—single element is valid heap)

Hmm, wait. Let's verify more carefully:

n = 1
Last internal node = (1 // 2) - 1 = 0 - 1 = -1
Since -1 < 0, the index 0 is NOT internal; element at index 0 is a leaf (the only element)
No internal nodes to process—correct!

Edge Case 3: Two Elements

arr = [1, 5]
build_max_heap(arr)
# n = 2
# Last internal node = (2 // 2) - 1 = 1 - 1 = 0
# Process node 0:
#   left = 1, right = 2 (out of bounds)
#   arr[1] = 5 > arr[0] = 1, so largest = 1
#   Swap: [5, 1]
# Result: [5, 1] (valid max-heap)

✓ Works correctly.

Edge Case 4: Already a Valid Heap

arr = [10, 5, 8, 3, 4]
build_max_heap(arr)
# This is already a valid max-heap
# Algorithm will check each internal node but find no swaps needed
# Result: [10, 5, 8, 3, 4] (unchanged)

✓ Works correctly—no unnecessary modifications.

Edge Case 5: All Equal Elements

arr = [7, 7, 7, 7, 7]
build_max_heap(arr)
# Any arrangement of equal elements is a valid heap
# Algorithm will find largest == index at each step (no swaps)
# Result: [7, 7, 7, 7, 7] (unchanged)

✓ Works correctly.

Edge Case 6: Strictly Decreasing Array

arr = [5, 4, 3, 2, 1]
build_max_heap(arr)
# This is already a valid max-heap (parent always > children)
# No swaps needed

✓ Works correctly—best case for build-heap.

Edge Case 7: Strictly Increasing Array (Worst Case)

arr = [1, 2, 3, 4, 5]
# This is the opposite of a max-heap
# Each internal node will need to bubble down
# But still O(n) total work

✓ Works correctly—but exercises maximum work distribution.

Common Implementation Bugs

Recursive vs. Iterative HEAPIFY-DOWN

HEAPIFY-DOWN can be implemented recursively or iteratively. Both are correct, but they have different trade-offs.

Recursive Implementation:

def heapify_down_recursive(arr, heap_size, index):
    left = 2 * index + 1
    right = 2 * index + 2
    largest = index
    
    if left < heap_size and arr[left] > arr[largest]:
        largest = left
    if right < heap_size and arr[right] > arr[largest]:
        largest = right
    
    if largest != index:
        arr[index], arr[largest] = arr[largest], arr[index]
        heapify_down_recursive(arr, heap_size, largest)  # Recursive call

Iterative Implementation (from earlier):

def heapify_down_iterative(arr, heap_size, index):
    while True:
        left = 2 * index + 1
        right = 2 * index + 2
        largest = index
        
        if left < heap_size and arr[left] > arr[largest]:
            largest = left
        if right < heap_size and arr[right] > arr[largest]:
            largest = right
        
        if largest != index:
            arr[index], arr[largest] = arr[largest], arr[index]
            index = largest
        else:
            break

Recursive vs. Iterative HEAPIFY-DOWN Comparison
Aspect	Recursive	Iterative
Time Complexity	O(log n)	O(log n)
Space Complexity	O(log n) stack space	O(1) auxiliary
Code Clarity	Often cleaner	Slightly more verbose
Stack Overflow Risk	Yes, for n > ~10^6	No
Tail-Call Optimization	Works if supported	N/A
Performance	Slightly slower (call overhead)	Slightly faster
Recommendation	Learning/small heaps	Production code

Tail-Call Consideration:

Recommendation:

Use the iterative version in production code. It:

Has O(1) auxiliary space instead of O(log n)
Avoids stack overflow for very large heaps
Is typically slightly faster due to no function call overhead

The recursive version is elegant for teaching and for languages that guarantee TCO.

Standard Library Implementations

Adapting for Min-Heap

Converting the algorithm from max-heap to min-heap requires only changing the comparison operators.

Max-Heap HEAPIFY-DOWN:

if arr[left] > arr[largest]:    # Find LARGEST
    largest = left
if arr[right] > arr[largest]:
    largest = right

Min-Heap HEAPIFY-DOWN:

if arr[left] < arr[smallest]:   # Find SMALLEST
    smallest = left
if arr[right] < arr[smallest]:
    smallest = right

Generic Implementation with Comparator:

The cleanest approach uses a comparator function:

def build_heap(arr, compare):
    """Build heap using custom comparator.
    
    For max-heap: compare = lambda a, b: a > b
    For min-heap: compare = lambda a, b: a < b
    """
    n = len(arr)
    
    for i in range((n // 2) - 1, -1, -1):
        _heapify_down(arr, n, i, compare)
    
    return arr

def _heapify_down(arr, heap_size, index, compare):
    while True:
        left = 2 * index + 1
        right = 2 * index + 2
        extreme = index  # 'largest' for max-heap, 'smallest' for min-heap
        
        if left < heap_size and compare(arr[left], arr[extreme]):
            extreme = left
        if right < heap_size and compare(arr[right], arr[extreme]):
            extreme = right
        
        if extreme != index:
            arr[index], arr[extreme] = arr[extreme], arr[index]
            index = extreme
        else:
            break

# Usage
max_heap = build_heap([4, 1, 3, 2, 16], lambda a, b: a > b)
min_heap = build_heap([4, 1, 3, 2, 16], lambda a, b: a < b)

Python's heapq Quirk

Summary: The Optimal Heapify Algorithm

We've thoroughly explored Floyd's bottom-up heapify algorithm. Let's consolidate the essential knowledge.

Key Takeaways

•Core Insight: Invert the work distribution—bubble DOWN instead of UP, process BOTTOM-TO-TOP instead of TOP-TO-BOTTOM.
•Algorithm: Process nodes from ⌊n/2⌋ - 1 down to 0, calling HEAPIFY-DOWN on each.
•HEAPIFY-DOWN: Swap node with its largest child if violated, repeat until heap property holds.
•Precondition: HEAPIFY-DOWN requires both subtrees to already be valid heaps. Bottom-up processing ensures this.
•Correctness: Proven via loop invariant—after processing node i, all nodes ≥ i are roots of valid heaps.
•Time Complexity: O(n)—we'll prove this rigorously in the next page.
•Space Complexity: O(1) auxiliary with iterative HEAPIFY-DOWN; O(log n) with recursive.
•Prefer Iterative: Use iterative HEAPIFY-DOWN in production for O(1) space and no stack overflow risk.

What's Next:

Understanding why heapify is O(n)—not just that it is—distinguishes engineers who truly understand algorithms from those who merely memorize them.

Page Complete

3 / 4