Loading content...
When faced with the problem of building a heap from an unsorted array, the most natural solution draws on what we already know: heap insertion. We've mastered inserting a single element into a heap in O(log n) time, so why not simply start with an empty heap and insert each element?
This approach is intuitive, easy to implement, and correct—it absolutely produces a valid heap. However, it's not optimal. Understanding precisely why it's suboptimal is crucial, because this understanding illuminates why the optimal bottom-up algorithm works and why it's faster.
In this page, we'll dissect the naive approach completely: its algorithm, its correctness, its exact complexity, and the distribution of work that makes it inefficient. By the end, you'll have a crystal-clear understanding of the O(n log n) baseline that the optimal algorithm improves upon.
By the end of this page, you will: (1) Implement the naive heap construction algorithm; (2) Prove its correctness using loop invariants; (3) Derive its O(n log n) time complexity precisely; (4) Understand the distribution of work across tree levels; (5) See why this work distribution is inherently inefficient.
The naive heap construction algorithm mimics what you would do if you were building a heap incrementally: start empty, add one element at a time.
Algorithm: BUILD-HEAP-NAIVE (for max-heap)
BUILD-HEAP-NAIVE(A):
n = length(A)
// Conceptually, A[0] alone is already a valid heap of size 1
// We "insert" elements A[1], A[2], ..., A[n-1] into this growing heap
for i from 1 to n - 1:
BUBBLE-UP(A, i)
return A // A is now a valid heap
BUBBLE-UP(A, index):
while index > 0:
parent = (index - 1) / 2 // integer division
if A[index] > A[parent]:
swap A[index] and A[parent]
index = parent
else:
break
Key Observation:
This algorithm works in-place. We don't create a new array; we reinterpret the input array as a growing heap:
i, the subarray A[0..i] is a valid max-heapA[0..0] (just the first element) is trivially a valid heapIn-Place Visualization:
Original Array: [3, 9, 2, 1, 4, 5]
Step 0: Heap is A[0..0] = [3]
✓ A single element is a valid heap
Step 1: Insert A[1]=9 into heap A[0..0]
Compare 9 with parent 3: 9 > 3, swap
Heap is now [9, 3]
Array: [9, 3, 2, 1, 4, 5]
Step 2: Insert A[2]=2 into heap A[0..1]
Compare 2 with parent 9: 2 < 9, done
Heap is now [9, 3, 2]
Array: [9, 3, 2, 1, 4, 5]
Step 3: Insert A[3]=1 into heap A[0..2]
Compare 1 with parent 3: 1 < 3, done
Heap is now [9, 3, 2, 1]
Array: [9, 3, 2, 1, 4, 5]
Step 4: Insert A[4]=4 into heap A[0..3]
Compare 4 with parent 3: 4 > 3, swap
Heap is now [9, 4, 2, 1, 3]
Compare 4 with parent 9: 4 < 9, done
Array: [9, 4, 2, 1, 3, 5]
Step 5: Insert A[5]=5 into heap A[0..4]
Compare 5 with parent 2: 5 > 2, swap
Heap is now [9, 4, 5, 1, 3, 2]
Compare 5 with parent 9: 5 < 9, done
Array: [9, 4, 5, 1, 3, 2]
Final Heap: [9, 4, 5, 1, 3, 2] ✓
To rigorously prove that the naive algorithm produces a valid heap, we employ loop invariant analysis—the gold standard for algorithm correctness proofs.
Loop Invariant:
At the start of each iteration of the for loop,
A[0..i-1]is a valid max-heap containing the firstielements of the original array (in some order).
Initialization (before first iteration, i = 1):
A[0..0] contains a single element. A single-element array trivially satisfies the max-heap property—there are no parent-child pairs to violate it.
✓ Invariant holds initially.
Maintenance (if invariant holds at start of iteration i, it holds at start of iteration i+1):
Assume A[0..i-1] is a valid max-heap before iteration i.
During iteration i, we call BUBBLE-UP(A, i). This operation:
A[i] (which is outside the current heap)A[i] reaches a position where it doesn't violate the heap propertyWe proved in the previous module that BUBBLE-UP correctly restores the heap property. After BUBBLE-UP completes, A[0..i] is a valid max-heap.
✓ Invariant maintained.
Termination (after loop completes, i = n):
When the loop terminates, i = n, so by the invariant, A[0..n-1] is a valid max-heap. This is the entire array.
✓ Algorithm is correct.
This proof demonstrates how we build correctness incrementally. Each iteration extends the heap by one element while preserving the heap property. The invariant precisely captures what we've achieved after each step, making the proof almost mechanical.
Now we establish that the naive algorithm is O(n log n). This analysis is critical because it reveals the exact nature of the inefficiency.
Per-Insertion Cost:
When we insert the element at index i (for i ≥ 1):
i elements (indices 0 to i-1)⌊log₂(i)⌋⌊log₂(i)⌋ swapsTotal Cost:
The total work is the sum of worst-case costs for each insertion:
T(n) = Σ (i=1 to n-1) ⌊log₂(i)⌋
Let's compute this sum.
Lower Bound:
T(n) ≥ Σ (i=1 to n-1) (log₂(i) - 1)
= Σ (i=1 to n-1) log₂(i) - (n-1)
= log₂((n-1)!) - (n-1)
Upper Bound:
T(n) ≤ Σ (i=1 to n-1) log₂(i)
= log₂(1) + log₂(2) + ... + log₂(n-1)
= log₂(1 × 2 × 3 × ... × (n-1))
= log₂((n-1)!)
Using Stirling's Approximation:
Stirling's approximation states that for large n:
log₂(n!) ≈ n log₂(n) - n log₂(e) + O(log n)
= n log₂(n) - 1.443n + O(log n)
Applying this:
log₂((n-1)!) ≈ (n-1) log₂(n-1) - 1.443(n-1)
= Θ(n log n)
Therefore:
T(n) = Θ(n log n)
Alternative Direct Analysis:
We can also analyze by counting nodes at each level and their potential work:
Level 0: 1 node, can bubble up 0 levels
Level 1: 2 nodes, can bubble up 1 level each → 2 × 1 = 2
Level 2: 4 nodes, can bubble up 2 levels each → 4 × 2 = 8
Level 3: 8 nodes, can bubble up 3 levels each → 8 × 3 = 24
...
Level k: 2^k nodes, can bubble up k levels each → k × 2^k
If the tree has height h = log₂(n), total worst-case work:
T(n) = Σ (k=0 to h) k × 2^k
This sum evaluates to:
Σ (k=0 to h) k × 2^k = 2 + (h-1) × 2^(h+1) ≈ h × 2^h = log(n) × n
Confirming T(n) = O(n log n).
The sum Σ(k=0 to h) k × 2^k can be derived using the formula for geometric series and differentiation. The result is 2 + (h-1) × 2^(h+1). This is a useful formula to remember for analyzing tree-based algorithms.
The key to understanding why the naive approach is inefficient lies in where the work is concentrated. Let's visualize this.
For a heap of n = 15 elements (height 3):
Level 0 (root): 1 node × max 0 swaps = 0 work
Level 1: 2 nodes × max 1 swap = 2 work
Level 2: 4 nodes × max 2 swaps = 8 work
Level 3 (leaves): 8 nodes × max 3 swaps = 24 work
-------
Total worst-case work: 34 operations
Observation: Level 3 (the leaves) contributes 24 out of 34 operations—over 70% of the work!
This pattern holds generally. The deepest level contains approximately half of all nodes, and each of those nodes can bubble all the way to the root.
| Level | Nodes at Level | Max Bubble-Up Distance | Max Work at Level | Percentage of Total |
|---|---|---|---|---|
| 0 (root) | 1 | 0 | 0 | 0% |
| 1 | 2 | 1 | 2 | ~5.9% |
| 2 | 4 | 2 | 8 | ~23.5% |
| 3 (leaves) | 8 | 3 | 24 | ~70.6% |
The Fundamental Inefficiency:
The problem is structural. In the naive approach:
This is the worst possible distribution! We're making the most numerous nodes do the most work.
Mathematical Formulation:
Total Work = Σ (level k) (nodes at k) × (distance from k to root)
= Σ (k=0 to h) 2^k × k
= Θ(n log n)
If we could somehow invert this—make the numerous nodes do little work and the few nodes do more work—we could potentially achieve better complexity. This is exactly what the bottom-up heapify algorithm does.
The bottom-up approach inverts the work distribution: nodes bubble DOWN instead of UP. Since lower nodes can only bubble down a short distance (they're already near the leaves), and only the few upper nodes must bubble down far, the total work becomes O(n). We'll see this in detail in the next page.
Let's trace through the naive algorithm on a larger example to fully internalize the process and see the work distribution in action.
Input Array: A = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
Expected: A valid max-heap containing these 10 elements
Step-by-Step Trace:
Final Array: [16, 14, 10, 4, 7, 3, 9, 1, 8, 2]
Verification (Max-Heap Property):
16
/ \
14 10
/ \ / \
4 7 3 9
/ \ /
1 8 2
Parent-Child Checks:
- 16 ≥ 14, 16 ≥ 10 ✓
- 14 ≥ 4, 14 ≥ 7 ✓
- 10 ≥ 3, 10 ≥ 9 ✓
- 4 ≥ 1, 4 ≥ 8 ✗ WAIT, 4 < 8!
Oops! Let me re-trace more carefully...
Actually, let me redo step 7:
i=7: Insert 14 at index 7. Parent index = (7-1)/2 = 3. A[3]=2. 14 > 2, swap. Now 14 is at index 3, 2 is at index 7. Array: [16,4,10,14,2,3,9,2]. Wait, there's an error. Let me start fresh with careful tracking.
Corrected Trace (starting from i=6):
After i=6: [16, 4, 10, 1, 2, 3, 9]
i=7: Insert 14 at index 7.
[16, 4, 10, 14, 2, 3, 9, 1][16, 14, 10, 4, 2, 3, 9, 1][16, 14, 10, 4, 2, 3, 9, 1] (2 swaps)i=8: Insert 8 at index 8.
[16, 14, 10, 8, 2, 3, 9, 1, 4][16, 14, 10, 8, 2, 3, 9, 1, 4] (1 swap)i=9: Insert 7 at index 9.
[16, 14, 10, 8, 7, 3, 9, 1, 4, 2][16, 14, 10, 8, 7, 3, 9, 1, 4, 2] (1 swap)Total Swaps: 0+0+0+1+2+1+1+2+1+1 = 9 swaps
For n=10 elements, log₂(10. ≈ 3.3, so the naive approach's theoretical worst case would be ~10 × 3 = 30 swaps. Our actual count of 9 is well under because not every element needed to bubble all the way up.
Here are production-quality implementations of the naive heap construction algorithm in several languages.
Python Implementation:
def build_heap_naive(arr: list) -> list:
"""
Build a max-heap from an array using repeated insertions.
Time: O(n log n), Space: O(1) auxiliary
"""
n = len(arr)
# Process each element starting from index 1
# (index 0 is trivially a valid heap of size 1)
for i in range(1, n):
_bubble_up(arr, i)
return arr
def _bubble_up(arr: list, index: int) -> None:
"""Bubble element at index up to restore heap property."""
while index > 0:
parent = (index - 1) // 2
if arr[index] > arr[parent]:
arr[index], arr[parent] = arr[parent], arr[index]
index = parent
else:
break
# Example usage
arr = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
build_heap_naive(arr)
print(arr) # [16, 14, 10, 8, 7, 3, 9, 1, 4, 2]
TypeScript Implementation:
function buildHeapNaive<T>(arr: T[], compare: (a: T, b: T) => number = (a, b) => (a > b ? 1 : -1)): T[] {
const n = arr.length;
for (let i = 1; i < n; i++) {
bubbleUp(arr, i, compare);
}
return arr;
}
function bubbleUp<T>(arr: T[], index: number, compare: (a: T, b: T) => number): void {
while (index > 0) {
const parent = Math.floor((index - 1) / 2);
if (compare(arr[index], arr[parent]) > 0) {
[arr[index], arr[parent]] = [arr[parent], arr[index]];
index = parent;
} else {
break;
}
}
}
// Example usage
const arr = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7];
buildHeapNaive(arr);
console.log(arr); // [16, 14, 10, 8, 7, 3, 9, 1, 4, 2]
Java Implementation:
public class NaiveHeapBuilder {
public static void buildMaxHeap(int[] arr) {
int n = arr.length;
for (int i = 1; i < n; i++) {
bubbleUp(arr, i);
}
}
private static void bubbleUp(int[] arr, int index) {
while (index > 0) {
int parent = (index - 1) / 2;
if (arr[index] > arr[parent]) {
// Swap
int temp = arr[index];
arr[index] = arr[parent];
arr[parent] = temp;
index = parent;
} else {
break;
}
}
}
}
The naive algorithm's space complexity is excellent—it's one of its few advantages.
Auxiliary Space: O(1)
The algorithm uses only:
i, index, parent)No additional arrays, no recursion stack (the while loop is iterative), no extra data structures.
In-Place Property:
The algorithm modifies the input array directly. After execution:
Comparison with "Create New Heap" Approach:
An alternative naive approach creates a new heap and copies elements:
def build_heap_copy(arr):
heap = []
for x in arr:
heap.append(x)
bubble_up(heap, len(heap) - 1)
return heap
This uses O(n) extra space for the new array. Our in-place version avoids this.
| Approach | Auxiliary Space | In-Place? | Notes |
|---|---|---|---|
| In-place naive (our approach) | O(1) | Yes | Best space efficiency |
| Copy-based naive | O(n) | No | Creates new array |
| Bottom-up heapify (next page) | O(1) | Yes | Same space, better time |
In memory-constrained environments (embedded systems, mobile devices, very large datasets), O(1) auxiliary space is a significant advantage. This is why in-place algorithms are preferred when available. Fortunately, the optimal bottom-up heapify is also O(1) space, so we don't sacrifice space efficiency when we improve time efficiency.
Despite being suboptimal, the naive approach is sometimes perfectly reasonable. Understanding when can save you from premature optimization.
Use Naive When:
Avoid Naive When:
The Practical Threshold:
As a rough guideline:
Remember that O(n log n) vs O(n) is an asymptotic comparison. For n = 1000, log n ≈ 10, so the difference is about 10×. For n = 1,000,000, log n ≈ 20, so about 20×. The constant factors in both algorithms are similar, so these multipliers reflect real-world differences accurately.
We've thoroughly examined the naive heap construction algorithm. Let's consolidate the key insights.
The Core Insight for Improvement:
The naive approach is inefficient because:
Work = Σ (level k) [nodes at k] × [distance to root]
= Σ (k=0 to h) 2^k × k
= O(n log n)
If we could compute:
Work = Σ (level k) [nodes at k] × [distance to leaves]
= Σ (k=0 to h) 2^k × (h - k)
This would give us a much smaller total—actually O(n). This is the bottom-up heapify approach, which we'll cover in the next page.
What's Next:
Now that we understand why the naive approach is O(n log n), we're ready to learn the optimal algorithm. The next page presents bottom-up heapify: the elegant O(n) algorithm that builds a heap by processing internal nodes from bottom to top, exploiting the favorable work distribution we identified.
You now have a complete understanding of the naive heap construction algorithm: how it works, why it's correct, exactly why it achieves O(n log n) complexity, and when it's acceptable to use. Most importantly, you understand the work distribution problem that the optimal algorithm solves. Next, we'll see the elegant bottom-up heapify algorithm in action.