Data Structures & AlgorithmsHeaps & Priority Queues

Heapify — Building a Heap Efficiently

LevelIntermediate

Duration70 mins

TopicHeaps & Priority Queues

1 / 4

Building a Heap from an Array

The Heap Construction Problem

Consider this scenario: you're building a task scheduling system for a data center with thousands of servers. Each server submits tasks with priority levels, and your system must process them in priority order. The catch? At system initialization, you receive a massive dump of 10 million pending tasks from a crash recovery log—all at once, in arbitrary order.

You know heaps are ideal for priority processing. But how do you convert those 10 million unsorted task records into a valid heap? Do you insert them one by one? Is there a faster way?

This question—how to efficiently build a heap from an existing collection—is one of the most important problems in heap algorithms. The naive approach seems obvious, but as we'll discover, it's not optimal. The optimal approach, called heapify (or build-heap), achieves something remarkable: it constructs a valid heap in O(n) time, not O(n log n). Understanding why this works requires grasping some beautiful mathematical insights about tree structure.

What You Will Learn

By the end of this page, you will understand: (1) The precise problem statement for heap construction; (2) Why this problem is fundamentally different from repeated insertions; (3) The interface and contract of the build-heap operation; (4) Real-world scenarios where efficient heap construction matters; (5) The structural properties of heaps that enable efficient algorithms.

The Problem Statement: Precise Definition

Let's define the heap construction problem with mathematical precision. Ambiguity in problem statements leads to ambiguity in solutions, so we'll be rigorous.

Input:

An array A of n comparable elements in arbitrary (unsorted) order:

A = [a₀, a₁, a₂, ..., aₙ₋₁]

Output:

Rearrange the elements of A in-place such that A satisfies the heap property.

The Max-Heap Property (formal):

For all indices i where i > 0:

A[parent(i)] ≥ A[i]

where parent(i) = ⌊(i - 1) / 2⌋.

Equivalently: every node's value is at least as large as the values of its children.

The Min-Heap Property (formal):

For all indices i where i > 0:

A[parent(i)] ≤ A[i]

Equivalently: every node's value is at most as small as the values of its children.

In-Place Requirement

The 'in-place' constraint is important. We want to transform the existing array without allocating a new array of size n. This means O(1) auxiliary space (not counting the input array itself). This constraint rules out solutions that copy elements to a new heap structure.

What We're NOT Doing:

It's important to distinguish heap construction from related operations:

Not sorting — We're not producing a sorted array. A heap is only partially ordered. The heap property is weaker than the sorted property.
Not creating a new heap and inserting — While this works, it's not in-place (requires new allocation) and, as we'll see, it's slower than necessary.
Not building from a data stream — We have all elements available upfront. This batch availability is what enables the O(n) algorithm.

Example:

Input Array:  [3, 9, 2, 1, 4, 5]

Valid Max-Heap Output: [9, 4, 5, 1, 3, 2]

Verification:
- parent(1)=0: A[0]=9 ≥ A[1]=4 ✓
- parent(2)=0: A[0]=9 ≥ A[2]=5 ✓
- parent(3)=1: A[1]=4 ≥ A[3]=1 ✓
- parent(4)=1: A[1]=4 ≥ A[4]=3 ✓
- parent(5)=2: A[2]=5 ≥ A[5]=2 ✓

Note that there are multiple valid heap configurations for the same input. Our algorithm needs to produce any valid heap, not a specific one.

The Obvious Approach: Why Not Use Repeated Insert?

The first solution most developers think of is straightforward: create an empty heap, then insert elements one by one.

Naive Algorithm:

BUILD-HEAP-NAIVE(A):
    H = new empty heap
    for each element x in A:
        H.insert(x)
    return H

This works correctly—it definitely produces a valid heap. But let's analyze its complexity.

Time Complexity Analysis:

Each insert operation is O(log k) where k is the current heap size. For the entire sequence:

Worst-Case Operations in Naive Heap Construction
Insertion #	Heap Size Before	Max Bubble-Up Cost	Cumulative Work
1	0	O(1)	O(1)
2	1	O(1)	O(1)
3	2	O(1)	O(1)
4	3	O(2)	O(1) + O(2)
k	k-1	O(log k)	Σ log i
n	n-1	O(log n)	Σ(i=1 to n) log i

Total Work:

T(n) = Σ(i=1 to n) log i = log(1) + log(2) + log(3) + ... + log(n)
     = log(1 × 2 × 3 × ... × n)
     = log(n!)

Using Stirling's approximation: log(n!) ≈ n log(n) - n log(e) = Θ(n log n)

Therefore, the naive approach is O(n log n).

Additional Overhead:

Beyond the worse complexity, the naive approach has practical drawbacks:

Memory allocation — Creating a new heap requires allocating n elements of space, violating the in-place constraint.
Cache inefficiency — Growing arrays cause cache misses; the final elements are far from the first in memory.
More swaps — Each insertion potentially swaps all the way to the root, performing ~n log n swaps total.

The n log n Trap

Many developers assume n log n is optimal for heap construction because 'heap operations are O(log n).' But this reasoning is flawed! The question isn't what each operation costs—it's whether we can do better by taking a different approach entirely. Spoiler: we can achieve O(n).

When Does Fast Heap Construction Matter?

Before diving into the optimal algorithm, let's understand why the difference between O(n log n) and O(n) actually matters in practice. These scenarios demonstrate that heap construction speed is not merely theoretical.

Scenario 1: HeapSort Algorithm

HeapSort is a comparison-based sorting algorithm that:

Builds a max-heap from the input array
Repeatedly extracts the maximum to the end of the array

The overall complexity is:

Build heap: T_build
Extract n times: O(n log n)
Total: T_build + O(n log n)

If T_build = O(n log n), total is O(n log n) — optimal for comparison sorts. If T_build = O(n), total is still O(n log n) — but the constant factor improves!

For HeapSort to be competitive with QuickSort and MergeSort, the O(n) build is essential. With O(n log n) build, the constant factor makes HeapSort noticeably slower.

Impact of Build-Heap Complexity on HeapSort (n = 1,000,000)
Phase	O(n log n) Build	O(n) Build	Savings
Build Phase	~20 million ops	~2 million ops	~18 million ops
Extract Phase	~20 million ops	~20 million ops	Same
Total	~40 million ops	~22 million ops	~45% reduction

Scenario 2: Finding K Largest Elements

To find the K largest elements in an array of n elements:

Approach A: Build a max-heap of all n elements, extract K times.

Time: O(n) + O(K log n)

Approach B: Maintain a min-heap of K elements, scanning through n.

Time: O(n log K)

For small K, Approach A is better—but only if build-heap is O(n). With O(n log n) build, the advantage disappears.

Scenario 3: Batch Priority Queue Initialization

Database query planners, operating system schedulers, and network routers often initialize with a batch of existing work items:

OS reboot: thousands of pending processes from hibernation state
Database startup: queued transactions from write-ahead log
Network router: buffered packets from interface reset

With millions of items, the difference between 20 million operations and 2 million operations means the difference between seconds of initialization and sub-second—critical for high-availability systems.

Performance-Critical Applications of Heap Construction

•HeapSort — The sorting algorithm's efficiency depends on O(n) build
•Dijkstra's Algorithm — Initial priority queue for all vertices
•Prim's MST Algorithm — Initial edge priority queue
•Huffman Coding — Building the initial frequency heap
•Median Maintenance — Initializing dual heaps from a data batch
•Event-Driven Simulation — Loading initial event schedule
•K-way Merge — Creating the initial merge heap from K arrays

The Constant Factor Matters

Even when asymptotic complexity is the same (e.g., both approaches yield O(n log n) overall for HeapSort), the constant factor difference between O(n) and O(n log n) for heap building translates to real seconds saved at scale. In competitive programming, this can be the difference between 'Accepted' and 'Time Limit Exceeded.'

The Key Insight: Leaves Are Already Heaps

The optimal heap construction algorithm is based on a beautiful observation about tree structure. Let's build up to this insight.

Observation 1: A Single Node Is a Valid Heap

A heap with one element trivially satisfies the heap property—there are no parent-child pairs to violate it.

Observation 2: Leaves Have No Children

In a complete binary tree represented as an array:

Nodes with indices 0 to ⌊n/2⌋ - 1 are internal nodes (have at least one child)
Nodes with indices ⌊n/2⌋ to n - 1 are leaves (have no children)

This can be derived from the child index formulas:

Left child of index i: 2i + 1
For this to be valid: 2i + 1 < n, so i < (n-1)/2, meaning i ≤ ⌊(n-2)/2⌋ = ⌊n/2⌋ - 1

Observation 3: Every Leaf Is a Valid Heap

Since leaves have no children, there are no heap property constraints to satisfy. Every leaf, considered as a subtree of one node, is already a valid heap.

The Critical Insight:

In a complete binary tree with n nodes, approximately half the nodes are leaves. These leaves are already valid heaps without any work.

For n = 15 (a complete tree with 4 levels):

Level 0:     1 node  (root)           — internal
Level 1:     2 nodes                  — internal
Level 2:     4 nodes                  — internal
Level 3:     8 nodes                  — LEAVES

Total: 15 nodes
Leaves: 8 nodes (53%)
Internal: 7 nodes (47%)

For a general complete binary tree:

Number of leaves: ⌈n/2⌉
Number of internal nodes: ⌊n/2⌋

What This Means:

We don't need to do any work on half the array! The naive approach (repeated insertion) doesn't exploit this—it processes every element. But if we could somehow "merge" valid sub-heaps into larger valid heaps, we could skip the leaves entirely and work only on internal nodes.

This is exactly what the bottom-up heapify algorithm does.

Structural Advantage

The insight that 'half the nodes need no processing' is just the beginning. As we'll see, the nodes that DO need processing require less work on average because deeper nodes (which are more numerous) have shorter distances to bubble down. This distribution of work is what achieves O(n) complexity.

Heaps as Composable Structures

To understand bottom-up heapify, we need to see how small heaps can combine into larger heaps. This compositional view is fundamental.

Composition Theorem:

Given:

A value v at some position
A valid max-heap as its left subtree
A valid max-heap as its right subtree

We can create a valid max-heap rooted at v by "heapifying" (bubbling down) v to its correct position.

Why This Works:

When we bubble-down v:

If v is already largest among itself and children → heap is valid, we're done
If a child is larger → swap v with that child
The subtree where v lands is still a valid heap (by assumption), so we recurse

The recursion terminates because:

Either v becomes larger than its children
Or v reaches a leaf position (no children to compare)

At each step, the heap property is restored for the current level, and the subtrees remain valid heaps.

Visualizing Composition:

BEFORE (two valid heaps + arbitrary root):

        [3]         ← arbitrary value
       /   \
     [9]   [7]      ← roots of valid sub-heaps
     / \   / \
   [5][4][2][1]     ← these subtrees are valid heaps

AFTER (heapify root):

        [9]         ← 3 bubbled down
       /   \
     [5]   [7]      
     / \   / \
   [3][4][2][1]     ← 3 found its position

Trace of the heapify-down process:

Start: value 3 at root
Compare 3 with children 9, 7: max is 9 (left)
Swap 3 ↔ 9: now 3 is at left child position
Compare 3 with children 5, 4: max is 5 (left)
Swap 3 ↔ 5: now 3 is at left-left position
3 has no children → done

Result: Valid max-heap!

The Composition Principle

This composition principle—that we can build a larger heap from smaller heaps by fixing just the root—is the foundation of both the build-heap algorithm and the extract-max operation we studied earlier. It's a powerful pattern: local repair with guaranteed global correctness.

Array Structure: Understanding Node Positions

Before we dive into the algorithm, let's solidify our understanding of how complete binary trees map to arrays. This mapping is what makes efficient heap operations possible.

The Level-Order Mapping:

A complete binary tree is stored in an array using level-order (breadth-first) indexing:

Tree Structure:            Array Representation:

        0                  Index: 0  1  2  3  4  5  6  7  8  9  10 11 12
       / \                 Value: ?  ?  ?  ?  ?  ?  ?  ?  ?  ?  ?  ?  ?
      /   \                       ↑  ↑  ↑  ↑ ← Level 0,1,2 (internal)
     1     2                            Level 3,... ← (leaves)
    / \   / \
   3   4 5   6         For n=13:
  / \ / \              - Internal nodes: indices 0 to 5 (⌊13/2⌋ - 1 = 5)
 7  8 9 10             - Leaf nodes: indices 6 to 12 (⌊13/2⌋ = 6 onward)

Index Formulas (0-based indexing):

Parent-Child Index Relationships
Relationship	Formula
Parent of node i	(i - 1) / 2 (integer division)
Left child of node i	2i + 1
Right child of node i	2i + 2
Left child exists?	2i + 1 < n
Right child exists?	2i + 2 < n
Is leaf?	2i + 1 ≥ n (equivalently: i ≥ n/2)
Last internal node	⌊n/2⌋ - 1

Why These Formulas Matter for Heapify:

The build-heap algorithm will iterate through internal nodes from the last one to the root:

for i from ⌊n/2⌋ - 1 down to 0:
    heapify-down(i)

Let's verify the formula with examples:

Example 1: n = 10

⌊10/2⌋ - 1 = 5 - 1 = 4
Internal nodes: 0, 1, 2, 3, 4 (5 nodes)
Leaf nodes: 5, 6, 7, 8, 9 (5 nodes)

Tree:
          0
        /   \
       1      2
      / \    / \
     3   4  5   6
    / \ /
   7  8 9

Node 4's children: 2*4+1=9, 2*4+2=10 (10 is out of bounds)
  → Node 4 has one child (9) ✓
Node 5's children: 2*5+1=11 (out of bounds)
  → Node 5 is a leaf ✓

Example 2: n = 7 (complete, full tree)

⌊7/2⌋ - 1 = 3 - 1 = 2
Internal nodes: 0, 1, 2 (3 nodes)
Leaf nodes: 3, 4, 5, 6 (4 nodes)

This structural understanding—where leaves are, where internal nodes are, and how they relate—is the foundation for understanding why bottom-up heapify is so efficient.

Preview: Two Approaches to Heap Construction

We're now equipped to understand the two main approaches to heap construction. In the following pages, we'll analyze each in detail.

Approach 1: Naive (Top-Down) — O(n log n)

Insert elements one by one, each bubbling UP through potentially log(n) levels.

for i from 0 to n-1:
    // Treat A[0..i] as heap, add A[i+1]
    bubble-up(i+1)

Each element starts at the bottom and may travel to the root.

Approach 2: Optimal (Bottom-Up) — O(n)

Process internal nodes from bottom to top, each bubbling DOWN through limited levels.

for i from ⌊n/2⌋ - 1 down to 0:
    bubble-down(i)

Each element starts at its position and may travel to a leaf.

The Crucial Difference:

In the naive approach, elements near the bottom (which are numerous) may bubble all the way to the top (long distance).

In the optimal approach, elements near the bottom (which are numerous) only bubble down a short distance, while elements near the top (which are few) may bubble down a long distance.

This inversion of work distribution is what achieves O(n) instead of O(n log n).

Naive Approach (Bubble Up)

•Process nodes 0 to n-1
•Each element bubbles UP
•Deep nodes travel far
•Many nodes × long distance
•Total: O(n log n)

Optimal Approach (Bubble Down)

•Process nodes ⌊n/2⌋-1 to 0
•Each element bubbles DOWN
•Deep nodes travel little
•Few nodes × long distance
•Total: O(n)

The Power of Direction

The difference between O(n) and O(n log n) comes entirely from choosing the right direction! Bubbling up makes deep nodes (many) work hard. Bubbling down makes deep nodes (many) work little. Same total nodes, vastly different total work.

Summary: The Heap Construction Challenge

We've established the foundation for understanding heap construction algorithms. Let's consolidate the key concepts before diving into the algorithms themselves.

Key Takeaways

•The Problem — Convert an arbitrary array into a valid heap in-place, using O(1) auxiliary space.
•Naive Approach Limitation — Repeated insertion achieves O(n log n), but we can do better.
•Real-World Impact — Efficient heap construction is critical for HeapSort, graph algorithms, and system initialization.
•Half Are Leaves — Approximately half of all nodes are leaves, which are already trivially valid heaps.
•Heaps Are Composable — We can merge two valid sub-heaps into a larger heap by fixing only the root.
•Direction Matters — Bubbling down (bottom-up) achieves O(n); bubbling up (top-down) gives O(n log n).
•Index Formulas — The last internal node is at index ⌊n/2⌋ - 1; all subsequent indices are leaves.

What's Next:

In the following pages, we'll:

Page 2: Naive Approach — Analyze the repeated-insertion algorithm in detail, proving its O(n log n) complexity and understanding exactly why it's suboptimal.
Page 3: Bottom-Up Heapify — Present the optimal O(n) algorithm with full implementation, correctness proof, and step-by-step examples.
Page 4: Why Bottom-Up is Faster — Provide the mathematical analysis that proves O(n) complexity, with intuition for why the summation converges.

With the foundational understanding from this page, you're ready to appreciate the elegance and efficiency of the heapify algorithm.

Page Complete

You now understand the heap construction problem precisely: the input, output, constraints, and real-world importance. You've seen why the naive approach is O(n log n) and have a preview of why the optimal approach can achieve O(n). Next, we'll examine the naive approach in detail to fully understand what we're improving upon.

1 / 4

Loading learning content...

Data Structures & AlgorithmsHeaps & Priority Queues

Heapify — Building a Heap Efficiently

LevelIntermediate

Duration70 mins

TopicHeaps & Priority Queues

1 / 4

Building a Heap from an Array

The Heap Construction Problem

You know heaps are ideal for priority processing. But how do you convert those 10 million unsorted task records into a valid heap? Do you insert them one by one? Is there a faster way?

What You Will Learn

The Problem Statement: Precise Definition

Let's define the heap construction problem with mathematical precision. Ambiguity in problem statements leads to ambiguity in solutions, so we'll be rigorous.

Input:

An array A of n comparable elements in arbitrary (unsorted) order:

A = [a₀, a₁, a₂, ..., aₙ₋₁]

Output:

Rearrange the elements of A in-place such that A satisfies the heap property.

The Max-Heap Property (formal):

For all indices i where i > 0:

A[parent(i)] ≥ A[i]

where parent(i) = ⌊(i - 1) / 2⌋.

Equivalently: every node's value is at least as large as the values of its children.

The Min-Heap Property (formal):

For all indices i where i > 0:

A[parent(i)] ≤ A[i]

Equivalently: every node's value is at most as small as the values of its children.

In-Place Requirement

What We're NOT Doing:

It's important to distinguish heap construction from related operations:

Not sorting — We're not producing a sorted array. A heap is only partially ordered. The heap property is weaker than the sorted property.
Not creating a new heap and inserting — While this works, it's not in-place (requires new allocation) and, as we'll see, it's slower than necessary.
Not building from a data stream — We have all elements available upfront. This batch availability is what enables the O(n) algorithm.

Example:

Input Array:  [3, 9, 2, 1, 4, 5]

Valid Max-Heap Output: [9, 4, 5, 1, 3, 2]

Verification:
- parent(1)=0: A[0]=9 ≥ A[1]=4 ✓
- parent(2)=0: A[0]=9 ≥ A[2]=5 ✓
- parent(3)=1: A[1]=4 ≥ A[3]=1 ✓
- parent(4)=1: A[1]=4 ≥ A[4]=3 ✓
- parent(5)=2: A[2]=5 ≥ A[5]=2 ✓

Note that there are multiple valid heap configurations for the same input. Our algorithm needs to produce any valid heap, not a specific one.

The Obvious Approach: Why Not Use Repeated Insert?

The first solution most developers think of is straightforward: create an empty heap, then insert elements one by one.

Naive Algorithm:

BUILD-HEAP-NAIVE(A):
    H = new empty heap
    for each element x in A:
        H.insert(x)
    return H

This works correctly—it definitely produces a valid heap. But let's analyze its complexity.

Time Complexity Analysis:

Each insert operation is O(log k) where k is the current heap size. For the entire sequence:

Worst-Case Operations in Naive Heap Construction
Insertion #	Heap Size Before	Max Bubble-Up Cost	Cumulative Work
1	0	O(1)	O(1)
2	1	O(1)	O(1)
3	2	O(1)	O(1)
4	3	O(2)	O(1) + O(2)
k	k-1	O(log k)	Σ log i
n	n-1	O(log n)	Σ(i=1 to n) log i

Total Work:

T(n) = Σ(i=1 to n) log i = log(1) + log(2) + log(3) + ... + log(n)
     = log(1 × 2 × 3 × ... × n)
     = log(n!)

Using Stirling's approximation: log(n!) ≈ n log(n) - n log(e) = Θ(n log n)

Therefore, the naive approach is O(n log n).

Additional Overhead:

Beyond the worse complexity, the naive approach has practical drawbacks:

Memory allocation — Creating a new heap requires allocating n elements of space, violating the in-place constraint.
Cache inefficiency — Growing arrays cause cache misses; the final elements are far from the first in memory.
More swaps — Each insertion potentially swaps all the way to the root, performing ~n log n swaps total.

The n log n Trap

When Does Fast Heap Construction Matter?

Scenario 1: HeapSort Algorithm

HeapSort is a comparison-based sorting algorithm that:

Builds a max-heap from the input array
Repeatedly extracts the maximum to the end of the array

The overall complexity is:

Build heap: T_build
Extract n times: O(n log n)
Total: T_build + O(n log n)

If T_build = O(n log n), total is O(n log n) — optimal for comparison sorts. If T_build = O(n), total is still O(n log n) — but the constant factor improves!

For HeapSort to be competitive with QuickSort and MergeSort, the O(n) build is essential. With O(n log n) build, the constant factor makes HeapSort noticeably slower.

Impact of Build-Heap Complexity on HeapSort (n = 1,000,000)
Phase	O(n log n) Build	O(n) Build	Savings
Build Phase	~20 million ops	~2 million ops	~18 million ops
Extract Phase	~20 million ops	~20 million ops	Same
Total	~40 million ops	~22 million ops	~45% reduction

Scenario 2: Finding K Largest Elements

To find the K largest elements in an array of n elements:

Approach A: Build a max-heap of all n elements, extract K times.

Time: O(n) + O(K log n)

Approach B: Maintain a min-heap of K elements, scanning through n.

Time: O(n log K)

For small K, Approach A is better—but only if build-heap is O(n). With O(n log n) build, the advantage disappears.

Scenario 3: Batch Priority Queue Initialization

Database query planners, operating system schedulers, and network routers often initialize with a batch of existing work items:

OS reboot: thousands of pending processes from hibernation state
Database startup: queued transactions from write-ahead log
Network router: buffered packets from interface reset

Performance-Critical Applications of Heap Construction

•HeapSort — The sorting algorithm's efficiency depends on O(n) build
•Dijkstra's Algorithm — Initial priority queue for all vertices
•Prim's MST Algorithm — Initial edge priority queue
•Huffman Coding — Building the initial frequency heap
•Median Maintenance — Initializing dual heaps from a data batch
•Event-Driven Simulation — Loading initial event schedule
•K-way Merge — Creating the initial merge heap from K arrays

The Constant Factor Matters

The Key Insight: Leaves Are Already Heaps

The optimal heap construction algorithm is based on a beautiful observation about tree structure. Let's build up to this insight.

Observation 1: A Single Node Is a Valid Heap

A heap with one element trivially satisfies the heap property—there are no parent-child pairs to violate it.

Observation 2: Leaves Have No Children

In a complete binary tree represented as an array:

Nodes with indices 0 to ⌊n/2⌋ - 1 are internal nodes (have at least one child)
Nodes with indices ⌊n/2⌋ to n - 1 are leaves (have no children)

This can be derived from the child index formulas:

Left child of index i: 2i + 1
For this to be valid: 2i + 1 < n, so i < (n-1)/2, meaning i ≤ ⌊(n-2)/2⌋ = ⌊n/2⌋ - 1

Observation 3: Every Leaf Is a Valid Heap

Since leaves have no children, there are no heap property constraints to satisfy. Every leaf, considered as a subtree of one node, is already a valid heap.

The Critical Insight:

In a complete binary tree with n nodes, approximately half the nodes are leaves. These leaves are already valid heaps without any work.

For n = 15 (a complete tree with 4 levels):

Level 0:     1 node  (root)           — internal
Level 1:     2 nodes                  — internal
Level 2:     4 nodes                  — internal
Level 3:     8 nodes                  — LEAVES

Total: 15 nodes
Leaves: 8 nodes (53%)
Internal: 7 nodes (47%)

For a general complete binary tree:

Number of leaves: ⌈n/2⌉
Number of internal nodes: ⌊n/2⌋

What This Means:

This is exactly what the bottom-up heapify algorithm does.

Structural Advantage

Heaps as Composable Structures

To understand bottom-up heapify, we need to see how small heaps can combine into larger heaps. This compositional view is fundamental.

Composition Theorem:

Given:

A value v at some position
A valid max-heap as its left subtree
A valid max-heap as its right subtree

We can create a valid max-heap rooted at v by "heapifying" (bubbling down) v to its correct position.

Why This Works:

When we bubble-down v:

If v is already largest among itself and children → heap is valid, we're done
If a child is larger → swap v with that child
The subtree where v lands is still a valid heap (by assumption), so we recurse

The recursion terminates because:

Either v becomes larger than its children
Or v reaches a leaf position (no children to compare)

At each step, the heap property is restored for the current level, and the subtrees remain valid heaps.

Visualizing Composition:

BEFORE (two valid heaps + arbitrary root):

        [3]         ← arbitrary value
       /   \
     [9]   [7]      ← roots of valid sub-heaps
     / \   / \
   [5][4][2][1]     ← these subtrees are valid heaps

AFTER (heapify root):

        [9]         ← 3 bubbled down
       /   \
     [5]   [7]      
     / \   / \
   [3][4][2][1]     ← 3 found its position

Trace of the heapify-down process:

Start: value 3 at root
Compare 3 with children 9, 7: max is 9 (left)
Swap 3 ↔ 9: now 3 is at left child position
Compare 3 with children 5, 4: max is 5 (left)
Swap 3 ↔ 5: now 3 is at left-left position
3 has no children → done

Result: Valid max-heap!

The Composition Principle

Array Structure: Understanding Node Positions

Before we dive into the algorithm, let's solidify our understanding of how complete binary trees map to arrays. This mapping is what makes efficient heap operations possible.

The Level-Order Mapping:

A complete binary tree is stored in an array using level-order (breadth-first) indexing:

Tree Structure:            Array Representation:

        0                  Index: 0  1  2  3  4  5  6  7  8  9  10 11 12
       / \                 Value: ?  ?  ?  ?  ?  ?  ?  ?  ?  ?  ?  ?  ?
      /   \                       ↑  ↑  ↑  ↑ ← Level 0,1,2 (internal)
     1     2                            Level 3,... ← (leaves)
    / \   / \
   3   4 5   6         For n=13:
  / \ / \              - Internal nodes: indices 0 to 5 (⌊13/2⌋ - 1 = 5)
 7  8 9 10             - Leaf nodes: indices 6 to 12 (⌊13/2⌋ = 6 onward)

Index Formulas (0-based indexing):

Parent-Child Index Relationships
Relationship	Formula
Parent of node i	(i - 1) / 2 (integer division)
Left child of node i	2i + 1
Right child of node i	2i + 2
Left child exists?	2i + 1 < n
Right child exists?	2i + 2 < n
Is leaf?	2i + 1 ≥ n (equivalently: i ≥ n/2)
Last internal node	⌊n/2⌋ - 1

Why These Formulas Matter for Heapify:

The build-heap algorithm will iterate through internal nodes from the last one to the root:

for i from ⌊n/2⌋ - 1 down to 0:
    heapify-down(i)

Let's verify the formula with examples:

Example 1: n = 10

⌊10/2⌋ - 1 = 5 - 1 = 4
Internal nodes: 0, 1, 2, 3, 4 (5 nodes)
Leaf nodes: 5, 6, 7, 8, 9 (5 nodes)

Tree:
          0
        /   \
       1      2
      / \    / \
     3   4  5   6
    / \ /
   7  8 9

Node 4's children: 2*4+1=9, 2*4+2=10 (10 is out of bounds)
  → Node 4 has one child (9) ✓
Node 5's children: 2*5+1=11 (out of bounds)
  → Node 5 is a leaf ✓

Example 2: n = 7 (complete, full tree)

⌊7/2⌋ - 1 = 3 - 1 = 2
Internal nodes: 0, 1, 2 (3 nodes)
Leaf nodes: 3, 4, 5, 6 (4 nodes)

This structural understanding—where leaves are, where internal nodes are, and how they relate—is the foundation for understanding why bottom-up heapify is so efficient.

Preview: Two Approaches to Heap Construction

We're now equipped to understand the two main approaches to heap construction. In the following pages, we'll analyze each in detail.

Approach 1: Naive (Top-Down) — O(n log n)

Insert elements one by one, each bubbling UP through potentially log(n) levels.

for i from 0 to n-1:
    // Treat A[0..i] as heap, add A[i+1]
    bubble-up(i+1)

Each element starts at the bottom and may travel to the root.

Approach 2: Optimal (Bottom-Up) — O(n)

Process internal nodes from bottom to top, each bubbling DOWN through limited levels.

for i from ⌊n/2⌋ - 1 down to 0:
    bubble-down(i)

Each element starts at its position and may travel to a leaf.

The Crucial Difference:

In the naive approach, elements near the bottom (which are numerous) may bubble all the way to the top (long distance).

In the optimal approach, elements near the bottom (which are numerous) only bubble down a short distance, while elements near the top (which are few) may bubble down a long distance.

This inversion of work distribution is what achieves O(n) instead of O(n log n).

Naive Approach (Bubble Up)

•Process nodes 0 to n-1
•Each element bubbles UP
•Deep nodes travel far
•Many nodes × long distance
•Total: O(n log n)

Optimal Approach (Bubble Down)

•Process nodes ⌊n/2⌋-1 to 0
•Each element bubbles DOWN
•Deep nodes travel little
•Few nodes × long distance
•Total: O(n)

The Power of Direction

Summary: The Heap Construction Challenge

We've established the foundation for understanding heap construction algorithms. Let's consolidate the key concepts before diving into the algorithms themselves.

Key Takeaways

•The Problem — Convert an arbitrary array into a valid heap in-place, using O(1) auxiliary space.
•Naive Approach Limitation — Repeated insertion achieves O(n log n), but we can do better.
•Real-World Impact — Efficient heap construction is critical for HeapSort, graph algorithms, and system initialization.
•Half Are Leaves — Approximately half of all nodes are leaves, which are already trivially valid heaps.
•Heaps Are Composable — We can merge two valid sub-heaps into a larger heap by fixing only the root.
•Direction Matters — Bubbling down (bottom-up) achieves O(n); bubbling up (top-down) gives O(n log n).
•Index Formulas — The last internal node is at index ⌊n/2⌋ - 1; all subsequent indices are leaves.

What's Next:

In the following pages, we'll:

Page 2: Naive Approach — Analyze the repeated-insertion algorithm in detail, proving its O(n log n) complexity and understanding exactly why it's suboptimal.
Page 3: Bottom-Up Heapify — Present the optimal O(n) algorithm with full implementation, correctness proof, and step-by-step examples.
Page 4: Why Bottom-Up is Faster — Provide the mathematical analysis that proves O(n) complexity, with intuition for why the summation converges.

With the foundational understanding from this page, you're ready to appreciate the elegance and efficiency of the heapify algorithm.

Page Complete

1 / 4