Data Structures & AlgorithmsHeaps & Priority Queues

Array-Based Heap Implementation

LevelIntermediate

Duration60 mins

TopicHeaps & Priority Queues

1 / 4

Implicit Tree Structure in an Array

The Elegance of Implicit Representation

When you first learn about trees in computer science, the natural implementation uses nodes with explicit pointers—each node stores references to its children, and perhaps to its parent. This approach is intuitive: it mirrors how we draw trees on paper, with arrows connecting boxes. Yet for one specific and critically important type of tree, this pointer-based approach is not just unnecessary—it's suboptimal.

The binary heap, built on the foundation of a complete binary tree, possesses a structural regularity so profound that we can eliminate pointers entirely. Instead of storing a tree as a collection of nodes linked by references, we can store it as a simple, contiguous array. The tree structure becomes implicit—encoded not in explicit connections, but in the mathematical relationship between array indices.

This insight transforms heap implementation from a graph-like structure into something far more elegant and efficient. Understanding this implicit representation is fundamental to mastering heaps and unlocks performance characteristics that make heaps indispensable in systems ranging from operating system schedulers to real-time event processing.

What You Will Learn

By the end of this page, you will understand: (1) Why complete binary trees have a unique property that enables array storage, (2) How tree structure is implicitly encoded in array indices, (3) The level-order mapping between tree nodes and array positions, (4) Why this representation is not just clever but practically superior, and (5) How to mentally visualize the tree within the array.

The Complete Binary Tree Prerequisite

Before diving into array representation, we must firmly establish why this technique works specifically for complete binary trees and not for arbitrary trees.

Recall the definition of a complete binary tree:

A complete binary tree is a binary tree where:

Every level except possibly the last is completely filled
All nodes in the last level are as far left as possible

This definition might seem like an arbitrary constraint, but it carries profound implications. A complete binary tree has no gaps. When you traverse the tree level by level, from left to right (level-order traversal), you encounter a continuous sequence of nodes with no holes until you reach the final, possibly incomplete level—and even there, nodes are packed to the left.

The Key Insight

The completeness property means that if we number nodes in level-order (starting from 0 or 1), every node from 0 to n-1 exists. There are no 'missing' indices. This one-to-one correspondence between node position and sequential integer is what makes array storage possible.

Why arbitrary trees don't work:

Consider a binary tree that is not complete—perhaps a tree where the root has only a left child, which has only a right child, which has only a left child. If we tried to use level-order numbering:

Position 0: Root ✓
Position 1: Left child of root ✓
Position 2: Right child of root ✗ (doesn't exist)
Position 3: Left child of position 1 ✗ (doesn't exist)
Position 4: Right child of position 1 ✓
...

We'd have gaps—positions 2 and 3 are empty. To represent this tree in an array, we'd need to either:

Store null values in the gaps, wasting space and complicating logic
Use a sparse representation with explicit position markers, reintroducing complexity

Neither option preserves the elegant simplicity we're seeking. Completeness eliminates gaps, and eliminating gaps enables direct indexing.

Complete vs Non-Complete Trees: Storage Implications
Property	Complete Binary Tree	Non-Complete Binary Tree
Gap-free level-order	✓ Yes—every index 0 to n-1 maps to a node	✗ No—holes require null markers or indirection
Dense array storage	✓ O(n) space for n nodes, no waste	✗ May require O(2^h) space for h-level tree
Index-based navigation	✓ Simple formulas find parent/children	✗ Must store explicit pointers or markers
Cache performance	✓ Excellent—contiguous memory access	✗ Poor—scattered null values break patterns

Level-Order Numbering: The Foundation of Array Mapping

The array representation of a complete binary tree is based on level-order numbering—also known as breadth-first numbering. Starting from the root, we assign each node an index based on its position in a breadth-first traversal.

Let's establish this concretely with an example. Consider a complete binary tree with 10 nodes:

1
2
3
4
5
6
7
8
9
10
11
12
                    [0]
                   /     \
                [1]       [2]
               /   \     /   \
            [3]    [4] [5]    [6]
           /   \   /
        [7]   [8][9]
 
Level 0: Node at index 0 (root)
Level 1: Nodes at indices 1, 2
Level 2: Nodes at indices 3, 4, 5, 6
Level 3: Nodes at indices 7, 8, 9 (last level, filled left-to-right)

The level-order principle:

At each level k, the first node has index 2^k - 1 (0-based), and there are at most 2^k nodes. The levels fill sequentially:

Level 0: 1 node (index 0)
Level 1: 2 nodes (indices 1, 2)
Level 2: 4 nodes (indices 3, 4, 5, 6)
Level 3: 8 nodes (indices 7, 8, ..., 14, if fully filled)

This exponential growth of level capacity is intrinsic to binary trees, and the level-order numbering captures this structure perfectly.

0-Based vs 1-Based Indexing

Array representation works with both 0-based and 1-based indexing. 1-based indexing (where the root is at index 1) produces slightly simpler parent-child formulas. 0-based indexing (root at index 0) aligns with most programming language arrays. We'll explore both, as you'll encounter both in practice.

The mapping to array:

Given the level-order numbering, the array representation is trivial. We simply place each node's value at its corresponding index:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Tree nodes (with values):
                     50
                   /     \
                 30       70
                /   \    /   \
              20    40  60    80
             /  \   /
           10  25  35
 
Array (0-indexed):
Index:   0    1    2    3    4    5    6    7    8    9
Value: [50 | 30 | 70 | 20 | 40 | 60 | 80 | 10 | 25 | 35]
        ↑     ↑    ↑     ↑    ↑    ↑    ↑    ↑    ↑    ↑
       root  L0   R0    L1   R1   L2   R2  L3   R3   L4
       
Where L0/R0 = left/right child of index 0, etc.

Observe the elegance: No pointers, no node objects, no memory overhead per node—just a flat array of values. The tree structure is implicit in the positions. Index 0 is the root. Index 1 is the left child of the root. Index 2 is the right child of the root. And so on.

This isn't merely a storage optimization; it's a fundamentally different representation paradigm. The tree exists in the array, but only through the lens of index arithmetic.

Visualizing the Implicit Structure

To truly internalize array-based heap representation, you must develop the ability to see the tree within the array. This mental skill—translating between linear array view and hierarchical tree view—is essential for debugging, tracing algorithms, and building intuition.

Mental model 1: Layered segments

Think of the array as divided into segments, one per tree level:

1
2
3
4
5
6
7
8
9
10
Index:     0  |  1    2  |  3    4    5    6  |  7    8    9   10   11   12   13  |
Level:     0  |     1     |           2         |               3                   |
           ─────────────────────────────────────────────────────────────────────────
           ↑       ↑                ↑                          ↑
         Root    2 nodes         4 nodes                    8 nodes (max)
 
Level k contains indices from (2^k - 1) to (2^(k+1) - 2), inclusive (0-based).
 
For 1-based indexing:
Level k contains indices from 2^k to 2^(k+1) - 1, inclusive.

Mental model 2: Binary expansion

A powerful way to understand the implicit structure is through binary representation of indices. In 1-based indexing, the path from the root to any node corresponds exactly to the binary representation of its index (excluding the leading 1-bit):

After the leading 1, a 0-bit means "go left"
A 1-bit means "go right"

For example, index 11 in binary is 1011:

Leading 1: start at root
0: go left
1: go right
1: go right

Following this path from the root gives you the node at index 11.

Binary Representation and Tree Navigation (1-Based Indexing)
Index	Binary	Path from Root	Level
1	1	(root)	0
2	10	left	1
3	11	right	1
4	100	left → left	2
5	101	left → right	2
6	110	right → left	2
7	111	right → right	2
11	1011	left → right → right	3

Why This Matters for Debugging

When tracing heap operations, being able to quickly identify a node's position in the conceptual tree from its array index (and vice versa) is invaluable. Practice this translation: given index 9 (1-based), where is the node? Answer: binary 1001 → path is left-left-right → third level, second-from-left position.

Mental model 3: Parent-child locality

Another useful observation: nodes that are close in the tree tend to be close in the array. Specifically:

A node's children are always nearby (at indices 2i and 2i+1 for 1-based)
Nodes on the same level are contiguous in the array
Moving down one level roughly doubles the index
Moving up one level roughly halves the index

This locality has direct performance implications that we'll explore when discussing cache behavior.

The Mathematical Foundation

The implicit tree structure isn't magic—it rests on precise mathematical relationships. Let's formalize the key properties that make this representation work.

Property 1: Level calculation

A node at index i (0-based) resides at level ⌊log₂(i + 1)⌋.

For 1-based indexing, a node at index i resides at level ⌊log₂(i)⌋.

This follows from the observation that:

Level 0 contains index 0 (or 1 in 1-based)
Each subsequent level doubles in capacity
The first index of level k is 2^k - 1 (0-based) or 2^k (1-based)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// 0-based indexing
function getLevel0Based(index) {
    return Math.floor(Math.log2(index + 1));
}
 
// 1-based indexing
function getLevel1Based(index) {
    return Math.floor(Math.log2(index));
}
 
// Examples (0-based):
// getLevel0Based(0) = 0  (root)
// getLevel0Based(1) = 1  (level 1)
// getLevel0Based(2) = 1  (level 1)
// getLevel0Based(3) = 2  (level 2)
// getLevel0Based(6) = 2  (level 2)
// getLevel0Based(7) = 3  (level 3)

Property 2: Position within level

Given a node at index i (0-based), its position within its level (0-indexed from left) is:

position = i - (2^level - 1) = i - 2^⌊log₂(i+1)⌋ + 1

This calculation subtracts the starting index of the level from the node's index.

Property 3: Subtree boundaries

For a node at index i, its subtree contains specific index ranges. The descendants of node i span a contiguous range that can be calculated based on the node's level and position. While the exact formulas are complex, the key insight is that subtrees map to predictable array segments.

Practical Simplification

In practice, you rarely need to compute levels or positions explicitly. Heap algorithms primarily use parent-child navigation formulas, which we'll cover in depth on the next page. The mathematical foundation here serves to deepen understanding, not to add routine computation.

Property 4: Height relationship

For a heap with n elements, the height h is ⌊log₂(n)⌋. This is because:

We need at least 2^h slots to reach level h
Level h can contain up to 2^h nodes
So height = ⌊log₂(n)⌋

This logarithmic height is why heap operations achieve O(log n) time complexity—they traverse at most h + 1 levels.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
function heapHeight(n) {
    if (n <= 0) return -1; // Empty heap
    return Math.floor(Math.log2(n));
}
 
// Examples:
// heapHeight(1)  = 0   (just root)
// heapHeight(2)  = 1   (root + one child)
// heapHeight(3)  = 1   (root + two children)
// heapHeight(4)  = 2   (level 2 started)
// heapHeight(7)  = 2   (level 2 complete)
// heapHeight(8)  = 3   (level 3 started)
// heapHeight(15) = 3   (level 3 complete)
// heapHeight(16) = 4   (level 4 started)

Why Implicit Representation Beats Explicit Pointers

You might wonder: "Pointer-based trees work fine for other structures. Why is array representation better for heaps?"

The answer involves multiple dimensions of improvement: memory, performance, simplicity, and cache behavior. Let's examine each.

Pointer-Based Tree

•Each node stores: value + left pointer + right pointer + (optional) parent pointer
•On 64-bit systems: 8 bytes per pointer
•Overhead per node: 16-24 bytes minimum
•For n nodes: n allocations scattered in memory
•Navigation requires pointer dereferencing
•Poor cache locality—nodes are scattered

Array-Based Heap

•Each element stores: value only
•Zero pointer overhead
•Overhead per element: zero
•For n elements: single contiguous allocation
•Navigation is index arithmetic (no dereferencing)
•Excellent cache locality—sequential memory access

Memory overhead comparison:

Consider storing 1 million 32-bit integers in a heap:

Pointer-based (with parent pointers):

Per node: 4 bytes (value) + 8 bytes (left) + 8 bytes (right) + 8 bytes (parent) = 28 bytes
Plus allocation overhead: ~16 bytes per malloc (header + alignment)
Total: ~44 bytes × 1M = ~44 MB
Plus object metadata if using a language with managed memory

Array-based:

Per element: 4 bytes (value)
One allocation for entire array
Total: 4 bytes × 1M = 4 MB

That's an 11x memory reduction. For memory-constrained systems or heaps with billions of elements, this difference is transformative.

Memory Matters More Than You Think

Modern systems have abundant RAM, but memory efficiency still matters. Smaller data structures mean: (1) Better CPU cache utilization, (2) Lower memory bandwidth consumption, (3) Reduced garbage collection pressure, (4) Better scalability for large datasets. A heap that fits in L3 cache performs dramatically differently than one that spills to main memory.

The Contiguity Advantage

Perhaps the most significant performance benefit of array-based heaps comes from memory contiguity. When heap elements are stored in a single, contiguous array, memory access patterns become highly predictable.

CPU cache fundamentals:

Modern CPUs don't fetch individual bytes from RAM; they fetch cache lines—typically 64 bytes at a time. When you access one element, the CPU speculatively loads its neighbors into cache. This prefetching is:

Brilliant for arrays: Accessing element i likely brings elements i+1 through i+15 (for 4-byte integers) into cache. Subsequent accesses are nearly instant.
Terrible for scattered pointers: Following a pointer to a heap-allocated node brings one node into cache. The next pointer might jump to a completely different memory region—a cache miss.

Heap operations and cache behavior:

Consider a bubbleUp operation (moving an element up the tree after insertion):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Bubble-up from index 15 to root (4 swaps needed):
 
Array-based access pattern:
  Index 15 → Index 7 → Index 3 → Index 1 → Index 0
  Memory: addr[15] → addr[7] → addr[3] → addr[1] → addr[0]
  
  All addresses within same 64-byte cache line window!
  Likely 1-2 cache misses total (if heap fits in cache).
 
Pointer-based access pattern:
  Node* n15 → n15->parent → n7->parent → n3->parent → n1->parent
  Memory: 0x7f8a... → 0x3b2c... → 0xa91f... → 0x5e7d... → 0x12ab...
  
  Each node potentially in different cache line!
  Potentially 4-5 cache misses.
 
Cache miss cost: ~100-300 CPU cycles each
Cache hit cost: ~1-4 CPU cycles
 
For 5 cache misses vs 1: ~400+ cycles wasted per bubble-up

The prefetching bonus:

Because array access patterns are predictable (we're doing simple arithmetic on indices), the CPU's hardware prefetcher can anticipate future accesses and load them proactively. Pointer chasing is fundamentally unpredictable—the CPU can't know where the next pointer leads without dereferencing it.

This is why array-based data structures often outperform linked structures even when both have the same asymptotic complexity. The O(log n) of an array heap and a pointer-based heap are the same in theory, but the constants differ significantly in practice.

The Modern Memory Hierarchy

On modern processors: L1 cache access: ~4 cycles, L2 cache: ~12 cycles, L3 cache: ~40 cycles, Main RAM: ~200+ cycles. An algorithm that stays in cache can be 50-100x faster than one that constantly misses. Array-based heaps stay in cache; scattered pointer trees often don't.

Implementation Simplicity

Beyond performance, array-based heaps are simply easier to implement correctly. The cognitive load of managing pointers, handling edge cases at tree boundaries, and maintaining parent references disappears.

No pointer management:

No malloc/free for individual nodes
No risk of memory leaks or dangling pointers
No null checks for missing children
No special cases for root (which has no parent pointer)

Simpler swaps:

Heap operations fundamentally involve swapping elements. Compare the swap implementations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Array-based swap: trivial
function swap(heap, i, j) {
    const temp = heap[i];
    heap[i] = heap[j];
    heap[j] = temp;
}
 
// Pointer-based swap: complex
function swapNodes(node1, node2) {
    // Must update: node1's parent, node2's parent,
    // node1's children, node2's children,
    // and handle special cases if node1 is parent of node2
    
    // Handle parent pointers
    if (node1.parent) {
        if (node1.parent.left === node1) {
            node1.parent.left = node2;
        } else {
            node1.parent.right = node2;
        }
    }
    // ... similar for node2's parent
    
    // Handle children pointers
    if (node1.left) node1.left.parent = node2;
    if (node1.right) node1.right.parent = node2;
    // ... similar for node2's children
    
    // Swap the parent/child links themselves
    // This gets very complex if node1 is direct parent of node2!
    // ... dozens more lines of careful pointer manipulation
}

Error resistance:

Pointer-based tree code is notoriously error-prone. Off-by-one errors in index calculations are frustrating but debuggable. Dangling pointer bugs or corrupted tree structure can be catastrophic and nearly impossible to diagnose.

The array-based approach trades pointer complexity for arithmetic simplicity. Getting the parent formula wrong produces wrong answers but doesn't corrupt memory or crash the program.

Correctness Through Simplicity

In software engineering, the simplest correct solution is usually the best. Array-based heaps exemplify this principle: by choosing the right representation, we eliminate entire categories of bugs and complexity, achieving both correctness and performance.

Limitations and Trade-offs

While array-based heap representation is overwhelmingly advantageous for the heap use case, intellectual honesty demands we acknowledge its limitations.

Limitation 1: Only works for complete binary trees

This representation is not general-purpose. If you need a tree with arbitrary shape—perhaps a BST that becomes unbalanced, or a tree built from specific structural requirements—array representation either doesn't work or wastes significant space.

Limitation 2: Resizing overhead

When the heap grows beyond array capacity, we must allocate a new, larger array and copy all elements. This is an O(n) operation. While amortized analysis shows this averages to O(1) per insertion (by doubling capacity), individual operations can spike in latency.

Pointer-based trees never need bulk copying—each insertion allocates one node.

Array vs Pointer Representation Trade-offs
Aspect	Array-Based	Pointer-Based
Applicable structures	Complete binary trees only	Any tree shape
Memory overhead	Zero per element	~24 bytes per node (64-bit)
Cache performance	Excellent	Poor
Insertion (worst case)	O(n) if resize needed	O(1) allocation
Deletion	O(log n), swap with last	O(log n), complex pointer updates
Implementation complexity	Low	High
Storage requirement	Exactly n slots for n elements	≥ n allocations, scattered

Limitation 3: Not suitable for persistent data structures

If you need to maintain historical versions of your heap (persistent or immutable heaps), array representation is problematic. Modifications require full array copies to preserve old versions. Pointer-based trees support path-copying techniques for O(log n) persistent updates.

Limitation 4: Merge operation

Merging two array-based heaps requires O(n) time (combine arrays, then heapify). Specialized mergeable heaps (like Fibonacci heaps or Binomial heaps) use pointer-based representations to achieve O(1) or O(log n) merge.

The bottom line:

For standard priority queue use cases—where we insert elements, extract the minimum/maximum, and occasionally peek—array-based heaps are unambiguously superior. The limitations only matter for specialized use cases that most applications never encounter.

Know Your Requirements

If you need to merge heaps frequently, consider pairing heaps or Fibonacci heaps. If you need persistent heaps, consider leftist heaps. But for 95% of use cases, the standard array-based binary heap is optimal. Don't over-engineer—use the simple solution when it suffices.

Summary: The Implicit Tree in Arrays

We've established the foundational insight that makes array-based heaps possible: complete binary trees have a gap-free level-order numbering that maps perfectly to array indices. This isn't just a cute observation—it's the basis for one of the most practically important data structure implementations.

Let's consolidate our key takeaways:

Key Takeaways

•Complete binary trees have gap-free level-order numbering — Every index from 0 to n-1 corresponds to a node, enabling direct array storage.
•Tree structure becomes implicit in index positions — Parent-child relationships are computed via simple formulas, not stored as pointers.
•Array representation eliminates pointer overhead — Memory usage drops dramatically compared to node-based implementations.
•Contiguous storage enables cache-efficient access — CPU prefetching and cache utilization improve significantly.
•Implementation simplicity reduces bugs — Swaps become trivial; pointer management disappears entirely.
•The trade-off is specialization — Array representation only works for complete binary trees, but heaps are exactly that structure.

What's next:

Now that we understand why we can represent heaps as arrays, we need the precise formulas for navigating the implicit tree. The next page covers the parent and child index formulas—the arithmetic that lets us traverse the tree structure encoded in the array. These formulas are simple, but understanding their derivation and selecting the right indexing convention (0-based vs 1-based) matters for correct implementation.

Page Complete

You now understand the foundational principle behind array-based heap representation: complete binary trees map perfectly to contiguous arrays, eliminating pointers while preserving structure through index arithmetic. This insight—seemingly simple—underlies one of the most elegant and efficient data structure implementations in computer science.

1 / 4

Loading learning content...

Data Structures & AlgorithmsHeaps & Priority Queues

Array-Based Heap Implementation

LevelIntermediate

Duration60 mins

TopicHeaps & Priority Queues

1 / 4

Implicit Tree Structure in an Array

The Elegance of Implicit Representation

What You Will Learn

The Complete Binary Tree Prerequisite

Before diving into array representation, we must firmly establish why this technique works specifically for complete binary trees and not for arbitrary trees.

Recall the definition of a complete binary tree:

A complete binary tree is a binary tree where:

Every level except possibly the last is completely filled
All nodes in the last level are as far left as possible

The Key Insight

Why arbitrary trees don't work:

Position 0: Root ✓
Position 1: Left child of root ✓
Position 2: Right child of root ✗ (doesn't exist)
Position 3: Left child of position 1 ✗ (doesn't exist)
Position 4: Right child of position 1 ✓
...

We'd have gaps—positions 2 and 3 are empty. To represent this tree in an array, we'd need to either:

Store null values in the gaps, wasting space and complicating logic
Use a sparse representation with explicit position markers, reintroducing complexity

Neither option preserves the elegant simplicity we're seeking. Completeness eliminates gaps, and eliminating gaps enables direct indexing.

Complete vs Non-Complete Trees: Storage Implications
Property	Complete Binary Tree	Non-Complete Binary Tree
Gap-free level-order	✓ Yes—every index 0 to n-1 maps to a node	✗ No—holes require null markers or indirection
Dense array storage	✓ O(n) space for n nodes, no waste	✗ May require O(2^h) space for h-level tree
Index-based navigation	✓ Simple formulas find parent/children	✗ Must store explicit pointers or markers
Cache performance	✓ Excellent—contiguous memory access	✗ Poor—scattered null values break patterns

Level-Order Numbering: The Foundation of Array Mapping

Let's establish this concretely with an example. Consider a complete binary tree with 10 nodes:

1
2
3
4
5
6
7
8
9
10
11
12
                    [0]
                   /     \
                [1]       [2]
               /   \     /   \
            [3]    [4] [5]    [6]
           /   \   /
        [7]   [8][9]
 
Level 0: Node at index 0 (root)
Level 1: Nodes at indices 1, 2
Level 2: Nodes at indices 3, 4, 5, 6
Level 3: Nodes at indices 7, 8, 9 (last level, filled left-to-right)

The level-order principle:

At each level k, the first node has index 2^k - 1 (0-based), and there are at most 2^k nodes. The levels fill sequentially:

Level 0: 1 node (index 0)
Level 1: 2 nodes (indices 1, 2)
Level 2: 4 nodes (indices 3, 4, 5, 6)
Level 3: 8 nodes (indices 7, 8, ..., 14, if fully filled)

This exponential growth of level capacity is intrinsic to binary trees, and the level-order numbering captures this structure perfectly.

0-Based vs 1-Based Indexing

The mapping to array:

Given the level-order numbering, the array representation is trivial. We simply place each node's value at its corresponding index:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Tree nodes (with values):
                     50
                   /     \
                 30       70
                /   \    /   \
              20    40  60    80
             /  \   /
           10  25  35
 
Array (0-indexed):
Index:   0    1    2    3    4    5    6    7    8    9
Value: [50 | 30 | 70 | 20 | 40 | 60 | 80 | 10 | 25 | 35]
        ↑     ↑    ↑     ↑    ↑    ↑    ↑    ↑    ↑    ↑
       root  L0   R0    L1   R1   L2   R2  L3   R3   L4
       
Where L0/R0 = left/right child of index 0, etc.

This isn't merely a storage optimization; it's a fundamentally different representation paradigm. The tree exists in the array, but only through the lens of index arithmetic.

Visualizing the Implicit Structure

Mental model 1: Layered segments

Think of the array as divided into segments, one per tree level:

1
2
3
4
5
6
7
8
9
10
Index:     0  |  1    2  |  3    4    5    6  |  7    8    9   10   11   12   13  |
Level:     0  |     1     |           2         |               3                   |
           ─────────────────────────────────────────────────────────────────────────
           ↑       ↑                ↑                          ↑
         Root    2 nodes         4 nodes                    8 nodes (max)
 
Level k contains indices from (2^k - 1) to (2^(k+1) - 2), inclusive (0-based).
 
For 1-based indexing:
Level k contains indices from 2^k to 2^(k+1) - 1, inclusive.

Mental model 2: Binary expansion

After the leading 1, a 0-bit means "go left"
A 1-bit means "go right"

For example, index 11 in binary is 1011:

Leading 1: start at root
0: go left
1: go right
1: go right

Following this path from the root gives you the node at index 11.

Binary Representation and Tree Navigation (1-Based Indexing)
Index	Binary	Path from Root	Level
1	1	(root)	0
2	10	left	1
3	11	right	1
4	100	left → left	2
5	101	left → right	2
6	110	right → left	2
7	111	right → right	2
11	1011	left → right → right	3

Why This Matters for Debugging

Mental model 3: Parent-child locality

Another useful observation: nodes that are close in the tree tend to be close in the array. Specifically:

A node's children are always nearby (at indices 2i and 2i+1 for 1-based)
Nodes on the same level are contiguous in the array
Moving down one level roughly doubles the index
Moving up one level roughly halves the index

This locality has direct performance implications that we'll explore when discussing cache behavior.

The Mathematical Foundation

The implicit tree structure isn't magic—it rests on precise mathematical relationships. Let's formalize the key properties that make this representation work.

Property 1: Level calculation

A node at index i (0-based) resides at level ⌊log₂(i + 1)⌋.

For 1-based indexing, a node at index i resides at level ⌊log₂(i)⌋.

This follows from the observation that:

Level 0 contains index 0 (or 1 in 1-based)
Each subsequent level doubles in capacity
The first index of level k is 2^k - 1 (0-based) or 2^k (1-based)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// 0-based indexing
function getLevel0Based(index) {
    return Math.floor(Math.log2(index + 1));
}
 
// 1-based indexing
function getLevel1Based(index) {
    return Math.floor(Math.log2(index));
}
 
// Examples (0-based):
// getLevel0Based(0) = 0  (root)
// getLevel0Based(1) = 1  (level 1)
// getLevel0Based(2) = 1  (level 1)
// getLevel0Based(3) = 2  (level 2)
// getLevel0Based(6) = 2  (level 2)
// getLevel0Based(7) = 3  (level 3)

Property 2: Position within level

Given a node at index i (0-based), its position within its level (0-indexed from left) is:

position = i - (2^level - 1) = i - 2^⌊log₂(i+1)⌋ + 1

This calculation subtracts the starting index of the level from the node's index.

Property 3: Subtree boundaries

Practical Simplification

Property 4: Height relationship

For a heap with n elements, the height h is ⌊log₂(n)⌋. This is because:

We need at least 2^h slots to reach level h
Level h can contain up to 2^h nodes
So height = ⌊log₂(n)⌋

This logarithmic height is why heap operations achieve O(log n) time complexity—they traverse at most h + 1 levels.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
function heapHeight(n) {
    if (n <= 0) return -1; // Empty heap
    return Math.floor(Math.log2(n));
}
 
// Examples:
// heapHeight(1)  = 0   (just root)
// heapHeight(2)  = 1   (root + one child)
// heapHeight(3)  = 1   (root + two children)
// heapHeight(4)  = 2   (level 2 started)
// heapHeight(7)  = 2   (level 2 complete)
// heapHeight(8)  = 3   (level 3 started)
// heapHeight(15) = 3   (level 3 complete)
// heapHeight(16) = 4   (level 4 started)

Why Implicit Representation Beats Explicit Pointers

You might wonder: "Pointer-based trees work fine for other structures. Why is array representation better for heaps?"

The answer involves multiple dimensions of improvement: memory, performance, simplicity, and cache behavior. Let's examine each.

Pointer-Based Tree

•Each node stores: value + left pointer + right pointer + (optional) parent pointer
•On 64-bit systems: 8 bytes per pointer
•Overhead per node: 16-24 bytes minimum
•For n nodes: n allocations scattered in memory
•Navigation requires pointer dereferencing
•Poor cache locality—nodes are scattered

Array-Based Heap

•Each element stores: value only
•Zero pointer overhead
•Overhead per element: zero
•For n elements: single contiguous allocation
•Navigation is index arithmetic (no dereferencing)
•Excellent cache locality—sequential memory access

Memory overhead comparison:

Consider storing 1 million 32-bit integers in a heap:

Pointer-based (with parent pointers):

Per node: 4 bytes (value) + 8 bytes (left) + 8 bytes (right) + 8 bytes (parent) = 28 bytes
Plus allocation overhead: ~16 bytes per malloc (header + alignment)
Total: ~44 bytes × 1M = ~44 MB
Plus object metadata if using a language with managed memory

Array-based:

Per element: 4 bytes (value)
One allocation for entire array
Total: 4 bytes × 1M = 4 MB

That's an 11x memory reduction. For memory-constrained systems or heaps with billions of elements, this difference is transformative.

Memory Matters More Than You Think

The Contiguity Advantage

CPU cache fundamentals:

Brilliant for arrays: Accessing element i likely brings elements i+1 through i+15 (for 4-byte integers) into cache. Subsequent accesses are nearly instant.
Terrible for scattered pointers: Following a pointer to a heap-allocated node brings one node into cache. The next pointer might jump to a completely different memory region—a cache miss.

Heap operations and cache behavior:

Consider a bubbleUp operation (moving an element up the tree after insertion):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Bubble-up from index 15 to root (4 swaps needed):
 
Array-based access pattern:
  Index 15 → Index 7 → Index 3 → Index 1 → Index 0
  Memory: addr[15] → addr[7] → addr[3] → addr[1] → addr[0]
  
  All addresses within same 64-byte cache line window!
  Likely 1-2 cache misses total (if heap fits in cache).
 
Pointer-based access pattern:
  Node* n15 → n15->parent → n7->parent → n3->parent → n1->parent
  Memory: 0x7f8a... → 0x3b2c... → 0xa91f... → 0x5e7d... → 0x12ab...
  
  Each node potentially in different cache line!
  Potentially 4-5 cache misses.
 
Cache miss cost: ~100-300 CPU cycles each
Cache hit cost: ~1-4 CPU cycles
 
For 5 cache misses vs 1: ~400+ cycles wasted per bubble-up

The prefetching bonus:

The Modern Memory Hierarchy

Implementation Simplicity

No pointer management:

No malloc/free for individual nodes
No risk of memory leaks or dangling pointers
No null checks for missing children
No special cases for root (which has no parent pointer)

Simpler swaps:

Heap operations fundamentally involve swapping elements. Compare the swap implementations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Array-based swap: trivial
function swap(heap, i, j) {
    const temp = heap[i];
    heap[i] = heap[j];
    heap[j] = temp;
}
 
// Pointer-based swap: complex
function swapNodes(node1, node2) {
    // Must update: node1's parent, node2's parent,
    // node1's children, node2's children,
    // and handle special cases if node1 is parent of node2
    
    // Handle parent pointers
    if (node1.parent) {
        if (node1.parent.left === node1) {
            node1.parent.left = node2;
        } else {
            node1.parent.right = node2;
        }
    }
    // ... similar for node2's parent
    
    // Handle children pointers
    if (node1.left) node1.left.parent = node2;
    if (node1.right) node1.right.parent = node2;
    // ... similar for node2's children
    
    // Swap the parent/child links themselves
    // This gets very complex if node1 is direct parent of node2!
    // ... dozens more lines of careful pointer manipulation
}

Error resistance:

The array-based approach trades pointer complexity for arithmetic simplicity. Getting the parent formula wrong produces wrong answers but doesn't corrupt memory or crash the program.

Correctness Through Simplicity

Limitations and Trade-offs

While array-based heap representation is overwhelmingly advantageous for the heap use case, intellectual honesty demands we acknowledge its limitations.

Limitation 1: Only works for complete binary trees

Limitation 2: Resizing overhead

Pointer-based trees never need bulk copying—each insertion allocates one node.

Array vs Pointer Representation Trade-offs
Aspect	Array-Based	Pointer-Based
Applicable structures	Complete binary trees only	Any tree shape
Memory overhead	Zero per element	~24 bytes per node (64-bit)
Cache performance	Excellent	Poor
Insertion (worst case)	O(n) if resize needed	O(1) allocation
Deletion	O(log n), swap with last	O(log n), complex pointer updates
Implementation complexity	Low	High
Storage requirement	Exactly n slots for n elements	≥ n allocations, scattered

Limitation 3: Not suitable for persistent data structures

Limitation 4: Merge operation

The bottom line:

Know Your Requirements

Summary: The Implicit Tree in Arrays

Let's consolidate our key takeaways:

Key Takeaways

•Complete binary trees have gap-free level-order numbering — Every index from 0 to n-1 corresponds to a node, enabling direct array storage.
•Tree structure becomes implicit in index positions — Parent-child relationships are computed via simple formulas, not stored as pointers.
•Array representation eliminates pointer overhead — Memory usage drops dramatically compared to node-based implementations.
•Contiguous storage enables cache-efficient access — CPU prefetching and cache utilization improve significantly.
•Implementation simplicity reduces bugs — Swaps become trivial; pointer management disappears entirely.
•The trade-off is specialization — Array representation only works for complete binary trees, but heaps are exactly that structure.

What's next:

Page Complete

1 / 4