Loading content...
Throughout our study of trees—binary trees, BSTs, and now heaps—we've used pointer-based representations. Each node contains data plus pointers to its children (and sometimes parent). This is intuitive: the tree structure maps directly to the pointer structure.
But heaps break from this pattern. Despite being tree-structured conceptually, heaps are almost universally implemented as arrays, with no pointers at all. The tree structure is implicit in the array indices.
This isn't a mere implementation trick—it's a fundamental insight about the relationship between complete binary trees and contiguous sequences. Understanding this connection deeply will clarify not just heap implementation, but also segment trees, tournament trees, and other array-based tree structures.
By the end of this page, you will understand why arbitrary binary trees cannot be efficiently stored in arrays, how the complete property creates a bijection between trees and arrays, the formal proof that array storage is always efficient for complete trees, and the practical implementation of implicit tree representation.
Before appreciating why complete trees are special, let's understand why arbitrary binary trees cannot be efficiently stored in arrays using index arithmetic.
The Standard Index Convention:
For any binary tree, we could attempt array storage using:
This formula works perfectly if we're willing to store 'null' or 'empty' markers for missing nodes. But here's the problem: for unbalanced trees, the number of array positions needed grows exponentially worse than the number of nodes.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
CASE 1: Skewed Tree (Linked List Shape) — n = 4 nodes━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Tree Structure: (A) Index allocation (using 2i for left, 2i+1 for right): \ (B) Position 1: A (root) \ Position 3: B (right child of 1) (C) Position 7: C (right child of 3) \ Position 15: D (right child of 7) (D) Array (indices 1-15): [A, -, B, -, -, -, C, -, -, -, -, -, -, -, D] Efficiency: 4 nodes, 15 positions = 26.7% utilization ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━CASE 2: Longer Skewed Tree — n = 10 nodes━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ With 10 right-only nodes: Node 1 at index 1 Node 2 at index 3 Node 3 at index 7 Node 4 at index 15 Node 5 at index 31 ... Node 10 at index 2^10 - 1 = 1023 Required array size: 1023 positions for 10 nodes = 0.98% utilization ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━CASE 3: General Unbalanced Tree — n = 8 nodes━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (1) / \ (2) (3) / \ (4) (7) / \ \ (8)(9) (15) / (30) / (60) Required index: 60 (at least)Array size: 60+ positions for 8 nodes ≈ 13% utilizationThe Exponential Blowup:
For a tree of height h:
Ratio: Array size ÷ Node count = 2^h ÷ (h+1), which is exponential in h.
For h = 20 (a mere 21-node skewed tree):
This is why general binary trees use pointer-based representation—it scales linearly with node count regardless of shape.
The problem is that arbitrary trees have 'gaps' in their level-order numbering. Missing nodes create holes in the index sequence. We either waste space storing these holes, or we need a separate mapping from logical indices to packed storage—which defeats the simplicity benefit of array representation.
Complete binary trees are precisely the class of trees where the level-order index sequence has no gaps. Let's prove this formally and understand why it enables perfect array storage.
Theorem: Complete ⟺ No Gaps in Level-Order Indices
A binary tree is complete if and only if, when nodes are numbered 1, 2, 3, ... in level-order, the set of indices used is exactly {1, 2, 3, ..., n} for some n.
Proof (⟹ direction, Complete → No Gaps):
Assume T is a complete binary tree with n nodes. We prove the indices are {1, 2, ..., n}.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
COMPLETE TREE (n = 10): No Gaps━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Level 0: [1] / \Level 1: [2] [3] / \ / \Level 2: [4] [5] [6] [7] / \ /Level 3: [8][9][10] Indices used: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}Gaps: NONEArray: [_, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] (index 0 unused) Array utilization: 10 nodes / 10 positions = 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━NOT COMPLETE (gap at position 6): Has Gaps━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Level 0: [1] / \Level 1: [2] [3] / \ \Level 2: [4] [5] [7] ← Position 6 is missing! Indices used: {1, 2, 3, 4, 5, 7}Gaps: Position 6 is missingArray must be: [_, 1, 2, 3, 4, 5, -, 7] To use index formulas correctly: Position 6 must exist in array (storing null/sentinel) Otherwise, children of 3 would compute wrong indices ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━VERIFICATION: Why Level-Order Must Be Gapless for Complete━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ The level-order filling rule states: "All nodes at level k exist before any node at level k+1" "Within level k, all positions fill left-to-right with no gaps" Consequence: If node at position i exists, ALL positions 1 through i-1 must exist. Therefore: n nodes occupy exactly positions 1, 2, ..., n. No gaps possible.Proof (⟸ direction, No Gaps → Complete):
Assume T has nodes at indices {1, 2, ..., n} with no gaps. We prove T is complete.
The Bijection:
This theorem establishes a one-to-one correspondence between:
Every complete tree maps to exactly one array, and every array of length n defines exactly one complete binary tree. This is the mathematical foundation of implicit heap representation.
An implicit data structure is one where relationships between elements are encoded in their positions rather than stored explicitly as pointers. The array representation of complete binary trees is the canonical example of implicit structure.
Explicit vs Implicit Representation:
| Aspect | Explicit (Pointer-Based) | Implicit (Array-Based) |
|---|---|---|
| Parent access | Follow parent pointer | Compute i // 2 |
| Child access | Follow left/right pointer | Compute 2i, 2i+1 |
| Navigation cost | Pointer dereference | Arithmetic + array access |
| Memory for structure | 2-3 pointers per node | Zero (structure is free) |
| Shape constraint | None | Must be complete |
| Locality | Poor (scattered nodes) | Excellent (contiguous) |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
import sysfrom dataclasses import dataclassfrom typing import Optional # ===============================================# EXPLICIT (POINTER-BASED) TREE NODE# ===============================================@dataclassclass ExplicitNode: """Traditional pointer-based tree node.""" value: int left: Optional['ExplicitNode'] = None right: Optional['ExplicitNode'] = None parent: Optional['ExplicitNode'] = None # Optional, adds more overhead def explicit_node_memory(): """Estimate memory usage for pointer-based node.""" # Python object overhead: ~56 bytes # Reference (pointer): 8 bytes each # Integer value: ~28 bytes (Python int object) # Total with 3 pointers: ~56 + 28 + 3*8 = ~108 bytes per node return "~108 bytes per node (Python), ~40 bytes (C struct with pointers)" # ===============================================# IMPLICIT (ARRAY-BASED) TREE# ===============================================class ImplicitCompleteTree: """Array-based complete binary tree with implicit structure.""" def __init__(self, capacity: int = 16): # Index 0 unused for cleaner arithmetic self._data = [None] * (capacity + 1) self._size = 0 def _parent(self, i: int) -> int: return i >> 1 # i // 2, but faster def _left(self, i: int) -> int: return i << 1 # 2 * i, but faster def _right(self, i: int) -> int: return (i << 1) | 1 # 2 * i + 1, but faster def get_value(self, i: int) -> Optional[int]: if 1 <= i <= self._size: return self._data[i] return None def set_value(self, i: int, value: int): if 1 <= i <= self._size: self._data[i] = value def get_parent_value(self, i: int) -> Optional[int]: """Navigate to parent via index arithmetic—no pointer chase.""" parent_idx = self._parent(i) return self.get_value(parent_idx) def get_children_values(self, i: int) -> tuple[Optional[int], Optional[int]]: """Get children via index arithmetic.""" left_val = self.get_value(self._left(i)) right_val = self.get_value(self._right(i)) return (left_val, right_val) # ===============================================# MEMORY COMPARISON# ===============================================def memory_comparison(n: int): """Compare memory usage for n nodes.""" # Pointer-based (conservative estimate for compiled languages) # Node: value (8 bytes) + 2 pointers (16 bytes) = 24 bytes minimum # With parent pointer: 32 bytes explicit_per_node = 24 # Without parent explicit_total = n * explicit_per_node # Array-based # Just the values, stored contiguously: 8 bytes per value (64-bit) # Plus one unused index 0: negligible implicit_per_node = 8 implicit_total = n * implicit_per_node savings_bytes = explicit_total - implicit_total savings_percent = (savings_bytes / explicit_total) * 100 return { 'nodes': n, 'explicit_total_kb': explicit_total / 1024, 'implicit_total_kb': implicit_total / 1024, 'savings_kb': savings_bytes / 1024, 'savings_percent': savings_percent } # Print comparison for various sizesfor n in [1000, 10000, 100000, 1000000]: comp = memory_comparison(n) print(f"n = {comp['nodes']:,}:") print(f" Explicit: {comp['explicit_total_kb']:.1f} KB") print(f" Implicit: {comp['implicit_total_kb']:.1f} KB") print(f" Savings: {comp['savings_kb']:.1f} KB ({comp['savings_percent']:.0f}%)") print()The Implicit Representation Formulas:
For 1-indexed arrays (index 0 unused):
parent(i) = i / 2 (integer division)
left(i) = 2 * i
right(i) = 2 * i + 1
For 0-indexed arrays (index 0 is root):
parent(i) = (i - 1) / 2 (integer division)
left(i) = 2 * i + 1
right(i) = 2 * i + 2
Most heap implementations use 1-indexing for cleaner formulas, sacrificing one array position.
The 1-indexed formulas can be implemented with single-cycle bit operations: parent(i) = i >> 1, left(i) = i << 1, right(i) = (i << 1) | 1. These are among the fastest operations on any CPU, taking a single clock cycle. Pointer dereferences require memory access (10-100+ cycles if cache miss).
The implicit representation only works correctly when the tree is complete. Let's examine exactly what fails when this constraint is violated.
Case Study: Non-Complete Tree Array Representation
Consider a tree where node 5 has a right child but no left child (violating completeness):
(1)
/ \
(2) (3)
/ \
(4) (5)
\
(11) ← Right child only
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
PROBLEM: Non-Complete Tree with Index Formulas Tree Structure: 1 / \ 2 3 / \ 4 5 \ 11 (right child of 5, no left child) Level-order indices: 1 → index 1 2 → index 2 3 → index 3 4 → index 4 5 → index 5 (position 10 = left child of 5: EMPTY) 11 → index 11 (right child of 5 by formula: 2*5+1=11) Array Representation:Index: 0 1 2 3 4 5 6 7 8 9 10 11Value: - 1 2 3 4 5 - - - - - 11 Problems:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1. WASTED SPACE: 6 nodes require 11 positions (55% waste) 2. INDEX HOLES: Indices 6, 7, 8, 9, 10 are empty but must be tracked 3. VALIDITY COMPLEXITY: - Can't just check "i <= n" for node existence - Need separate tracking of which indices are occupied - Storage for "occupied" bitmask adds overhead 4. NAVIGATION ISSUES: - left(5) = 10, but position 10 is empty - Must check "is position occupied" at every navigation step - Loses O(1) simplicity of implicit structure 5. INSERTION CONFUSION: - Where does next node go? Not obvious - Need to track "first empty index" or similar - Breaks the simple "n+1" insertion rule COMPLETE TREE: Same 6 nodes, but MUST include position 6 1 / \ 2 3 / \ / 4 5 6 ← Node 6 is mandatory! Array: [-, 1, 2, 3, 4, 5, 6]All indices 1-6 used. Zero waste. Simple validity: 1 ≤ i ≤ 6.The Only-Complete Theorem:
Implicit array representation with the parent/child index formulas is space-efficient if and only if the tree is complete.
Proof:
(⟹) If the tree is complete:
(⟸) If space efficiency is 100%:
Necessity of Completeness:
If we relax completeness:
There's no way to have implicit structure, arbitrary shape, AND space efficiency. Completeness is the price for implicit representation.
Heaps don't need arbitrary shape because they don't maintain BST ordering. The heap property (parent ≥ children) doesn't depend on tree shape. So we CAN enforce completeness without losing functionality—and gain major efficiency benefits.
Let's examine the key implementation patterns for array-based complete binary trees, which form the foundation of heap implementations.
Pattern 1: Capacity Management
Arrays have fixed capacity, but heaps need to grow. Standard approach:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182
from typing import TypeVar, Generic, Optional, Iterator T = TypeVar('T') class ArrayCompleteTree(Generic[T]): """ Complete binary tree backed by a dynamic array. Forms the structural foundation of binary heap implementations. """ DEFAULT_CAPACITY = 16 def __init__(self, initial_capacity: int = DEFAULT_CAPACITY): """Initialize with given capacity. Position 0 is unused.""" self._capacity = initial_capacity self._data: list[Optional[T]] = [None] * (initial_capacity + 1) self._size = 0 # ========================================= # SIZE AND CAPACITY MANAGEMENT # ========================================= def __len__(self) -> int: """Return number of elements.""" return self._size def is_empty(self) -> bool: """Check if tree is empty.""" return self._size == 0 def _ensure_capacity(self, required: int) -> None: """Grow array if needed. Amortized O(1) via doubling.""" if required > self._capacity: new_capacity = self._capacity * 2 while new_capacity < required: new_capacity *= 2 new_data: list[Optional[T]] = [None] * (new_capacity + 1) for i in range(self._size + 1): new_data[i] = self._data[i] self._data = new_data self._capacity = new_capacity # ========================================= # INDEX ARITHMETIC (1-INDEXED) # ========================================= @staticmethod def parent(i: int) -> int: """Parent index. Root's parent is 0 (sentinel).""" return i >> 1 @staticmethod def left(i: int) -> int: """Left child index.""" return i << 1 @staticmethod def right(i: int) -> int: """Right child index.""" return (i << 1) | 1 def has_left(self, i: int) -> bool: """Check if node at i has a left child.""" return self.left(i) <= self._size def has_right(self, i: int) -> bool: """Check if node at i has a right child.""" return self.right(i) <= self._size def is_leaf(self, i: int) -> bool: """Check if node at i is a leaf.""" return not self.has_left(i) def is_valid_index(self, i: int) -> bool: """Check if index i corresponds to an existing node.""" return 1 <= i <= self._size # ========================================= # STRUCTURAL OPERATIONS # ========================================= def append(self, value: T) -> int: """ Add value at next position (maintains completeness). Returns the index where value was placed. """ self._size += 1 self._ensure_capacity(self._size) self._data[self._size] = value return self._size def remove_last(self) -> T: """ Remove and return last element (maintains completeness). """ if self.is_empty(): raise IndexError("Tree is empty") value = self._data[self._size] self._data[self._size] = None # Help garbage collection self._size -= 1 return value # type: ignore def get(self, i: int) -> T: """Get value at index i.""" if not self.is_valid_index(i): raise IndexError(f"Invalid index: {i}") return self._data[i] # type: ignore def set(self, i: int, value: T) -> None: """Set value at index i.""" if not self.is_valid_index(i): raise IndexError(f"Invalid index: {i}") self._data[i] = value def swap(self, i: int, j: int) -> None: """Swap values at indices i and j.""" self._data[i], self._data[j] = self._data[j], self._data[i] # ========================================= # TRAVERSAL # ========================================= def level_order(self) -> Iterator[T]: """Iterate in level order (which is just array order!).""" for i in range(1, self._size + 1): yield self._data[i] # type: ignore def get_level(self, i: int) -> int: """Return the level of node at index i (0-based level).""" if i < 1: raise ValueError("Index must be positive") return i.bit_length() - 1 # ========================================= # DEBUGGING UTILITIES # ========================================= def _repr_tree(self, i: int = 1, prefix: str = "", is_left: bool = True) -> str: """Generate visual tree representation for debugging.""" if not self.is_valid_index(i): return "" result = "" if self.has_right(i): result += self._repr_tree( self.right(i), prefix + ("│ " if is_left else " "), False ) result += prefix + ("└── " if is_left else "┌── ") + str(self._data[i]) + "\n" if self.has_left(i): result += self._repr_tree( self.left(i), prefix + (" " if is_left else "│ "), True ) return result def __repr__(self) -> str: if self.is_empty(): return "ArrayCompleteTree(empty)" return f"ArrayCompleteTree(size={self._size}):\n{self._repr_tree()}" # Demonstrationtree = ArrayCompleteTree[str]()for char in "ABCDEFGHIJ": tree.append(char) print(f"Size: {len(tree)}")print(f"Level order: {list(tree.level_order())}")print(f"Node at 3: {tree.get(3)}, parent: {tree.get(tree.parent(3))}")print(f"Children of 2: {tree.get(tree.left(2))}, {tree.get(tree.right(2))}")print()print(tree)Pattern 2: Bounds Checking
The simplicity of bounds checking is a major benefit:
All checks are single comparisons against n—no pointer null checks needed.
Pattern 3: Level Computation
Level of node i: ⌊log₂(i)⌋ which equals bit_length(i) - 1 in most languages. This is a single CPU instruction on modern hardware.
The complete binary tree array pattern extends beyond heaps to other important data structures. Understanding this generality reinforces the importance of the pattern.
Segment Trees
Segment trees store aggregate information (sum, min, max, etc.) over array ranges:
Tournament Trees
Used for k-way merging and selection:
The Common Thread:
All these structures share the insight that completeness enables implicit storage. They differ in:
But the structural foundation—complete tree in an array—is the same. Mastering this pattern unlocks understanding of many advanced data structures.
Complete tree array storage is excellent when you only need operations that the structure supports. For BSTs (which need ordered iteration, predecessor/successor, etc.), the complete constraint is incompatible with the ordering requirement, so pointer-based or other representations are necessary.
We've thoroughly explored why the complete binary tree structure enables efficient array-based storage—the key insight that makes heap implementations practical.
What's Next:
With the array representation understood, we turn to the final piece of the complete binary tree story: the height guarantee. We'll prove that complete binary trees always have height O(log n), which directly bounds the time complexity of heap operations.
You now understand the deep connection between complete binary trees and array storage. The gapless index sequence is the key property that enables implicit representation, eliminating pointer overhead while maintaining O(1) access to all structural relationships.