What Is A Heap - Learning Module

Loading content...

0/279

Complete Binary Tree Structure

The Shape That Makes Everything Work

We've established the heap property—the ordering invariant that places the extreme element at the root. But ordering alone doesn't guarantee efficiency. Consider a degenerate tree where every node has only a left child: it could satisfy the heap property while being essentially a linked list, with O(n) height and O(n) operations.

The complete binary tree structure is the second invariant that prevents this catastrophe. It constrains the tree's shape, not its values, ensuring optimal height and enabling a remarkable property: we can represent the entire tree using just an array, with no pointers at all.

This page explores the complete binary tree in depth—what it is, why heaps require it, and how this shape unlocks the array representation that makes heaps memory-efficient and cache-friendly.

What You Will Learn

By the end of this page, you will: (1) Define a complete binary tree precisely and distinguish it from other tree types, (2) Prove that complete binary trees have optimal height ⌊log₂ n⌋, (3) Understand the level-by-level filling pattern, (4) See how completeness enables pointer-free array representation, (5) Appreciate the cache locality and memory efficiency implications.

Formal Definition of a Complete Binary Tree

Let's establish the precise definition:

Definition (Complete Binary Tree): A binary tree is complete if all levels except possibly the last are completely filled, and all nodes in the last level are as far left as possible.

This deceptively simple definition has profound implications. Let's unpack each part:

Deconstructing the Definition

•'All levels except possibly the last' — Levels 0 through h-1 (where h is height) must be completely full. Every position that can exist at these levels does exist.
•'Completely filled' — A level is completely filled if it has exactly 2^level nodes. Level 0 has 1 node (the root), level 1 has 2 nodes, level 2 has 4 nodes, etc.
•'Last level as far left as possible' — In the last level, nodes fill positions from left to right with no gaps. If there are k nodes in the last level, they occupy positions 0 through k-1 at that level.

Visual Examples:

Complete Binary Tree (valid):

         A              Level 0: 1 node (full)
       /   \           
      B     C          Level 1: 2 nodes (full)
     / \   /
    D   E F            Level 2: 3 nodes (left-filled)

NOT Complete (gap in last level):

         A              Level 0: 1 node (full)
       /   \           
      B     C          Level 1: 2 nodes (full)
     /     / 
    D     E   F        Level 2: gap between D and E ✗

NOT Complete (missing node in non-last level):

         A              Level 0: 1 node (full)
       /              
      B                Level 1: only 1 node ✗
     / 
    D   E              This level has nodes but level 1 is incomplete

Complete vs Full vs Perfect

Three related terms that students often confuse:

Complete: All levels full except last; last level left-filled. (What heaps require)

Full: Every node has 0 or 2 children—no nodes have exactly 1 child. (Not required for heaps)

Perfect: All levels completely filled, including the last. (A special case of complete)

Why Completeness Guarantees Optimal Height

The most important consequence of completeness is the height guarantee:

Height Theorem: A complete binary tree with n nodes has height exactly ⌊log₂ n⌋.

This is the optimal height for any binary tree with n nodes—you cannot have a shorter binary tree containing n nodes.

Proof of Height Bound

Proof: A binary tree of height h has at most 1 + 2 + 4 + ... + 2^h = 2^(h+1) - 1 nodes (when perfect). A complete tree of height h has at least 2^h nodes (when only root exists at level h) and at most 2^(h+1) - 1 nodes (when perfect).

Thus: 2^h ≤ n ≤ 2^(h+1) - 1

Taking logs: h ≤ log₂ n < h + 1

Therefore: h = ⌊log₂ n⌋ ∎

Why Height Matters for Performance:

Heap operations (insert, extract, bubble-up, bubble-down) all traverse a path from a node to the root or to a leaf. The length of the longest such path is exactly the height of the tree.

n	Height ⌊log₂ n⌋	Operations per heap action
7	2	≤ 3 swaps
15	3	≤ 4 swaps
1,000	9	≤ 10 swaps
1,000,000	19	≤ 20 swaps
1,000,000,000	29	≤ 30 swaps

This is extraordinary. Even for a billion elements, heap operations involve at most 30 comparisons and swaps. The completeness property guarantees this bound—without it, a degenerate tree could require n operations.

Contrast: Non-Complete BST Worst Case

Consider inserting sorted data into an unbalanced BST: 1, 2, 3, 4, 5, ..., n

The result is a degenerate tree (linked list):

Height = n - 1, operations are O(n). This is why BSTs need explicit balancing (AVL, Red-Black). Heaps sidestep this entirely: the completeness constraint is baked into the structure, eliminating degenerate cases by definition.

The Level-by-Level Filling Pattern

The completeness property creates a deterministic filling pattern that's key to understanding heaps:

Nodes fill the tree in this order:

Root (level 0)
Left child of root, then right child of root (level 1)
Children of left child (left then right), then children of right child (left then right) (level 2)
And so on...

This is exactly breadth-first (level-order) filling. If we number positions in this order starting from 0, we get:

                 0                   Level 0: position 0
              /     
            1         2              Level 1: positions 1, 2
          /   \     /   
         3     4   5     6           Level 2: positions 3, 4, 5, 6
        / \   / 
       7   8 9  10                   Level 3: positions 7, 8, 9, 10

The positions 0, 1, 2, 3, 4, 5, ... correspond exactly to array indices!

This is the insight that enables array-based heap representation: a complete binary tree with nodes at positions 0 through n-1 maps directly to an array of size n.

The Fundamental Index Relationships

For a node at index i (0-indexed):

• Parent index: ⌊(i - 1) / 2⌋ • Left child index: 2i + 1 • Right child index: 2i + 2

These formulas work because of the level-order filling. The arithmetic relationships between parent and children are fixed and computable—no pointers needed!

Deriving the Index Formulas:

Left Child Formula (2i + 1): At each level, the number of positions doubles. Position i has seen i node positions before it. The number of positions at or before level containing i is about 2i. The left child is the first of the two children, at position 2i + 1.

More formally: if a node is at position i, all positions 0 to i are filled. The children of positions 0 to i-1 occupy positions 1 to 2(i-1)+2 = 2i. So position i's left child is at 2i + 1.

Parent Formula ((i-1)/2): This is the inverse of the child formulas. Given child position i:

If i = 2p + 1 (left child of p), then p = (i - 1) / 2
If i = 2p + 2 (right child of p), then p = (i - 2) / 2 = (i - 1) / 2 (integer division)

Both cases give ⌊(i - 1) / 2⌋. This elegant unification is because consecutive children of the same parent differ by 1, and integer division absorbs the difference.

Index Relationship Examples
Node Index	Parent Index	Left Child	Right Child
0 (root)	undefined	1	2
1	0	3	4
2	0	5	6
3	1	7	8
4	1	9	10
5	2	11	12
9	4	19	20

The Array Representation Insight

The completeness property enables one of the most elegant representations in all of computer science: storing a tree as a contiguous array with no pointers.

Traditionally, trees require:

Each node to store 2-3 pointers (left child, right child, maybe parent)
Each pointer: 8 bytes on 64-bit systems
Overhead per node: 16-24 bytes just for structure
Nodes scattered in memory: poor cache locality

With array representation:

No pointers at all—parent/child relationships are arithmetic
Zero structural overhead
Contiguous memory: excellent cache locality
Simpler implementation: no malloc/free, no null checks

Example: Array Representation

Consider this heap:

As an array: [90, 75, 80, 60, 55, 70]

The relationships:

Index 0 (90): children at 1 (75) and 2 (80) ✓
Index 1 (75): parent at 0 (90), children at 3 (60) and 4 (55) ✓
Index 2 (80): parent at 0 (90), children at 5 (70) and 6 (doesn't exist: out of bounds) ✓

The array IS the heap. No additional structure needed.

Memory Comparison

For a heap of 1 million integers (4 bytes each):

Pointer-based tree: 4 bytes (value) + 24 bytes (3 pointers) = 28 bytes/node = 28 MB

Array-based heap: 4 bytes (value) = 4 MB

Savings: 7x less memory! Plus, the array is contiguous, so sequential access hits L1/L2 cache almost every time. Pointer-based trees scatter nodes randomly in memory, causing cache misses.

Array Representation Advantages

•Zero pointer overhead — Only values stored
•Cache-friendly — Contiguous memory, sequential access
•Simple implementation — Index arithmetic vs pointer management
•No memory fragmentation — One contiguous allocation
•Easy serialization — Just write the array
•Bounds checking — Natural array bounds work

Why Only Complete Trees Work

•Gaps require markers — Non-complete trees need null/sentinel values
•Sparse levels waste space — A degenerate tree would waste most of the array
•Index formulas assume no gaps — Math only works for contiguous filling
•This is why heaps mandate completeness — The representation requires it

Cache Locality: Why Shape Affects Real Performance

Theoretical analysis tells us heaps are O(log n). But on real hardware, constant factors matter enormously, and cache locality is where heaps truly shine.

The Memory Hierarchy Reality:

L1 cache access: ~1 nanosecond
L2 cache access: ~3-4 nanoseconds
L3 cache access: ~10-20 nanoseconds
Main memory (RAM): ~50-100 nanoseconds

A cache miss is 50-100x slower than a cache hit. Data structures that maximize cache hits vastly outperform those that don't, even with identical big-O complexity.

Why Array-Based Heaps Win:

Spatial Locality: When we access index i, the CPU loads a cache line (64 bytes typically) containing i and its neighbors. For a bubble-up operation: indices i, parent(i), parent(parent(i)), ... are often on the same or adjacent cache lines near the root.
Sequential Access in Heapify: Building a heap via bottom-up heapify accesses array elements largely in order (indices n/2 down to 0). This is cache-optimal—each cache line is loaded once and used completely.
Predictable Access Patterns: The CPU's prefetcher can anticipate heap access patterns because they're arithmetically determined. It preloads the next cache line before you need it.

Contrast with Pointer-Based Trees:

Each node could be anywhere in memory
Following a pointer = potential cache miss
Modern CPUs can't prefetch effectively
Dynamic allocation fragments nodes across memory

Practical Performance Impact

Benchmarks consistently show array-based binary heaps outperforming Fibonacci heaps for practical graph sizes, despite Fibonacci heaps having better theoretical bounds. The reason: cache misses. A 'slower' O(log n) operation that stays in cache beats a 'faster' O(1) amortized operation that causes cache misses.

Quantifying the Difference:

Consider extracting the minimum 1 million times from a 1-million element heap:

Array-based heap:

~20 swaps per extraction × ~4 bytes per comparison (stays in cache)
Mostly L1/L2 hits: ~3 ns per operation
Total: ~20 × 3 ns × 1M = 60 million ns = 60 ms

Pointer-based tree (hypothetical):

Same ~20 swaps per extraction
Each pointer dereference: potential 50 ns cache miss
If half are misses: ~20 × 25 ns × 1M = 500 million ns = 500 ms

Same algorithm, 8x slower—purely from memory access patterns.

This is why production heap implementations universally use arrays. The completeness property makes this possible.

Maintaining Completeness During Operations

Completeness isn't just an initial property—it must be maintained as elements are inserted and removed. The heap operations are specifically designed to preserve this invariant.

Insertion: Always at the Next Position

When inserting a new element, it always goes at index size (the current size of the heap, 0-indexed). This is the 'next' position in level-order.

Before (size = 6):          After insert (size = 7):
      [0]                         [0]
     /   \                       /   
   [1]   [2]                   [1]   [2]
   / \   /                     / \   / 
 [3][4][5]                   [3][4][5][6] ← new element

The new element is placed at index 6, which is the leftmost empty position at the current level. Completeness preserved.

The new element might violate the heap property (be larger than its parent in a max-heap), so we bubble it up. But the shape remains complete throughout—we never create gaps.

Extraction: Remove Root, Replace with Last

When extracting the root (max or min), we:

Take the value at the root (index 0)
Move the last element (index size - 1) to the root
Decrease size by 1
Bubble down to restore heap property

Before (size = 7):          After extract (size = 6):
      90                         75 ← moved from last position
     /  \                       /  
   75    80                   60    80
   / \   / \                 / \   /
 60 55 70  40               45 55 70
 /
45 ← last element

The last element fills the gap at the root, preserving completeness. It's now at the wrong position (violates heap property), but bubbling down fixes that while maintaining the complete shape.

Why This Specific Strategy?

Why move the last element to the root, rather than promote a child? Because promoting a child would create a gap in the middle of the tree, violating completeness. The last element is specifically chosen because removing it doesn't create a gap—it's already at the boundary. This maintains the array's contiguous property.

The Interplay of Shape and Ordering

The heap's two invariants—shape (completeness) and order (heap property)—work together in a beautiful way. Neither alone is sufficient; together they enable everything heaps can do.

What Each Property Contributes
Property	What It Guarantees	What It Enables
Shape (Completeness)	Height = ⌊log₂ n⌋	O(log n) path length for operations
Shape (Completeness)	No gaps in level-order	Array representation with index arithmetic
Shape (Completeness)	Deterministic structure for n nodes	Single valid shape, operations are uniform
Order (Heap Property)	Root is extreme element	O(1) access to max or min
Order (Heap Property)	Transitivity along paths	Bubbling up/down restores order
Order (Heap Property)	Subtrees are heaps	Local repairs, inductive correctness

The Synergy:

Completeness ensures that bubbling up/down traverses at most O(log n) levels
Heap property ensures that bubbling up/down eventually terminates (the correct position is found)
Together: O(log n) operations that maintain both invariants

Thought Experiment: What If We Relaxed Each?

Without completeness (arbitrary tree shape):

Height could be O(n) → operations become O(n)
No clean array representation → need pointers
This is just a 'heap-ordered tree', not a heap

Without heap property (just complete binary tree):

Root is arbitrary → no efficient extreme access
Still have the array structure, but no ordering benefit
This is just a complete binary tree, not a heap

The heap's power comes from both constraints working in concert.

Design Lesson

The binary heap illustrates a key data structure design principle: carefully chosen invariants that complement each other can produce remarkably efficient structures. Shape + ordering = efficient priority queue. Neither property alone is remarkable, but their combination is powerful.

Summary: The Complete Binary Tree Foundation

We've explored the complete binary tree structure—the shape invariant that makes heaps efficient and practical. Let's consolidate the key insights:

Key Takeaways

•Definition: Complete = all levels full except last, which is left-filled. Not the same as 'full' or 'perfect'.
•Height guarantee: Complete binary tree with n nodes has height ⌊log₂ n⌋—optimal for any binary tree. This ensures O(log n) operations.
•Level-order filling: Nodes fill positions 0, 1, 2, ... in breadth-first order. No gaps, ever.
•Index formulas: parent(i) = ⌊(i-1)/2⌋, left(i) = 2i+1, right(i) = 2i+2. Pure arithmetic, no pointers.
•Array representation: Complete trees map directly to arrays. Zero structural overhead, excellent cache locality.
•Operations preserve completeness: Insert at next position, extract by moving last to root. Shape invariant maintained.
•Shape + Order synergy: Completeness bounds height; heap property ensures correctness. Together = efficient priority queue.

What's next:

With both invariants understood—heap property (ordering) and completeness (shape)—we now examine why shape matters from a performance perspective. The next page synthesizes how these properties combine to guarantee the efficiency that makes heaps indispensable.

Page Complete

You now understand the complete binary tree structure that heaps require. You can explain why completeness guarantees O(log n) height, how level-order filling enables array representation, and why cache locality makes array-based heaps vastly outperform pointer-based alternatives. Next, we'll see why this shape is so critical through the lens of performance.