What Is an Array? - Learning Module

Loading content...

0/276

Arrays as the Underlying Structure for Many Other Data Structures

The Hidden Foundation

When you use a stack in Java, you're using an array. When you use a heap in C++, you're using an array. When you query a hash table, you're using an array of buckets. When you process an image, you're using an array of pixels.

Arrays aren't just a data structure—they're the data structure upon which most others are built. This final page of Module 1 makes this connection explicit, showing exactly how the structures you'll learn in future chapters rest on the array foundation you now understand.

What You Will Learn

By the end of this page, you will understand how stacks, queues, heaps, hash tables, graphs, and other structures are implemented using arrays, why this implementation choice is made, and how this knowledge helps you reason about performance and behavior.

The Implementation Reality

There's a distinction in data structures between the abstract concept and the concrete implementation:

Abstract Data Type (ADT): Defines what operations are supported and their behavior
Implementation: Defines how those operations are realized in memory

For most ADTs, arrays are the most common and often the most efficient implementation choice.

ADTs and Their Array Implementations
ADT	Key Operations	Common Array Implementation
Stack	push, pop, peek	Array with top index; push/pop at end
Queue	enqueue, dequeue	Circular array with head/tail indices
Deque	add/remove at both ends	Circular array with bi-directional growth
Priority Queue	insert, extractMin	Binary heap stored as array
Hash Table	get, put, remove	Array of buckets with probing/chaining
Set	add, remove, contains	Hash table (array of buckets)
Dynamic Array	flexible size, random access	Array with capacity tracking + resize
Matrix	2D access, transpose, multiply	1D or 2D array with index arithmetic
Graph	vertices, edges, traversal	Adjacency matrix (2D array) or adjacency list (array of arrays)

Why know the implementation?

Understanding that your stack is an array helps you:

Predict performance — Stack push/pop is O(1) because array end access is O(1)
Understand limits — Stack overflow means the underlying array ran out of space
Debug issues — Array index out of bounds might reveal stack corruption
Optimize usage — Pre-sizing the array avoids dynamic resizing costs

The abstraction protects you from implementation details, but knowing the implementation makes you a more effective engineer.

Stacks: LIFO on Arrays

A stack follows Last-In-First-Out (LIFO) order: the most recently added element is the first removed. Using an array, this is trivially efficient.

Array-Based Stack

Visualization

Array-Based Stack Implementation:
 
Array:  [  A  |  B  |  C  |     |     |     ]
Index:     0     1     2     3     4     5
                        ↑
                       top
 
Operations:
• push(D): array[++top] = D  →  [A | B | C | D |   |   ]  top=3
• pop():   return array[top--] → returns D, top=2
• peek():  return array[top]   → returns C (no change)
 
Time Complexity:
• push: O(1)  - single array write
• pop:  O(1)  - single array read
• peek: O(1)  - single array read
 
Space: Single contiguous array. No pointers. Cache-friendly.

Why arrays work perfectly for stacks:

Only end access needed — Stacks only add/remove at one end, and array end operations are O(1)
No middle insertion — The stack ADT doesn't require middle access, avoiding array's weakness
Index tracking is trivial — A single top variable manages the entire structure
Predictable memory — Pre-sized array means no allocation during operations

Real-world usage:

Function call stacks, undo systems, expression evaluation, backtracking algorithms—all typically use array-based stacks for their efficiency.

Stack Overflow

The famous 'stack overflow' error occurs when the call stack (implemented as an array in memory) runs out of space. Understanding stacks as arrays helps you understand why recursion depth is limited and why you can increase stack size allocation.

Queues: FIFO on Circular Arrays

A queue follows First-In-First-Out (FIFO) order: elements are removed in the order they were added. A naive array implementation would require O(n) shifting on dequeue, but a circular array solves this elegantly.

Circular Array Queue

Visualization

Circular Array Queue:
 
Array:  [  D  |  E  |     |     |  A  |  B  |  C  ]
Index:     0     1     2     3     4     5     6
           ↑                       ↑
          tail                   head
 
Logical order: A → B → C → D → E  (head to tail)
 
Operations:
• enqueue(F): array[tail] = F; tail = (tail+1) % size
• dequeue():  item = array[head]; head = (head+1) % size
 
Wrap-around behavior:
• When tail reaches end, it wraps to index 0
• When head reaches end, it wraps to index 0
• This creates a "circular" buffer
 
Time Complexity:
• enqueue: O(1)  - single array write + modulo
• dequeue: O(1)  - single array read + modulo
 
No shifting! Both ends are O(1) via index arithmetic.

The circular array insight:

The problem with a naive array queue is that dequeue from the front requires shifting all elements left—O(n). The solution: don't shift. Instead, use two pointers (head and tail) and let them wrap around using modulo arithmetic.

This transforms queue operations from O(n) to O(1) while maintaining the benefits of array storage (cache locality, simple allocation).

Applications:

Ring buffers for audio/video streaming
Bounded producer-consumer queues
Recent history tracking (last N events)
Network packet queues

Heaps: Trees Without Pointers

A binary heap is conceptually a complete binary tree, but it's almost universally implemented as an array. This is one of the most elegant array applications in computer science.

Heap as Array

Visualization

Binary Min-Heap (Conceptual Tree):
 
                     1
                   /   \
                  3     2
                 / \   /
                7   6 4
 
Binary Min-Heap (Array Storage):
 
Index:   0    1    2    3    4    5
        ┌────┬────┬────┬────┬────┬────┐
Values: │ 1  │ 3  │ 2  │ 7  │ 6  │ 4  │
        └────┴────┴────┴────┴────┴────┘
 
Parent-Child Relationships (0-indexed):
• Parent of i:      (i - 1) / 2  (integer division)
• Left child of i:  2i + 1
• Right child of i: 2i + 2
 
Examples:
• Node 1 (value 3): parent = (1-1)/2 = 0, children = 3, 4
• Node 2 (value 2): parent = (2-1)/2 = 0, children = 5, 6
 
No pointers stored! Relationships are computed.

Why this works:

A complete binary tree has a special property: nodes can be numbered level by level, left to right, with no gaps. This numbering maps directly to array indices.

The parent-child formula replaces explicit pointers. Instead of storing node.left = other_node, you compute left_child_index = 2i + 1. This arithmetic is:

Faster than pointer dereferencing
Smaller (no pointer storage per node)
Cache-friendly (contiguous array access)

Performance implications:

A million-element heap as an array uses ~4 MB (for 32-bit ints). As linked nodes with two pointers each, it would use ~32 MB (7x more). The array version also has dramatically better cache behavior during heapify operations.

Priority Queue = Heap = Array

When you use Java's PriorityQueue, C++'s priority_queue, or Python's heapq, you're using an array underneath. Understanding this helps you reason about when priority queues are appropriate and what their memory costs are.

Hash Tables: Buckets Are Array Slots

A hash table provides near-O(1) key-value lookup. At its core, it's an array of "buckets," where a hash function determines which bucket holds a given key.

Hash Table Structure

Visualization

Hash Table with Chaining:
 
Bucket Array (size 8):
Index:   0      1      2      3      4      5      6      7
       ┌──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐
       │ null │ K1→  │ null │ K2→  │ null │ K3→  │ null │ null │
       │      │ V1   │      │ V2   │      │ V3   │      │      │
       └──────┴───┬──┴──────┴──────┴──────┴───┬──┴──────┴──────┘
                  │                           │
                  ↓                           ↓
              ┌────────┐                  ┌────────┐
              │ K4→V4  │                  │ K5→V5  │
              └────────┘                  └────────┘
                  (collision)                (collision)
 
Lookup process:
1. hash("mykey") = 5
2. Access buckets[5] in O(1)
3. Search bucket chain for "mykey"
 
The array access is always O(1).
Chain search is O(k) where k is chain length.
With good hash function and load factor, k ≈ 1.

The array's role:

The hash function converts a key into an integer, and modulo operation converts that into a valid array index. The array enables O(1) access to the bucket:

index = hash(key) % array.length
bucket = array[index]

Without the array, hash tables couldn't provide O(1) access. The hash function just produces a number; the array makes that number useful.

Open addressing (alternative):

Some hash tables don't use chains. Instead, they store entries directly in the array and probe for empty slots on collision. This is even more array-centric—the entire table is a single flat array.

Resize = Array Resize

When a hash table resizes (due to load factor threshold), it allocates a larger array and rehashes all entries. This is why hash table insertion is O(1) amortized but occasionally O(n). The array growth strategy is identical to dynamic arrays.

Graphs: Adjacency Matrices and Lists

Graphs represent relationships between entities (vertices connected by edges). Despite being conceptually complex, their implementations typically reduce to arrays.

Adjacency Matrix (2D Array)

•matrix[i][j] = 1 if edge exists
•O(1) edge lookup
•O(V²) space always
•Best for dense graphs
•Good for matrix operations

Adjacency List (Array of Arrays)

•adj[i] = array of i's neighbors
•O(degree) edge lookup
•O(V + E) space
•Best for sparse graphs
•Good for traversal algorithms

Graph Representations

Visualization

Graph with 4 vertices:
    0 ─── 1
    │     │
    │     │
    3 ─── 2
 
Adjacency Matrix (2D Array):
        0   1   2   3
      ┌───┬───┬───┬───┐
   0  │ 0 │ 1 │ 0 │ 1 │
      ├───┼───┼───┼───┤
   1  │ 1 │ 0 │ 1 │ 0 │
      ├───┼───┼───┼───┤
   2  │ 0 │ 1 │ 0 │ 1 │
      ├───┼───┼───┼───┤
   3  │ 1 │ 0 │ 1 │ 0 │
      └───┴───┴───┴───┘
 
Adjacency List (Array of Arrays):
Index 0: [ 1, 3 ]     ← neighbors of vertex 0
Index 1: [ 0, 2 ]     ← neighbors of vertex 1
Index 2: [ 1, 3 ]     ← neighbors of vertex 2
Index 3: [ 0, 2 ]     ← neighbors of vertex 3
 
Both are arrays. The matrix is a flat 2D array.
The list is an array where each element is also an array.

The array advantage:

Vertex access: Vertices are indexed, so finding vertex i's data is O(1)
Edge storage: Whether in a matrix cell or in a neighbor list, edges are stored in arrays
Algorithm efficiency: BFS/DFS queues and visited arrays are all array-based

Even graph algorithms like Dijkstra's use priority queues (heaps, which are arrays). The entire graph processing pipeline runs on array foundations.

The Power of Understanding Implementations

Why does it matter that all these structures use arrays underneath? Because this knowledge gives you implementation intuition—the ability to predict behavior and make informed decisions.

What Implementation Knowledge Enables

•Performance Prediction — 'This priority queue insert might resize' → pre-size to avoid
•Memory Estimation — 'A hash table of 1M entries with load factor 0.75 needs a ~1.3M slot array'
•Debugging Insight — 'Array index out of bounds in stack? Check for underflow (pop on empty)'
•Optimization Opportunities — 'This graph is dense—adjacency matrix will be faster than list'
•Trade-off Reasoning — 'Circular buffer avoids allocation but limits max size—acceptable?'
•Algorithm Analysis — 'BFS allocates visited array → O(V) space even for sparse graphs'

The expert mindset:

Novice developers think "I'm using a priority queue." Expert developers think "I'm using a binary heap stored in an array with element bubbling, which means parent at (i-1)/2 and O(log n) insert/extract."

This deeper understanding isn't academic—it's practical. It helps you choose the right structure, configure it properly, debug it quickly, and optimize it when necessary.

Mental Model Upgrade

As you learn each new data structure in upcoming chapters, always ask: 'How would this be implemented with an array?' Even for linked structures, ask 'What would the array alternative look like, and what trade-offs does each make?' This habit develops implementation-aware thinking.

Module Summary: What Is an Array?

We've completed Module 1: What Is an Array? Let's consolidate everything we've learned across all four pages:

Module Key Takeaways

•Arrays are fixed-size, ordered, homogeneous, contiguous collections — Each property enables specific performance characteristics.
•Arrays directly mirror memory architecture — They're not an abstraction layer but an exposure of how RAM works.
•Arrays are the most fundamental data structure — Most other structures use arrays as their implementation substrate.
•Arrays are universally supported and simple — Every language has them; they're easy to reason about correctly.
•Arrays are optimal for common patterns — Read access, iteration, append—the operations you do most.
•Stacks, queues, heaps, hash tables, graphs all use arrays — The structures you'll learn rest on array foundations.
•Understanding array implementations deepens all DSA knowledge — Implementation awareness is what separates experts from novices.

What's next:

With a complete understanding of what arrays are, the next module dives into Array Memory Model & Contiguous Storage. We'll explore exactly how arrays are laid out in memory, how address calculation works, and why this enables the O(1) random access that makes arrays so powerful.

Module Complete

You've completed Module 1: What Is an Array? You now have a deep, comprehensive understanding of arrays—their definition, their fundamental role, their prevalence in programming, and how they serve as the foundation for other data structures. This knowledge will inform everything you learn about data structures going forward.

Arrays as the Underlying Structure for Many Other Data Structures

The Hidden Foundation

What You Will Learn

The Implementation Reality

There's a distinction in data structures between the abstract concept and the concrete implementation:

Abstract Data Type (ADT): Defines what operations are supported and their behavior
Implementation: Defines how those operations are realized in memory

For most ADTs, arrays are the most common and often the most efficient implementation choice.

ADTs and Their Array Implementations
ADT	Key Operations	Common Array Implementation
Stack	push, pop, peek	Array with top index; push/pop at end
Queue	enqueue, dequeue	Circular array with head/tail indices
Deque	add/remove at both ends	Circular array with bi-directional growth
Priority Queue	insert, extractMin	Binary heap stored as array
Hash Table	get, put, remove	Array of buckets with probing/chaining
Set	add, remove, contains	Hash table (array of buckets)
Dynamic Array	flexible size, random access	Array with capacity tracking + resize
Matrix	2D access, transpose, multiply	1D or 2D array with index arithmetic
Graph	vertices, edges, traversal	Adjacency matrix (2D array) or adjacency list (array of arrays)

Why know the implementation?

Understanding that your stack is an array helps you:

Predict performance — Stack push/pop is O(1) because array end access is O(1)
Understand limits — Stack overflow means the underlying array ran out of space
Debug issues — Array index out of bounds might reveal stack corruption
Optimize usage — Pre-sizing the array avoids dynamic resizing costs

The abstraction protects you from implementation details, but knowing the implementation makes you a more effective engineer.

Stacks: LIFO on Arrays

A stack follows Last-In-First-Out (LIFO) order: the most recently added element is the first removed. Using an array, this is trivially efficient.

Array-Based Stack

Visualization

Array-Based Stack Implementation:
 
Array:  [  A  |  B  |  C  |     |     |     ]
Index:     0     1     2     3     4     5
                        ↑
                       top
 
Operations:
• push(D): array[++top] = D  →  [A | B | C | D |   |   ]  top=3
• pop():   return array[top--] → returns D, top=2
• peek():  return array[top]   → returns C (no change)
 
Time Complexity:
• push: O(1)  - single array write
• pop:  O(1)  - single array read
• peek: O(1)  - single array read
 
Space: Single contiguous array. No pointers. Cache-friendly.

Why arrays work perfectly for stacks:

Only end access needed — Stacks only add/remove at one end, and array end operations are O(1)
No middle insertion — The stack ADT doesn't require middle access, avoiding array's weakness
Index tracking is trivial — A single top variable manages the entire structure
Predictable memory — Pre-sized array means no allocation during operations

Real-world usage:

Function call stacks, undo systems, expression evaluation, backtracking algorithms—all typically use array-based stacks for their efficiency.

Stack Overflow

Queues: FIFO on Circular Arrays

Circular Array Queue

Visualization

Circular Array Queue:
 
Array:  [  D  |  E  |     |     |  A  |  B  |  C  ]
Index:     0     1     2     3     4     5     6
           ↑                       ↑
          tail                   head
 
Logical order: A → B → C → D → E  (head to tail)
 
Operations:
• enqueue(F): array[tail] = F; tail = (tail+1) % size
• dequeue():  item = array[head]; head = (head+1) % size
 
Wrap-around behavior:
• When tail reaches end, it wraps to index 0
• When head reaches end, it wraps to index 0
• This creates a "circular" buffer
 
Time Complexity:
• enqueue: O(1)  - single array write + modulo
• dequeue: O(1)  - single array read + modulo
 
No shifting! Both ends are O(1) via index arithmetic.

The circular array insight:

This transforms queue operations from O(n) to O(1) while maintaining the benefits of array storage (cache locality, simple allocation).

Applications:

Ring buffers for audio/video streaming
Bounded producer-consumer queues
Recent history tracking (last N events)
Network packet queues

Heaps: Trees Without Pointers

A binary heap is conceptually a complete binary tree, but it's almost universally implemented as an array. This is one of the most elegant array applications in computer science.

Heap as Array

Visualization

Binary Min-Heap (Conceptual Tree):
 
                     1
                   /   \
                  3     2
                 / \   /
                7   6 4
 
Binary Min-Heap (Array Storage):
 
Index:   0    1    2    3    4    5
        ┌────┬────┬────┬────┬────┬────┐
Values: │ 1  │ 3  │ 2  │ 7  │ 6  │ 4  │
        └────┴────┴────┴────┴────┴────┘
 
Parent-Child Relationships (0-indexed):
• Parent of i:      (i - 1) / 2  (integer division)
• Left child of i:  2i + 1
• Right child of i: 2i + 2
 
Examples:
• Node 1 (value 3): parent = (1-1)/2 = 0, children = 3, 4
• Node 2 (value 2): parent = (2-1)/2 = 0, children = 5, 6
 
No pointers stored! Relationships are computed.

Why this works:

A complete binary tree has a special property: nodes can be numbered level by level, left to right, with no gaps. This numbering maps directly to array indices.

The parent-child formula replaces explicit pointers. Instead of storing node.left = other_node, you compute left_child_index = 2i + 1. This arithmetic is:

Faster than pointer dereferencing
Smaller (no pointer storage per node)
Cache-friendly (contiguous array access)

Performance implications:

Priority Queue = Heap = Array

Hash Tables: Buckets Are Array Slots

A hash table provides near-O(1) key-value lookup. At its core, it's an array of "buckets," where a hash function determines which bucket holds a given key.

Hash Table Structure

Visualization

Hash Table with Chaining:
 
Bucket Array (size 8):
Index:   0      1      2      3      4      5      6      7
       ┌──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐
       │ null │ K1→  │ null │ K2→  │ null │ K3→  │ null │ null │
       │      │ V1   │      │ V2   │      │ V3   │      │      │
       └──────┴───┬──┴──────┴──────┴──────┴───┬──┴──────┴──────┘
                  │                           │
                  ↓                           ↓
              ┌────────┐                  ┌────────┐
              │ K4→V4  │                  │ K5→V5  │
              └────────┘                  └────────┘
                  (collision)                (collision)
 
Lookup process:
1. hash("mykey") = 5
2. Access buckets[5] in O(1)
3. Search bucket chain for "mykey"
 
The array access is always O(1).
Chain search is O(k) where k is chain length.
With good hash function and load factor, k ≈ 1.

The array's role:

The hash function converts a key into an integer, and modulo operation converts that into a valid array index. The array enables O(1) access to the bucket:

index = hash(key) % array.length
bucket = array[index]

Without the array, hash tables couldn't provide O(1) access. The hash function just produces a number; the array makes that number useful.

Open addressing (alternative):

Some hash tables don't use chains. Instead, they store entries directly in the array and probe for empty slots on collision. This is even more array-centric—the entire table is a single flat array.

Resize = Array Resize

Graphs: Adjacency Matrices and Lists

Graphs represent relationships between entities (vertices connected by edges). Despite being conceptually complex, their implementations typically reduce to arrays.

Adjacency Matrix (2D Array)

•matrix[i][j] = 1 if edge exists
•O(1) edge lookup
•O(V²) space always
•Best for dense graphs
•Good for matrix operations

Adjacency List (Array of Arrays)

•adj[i] = array of i's neighbors
•O(degree) edge lookup
•O(V + E) space
•Best for sparse graphs
•Good for traversal algorithms

Graph Representations

Visualization

Graph with 4 vertices:
    0 ─── 1
    │     │
    │     │
    3 ─── 2
 
Adjacency Matrix (2D Array):
        0   1   2   3
      ┌───┬───┬───┬───┐
   0  │ 0 │ 1 │ 0 │ 1 │
      ├───┼───┼───┼───┤
   1  │ 1 │ 0 │ 1 │ 0 │
      ├───┼───┼───┼───┤
   2  │ 0 │ 1 │ 0 │ 1 │
      ├───┼───┼───┼───┤
   3  │ 1 │ 0 │ 1 │ 0 │
      └───┴───┴───┴───┘
 
Adjacency List (Array of Arrays):
Index 0: [ 1, 3 ]     ← neighbors of vertex 0
Index 1: [ 0, 2 ]     ← neighbors of vertex 1
Index 2: [ 1, 3 ]     ← neighbors of vertex 2
Index 3: [ 0, 2 ]     ← neighbors of vertex 3
 
Both are arrays. The matrix is a flat 2D array.
The list is an array where each element is also an array.

The array advantage:

Vertex access: Vertices are indexed, so finding vertex i's data is O(1)
Edge storage: Whether in a matrix cell or in a neighbor list, edges are stored in arrays
Algorithm efficiency: BFS/DFS queues and visited arrays are all array-based

Even graph algorithms like Dijkstra's use priority queues (heaps, which are arrays). The entire graph processing pipeline runs on array foundations.

The Power of Understanding Implementations

Why does it matter that all these structures use arrays underneath? Because this knowledge gives you implementation intuition—the ability to predict behavior and make informed decisions.

What Implementation Knowledge Enables

•Performance Prediction — 'This priority queue insert might resize' → pre-size to avoid
•Memory Estimation — 'A hash table of 1M entries with load factor 0.75 needs a ~1.3M slot array'
•Debugging Insight — 'Array index out of bounds in stack? Check for underflow (pop on empty)'
•Optimization Opportunities — 'This graph is dense—adjacency matrix will be faster than list'
•Trade-off Reasoning — 'Circular buffer avoids allocation but limits max size—acceptable?'
•Algorithm Analysis — 'BFS allocates visited array → O(V) space even for sparse graphs'

The expert mindset:

This deeper understanding isn't academic—it's practical. It helps you choose the right structure, configure it properly, debug it quickly, and optimize it when necessary.

Mental Model Upgrade

Module Summary: What Is an Array?

We've completed Module 1: What Is an Array? Let's consolidate everything we've learned across all four pages:

Module Key Takeaways

•Arrays are fixed-size, ordered, homogeneous, contiguous collections — Each property enables specific performance characteristics.
•Arrays directly mirror memory architecture — They're not an abstraction layer but an exposure of how RAM works.
•Arrays are the most fundamental data structure — Most other structures use arrays as their implementation substrate.
•Arrays are universally supported and simple — Every language has them; they're easy to reason about correctly.
•Arrays are optimal for common patterns — Read access, iteration, append—the operations you do most.
•Stacks, queues, heaps, hash tables, graphs all use arrays — The structures you'll learn rest on array foundations.
•Understanding array implementations deepens all DSA knowledge — Implementation awareness is what separates experts from novices.

What's next:

Module Complete