Data Structures & AlgorithmsWhat Is an Array?

What Is an Array? — The Fundamental Collection

LevelBeginner

Duration60 mins

TopicWhat Is an Array?

1 / 4

Definition of an Array as a Fixed-Size, Ordered Collection

The Simplest Yet Most Profound Structure

If you were to strip away every complex data structure ever invented—linked lists, trees, hash tables, graphs—and ask yourself what single structure could rebuild them all, the answer would be the array.

Arrays are deceptively simple. At first glance, they're just numbered boxes holding values. But this simplicity is precisely what makes them powerful. Every database index, every image pixel buffer, every function call stack, every GPU shader—all ultimately rely on arrays. Understanding arrays isn't just learning one data structure; it's understanding the substrate upon which modern computing is built.

What You Will Learn

By the end of this page, you will understand the precise, formal definition of an array, its four essential properties (fixed size, ordered elements, homogeneous type, contiguous storage), and why each property exists not as arbitrary design but as a deliberate trade-off enabling unparalleled performance.

The Formal Definition of an Array

An array is a data structure consisting of a collection of elements, each identified by an index (or key), where:

All elements are of the same type (homogeneity)
Elements are stored in contiguous memory locations (contiguity)
The size is fixed at the time of creation (static size)
Elements are accessed by their position (ordered/indexed access)

This definition distinguishes arrays from other collection types. Lists may be resizable, dictionaries use keys instead of indices, and sets are unordered. Arrays commit to a specific set of constraints that unlock unique performance characteristics.

Why Definitions Matter

In computer science, precise definitions aren't pedantry—they're contracts. When you know something is an array (not a list, not a vector, not a buffer abstraction), you can immediately reason about its memory layout, access patterns, and performance guarantees. This is why elite engineers obsess over terminology.

Mathematical notation:

Formally, an array A of size n can be denoted as:

$$A = [a_0, a_1, a_2, ..., a_{n-1}]$$

Where:

$n$ is the fixed size of the array
$a_i$ is the element at index $i$
Indices range from $0$ to $n-1$ (in zero-indexed systems)
Each $a_i$ belongs to the same type T

This notation captures the essence: an ordered sequence of n elements where position determines identity.

Property 1: Fixed Size at Creation

The most distinguishing property of a classical array is its fixed size. When you create an array, you must specify exactly how many elements it will hold, and that number cannot change throughout the array's lifetime.

This might seem like a limitation—and in some ways, it is—but it's a strategic limitation that enables critical optimizations.

Why Fixed Size Matters

•Predictable Memory Allocation — The system knows exactly how much memory to reserve upfront. No guessing, no reallocation, no fragmentation.
•Constant-Time Access Guarantee — Because memory is pre-allocated in a contiguous block, the address of any element can be computed instantly using simple arithmetic.
•Cache Efficiency — Fixed-size contiguous allocation means CPUs can prefetch array elements predictably, dramatically improving cache hit rates.
•No Runtime Overhead — No need for growth checks, capacity tracking, or reallocation logic during access operations.
•Deterministic Behavior — The array's memory footprint is known at allocation time, enabling precise space analysis.

The trade-off:

Fixed size means you must know (or estimate) how many elements you need before creating the array. If you underestimate, you can't fit all your data. If you overestimate, you waste memory.

This trade-off is fundamental to computer science: flexibility costs performance. Arrays sacrifice flexibility for speed. When you need dynamic sizing, you use different structures (dynamic arrays, linked lists) that introduce their own overhead.

Historical context:

In early computing, memory was precious and expensive. Arrays emerged as the natural fit because programmers did know their data sizes in advance (processing fixed-format punch cards, predefined sensor counts, etc.). The fixed-size constraint wasn't a limitation—it was a feature that matched the problem domain.

Static vs Dynamic Arrays

Many modern languages offer 'dynamic arrays' (ArrayList in Java, vector in C++, list in Python) that appear to resize. These are abstractions built on top of fixed-size arrays—they allocate a fixed array internally and create larger ones when needed, copying elements over. The underlying reality is still fixed-size chunks of memory.

Property 2: Elements Are Ordered by Position

In an array, order matters. The element at index 0 is fundamentally different from the element at index 1, even if they hold the same value. Position is part of identity.

This ordered nature distinguishes arrays from sets (where only membership matters) and maps (where keys, not positions, identify elements).

Ordered Collection (Array)

•[5, 3, 5] has 3 distinct positions
•Index 0 = 5, Index 1 = 3, Index 2 = 5
•The two 5s are different elements
•Order of insertion is preserved
•Sequence matters for algorithms

Unordered Collection (Set)

•{5, 3, 5} = {5, 3} (duplicates collapse)
•No indices, only membership
•Only one '5' can exist
•Order has no meaning
•Only presence/absence matters

Why order is computationally powerful:

Ordering enables positional reasoning. You can say "the element before X" or "the element after X" or "the elements between positions i and j." This positional vocabulary is essential for:

Sorting algorithms — Reordering elements based on comparisons
Binary search — Eliminating half the search space based on position
Sliding window techniques — Processing contiguous subsequences
Two-pointer algorithms — Processing from both ends simultaneously
Sequential processing — Processing files, streams, or logs in order

Without ordering, none of these algorithms would work. The notion of "sorted" itself requires order. Arrays make order explicit and cheap to maintain.

Zero-Based vs One-Based Indexing

Most programming languages use zero-based indexing (first element at index 0). This isn't arbitrary—it reflects the underlying memory model where the index represents offset from the starting address. The first element has offset 0, the second has offset 1, and so on. Some languages (Lua, MATLAB, R) use one-based indexing for mathematical convenience, but zero-based indexing dominates systems programming.

Property 3: All Elements Share the Same Type

In a true array, every element must be of the same type. An array of integers contains only integers. An array of characters contains only characters. You cannot mix types within a single array.

This constraint—called homogeneity—might seem limiting compared to languages that allow mixed-type lists. But homogeneity is what makes arrays fast.

Why Homogeneity Enables Speed

•Uniform Element Size — Every element occupies the same number of bytes. An int32 array has 4 bytes per element, always. This uniformity enables instant address calculation.
•Simple Address Arithmetic — Address of element i = base_address + (i × element_size). No lookup tables, no type checking, no indirection.
•Memory Layout Predictability — The CPU and compiler know exactly how elements are laid out, enabling aggressive optimization.
•SIMD Vectorization — Homogeneous arrays can be processed using SIMD (Single Instruction, Multiple Data) instructions, processing 4, 8, or 16 elements simultaneously.
•Cache Line Optimization — Uniform elements align predictably with cache lines, maximizing cache utilization.

The address calculation formula:

Given an array starting at memory address base with elements of size s bytes, the address of element at index i is:

$$\text{address}(i) = \text{base} + (i \times s)$$

This formula works only because element size is constant. If elements had different sizes, you'd need to sum the sizes of all preceding elements—an O(n) operation instead of O(1).

Example:

Array of 32-bit integers starting at address 1000
Element size = 4 bytes
Element at index 5 = 1000 + (5 × 4) = 1020

This calculation happens in a single CPU cycle. It's the foundation of array efficiency.

Beware of "Arrays" in Dynamic Languages

Python lists, JavaScript arrays, and similar structures allow mixed types. These aren't arrays in the classical sense—they're typically arrays of pointers to objects that can be of any type. The performance characteristics differ significantly. When performance matters, languages offer typed alternatives (NumPy arrays in Python, TypedArrays in JavaScript).

Property 4: Contiguous Memory Storage

The fourth defining property of arrays is contiguous storage: all elements are stored in adjacent memory locations with no gaps between them.

If your array starts at memory address 1000 and each element is 4 bytes, then:

Element 0 is at 1000-1003
Element 1 is at 1004-1007
Element 2 is at 1008-1011
And so on...

There's no wasted space, no pointers between elements, no indirection. Just data, one element after another.

Memory Layout Visualization

Memory Diagram

Array of 4 integers starting at address 1000:
 
Memory Address:  1000  1004  1008  1012  1016  1020  ...
                 ├────┼─────┼─────┼─────┼─────┼─────┤
Values:          │ 42 │  7  │ 13  │ 99  │ ??? │ ??? │
                 └────┴─────┴─────┴─────┴─────┴─────┘
                   ↑     ↑     ↑     ↑
                 [0]   [1]   [2]   [3]
 
Each element occupies exactly 4 bytes.
No gaps. No pointers. Pure data.
 
Compare to a linked list:
┌─────┬───────┐     ┌─────┬───────┐     ┌─────┬───────┐
│ 42  │ ptr ──┼────▶│  7  │ ptr ──┼────▶│ 13  │ null  │
└─────┴───────┘     └─────┴───────┘     └─────┴───────┘
Address: 2000       Address: 5624       Address: 3312
 
Elements scattered in memory. Each needs a pointer.
Extra overhead. Cache-unfriendly. No instant indexing.

Why contiguity is a superpower:

Contiguous storage transforms array access from a memory scavenger hunt into instant retrieval:

O(1) Random Access — Any element can be accessed in constant time using the address formula. No following chains of pointers.
Cache Locality — Modern CPUs don't fetch individual bytes; they fetch cache lines (typically 64 bytes). When you access element 0, elements 1-15 might already be in cache. Sequential array access achieves near-memory bandwidth.
Prefetching — CPUs detect sequential access patterns and prefetch upcoming elements before you request them. This hides memory latency.
Memory Efficiency — No metadata per element. A 100-element int array uses exactly 400 bytes (for 32-bit ints), not 800+ bytes like a linked list with pointers.
Vectorization — SIMD instructions can load multiple contiguous elements in a single operation and process them in parallel.

Contiguous vs Non-Contiguous Access Performance
Operation	Array (Contiguous)	Linked List (Non-Contiguous)
Access element at index i	O(1) — single calculation	O(n) — traverse n pointers
Memory per element	Only the data size	Data + pointer (8 bytes on 64-bit)
Cache behavior	Excellent — sequential prefetching	Poor — unpredictable jumps
SIMD processing	Fully supported	Not applicable
Memory fragmentation	None (single allocation)	Potentially severe

The Memory Hierarchy Secret

In modern computing, the speed difference between cache and main memory is roughly 100x. Between cache and disk, it's 10,000x or more. Contiguous arrays are designed to exploit cache—they keep related data together. This single property often matters more than algorithmic complexity for real-world performance.

The Four Properties Work Together

Each array property in isolation is useful, but together they create something greater than the sum of their parts. The four properties form a cohesive system where each enables and reinforces the others:

The Synergy of Array Properties

•Fixed size enables contiguous allocation — The system can reserve exactly the right amount of memory in one block because it knows the final size.
•Homogeneous type enables contiguous packing — Elements can be placed immediately adjacent because their sizes are identical.
•Contiguous storage enables O(1) indexed access — Address calculation works only because elements are predictably positioned.
•Ordered access gives semantic meaning — Indices represent positions, enabling algorithms that reason about order, sequence, and relative position.

Remove any property and the system breaks:

Remove fixed size → Can't pre-allocate contiguous memory; need dynamic reallocation
Remove homogeneity → Can't calculate addresses; need per-element size metadata
Remove contiguity → Can't use address arithmetic; need pointers or tables
Remove ordering → Lose indexed access semantics; become a set or bag

This is why arrays are exactly what they are. They're not a compromise or a default—they're a carefully optimized sweet spot where constraints enable performance.

Design Insight

Understanding why arrays have these specific properties helps you understand when to use them and when to reach for something else. If you need dynamic sizing, you pay for it with dynamic arrays. If you need mixed types, you pay for it with pointer indirection. Every departure from the pure array model has a cost.

What Arrays Are NOT

To fully understand what arrays are, it helps to clarify what they are not. Many structures get called "arrays" loosely, but they have fundamentally different properties:

Arrays vs Similar Structures
Structure	Fixed Size?	Homogeneous?	Contiguous?	O(1) Access?
True Array	✓ Yes	✓ Yes	✓ Yes	✓ Yes
Dynamic Array (vector)	✗ No (resizable)	✓ Yes	✓ Yes	✓ Yes (amortized)
Python List	✗ No	✗ No (mixed types)	✓ Yes (pointers)	✓ Yes (pointer array)
Linked List	✗ No	✓ Yes	✗ No	✗ No — O(n)
Hash Map	✗ No	✗ No	Partially	✓ Yes (average)
Set	✗ No	✓ Yes	Varies	✓ Yes (average)

Important distinctions:

Dynamic/Resizable Arrays — These grow as needed but internally use fixed-size arrays with periodic reallocation. They're arrays with a wrapper, not a fundamentally different structure.
Associative Arrays (Maps/Dictionaries) — The term "associative array" is misleading. These use keys (not indices) and have completely different implementations (hash tables, trees).
Language-Specific 'Arrays' — JavaScript's Array is highly dynamic and sparse-capable. PHP's arrays are really hash maps. Only languages like C, Rust, and low-level contexts provide true arrays.
Multi-Dimensional Arrays — These are arrays of arrays (or contiguous blocks indexed by multiple dimensions). The principles extend, but implementation details vary.

Terminology Matters

When someone says 'array' in conversation, ask yourself: Do they mean a true fixed-size, contiguous, homogeneous array? Or something with array-like syntax but different performance characteristics? This distinction often explains surprising performance differences.

Summary and What's Next

We've established the complete formal definition of an array. Let's consolidate:

Key Takeaways

•An array is a fixed-size, ordered collection of homogeneous elements stored contiguously — Every word in this definition reflects a deliberate design choice with performance implications.
•Fixed size enables pre-allocation — The system reserves exactly the right memory upfront, avoiding runtime overhead.
•Ordered elements enable positional reasoning — Indices give meaning to position, enabling sorting, searching, and sequence algorithms.
•Homogeneous type enables uniform sizing — Every element is the same size, allowing instant address calculation.
•Contiguous storage enables O(1) access — The address formula base + (i × size) works because elements are adjacent.
•The four properties synergize — Remove any one and the performance guarantees collapse.

What's next:

Now that we understand what arrays are, the next page explores why they're considered the most fundamental data structure. We'll see how nearly every other data structure—stacks, queues, heaps, hash tables, even graphs—can be (and often is) implemented using arrays as their foundation. Arrays aren't just a data structure; they're the data structure upon which computer science is built.

Page Complete

You now understand the precise formal definition of an array and why each of its four defining properties exists. This foundation will inform everything that follows—from understanding memory models to predicting performance characteristics to recognizing when arrays are (or aren't) the right choice.

1 / 4

Loading learning content...

Data Structures & AlgorithmsWhat Is an Array?

What Is an Array? — The Fundamental Collection

LevelBeginner

Duration60 mins

TopicWhat Is an Array?

1 / 4

Definition of an Array as a Fixed-Size, Ordered Collection

The Simplest Yet Most Profound Structure

What You Will Learn

The Formal Definition of an Array

An array is a data structure consisting of a collection of elements, each identified by an index (or key), where:

All elements are of the same type (homogeneity)
Elements are stored in contiguous memory locations (contiguity)
The size is fixed at the time of creation (static size)
Elements are accessed by their position (ordered/indexed access)

Why Definitions Matter

Mathematical notation:

Formally, an array A of size n can be denoted as:

$$A = [a_0, a_1, a_2, ..., a_{n-1}]$$

Where:

$n$ is the fixed size of the array
$a_i$ is the element at index $i$
Indices range from $0$ to $n-1$ (in zero-indexed systems)
Each $a_i$ belongs to the same type T

This notation captures the essence: an ordered sequence of n elements where position determines identity.

Property 1: Fixed Size at Creation

This might seem like a limitation—and in some ways, it is—but it's a strategic limitation that enables critical optimizations.

Why Fixed Size Matters

•Predictable Memory Allocation — The system knows exactly how much memory to reserve upfront. No guessing, no reallocation, no fragmentation.
•Constant-Time Access Guarantee — Because memory is pre-allocated in a contiguous block, the address of any element can be computed instantly using simple arithmetic.
•Cache Efficiency — Fixed-size contiguous allocation means CPUs can prefetch array elements predictably, dramatically improving cache hit rates.
•No Runtime Overhead — No need for growth checks, capacity tracking, or reallocation logic during access operations.
•Deterministic Behavior — The array's memory footprint is known at allocation time, enabling precise space analysis.

The trade-off:

Fixed size means you must know (or estimate) how many elements you need before creating the array. If you underestimate, you can't fit all your data. If you overestimate, you waste memory.

Historical context:

Static vs Dynamic Arrays

Property 2: Elements Are Ordered by Position

In an array, order matters. The element at index 0 is fundamentally different from the element at index 1, even if they hold the same value. Position is part of identity.

This ordered nature distinguishes arrays from sets (where only membership matters) and maps (where keys, not positions, identify elements).

Ordered Collection (Array)

•[5, 3, 5] has 3 distinct positions
•Index 0 = 5, Index 1 = 3, Index 2 = 5
•The two 5s are different elements
•Order of insertion is preserved
•Sequence matters for algorithms

Unordered Collection (Set)

•{5, 3, 5} = {5, 3} (duplicates collapse)
•No indices, only membership
•Only one '5' can exist
•Order has no meaning
•Only presence/absence matters

Why order is computationally powerful:

Ordering enables positional reasoning. You can say "the element before X" or "the element after X" or "the elements between positions i and j." This positional vocabulary is essential for:

Sorting algorithms — Reordering elements based on comparisons
Binary search — Eliminating half the search space based on position
Sliding window techniques — Processing contiguous subsequences
Two-pointer algorithms — Processing from both ends simultaneously
Sequential processing — Processing files, streams, or logs in order

Without ordering, none of these algorithms would work. The notion of "sorted" itself requires order. Arrays make order explicit and cheap to maintain.

Zero-Based vs One-Based Indexing

Property 3: All Elements Share the Same Type

In a true array, every element must be of the same type. An array of integers contains only integers. An array of characters contains only characters. You cannot mix types within a single array.

This constraint—called homogeneity—might seem limiting compared to languages that allow mixed-type lists. But homogeneity is what makes arrays fast.

Why Homogeneity Enables Speed

•Uniform Element Size — Every element occupies the same number of bytes. An int32 array has 4 bytes per element, always. This uniformity enables instant address calculation.
•Simple Address Arithmetic — Address of element i = base_address + (i × element_size). No lookup tables, no type checking, no indirection.
•Memory Layout Predictability — The CPU and compiler know exactly how elements are laid out, enabling aggressive optimization.
•SIMD Vectorization — Homogeneous arrays can be processed using SIMD (Single Instruction, Multiple Data) instructions, processing 4, 8, or 16 elements simultaneously.
•Cache Line Optimization — Uniform elements align predictably with cache lines, maximizing cache utilization.

The address calculation formula:

Given an array starting at memory address base with elements of size s bytes, the address of element at index i is:

$$\text{address}(i) = \text{base} + (i \times s)$$

This formula works only because element size is constant. If elements had different sizes, you'd need to sum the sizes of all preceding elements—an O(n) operation instead of O(1).

Example:

Array of 32-bit integers starting at address 1000
Element size = 4 bytes
Element at index 5 = 1000 + (5 × 4) = 1020

This calculation happens in a single CPU cycle. It's the foundation of array efficiency.

Beware of "Arrays" in Dynamic Languages

Property 4: Contiguous Memory Storage

The fourth defining property of arrays is contiguous storage: all elements are stored in adjacent memory locations with no gaps between them.

If your array starts at memory address 1000 and each element is 4 bytes, then:

Element 0 is at 1000-1003
Element 1 is at 1004-1007
Element 2 is at 1008-1011
And so on...

There's no wasted space, no pointers between elements, no indirection. Just data, one element after another.

Memory Layout Visualization

Memory Diagram

Array of 4 integers starting at address 1000:
 
Memory Address:  1000  1004  1008  1012  1016  1020  ...
                 ├────┼─────┼─────┼─────┼─────┼─────┤
Values:          │ 42 │  7  │ 13  │ 99  │ ??? │ ??? │
                 └────┴─────┴─────┴─────┴─────┴─────┘
                   ↑     ↑     ↑     ↑
                 [0]   [1]   [2]   [3]
 
Each element occupies exactly 4 bytes.
No gaps. No pointers. Pure data.
 
Compare to a linked list:
┌─────┬───────┐     ┌─────┬───────┐     ┌─────┬───────┐
│ 42  │ ptr ──┼────▶│  7  │ ptr ──┼────▶│ 13  │ null  │
└─────┴───────┘     └─────┴───────┘     └─────┴───────┘
Address: 2000       Address: 5624       Address: 3312
 
Elements scattered in memory. Each needs a pointer.
Extra overhead. Cache-unfriendly. No instant indexing.

Why contiguity is a superpower:

Contiguous storage transforms array access from a memory scavenger hunt into instant retrieval:

O(1) Random Access — Any element can be accessed in constant time using the address formula. No following chains of pointers.
Cache Locality — Modern CPUs don't fetch individual bytes; they fetch cache lines (typically 64 bytes). When you access element 0, elements 1-15 might already be in cache. Sequential array access achieves near-memory bandwidth.
Prefetching — CPUs detect sequential access patterns and prefetch upcoming elements before you request them. This hides memory latency.
Memory Efficiency — No metadata per element. A 100-element int array uses exactly 400 bytes (for 32-bit ints), not 800+ bytes like a linked list with pointers.
Vectorization — SIMD instructions can load multiple contiguous elements in a single operation and process them in parallel.

Contiguous vs Non-Contiguous Access Performance
Operation	Array (Contiguous)	Linked List (Non-Contiguous)
Access element at index i	O(1) — single calculation	O(n) — traverse n pointers
Memory per element	Only the data size	Data + pointer (8 bytes on 64-bit)
Cache behavior	Excellent — sequential prefetching	Poor — unpredictable jumps
SIMD processing	Fully supported	Not applicable
Memory fragmentation	None (single allocation)	Potentially severe

The Memory Hierarchy Secret

The Four Properties Work Together

The Synergy of Array Properties

•Fixed size enables contiguous allocation — The system can reserve exactly the right amount of memory in one block because it knows the final size.
•Homogeneous type enables contiguous packing — Elements can be placed immediately adjacent because their sizes are identical.
•Contiguous storage enables O(1) indexed access — Address calculation works only because elements are predictably positioned.
•Ordered access gives semantic meaning — Indices represent positions, enabling algorithms that reason about order, sequence, and relative position.

Remove any property and the system breaks:

Remove fixed size → Can't pre-allocate contiguous memory; need dynamic reallocation
Remove homogeneity → Can't calculate addresses; need per-element size metadata
Remove contiguity → Can't use address arithmetic; need pointers or tables
Remove ordering → Lose indexed access semantics; become a set or bag

This is why arrays are exactly what they are. They're not a compromise or a default—they're a carefully optimized sweet spot where constraints enable performance.

Design Insight

What Arrays Are NOT

To fully understand what arrays are, it helps to clarify what they are not. Many structures get called "arrays" loosely, but they have fundamentally different properties:

Arrays vs Similar Structures
Structure	Fixed Size?	Homogeneous?	Contiguous?	O(1) Access?
True Array	✓ Yes	✓ Yes	✓ Yes	✓ Yes
Dynamic Array (vector)	✗ No (resizable)	✓ Yes	✓ Yes	✓ Yes (amortized)
Python List	✗ No	✗ No (mixed types)	✓ Yes (pointers)	✓ Yes (pointer array)
Linked List	✗ No	✓ Yes	✗ No	✗ No — O(n)
Hash Map	✗ No	✗ No	Partially	✓ Yes (average)
Set	✗ No	✓ Yes	Varies	✓ Yes (average)

Important distinctions:

Dynamic/Resizable Arrays — These grow as needed but internally use fixed-size arrays with periodic reallocation. They're arrays with a wrapper, not a fundamentally different structure.
Associative Arrays (Maps/Dictionaries) — The term "associative array" is misleading. These use keys (not indices) and have completely different implementations (hash tables, trees).
Language-Specific 'Arrays' — JavaScript's Array is highly dynamic and sparse-capable. PHP's arrays are really hash maps. Only languages like C, Rust, and low-level contexts provide true arrays.
Multi-Dimensional Arrays — These are arrays of arrays (or contiguous blocks indexed by multiple dimensions). The principles extend, but implementation details vary.

Terminology Matters

Summary and What's Next

We've established the complete formal definition of an array. Let's consolidate:

Key Takeaways

•An array is a fixed-size, ordered collection of homogeneous elements stored contiguously — Every word in this definition reflects a deliberate design choice with performance implications.
•Fixed size enables pre-allocation — The system reserves exactly the right memory upfront, avoiding runtime overhead.
•Ordered elements enable positional reasoning — Indices give meaning to position, enabling sorting, searching, and sequence algorithms.
•Homogeneous type enables uniform sizing — Every element is the same size, allowing instant address calculation.
•Contiguous storage enables O(1) access — The address formula base + (i × size) works because elements are adjacent.
•The four properties synergize — Remove any one and the performance guarantees collapse.

What's next:

Page Complete

1 / 4