Non-Primitive Data Structures - Learning Module

Loading content...

0/276

Examples: Arrays, Strings, Linked Lists, Stacks, Queues, Trees, Graphs

The Data Structure Landscape

Throughout this module, we've established what non-primitive data structures are, how they enable logical grouping and relationships, and why abstraction is essential. Now it's time to survey the actual terrain—the specific data structures you'll encounter and master throughout this curriculum.

Think of this page as a map of the territory ahead. We won't dive deep into implementation details or complex operations here—dedicated chapters will do that. Instead, we'll establish a clear mental model of each major structure: what it is, what it's for, and where it fits in the broader landscape.

By the end of this page, you'll have a comprehensive overview that will make every subsequent chapter feel like revisiting a familiar friend rather than meeting a stranger.

What You Will Learn

This page surveys seven fundamental categories of non-primitive data structures: Arrays, Strings, Linked Lists, Stacks, Queues, Trees, and Graphs. For each, you'll learn its essential nature, core use cases, key operations, and trade-offs—providing the conceptual foundation for deep dives to come.

Arrays — The Foundational Collection

The Array is the most fundamental non-primitive data structure—so fundamental that it's typically built into every programming language and directly supported by hardware.

What Is an Array?

An array is a contiguous block of memory containing a fixed number of elements of the same type. Elements are accessed by their index—a zero-based position number.

Index:     0       1       2       3       4
         ┌───────┬───────┬───────┬───────┬───────┐
Array:   │  42   │  17   │  93   │   8   │  56   │
         └───────┴───────┴───────┴───────┴───────┘
         ↑
    Base Address

Why Arrays Matter:

O(1) Random Access — Given an index, access is instant: address = base + (index × element_size)
Memory Efficiency — No overhead per element (no pointers, no metadata)
Cache Friendly — Contiguous memory means sequential access is blazingly fast
Foundation for Others — Many structures (heaps, hash tables, dynamic arrays) are built on arrays

Array Operations and Complexity
Operation	Time Complexity	Notes
Access by index	O(1)	Direct calculation from base address
Search (unsorted)	O(n)	Must check each element
Search (sorted)	O(log n)	Binary search applicable
Insert at end	O(1)*	If space available; O(n) if resize needed
Insert at position	O(n)	Must shift all subsequent elements
Delete at position	O(n)	Must shift all subsequent elements

Array Variants:

Static Arrays — Fixed size at creation; cannot grow or shrink
Dynamic Arrays — Automatically resize (ArrayList, Python list, JavaScript array)
Multi-dimensional Arrays — Arrays of arrays (matrices, tensors)
Sparse Arrays — Mostly empty; use maps for efficiency

When to Use Arrays:

✅ Need fast random access by position ✅ Collection size is known or changes infrequently ✅ Memory efficiency is important ✅ Sequential processing is common

❌ Frequent insertions/deletions in the middle ❌ Collection size is highly variable ❌ Need fast key-based (not index-based) lookup

The Default Choice

Arrays should be your default choice until you have a reason to use something else. Their cache efficiency and simplicity make them faster than theoretically 'better' structures in many real scenarios. A linked list with O(1) insertion is often slower than an array with O(n) insertion due to cache effects.

Strings — Sequences of Characters

The String is conceptually an array of characters, but treated as a distinct data structure due to its unique importance and specialized operations.

What Is a String?

A string is an ordered sequence of characters representing text. Unlike a generic array, strings carry semantic meaning and support text-specific operations.

Index:     0     1     2     3     4
         ┌─────┬─────┬─────┬─────┬─────┐
String:  │  H  │  e  │  l  │  l  │  o  │
         └─────┴─────┴─────┴─────┴─────┘

Why Strings Are Special:

Ubiquitous — Nearly every program processes text: user input, file paths, network data, logs
Rich Operations — Concatenation, substring, search, replace, pattern matching
Encoding Complexity — ASCII, UTF-8, UTF-16; variable-width characters
Immutability — Many languages make strings immutable for safety and efficiency

String Operations and Complexity
Operation	Time Complexity	Notes
Access character	O(1)	If fixed-width encoding; O(n) for some UTF encodings
Concatenation	O(n+m)	Creates new string of combined length
Substring	O(k)	k = substring length; may share memory in some implementations
Search (naive)	O(n*m)	n = string length, m = pattern length
Search (KMP/Boyer-Moore)	O(n+m)	Advanced algorithms
Comparison	O(min(n,m))	Character-by-character until difference

String Concepts to Master:

Immutability — Strings often cannot be changed after creation; 'modifications' create new strings
String Interning — Languages often reuse identical strings for memory efficiency
StringBuilder/Buffer — Mutable alternatives for efficient string construction
Unicode and Encoding — UTF-8, UTF-16, code points vs grapheme clusters
Pattern Matching — Regular expressions, KMP algorithm, Rabin-Karp

When to Use String-Specific Structures:

✅ Text processing, parsing, formatting ✅ User-facing input and output ✅ File and network operations ✅ When semantic text operations (split, trim, case conversion) are needed

❌ Numeric data that looks like text should be converted to numbers ❌ Binary data should use byte arrays, not strings

The Hidden Complexity

Strings seem simple until you encounter Unicode. A 'character' might be one byte (ASCII), two bytes (some UTF-16), or four bytes (many emojis). String length might differ from byte count, display width, and grapheme cluster count. String processing in a globalized world requires understanding these distinctions.

Linked Lists — Dynamic Sequential Structures

The Linked List sacrifices array's random access for flexible insertion and deletion, using pointers to connect elements scattered in memory.

What Is a Linked List?

A linked list is a sequence of nodes, each containing data and a reference (pointer) to the next node. Elements are not contiguous—they can be anywhere in memory.

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Data: 42     │    │ Data: 17     │    │ Data: 93     │
│ Next: ───────┼───→│ Next: ───────┼───→│ Next: null   │
└──────────────┘    └──────────────┘    └──────────────┘
     Head                                     Tail

Linked List Variants:

Singly Linked — Each node points to next only; forward traversal only
Doubly Linked — Each node points to next and previous; bidirectional traversal
Circular — Last node points back to first; no null terminator
Skip List — Multiple levels of links for O(log n) search (probabilistic)

Linked List Operations and Complexity
Operation	Singly Linked	Doubly Linked	Notes
Access by index	O(n)	O(n)	Must traverse from head or tail
Insert at head	O(1)	O(1)	Just update pointers
Insert at tail	O(1)*	O(1)	*O(1) with tail pointer
Insert at position	O(n)	O(n)	O(1) if position known
Delete by reference	O(n)/O(1)	O(1)	Singly: need previous; Doubly: immediate
Search	O(n)	O(n)	Must traverse

Linked List Trade-offs:

Advantages:

O(1) insertion/deletion when position is known
Dynamic size—no resizing needed
No memory wasted on unused capacity
Efficient for frequent modifications at both ends

Disadvantages:

No random access (O(n) to reach position k)
Memory overhead per node (pointer storage)
Poor cache locality (nodes scattered in memory)
Often slower than arrays in practice despite better theoretical complexity

When to Use Linked Lists:

✅ Frequent insertions/deletions at ends ✅ Implementing stacks, queues, deques ✅ When you need to insert/delete without shifting ✅ When memory is fragmented

❌ Need fast random access ❌ Cache efficiency matters ❌ Memory overhead is a concern

The Cache Reality

In theory, linked lists shine with O(1) insertions. In practice, cache misses make each pointer traversal expensive (100+ cycles vs 1 cycle for sequential array access). Profile before choosing linked lists over arrays—the 'slower' array is often faster.

Stacks — Last-In, First-Out (LIFO)

The Stack is not just a data structure—it's a behavioral constraint. It restricts how elements are added and removed, modeling many real-world patterns.

What Is a Stack?

A stack is a Last-In, First-Out (LIFO) structure where elements are added and removed from the same end (the 'top'). Think of a stack of plates: you can only add to or take from the top.

         ┌───────┐
Top →    │  56   │  ← Most recently added
         ├───────┤
         │   8   │
         ├───────┤
         │  93   │
         ├───────┤
Bottom → │  42   │  ← First added (last to be removed)
         └───────┘

Core Stack Operations:

push(item) — Add item to top
pop() — Remove and return top item
peek()/top() — Return top item without removing
isEmpty() — Check if stack is empty

All operations are O(1) in any reasonable implementation.

Why Stacks Matter — Use Cases:

Real-World Stack Applications

•Function Call Stack — Every function call pushes a frame; returning pops it. Recursion is managed by the stack.
•Undo/Redo — Actions push to undo stack; undo pushes to redo stack. Last action is first to undo.
•Expression Evaluation — Parsing and evaluating mathematical expressions (postfix notation, operator precedence).
•Balanced Parentheses — Push opening brackets, pop and match on closing brackets.
•Browser History — Back button pops from history stack, pushes to forward stack.
•DFS Traversal — Depth-first search uses a stack (explicit or via recursion).

Stack Implementations:

Array-based — Use end of array as top; efficient but fixed capacity or resize needed
Linked list-based — Head is top; dynamic size but slight overhead

Both provide O(1) for all operations. Array-based is usually preferred for cache efficiency.

When to Use Stacks:

✅ LIFO access pattern is required ✅ Managing nested structures (recursion, scopes, tags) ✅ Backtracking algorithms ✅ Undo functionality

❌ Need access to elements other than top ❌ Need FIFO (use queue instead) ❌ Need random access

Recursion IS a Stack

Every recursive algorithm implicitly uses the call stack. You can convert any recursive algorithm to iterative by explicitly managing a stack. Understanding stacks deeply means understanding recursion deeply, and vice versa.

Queues — First-In, First-Out (FIFO)

The Queue models a different behavioral constraint: fairness. The first to arrive is the first to be served.

What Is a Queue?

A queue is a First-In, First-Out (FIFO) structure where elements are added at one end (the 'rear') and removed from the other (the 'front'). Think of a line at a ticket counter.

  Dequeue                                   Enqueue
     ↓                                         ↓
┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐
│Front: 42│ → │   17    │ → │   93    │ → │Rear: 56 │
└─────────┘   └─────────┘   └─────────┘   └─────────┘

Core Queue Operations:

enqueue(item) / offer(item) — Add item to rear
dequeue() / poll() — Remove and return front item
peek() / front() — Return front item without removing
isEmpty() — Check if queue is empty

All operations are O(1) in proper implementations.

Why Queues Matter — Use Cases:

Real-World Queue Applications

•Task Scheduling — Print jobs, CPU process scheduling, message queues
•BFS Traversal — Breadth-first search processes nodes level-by-level using a queue
•Buffers — Network packets, keyboard input, audio/video streaming
•Request Handling — Web servers process requests in order received
•Asynchronous Processing — Background job queues (RabbitMQ, SQS, Kafka)
•Fairness Guarantees — When order of arrival should determine processing order

Queue Variants:

Linear Queue — Basic FIFO; can waste space if not circular
Circular Queue — Wraps around; efficient array usage
Deque (Double-ended Queue) — Add/remove from both ends
Priority Queue — Elements dequeued by priority, not arrival order
Blocking Queue — Waits when empty or full; for producer-consumer patterns

Queue Implementations:

Circular Array — Efficient for fixed-capacity queues
Linked List — Dynamic size, O(1) operations with tail pointer
Two Stacks — Elegant but amortized O(1) for dequeue

When to Use Queues:

✅ FIFO processing required ✅ Order of arrival matters ✅ Buffering between producer and consumer ✅ BFS and level-order traversals

❌ Need access to arbitrary elements ❌ Need LIFO (use stack) ❌ Need priority-based processing (use priority queue)

BFS = Queue, DFS = Stack

This is one of the most important mental models in algorithms. BFS explores level-by-level using a queue (process all neighbors before going deeper). DFS explores depth-first using a stack (go as deep as possible before backtracking). The only difference is the data structure—the algorithm logic is nearly identical.

Trees — Hierarchical Structures

Trees model hierarchical relationships—structures where entities have parent-child connections forming a branching pattern.

What Is a Tree?

A tree is a connected, acyclic graph with:

A single root node (no parent)
Each non-root node has exactly one parent
Each node can have zero or more children
No cycles (you can't follow child/parent links and return to start)

                     [ROOT]
                    /   |   \
               [A]    [B]   [C]
              /   \          |
           [D]   [E]        [F]
                 / \
               [G] [H]

Tree Terminology:

Root — The topmost node (no parent)
Leaf — A node with no children
Internal Node — A node with at least one child
Depth — Distance from root to node (root has depth 0)
Height — Longest path from node to any descendant leaf
Subtree — A node and all its descendants

Why Trees Matter:

Trees naturally model countless real-world structures:

File systems (folders and files)
Organization charts (managers and reports)
HTML/XML documents (nested elements)
Abstract syntax trees (parsed code)
Decision trees (yes/no branches)
Taxonomies (kingdom → species)

Important Tree Variants:

Tree Family Members

•Binary Tree — Each node has at most 2 children (left, right)
•Binary Search Tree (BST) — Binary tree with ordering: left < parent < right
•Balanced Trees (AVL, Red-Black) — BSTs that maintain balance for O(log n) operations
•Heap — Complete binary tree with heap property (parent ≤ or ≥ children)
•Trie — Tree for storing strings, sharing common prefixes
•B-Tree / B+ Tree — Wide, balanced trees for databases and file systems
•N-ary Tree — Each node can have up to N children

Tree Operations (Balanced BST)
Operation	Complexity	Notes
Search	O(log n)	Binary search principle
Insert	O(log n)	Find position, add node, rebalance
Delete	O(log n)	Find node, restructure, rebalance
Find Min/Max	O(log n)	Follow leftmost/rightmost path
Traversal (in-order, pre-order, post-order)	O(n)	Visit all nodes

Trees Enable O(log n)

The magic of trees comes from halving the search space at each level. A balanced tree with n nodes has height ~log₂(n). 1 million nodes? Just 20 levels. 1 billion? About 30. This logarithmic efficiency is why trees underpin databases, file systems, and search engines.

Graphs — Networks of Connections

Graphs are the most general relationship structure—capable of representing any pattern of connections between entities.

What Is a Graph?

A graph consists of:

Vertices (Nodes) — The entities
Edges — The connections between entities

Unlike trees, graphs have no restrictions on connectivity:

Any node can connect to any number of other nodes
Cycles are allowed (A → B → C → A)
No root; nodes are peers
A node can have zero or many incoming/outgoing connections

     [A] ←——→ [B]
      ↑        ↓
      |       [C] ←——→ [D]
      ↓        |
     [E] ←—————┘

Graph Characteristics:

Directed vs Undirected — Edges have direction (A→B) or are bidirectional (A↔B)
Weighted vs Unweighted — Edges have values (distances, costs) or are uniform
Sparse vs Dense — Few edges relative to possible vs many edges
Connected vs Disconnected — All nodes reachable vs separate components
Cyclic vs Acyclic — Contains cycles vs no cycles (DAGs for dependencies)

Real-World Graph Examples:

Social networks (users and friendships/follows)
Road networks (intersections and roads)
The Internet (servers and connections)
Flight routes (airports and flights)
Dependencies (packages and requirements)
Knowledge graphs (entities and relationships)

Graph Representations:

Adjacency List

•Each vertex stores list of neighbors
•Space: O(V + E)
•Good for sparse graphs
•Efficient edge iteration
•Checking specific edge: O(degree)

Adjacency Matrix

•2D array: matrix[i][j] = edge exists?
•Space: O(V²)
•Good for dense graphs
•Constant time edge check: O(1)
•Inefficient for sparse graphs

Core Graph Algorithms
Algorithm	Purpose	Complexity
BFS	Level-order traversal, shortest path (unweighted)	O(V + E)
DFS	Explore paths, detect cycles, topological sort	O(V + E)
Dijkstra	Shortest path (weighted, non-negative)	O((V + E) log V)
Bellman-Ford	Shortest path (handles negative weights)	O(VE)
A*	Shortest path with heuristic	Varies by heuristic
Topological Sort	Order DAG respecting dependencies	O(V + E)
Union-Find	Check connectivity, find components	O(α(n)) per operation

Graphs Subsume Everything

Arrays, linked lists, and trees are all special cases of graphs. An array is a linear graph. A tree is an acyclic connected graph with a root. Understanding graphs means understanding the most general form of data relationships.

The Structure Selection Framework

With all these structures in mind, how do you choose the right one? Here's a decision framework:

Step 1: Identify the Relationship Type

No relationships required? → Primitives or simple record
Sequential (order matters)? → Array, linked list, stack, queue
Hierarchical (parent-child)? → Tree variants
Many-to-many (network)? → Graph
Key-value lookup? → Hash table, tree map

Step 2: Identify Critical Operations

What operations are most frequent?
What operations must be fast?
What operations can be slow?

Step 3: Consider Constraints

How much data? (affects scalability needs)
Memory limits? (arrays vs linked structures)
Concurrency? (thread-safe variants)
Latency requirements? (worst-case vs average)

Quick Structure Selection Guide
If You Need...	Consider...
Fast random access by index	Array, dynamic array
Fast insertion/deletion at ends	Deque, circular buffer
LIFO access	Stack
FIFO access	Queue
Fast key-value lookup (unordered)	Hash table
Ordered key-value with range queries	Balanced BST, skip list
Priority-based access	Heap / Priority queue
Prefix-based string operations	Trie
Hierarchical data	Tree
Arbitrary connections, path finding	Graph

Start Simple

When unsure, start with the simplest structure that works (often an array or hash table). Optimize only when you have evidence of performance problems. Premature optimization with complex structures often backfires—the 'better' structure is slower due to overhead and worse cache behavior.

Preview: The Journey Ahead

This page has provided a bird's-eye view of the non-primitive data structure landscape. The chapters ahead will dive deep into each:

Chapter 4: Strings — Deep exploration of string representation, operations, pattern matching, and algorithms.

Chapter 5: Arrays — Memory model, static vs dynamic arrays, two-pointer techniques, sliding windows, multi-dimensional arrays.

Chapter 6: Linked Lists — Node-based thinking, singly vs doubly linked, circular variants, two-pointer techniques for lists.

Chapter 7: Stacks — Implementation patterns, expression evaluation, parsing applications, monotonic stacks.

Chapter 8: Queues — Circular queues, deques, priority queues, queue-based algorithms.

Chapters 11-15: Trees — Binary trees, BSTs, balanced trees (AVL, Red-Black), heaps, tries.

Chapters 16, 23-24: Graphs — Representation, traversal, shortest paths, spanning trees, advanced algorithms.

The Learning Strategy:

For each structure, you'll learn:

Conceptual model — What is it? Why does it exist?
Operations and complexity — What can it do? How fast?
Implementation — How is it built internally?
Common patterns — What techniques use this structure?
Real applications — Where is it used in practice?
Trade-offs — When to use, when to avoid?

This consistent framework will help you build a coherent mental model where each structure is connected to the others, not isolated.

The Map Is Now Drawn

You now have a map of the territory. Every subsequent chapter fills in details on this map. When you learn about AVL trees, you'll know they're balanced BST variants for O(log n) guarantees. When you encounter Dijkstra's algorithm, you'll know it operates on weighted graphs. This context makes learning faster and more meaningful.

Summary: The Non-Primitive Data Structure Family

Let's consolidate our survey of non-primitive data structures:

Key Takeaways

•Arrays — Foundational, O(1) random access, cache-friendly. The default choice.
•Strings — Character sequences with specialized text operations. Immutability matters.
•Linked Lists — Dynamic, O(1) insertion/deletion at known positions. Cache-unfriendly.
•Stacks — LIFO access. Recursion, undo, expression parsing, DFS.
•Queues — FIFO access. Scheduling, buffering, BFS.
•Trees — Hierarchical relationships. O(log n) operations when balanced.
•Graphs — General many-to-many relationships. Most flexible, most complex.
•Selection framework — Match relationship type + critical operations + constraints to structures.

Module Complete:

This concludes Module 5: Non-Primitive Data Structures (Conceptual Overview). You now understand:

What distinguishes non-primitive structures from primitives
How they enable logical grouping and relationships
Why abstraction is essential
The major structure families and where each fits

With this foundation, you're prepared to dive deep into each structure in the chapters ahead, understanding not just how they work but why they're designed the way they are.

Module Complete

Congratulations! You've completed the conceptual foundation for non-primitive data structures. You are now equipped with the mental framework to understand every data structure this curriculum will cover. The deep dives begin with the next chapter.