Loading learning content...
Every software system manipulates collections. Users in a database. Products in a catalog. Messages in a queue. Files in a directory. At the heart of nearly every application lies the fundamental need to traverse collections—to visit each element, process it, and move to the next.
Yet collections come in wildly different forms. Arrays store elements contiguously in memory. Linked lists chain elements through pointers. Trees organize elements hierarchically. Graphs connect elements through arbitrary relationships. Hash tables scatter elements across buckets.
The problem: How do you write code that can traverse any collection without knowing (or caring about) its internal structure?
By the end of this page, you will understand why exposing collection internals for traversal creates fragile, tightly-coupled code. You'll see how different traversal requirements (forward, backward, filtered, transformed) compound the problem. Most importantly, you'll understand why we need a pattern that separates 'how to traverse' from 'what to traverse.'
When developers first encounter the need to traverse a collection, the most intuitive approach is to expose the collection's internal structure directly. If you have an array, you expose the array. If you have a linked list, you expose the head node. Let the client code figure out how to traverse it.
This seems perfectly reasonable for simple cases. Consider a notification system that needs to send alerts to all users:
12345678910111213141516171819202122232425262728293031323334353637
# Naive Approach: Exposing Internal Structure class UserRepository: """Repository that stores users in an internal array""" def __init__(self): self._users: list = [] # Internal storage - an array def add_user(self, user: User) -> None: self._users.append(user) def get_users(self) -> list: """Exposes internal array directly""" return self._users # ⚠️ Exposes implementation detail class NotificationService: """Service that sends notifications to all users""" def __init__(self, user_repository: UserRepository): self.user_repository = user_repository def notify_all(self, message: str) -> None: # Client code must know how to traverse an array users = self.user_repository.get_users() for i in range(len(users)): # Array-specific traversal users[i].send_notification(message) # Usagerepository = UserRepository()repository.add_user(User("alice@example.com"))repository.add_user(User("bob@example.com")) notification_service = NotificationService(repository)notification_service.notify_all("System maintenance tonight!")This works. But it creates a hidden dependency: NotificationService now knows that UserRepository stores users in a list. This knowledge is embedded in the for i in range(len(users)) loop.
What happens when requirements change?
When you expose a collection's internal structure, every piece of client code becomes coupled to that structure. Change the structure, and you must change every client. In large systems, this creates cascading changes that can affect dozens or hundreds of files.
Suppose your system grows. You now have millions of users, and array operations become performance bottlenecks. The team decides to switch to a linked list for O(1) insertions and deletions. Or perhaps you adopt a tree structure for sorted access. Or a hash table for O(1) lookups by ID.
Suddenly, every client that traverses users must change:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
# After switching from array to linked list... class UserNode: """Node in a linked list of users""" def __init__(self, user: User): self.user = user self.next: Optional['UserNode'] = None class UserRepository: """Now stores users in a linked list""" def __init__(self): self._head: Optional[UserNode] = None # Changed internal structure! self._tail: Optional[UserNode] = None def add_user(self, user: User) -> None: new_node = UserNode(user) if self._tail: self._tail.next = new_node else: self._head = new_node self._tail = new_node def get_head(self) -> Optional[UserNode]: """New API - exposes linked list structure""" return self._head class NotificationService: """Service that sends notifications to all users""" def __init__(self, user_repository: UserRepository): self.user_repository = user_repository def notify_all(self, message: str) -> None: # ❌ Old array traversal code is BROKEN # users = self.user_repository.get_users() # for i in range(len(users)): # users[i].send_notification(message) # ✅ Must rewrite for linked list traversal current = self.user_repository.get_head() while current is not None: # Linked list traversal current.user.send_notification(message) current = current.nextThe damage is extensive:
NotificationService must be rewrittenEvery single place in your codebase that traversed the user collection is now broken. And this is for one collection in one repository.
This scenario violates a fundamental principle of object-oriented design: encapsulation. The internal representation of an object should be hidden from clients. By exposing whether users are stored in an array, linked list, tree, or any other structure, we've leaked implementation details that clients should never know.
Real systems don't have just one type of collection. They have many—often with different structures optimized for different access patterns. Consider an e-commerce platform:
| Collection | Optimal Structure | Access Pattern | Traversal Method |
|---|---|---|---|
| Products | Hash Map (by ID) | O(1) lookup | Iterate over values() |
| Categories | Tree (hierarchical) | Parent-child navigation | DFS/BFS traversal |
| Order Queue | Linked List / Deque | FIFO processing | Pop from head |
| Price Tiers | Sorted Array | Range queries | Binary search bounds |
| User Sessions | LRU Cache | Recent access | Custom eviction order |
| Product Tags | Set | Membership test | Unordered iteration |
Now imagine writing a reporting module that needs to generate summaries across all these collections. Without abstraction, the reporting code becomes a maze of collection-specific traversal logic:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
# Without Iterator Pattern: Collection-Specific Chaos class ReportGenerator: """Generates reports across multiple collection types""" def __init__( self, product_repo: ProductRepository, # Uses HashMap category_repo: CategoryRepository, # Uses Tree order_queue: OrderQueue, # Uses LinkedList price_tiers: PriceTierRepository, # Uses SortedArray ): self.product_repo = product_repo self.category_repo = category_repo self.order_queue = order_queue self.price_tiers = price_tiers def generate_daily_summary(self) -> Report: report = Report() # Traverse products (HashMap) - must know it's a dict products = self.product_repo.get_product_map() for product_id, product in products.items(): report.add_product_metric(product) # Traverse categories (Tree) - must know it's a tree root = self.category_repo.get_root_category() self._traverse_category_tree(root, report) # Traverse orders (LinkedList) - must know it's a linked list current_order = self.order_queue.get_head() while current_order: report.add_order_metric(current_order.data) current_order = current_order.next # Traverse price tiers (SortedArray) - must know it's an array tiers = self.price_tiers.get_tiers_array() for i in range(len(tiers)): report.add_tier_metric(tiers[i]) return report def _traverse_category_tree(self, node: CategoryNode, report: Report): """Recursive tree traversal - tree-specific logic""" if node is None: return report.add_category_metric(node.category) for child in node.children: self._traverse_category_tree(child, report)The report generator must understand four completely different traversal mechanisms. Developers working on this code must mentally context-switch between array indices, dictionary iteration, linked list pointers, and recursive tree traversal—all in one method. This cognitive load increases bugs and slows development.
The problem deepens when you consider that traversal itself has many variations. It's not just about visiting every element—it's about how you visit them:
Without proper abstraction, each traversal variation requires collection-specific implementations. For a system with 5 collection types and 8 traversal variations, you potentially need 40 different traversal implementations—each tightly coupled to both the collection structure and the traversal type.
123456789101112131415161718192021222324252627282930
# The Combinatorial Explosion of Traversal Implementations # For Array-based collection:def traverse_array_forward(arr): ...def traverse_array_reverse(arr): ...def traverse_array_filtered(arr, predicate): ...def traverse_array_paginated(arr, page_size): ... # For LinkedList-based collection:def traverse_linked_list_forward(head): ...def traverse_linked_list_reverse(head): ... # Requires O(n) per step or doubly-linkeddef traverse_linked_list_filtered(head, predicate): ...def traverse_linked_list_paginated(head, page_size): ... # For Tree-based collection:def traverse_tree_preorder(root): ...def traverse_tree_inorder(root): ...def traverse_tree_postorder(root): ...def traverse_tree_level_order(root): ...def traverse_tree_filtered(root, predicate): ...def traverse_tree_paginated(root, page_size): ... # For Graph-based collection:def traverse_graph_dfs(start_node): ...def traverse_graph_bfs(start_node): ...def traverse_graph_topological(graph): ...def traverse_graph_filtered(start_node, predicate): ... # And so on for every combination...# This is UNMAINTAINABLE!Every time you add a new collection type, you must implement all traversal variations. Every time you add a new traversal variation, you must implement it for all collection types. The maintenance burden grows multiplicatively, not additively.
One of the most powerful features of object-oriented programming is polymorphism—the ability to treat different types uniformly through a common interface. But without proper iteration abstraction, collections break polymorphism.
Consider a function that should work with any collection of items:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
# Broken Polymorphism: Cannot Write Generic Code def calculate_total_value(collection) -> Decimal: """ Should calculate total value of any collection of items. But HOW do we traverse the collection? """ total = Decimal("0") # We must check the collection type and use specific traversal! if isinstance(collection, list): for i in range(len(collection)): total += collection[i].get_value() elif isinstance(collection, dict): for key, item in collection.items(): total += item.get_value() elif isinstance(collection, LinkedList): current = collection.head while current: total += current.data.get_value() current = current.next elif isinstance(collection, BinaryTree): # Need recursive traversal def traverse(node): nonlocal total if node: total += node.data.get_value() traverse(node.left) traverse(node.right) traverse(collection.root) else: raise TypeError(f"Unknown collection type: {type(collection)}") return total # The function "works" but violates every principle:# ❌ Open/Closed Principle: Must modify function for new collection types# ❌ Single Responsibility: One function handles multiple traversal algorithms# ❌ Liskov Substitution: Cannot substitute collections freely# ❌ Dependency Inversion: Depends on concrete collection implementationsThe isinstance checks are a code smell. They indicate that our abstraction is incomplete. We're forced to peek inside the collection to figure out how to traverse it, which defeats the purpose of having different collection types in the first place.
What we really want:
12345678910111213141516171819202122232425262728
# What We Want: True Polymorphic Iteration def calculate_total_value(collection: Iterable[Item]) -> Decimal: """ Calculate total value of ANY collection of items. We don't care HOW the collection stores items. We only care that we can iterate over them. """ total = Decimal("0") # Single, uniform traversal mechanism for item in collection: total += item.get_value() return total # This function works with:# - Arrays/Lists# - Linked Lists# - Trees (any traversal order)# - Hash Maps (over values)# - Sets# - Queues# - Database result sets# - File line iterators# - Network stream chunks# - ANY collection that provides iteration!The Iterator Pattern gives us exactly this: a uniform interface for traversing any collection, without exposing how the collection is structured internally. Client code simply asks for elements one at a time, and the iterator handles all the details of navigation.
The iteration problem isn't theoretical—it causes real pain in production systems. Here are patterns we see repeatedly:
In each case, the fundamental issue is the same: business logic becomes polluted with traversal mechanics. The code that should focus on 'what to do with each element' is instead preoccupied with 'how to get the next element.'
Let's crystallize the problem the Iterator Pattern solves. We need to satisfy multiple requirements simultaneously:
The formal problem statement:
Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representation.
This is exactly what the Gang of Four identified in their seminal work on design patterns. The Iterator Pattern provides a solution that satisfies all six requirements above while maintaining clean object-oriented design.
The Iterator Pattern is one of the original 23 patterns documented by the Gang of Four in 1994. It has become so fundamental that most modern programming languages build iteration directly into their syntax (Python's 'for...in', Java's enhanced for loop, JavaScript's 'for...of'). Understanding the pattern helps you leverage these features more effectively.
Before we dive into the full pattern implementation in the next page, let's preview the conceptual solution.
The key insight is to extract the traversal mechanism into its own object. Instead of the collection exposing its internals, the collection provides an 'iterator' object that knows how to traverse it. Clients interact only with the iterator, never with the collection's internal structure.
12345678910111213141516171819202122232425262728
# Conceptual Preview of the Iterator Solution # The Iterator abstracts traversalclass Iterator: def has_next(self) -> bool: """Returns True if more elements exist""" ... def next(self) -> Any: """Returns the next element and advances position""" ... # Collections provide iterators instead of exposing internalsclass Collection: def create_iterator(self) -> Iterator: """Factory method: creates an iterator for this collection""" ... # Client code uses only the Iterator interfacedef process_items(collection: Collection): iterator = collection.create_iterator() while iterator.has_next(): item = iterator.next() # Process item - no knowledge of collection structure! process(item)Why this works:
create_iterator() methodcreate_iterator() returns a fresh iterator with independent stateThe Iterator Pattern introduces a level of indirection between clients and collections. This indirection is the source of all its benefits. Clients depend on the abstract Iterator interface, not on concrete collection implementations. This is the Dependency Inversion Principle in action.
We've explored the fundamental challenges of collection traversal that motivate the Iterator Pattern:
What's next:
In the next page, we'll implement the Iterator Pattern in full detail. You'll see the complete structure—Iterator interface, Concrete Iterators, Aggregate interfaces, and Concrete Aggregates. We'll build iterators for multiple collection types and demonstrate how client code becomes simple, uniform, and completely decoupled from collection internals.
You now understand the problem of uniform collection traversal—why exposing collection internals creates fragile, coupled systems, and why we need an abstraction layer for iteration. Next, we'll build the solution: the Iterator Pattern's complete architecture.