Loading content...
In previous pages, we examined how to read array elements through linear traversal and stop early when conditions are met. But the true power of traversal emerges when we use it to create new data.
Most real-world programs don't just inspect data—they transform it. Extracting specific elements, converting values to new forms, grouping items by properties, or computing derived arrays are everyday tasks in software development. These operations share a common pattern: traversing an input while building output.
Mastering this pattern—what functional programmers call map, filter, and reduce—gives you a powerful mental model for data transformation that transcends specific programming languages and appears in databases, data pipelines, and virtually every domain of software engineering.
By the end of this page, you will understand how to collect elements into new arrays during traversal, transform values using map operations, filter elements using predicates, perform complex reductions and aggregations, and compose these operations into elegant data transformation pipelines.
The most fundamental productive traversal is collection: building a new array containing selected elements from the original. Unlike simple aggregation (computing a single value), collection produces structured output that preserves or reorganizes the input.
The Core Collect Pattern:
result = []
for element in input:
if should_include(element):
result.append(element)
return result
This pattern has three key components:
The power comes from what you can vary: the selection logic, what you add (original or transformed element), and how you organize the output.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
# The Collect Pattern: Building New Arrays# ========================================= def collect_basic(arr, predicate): """ Collect all elements satisfying predicate. This is the essence of filtering. Time: O(n) - must examine every element Space: O(k) - k = number of elements collected """ result = [] for x in arr: if predicate(x): result.append(x) return result def collect_evens(arr): """Practical example: collect all even numbers.""" return collect_basic(arr, lambda x: x % 2 == 0) def collect_with_indices(arr, predicate): """ Collect both elements AND their indices. Useful when position matters. """ result = [] for i, x in enumerate(arr): if predicate(x): result.append((i, x)) # Store (index, value) pairs return result def collect_unique(arr): """ Collect unique elements, preserving first occurrence order. Uses a set for O(1) lookup but preserves order. """ seen = set() result = [] for x in arr: if x not in seen: seen.add(x) result.append(x) return result def collect_every_nth(arr, n): """ Collect every nth element (subsampling). """ result = [] for i in range(0, len(arr), n): result.append(arr[i]) return result def collect_while(arr, predicate): """ Collect elements from the start while predicate holds. Stops at first failing element. """ result = [] for x in arr: if predicate(x): result.append(x) else: break # Stop collecting return result # Demonstrationif __name__ == "__main__": numbers = [5, 2, 8, 1, 9, 3, 7, 2, 6, 4] print(f"Original: {numbers}") print(f"Evens: {collect_evens(numbers)}") print(f"Greater than 5 with indices: {collect_with_indices(numbers, lambda x: x > 5)}") print(f"Unique (order preserved): {collect_unique(numbers)}") print(f"Every 3rd element: {collect_every_nth(numbers, 3)}") print(f"While < 8: {collect_while(numbers, lambda x: x < 8)}")Collecting creates new arrays, consuming O(k) extra space where k is the number of collected elements. In the worst case (collecting all elements), this is O(n). When memory is constrained, consider streaming/generator approaches that yield elements one at a time rather than building complete arrays.
Mapping transforms each element of an array into a new value, producing an output array of the same length. Unlike filtering (which reduces elements), mapping preserves structure while changing content.
The Core Map Pattern:
result = []
for element in input:
result.append(transform(element))
return result
The transformation function is the heart of mapping. It takes one value and produces one value—potentially of a different type. This simple contract enables powerful abstractions.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
# The Map Pattern: Transforming Every Element# ============================================ def map_basic(arr, transform): """ Apply transform function to every element. Invariant: len(result) == len(arr) Time: O(n) where each transform is O(1) Space: O(n) - output has same size as input """ result = [] for x in arr: result.append(transform(x)) return result # Practical transformationsdef double_all(arr): """Double each number.""" return map_basic(arr, lambda x: x * 2) def to_strings(arr): """Convert each element to its string representation.""" return map_basic(arr, str) def extract_property(arr, prop_name): """Extract a single property from objects/dicts.""" return map_basic(arr, lambda obj: obj.get(prop_name)) def clamp_values(arr, low, high): """Clamp each value to [low, high] range.""" def clamp(x): if x < low: return low if x > high: return high return x return map_basic(arr, clamp) def normalize(arr): """ Normalize values to [0, 1] range. Note: Requires two passes - one to find min/max, one to transform. This is common for relative transforms. """ if len(arr) == 0: return [] min_val = min(arr) max_val = max(arr) range_val = max_val - min_val if range_val == 0: return [0.5] * len(arr) # All same value return map_basic(arr, lambda x: (x - min_val) / range_val) def map_with_index(arr, transform): """ Transform using both value AND index. Useful for position-dependent transformations. """ result = [] for i, x in enumerate(arr): result.append(transform(i, x)) return result # Demonstrationif __name__ == "__main__": numbers = [1, 2, 3, 4, 5] print(f"Original: {numbers}") print(f"Doubled: {double_all(numbers)}") print(f"As strings: {to_strings(numbers)}") print(f"Clamped to [2, 4]: {clamp_values(numbers, 2, 4)}") print(f"Normalized: {normalize(numbers)}") # Index-aware transformation indexed = map_with_index(numbers, lambda i, x: f"{i}: {x}") print(f"With indices: {indexed}") # Extract from objects users = [ {'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}, {'name': 'Carol', 'age': 35} ] names = extract_property(users, 'name') print(f"Names: {names}")A key property of mapping is 1:1 correspondence—each input element produces exactly one output element at the same position. This means len(output) == len(input), and output[i] is derived from input[i]. This structural preservation makes reasoning about mapped data straightforward.
Filtering produces a subset of the input array containing only elements that satisfy a condition. While collect is the imperative implementation, filter is the conceptual abstraction—selecting elements based on a predicate function.
Properties of Filtering:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115
# The Filter Pattern: Selecting Subsets# ====================================== def filter_basic(arr, predicate): """ Keep only elements where predicate returns True. Time: O(n) - check every element Space: O(k) - k elements pass filter (0 <= k <= n) """ result = [] for x in arr: if predicate(x): result.append(x) return result # Common filtering operationsdef remove_none(arr): """Remove None values.""" return filter_basic(arr, lambda x: x is not None) def filter_by_type(arr, target_type): """Keep only elements of specified type.""" return filter_basic(arr, lambda x: isinstance(x, target_type)) def filter_in_range(arr, low, high): """Keep elements in [low, high] inclusive.""" return filter_basic(arr, lambda x: low <= x <= high) def remove_duplicates_stable(arr): """ Remove duplicates while preserving order. First occurrence of each value is kept. """ seen = set() result = [] for x in arr: if x not in seen: seen.add(x) result.append(x) return result def filter_by_indices(arr, indices): """ Keep only elements at specified indices. """ index_set = set(indices) result = [] for i, x in enumerate(arr): if i in index_set: result.append(x) return result def partition(arr, predicate): """ Split array into two: elements passing and failing predicate. Returns: (passing_elements, failing_elements) This is filter + anti-filter in one pass. """ passing = [] failing = [] for x in arr: if predicate(x): passing.append(x) else: failing.append(x) return passing, failing def filter_first_n(arr, predicate, n): """ Get first n elements matching predicate. Combines filtering with limit (early termination). """ result = [] for x in arr: if predicate(x): result.append(x) if len(result) >= n: break # Got enough return result # Demonstrationif __name__ == "__main__": mixed = [1, 'hello', 3.14, None, 42, 'world', None, 7] numbers = [5, 12, 3, 18, 9, 24, 6, 15, 2, 21] print(f"Original: {mixed}") print(f"Without None: {remove_none(mixed)}") print(f"Integers only: {filter_by_type(mixed, int)}") print(f"\nNumbers: {numbers}") print(f"In range [5, 15]: {filter_in_range(numbers, 5, 15)}") with_dups = [1, 2, 3, 2, 4, 3, 5, 1] print(f"\n{with_dups} without duplicates: {remove_duplicates_stable(with_dups)}") # Partition example evens, odds = partition(numbers, lambda x: x % 2 == 0) print(f"\nPartitioned {numbers}:") print(f" Evens: {evens}") print(f" Odds: {odds}") # Limited filter first_3_over_10 = filter_first_n(numbers, lambda x: x > 10, 3) print(f"\nFirst 3 numbers > 10: {first_3_over_10}")Filter changes the quantity of elements (keeping a subset), while map changes the quality of elements (transforming values). Filter: "Which elements do I keep?" Map: "What do I do to each element?" Different questions, different patterns.
Reduction (also called folding) combines all elements of an array into a single value. This is the most general aggregation pattern—sum, product, maximum, minimum, string concatenation, and even map and filter can be expressed as reductions.
The Core Reduce Pattern:
accumulator = initial_value
for element in input:
accumulator = combine(accumulator, element)
return accumulator
The combine function takes the running result and the current element, producing a new running result. The initial value establishes the starting point of the computation.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131
# The Reduce Pattern: Collapsing Arrays to Single Values# ======================================================== def reduce_basic(arr, combine, initial): """ Reduce array to single value using combine function. combine(accumulator, current) -> new_accumulator Time: O(n) assuming O(1) combine Space: O(1) - only accumulator stored """ accumulator = initial for x in arr: accumulator = combine(accumulator, x) return accumulator # Classic reductionsdef sum_all(arr): """Sum: reduce with addition, initial 0.""" return reduce_basic(arr, lambda acc, x: acc + x, 0) def product_all(arr): """Product: reduce with multiplication, initial 1.""" return reduce_basic(arr, lambda acc, x: acc * x, 1) def max_value(arr): """Maximum: reduce keeping larger value.""" if not arr: return None return reduce_basic(arr[1:], lambda acc, x: x if x > acc else acc, arr[0]) def min_value(arr): """Minimum: reduce keeping smaller value.""" if not arr: return None return reduce_basic(arr[1:], lambda acc, x: x if x < acc else acc, arr[0]) def concat_strings(arr, separator=""): """Concatenate strings with separator.""" if not arr: return "" return reduce_basic(arr[1:], lambda acc, x: acc + separator + x, arr[0]) def count_occurrences(arr, target): """Count how many times target appears.""" return reduce_basic(arr, lambda acc, x: acc + (1 if x == target else 0), 0) # Advanced reductionsdef build_frequency_map(arr): """ Build a dictionary counting occurrences of each element. This shows reduce building a complex structure, not just a number. """ def update_counts(counts, x): counts[x] = counts.get(x, 0) + 1 return counts return reduce_basic(arr, update_counts, {}) def flatten(nested_arr): """ Flatten one level of nesting. [[1, 2], [3], [4, 5]] -> [1, 2, 3, 4, 5] """ return reduce_basic(nested_arr, lambda acc, x: acc + x, []) def group_by(arr, key_fn): """ Group elements by a key function. Example: group_by([1, 2, 3, 4, 5], lambda x: x % 2) -> {0: [2, 4], 1: [1, 3, 5]} """ def add_to_group(groups, x): key = key_fn(x) if key not in groups: groups[key] = [] groups[key].append(x) return groups return reduce_basic(arr, add_to_group, {}) # Map and filter as special cases of reducedef map_via_reduce(arr, transform): """Map implemented using reduce.""" return reduce_basic(arr, lambda acc, x: acc + [transform(x)], []) def filter_via_reduce(arr, predicate): """Filter implemented using reduce.""" return reduce_basic(arr, lambda acc, x: acc + [x] if predicate(x) else acc, []) # Demonstrationif __name__ == "__main__": numbers = [1, 2, 3, 4, 5] words = ["hello", "world", "python"] print(f"Numbers: {numbers}") print(f"Sum: {sum_all(numbers)}") print(f"Product: {product_all(numbers)}") print(f"Max: {max_value(numbers)}") print(f"Min: {min_value(numbers)}") print(f"\nWords: {words}") print(f"Joined: {concat_strings(words, ' ')}") data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] print(f"\nData: {data}") print(f"Frequency map: {build_frequency_map(data)}") grouped = group_by(numbers, lambda x: "even" if x % 2 == 0 else "odd") print(f"Grouped by parity: {grouped}") nested = [[1, 2], [3, 4], [5]] print(f"\nNested: {nested}") print(f"Flattened: {flatten(nested)}")Reduce is the most general traversal pattern. In theory, any operation expressible with a loop can be expressed as a reduce. This is why functional programming languages treat reduce (or fold) as fundamental. Understanding reduce deeply unlocks the ability to express complex logic declaratively.
The real power emerges when you compose these patterns. Complex data transformations are often expressible as a chain of simple operations: filter to select, map to transform, reduce to aggregate.
The Pipeline Pattern:
data → filter → map → reduce → result
Each stage takes the output of the previous stage as input. This creates a readable, maintainable data transformation pipeline.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
# Composing Filter, Map, and Reduce# =================================== # Basic implementations for compositiondef my_filter(arr, predicate): return [x for x in arr if predicate(x)] def my_map(arr, transform): return [transform(x) for x in arr] def my_reduce(arr, combine, initial): acc = initial for x in arr: acc = combine(acc, x) return acc # Example 1: Process sales data# "What is the total revenue from orders over $100?"def total_high_value_revenue(orders): """ Pipeline: 1. Filter orders > $100 2. Map to extract amount 3. Reduce to sum """ high_value = my_filter(orders, lambda o: o['amount'] > 100) amounts = my_map(high_value, lambda o: o['amount']) total = my_reduce(amounts, lambda a, b: a + b, 0) return total # Example 2: Text processing# "Get uppercase versions of words with more than 4 letters"def transform_long_words(words): """ Pipeline: 1. Filter words with len > 4 2. Map to uppercase """ long_words = my_filter(words, lambda w: len(w) > 4) uppercased = my_map(long_words, lambda w: w.upper()) return uppercased # Example 3: Complex aggregation# "Get the average score of passing students"def average_passing_score(students): """ Pipeline: 1. Filter students with score >= 60 (passing) 2. Map to extract scores 3. Reduce to compute average """ passing = my_filter(students, lambda s: s['score'] >= 60) if not passing: return 0 scores = my_map(passing, lambda s: s['score']) total = my_reduce(scores, lambda a, b: a + b, 0) return total / len(scores) # The chain pattern - fluent styleclass DataPipeline: """ Wrapper enabling fluent chaining of operations. """ def __init__(self, data): self.data = list(data) def filter(self, predicate): self.data = [x for x in self.data if predicate(x)] return self # Return self for chaining def map(self, transform): self.data = [transform(x) for x in self.data] return self def reduce(self, combine, initial): acc = initial for x in self.data: acc = combine(acc, x) return acc # Terminal operation def collect(self): return self.data # Terminal operation # Same examples using fluent styledef total_high_value_fluent(orders): """Fluent pipeline version.""" return (DataPipeline(orders) .filter(lambda o: o['amount'] > 100) .map(lambda o: o['amount']) .reduce(lambda a, b: a + b, 0)) # Demonstrationif __name__ == "__main__": # Sales data example orders = [ {'id': 1, 'amount': 50}, {'id': 2, 'amount': 150}, {'id': 3, 'amount': 75}, {'id': 4, 'amount': 200}, {'id': 5, 'amount': 120} ] total = total_high_value_revenue(orders) print(f"Total revenue from orders > $100: ${total}") # Same with fluent API total_fluent = total_high_value_fluent(orders) print(f"Fluent version: ${total_fluent}") # Text processing words = ['cat', 'elephant', 'dog', 'rhinoceros', 'ant'] long_upper = transform_long_words(words) print(f"\nLong words uppercased: {long_upper}") # Student scores students = [ {'name': 'Alice', 'score': 85}, {'name': 'Bob', 'score': 55}, {'name': 'Carol', 'score': 72}, {'name': 'David', 'score': 48}, {'name': 'Eve', 'score': 91} ] avg = average_passing_score(students) print(f"\nAverage passing score: {avg:.1f}") # Fluent chaining demo result = (DataPipeline([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) .filter(lambda x: x % 2 == 0) # Keep evens .map(lambda x: x * x) # Square them .collect()) # Get result print(f"\nEvens squared: {result}")Filter is like SQL's WHERE clause (select rows). Map is like SELECT with transformations (choose/modify columns). Reduce is like aggregate functions (SUM, COUNT, AVG). Group_by is like SQL's GROUP BY. This mental model helps when switching between code and databases.
While chaining filter-map-reduce operations is elegant, it's important to understand the performance implications and optimize when necessary.
Multiple Passes vs. Single Pass:
Naive chaining creates intermediate arrays and makes multiple passes:
data → [filter pass] → temp1 → [map pass] → temp2 → [reduce pass] → result
This is O(n) time per operation (good), but creates O(n) intermediate storage per step (potentially wasteful).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131
# Efficiency Optimizations for Data Pipelines# ============================================= # Naive chaining (multiple passes, intermediate arrays)def naive_pipeline(data): """ Three passes, two intermediate arrays. Time: O(n) + O(k) + O(k) where k = filtered count Space: O(k) + O(k) for intermediates """ filtered = [x for x in data if x > 0] # Pass 1 transformed = [x * 2 for x in filtered] # Pass 2 result = sum(transformed) # Pass 3 return result # Single-pass fusion (one loop, no intermediates)def fused_pipeline(data): """ One pass, no intermediate arrays. Time: O(n) Space: O(1) """ result = 0 for x in data: if x > 0: # Filter logic transformed = x * 2 # Map logic result += transformed # Reduce logic return result # Generator-based lazy evaluation (single pass, streaming)def lazy_pipeline(data): """ Generators don't create intermediate arrays. Each element flows through the entire pipeline. Time: O(n) Space: O(1) - no intermediate storage """ filtered = (x for x in data if x > 0) # Generator (lazy) transformed = (x * 2 for x in filtered) # Generator (lazy) result = sum(transformed) # Consumes generators return result # When to use each approach"""NAIVE CHAINING:- Readable and maintainable- Good for prototyping- Acceptable for small datasets- When debugging (can inspect intermediates) FUSED PIPELINE:- Best performance- More complex logic in one place- Harder to modify individual steps- Use for hot paths in production LAZY GENERATORS:- Best of both worlds- Maintains composability- No intermediate storage- Python-specific (other languages have similar constructs)""" # Benchmarking comparisonimport time def benchmark(): data = list(range(-500000, 500000)) # 1 million elements # Benchmark naive start = time.time() result1 = naive_pipeline(data) naive_time = time.time() - start # Benchmark fused start = time.time() result2 = fused_pipeline(data) fused_time = time.time() - start # Benchmark lazy start = time.time() result3 = lazy_pipeline(data) lazy_time = time.time() - start print(f"Results: naive={result1}, fused={result2}, lazy={result3}") print(f"Times: naive={naive_time:.4f}s, fused={fused_time:.4f}s, lazy={lazy_time:.4f}s") print(f"Speedup (fused vs naive): {naive_time/fused_time:.2f}x") # Order matters for efficiencydef order_matters_example(): """ Filter placement affects performance. Filter early to reduce work for subsequent operations. """ data = list(range(1000000)) # BAD: Expensive map on all elements, then filter def bad_order(): mapped = [x ** 2 for x in data] # 1M expensive operations filtered = [x for x in mapped if x < 100] # Filter after return filtered # GOOD: Filter first, then expensive map on fewer elements def good_order(): filtered = [x for x in data if x < 10] # Filter to ~10 elements mapped = [x ** 2 for x in filtered] # Only 10 operations return mapped # Compare start = time.time() bad_order() bad_time = time.time() - start start = time.time() good_order() good_time = time.time() - start print(f"\nOrder matters:") print(f" Map then filter: {bad_time:.4f}s") print(f" Filter then map: {good_time:.4f}s") if __name__ == "__main__": benchmark() order_matters_example()When composing operations, filter as early as possible to reduce the dataset before expensive transformations. If you filter 90% of elements, all subsequent operations only process 10% of the original data. This simple reordering can yield 10x speedups.
The collect/map/filter/reduce patterns appear throughout real-world software development. Recognizing these patterns helps you leverage existing tools and write more idiomatic code.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
# Real-World Pattern Examples# ============================ # Example: E-commerce order processingdef process_orders(orders): """ Business requirement: Get total revenue from completed orders placed after 2024, excluding test accounts. """ is_real_order = lambda o: not o.get('is_test', False) is_completed = lambda o: o['status'] == 'completed' is_after_2024 = lambda o: o['year'] >= 2024 # Pipeline: filter test accounts -> filter completed -> # filter year -> map to amount -> reduce to sum real_orders = [o for o in orders if is_real_order(o)] completed = [o for o in real_orders if is_completed(o)] recent = [o for o in completed if is_after_2024(o)] amounts = [o['amount'] for o in recent] total = sum(amounts) return total # Example: Log analysisdef analyze_errors(log_entries): """ Find error distribution by service for this week. """ from collections import defaultdict from datetime import datetime, timedelta one_week_ago = datetime.now() - timedelta(days=7) # Filter: errors only, within time window errors = [ entry for entry in log_entries if entry['level'] == 'ERROR' and entry['timestamp'] > one_week_ago ] # Group by service, count occurrences error_counts = defaultdict(int) for entry in errors: error_counts[entry['service']] += 1 return dict(error_counts) # Example: User analyticsdef calculate_engagement_score(user_activities): """ Compute engagement score based on weighted activity. """ weights = { 'view': 1, 'like': 2, 'comment': 3, 'share': 5, 'purchase': 10 } # Map activities to weighted scores, then sum scores = [weights.get(activity['type'], 0) for activity in user_activities] total_score = sum(scores) return total_score # Example: Feature flag rolloutdef get_feature_eligible_users(users, feature_config): """ Determine which users should see a new feature. """ # Filter by multiple criteria eligible = [] for user in users: # Check region if feature_config.get('regions') and user['region'] not in feature_config['regions']: continue # Check user tier if feature_config.get('tiers') and user['tier'] not in feature_config['tiers']: continue # Check percentage rollout (by user ID hash) if feature_config.get('percentage'): if hash(user['id']) % 100 >= feature_config['percentage']: continue eligible.append(user) # Map to just user IDs return [u['id'] for u in eligible] # Demonstrationif __name__ == "__main__": # Order processing example orders = [ {'id': 1, 'amount': 100, 'status': 'completed', 'year': 2024, 'is_test': False}, {'id': 2, 'amount': 200, 'status': 'completed', 'year': 2024, 'is_test': True}, {'id': 3, 'amount': 150, 'status': 'pending', 'year': 2024, 'is_test': False}, {'id': 4, 'amount': 300, 'status': 'completed', 'year': 2024, 'is_test': False}, {'id': 5, 'amount': 50, 'status': 'completed', 'year': 2023, 'is_test': False}, ] revenue = process_orders(orders) print(f"Total qualifying revenue: ${revenue}") # Expected: $400 (orders 1 and 4 only)You've now mastered the art of transforming data during traversal. These patterns—collect, map, filter, and reduce—are the building blocks of data processing in every programming paradigm and domain.
What's Next:
Now that you can transform data during traversal, the final piece is building output arrays: understanding how to efficiently construct new arrays as you iterate, handle growing collections, and apply these patterns when the output structure differs from the input. The next page explores these techniques in depth.
You've mastered collecting, mapping, filtering, and reducing—the four fundamental patterns for productive traversal. You can now express complex data transformations as compositions of simple operations. In the next page, we'll focus on efficient techniques for building output arrays.