Array Traversal Patterns - Learning Module

Loading content...

0/279

Collecting and Transforming During Traversal

From Reading to Writing: Productive Traversal

In previous pages, we examined how to read array elements through linear traversal and stop early when conditions are met. But the true power of traversal emerges when we use it to create new data.

Most real-world programs don't just inspect data—they transform it. Extracting specific elements, converting values to new forms, grouping items by properties, or computing derived arrays are everyday tasks in software development. These operations share a common pattern: traversing an input while building output.

Mastering this pattern—what functional programmers call map, filter, and reduce—gives you a powerful mental model for data transformation that transcends specific programming languages and appears in databases, data pipelines, and virtually every domain of software engineering.

What You Will Learn

By the end of this page, you will understand how to collect elements into new arrays during traversal, transform values using map operations, filter elements using predicates, perform complex reductions and aggregations, and compose these operations into elegant data transformation pipelines.

The Collect Pattern: Gathering Elements

The most fundamental productive traversal is collection: building a new array containing selected elements from the original. Unlike simple aggregation (computing a single value), collection produces structured output that preserves or reorganizes the input.

The Core Collect Pattern:

result = []
for element in input:
    if should_include(element):
        result.append(element)
return result

This pattern has three key components:

Initialize output container — An empty list/array to hold results
Iterate through input — Standard traversal
Conditionally add to output — Selection logic determines inclusion

The power comes from what you can vary: the selection logic, what you add (original or transformed element), and how you organize the output.

collect_pattern.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# The Collect Pattern: Building New Arrays
# =========================================
 
def collect_basic(arr, predicate):
    """
    Collect all elements satisfying predicate.
    This is the essence of filtering.
    
    Time: O(n) - must examine every element
    Space: O(k) - k = number of elements collected
    """
    result = []
    for x in arr:
        if predicate(x):
            result.append(x)
    return result
 
 
def collect_evens(arr):
    """Practical example: collect all even numbers."""
    return collect_basic(arr, lambda x: x % 2 == 0)
 
 
def collect_with_indices(arr, predicate):
    """
    Collect both elements AND their indices.
    Useful when position matters.
    """
    result = []
    for i, x in enumerate(arr):
        if predicate(x):
            result.append((i, x))  # Store (index, value) pairs
    return result
 
 
def collect_unique(arr):
    """
    Collect unique elements, preserving first occurrence order.
    Uses a set for O(1) lookup but preserves order.
    """
    seen = set()
    result = []
    for x in arr:
        if x not in seen:
            seen.add(x)
            result.append(x)
    return result
 
 
def collect_every_nth(arr, n):
    """
    Collect every nth element (subsampling).
    """
    result = []
    for i in range(0, len(arr), n):
        result.append(arr[i])
    return result
 
 
def collect_while(arr, predicate):
    """
    Collect elements from the start while predicate holds.
    Stops at first failing element.
    """
    result = []
    for x in arr:
        if predicate(x):
            result.append(x)
        else:
            break  # Stop collecting
    return result
 
 
# Demonstration
if __name__ == "__main__":
    numbers = [5, 2, 8, 1, 9, 3, 7, 2, 6, 4]
    
    print(f"Original: {numbers}")
    print(f"Evens: {collect_evens(numbers)}")
    print(f"Greater than 5 with indices: {collect_with_indices(numbers, lambda x: x > 5)}")
    print(f"Unique (order preserved): {collect_unique(numbers)}")
    print(f"Every 3rd element: {collect_every_nth(numbers, 3)}")
    print(f"While < 8: {collect_while(numbers, lambda x: x < 8)}")

Space Complexity Consideration

Collecting creates new arrays, consuming O(k) extra space where k is the number of collected elements. In the worst case (collecting all elements), this is O(n). When memory is constrained, consider streaming/generator approaches that yield elements one at a time rather than building complete arrays.

The Map Pattern: Transforming Elements

Mapping transforms each element of an array into a new value, producing an output array of the same length. Unlike filtering (which reduces elements), mapping preserves structure while changing content.

The Core Map Pattern:

result = []
for element in input:
    result.append(transform(element))
return result

The transformation function is the heart of mapping. It takes one value and produces one value—potentially of a different type. This simple contract enables powerful abstractions.

map_pattern.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# The Map Pattern: Transforming Every Element
# ============================================
 
def map_basic(arr, transform):
    """
    Apply transform function to every element.
    
    Invariant: len(result) == len(arr)
    
    Time: O(n) where each transform is O(1)
    Space: O(n) - output has same size as input
    """
    result = []
    for x in arr:
        result.append(transform(x))
    return result
 
 
# Practical transformations
def double_all(arr):
    """Double each number."""
    return map_basic(arr, lambda x: x * 2)
 
 
def to_strings(arr):
    """Convert each element to its string representation."""
    return map_basic(arr, str)
 
 
def extract_property(arr, prop_name):
    """Extract a single property from objects/dicts."""
    return map_basic(arr, lambda obj: obj.get(prop_name))
 
 
def clamp_values(arr, low, high):
    """Clamp each value to [low, high] range."""
    def clamp(x):
        if x < low:
            return low
        if x > high:
            return high
        return x
    return map_basic(arr, clamp)
 
 
def normalize(arr):
    """
    Normalize values to [0, 1] range.
    
    Note: Requires two passes - one to find min/max, 
    one to transform. This is common for relative transforms.
    """
    if len(arr) == 0:
        return []
    
    min_val = min(arr)
    max_val = max(arr)
    range_val = max_val - min_val
    
    if range_val == 0:
        return [0.5] * len(arr)  # All same value
    
    return map_basic(arr, lambda x: (x - min_val) / range_val)
 
 
def map_with_index(arr, transform):
    """
    Transform using both value AND index.
    Useful for position-dependent transformations.
    """
    result = []
    for i, x in enumerate(arr):
        result.append(transform(i, x))
    return result
 
 
# Demonstration
if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]
    
    print(f"Original: {numbers}")
    print(f"Doubled: {double_all(numbers)}")
    print(f"As strings: {to_strings(numbers)}")
    print(f"Clamped to [2, 4]: {clamp_values(numbers, 2, 4)}")
    print(f"Normalized: {normalize(numbers)}")
    
    # Index-aware transformation
    indexed = map_with_index(numbers, lambda i, x: f"{i}: {x}")
    print(f"With indices: {indexed}")
    
    # Extract from objects
    users = [
        {'name': 'Alice', 'age': 30},
        {'name': 'Bob', 'age': 25},
        {'name': 'Carol', 'age': 35}
    ]
    names = extract_property(users, 'name')
    print(f"Names: {names}")

Map Preserves Structure

A key property of mapping is 1:1 correspondence—each input element produces exactly one output element at the same position. This means len(output) == len(input), and output[i] is derived from input[i]. This structural preservation makes reasoning about mapped data straightforward.

The Filter Pattern: Selective Inclusion

Filtering produces a subset of the input array containing only elements that satisfy a condition. While collect is the imperative implementation, filter is the conceptual abstraction—selecting elements based on a predicate function.

Properties of Filtering:

Output length ≤ input length
Output elements are a subset of input (not transformed)
Element order is preserved
Predicate function returns boolean (include or exclude)

filter_pattern.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# The Filter Pattern: Selecting Subsets
# ======================================
 
def filter_basic(arr, predicate):
    """
    Keep only elements where predicate returns True.
    
    Time: O(n) - check every element
    Space: O(k) - k elements pass filter (0 <= k <= n)
    """
    result = []
    for x in arr:
        if predicate(x):
            result.append(x)
    return result
 
 
# Common filtering operations
def remove_none(arr):
    """Remove None values."""
    return filter_basic(arr, lambda x: x is not None)
 
 
def filter_by_type(arr, target_type):
    """Keep only elements of specified type."""
    return filter_basic(arr, lambda x: isinstance(x, target_type))
 
 
def filter_in_range(arr, low, high):
    """Keep elements in [low, high] inclusive."""
    return filter_basic(arr, lambda x: low <= x <= high)
 
 
def remove_duplicates_stable(arr):
    """
    Remove duplicates while preserving order.
    First occurrence of each value is kept.
    """
    seen = set()
    result = []
    for x in arr:
        if x not in seen:
            seen.add(x)
            result.append(x)
    return result
 
 
def filter_by_indices(arr, indices):
    """
    Keep only elements at specified indices.
    """
    index_set = set(indices)
    result = []
    for i, x in enumerate(arr):
        if i in index_set:
            result.append(x)
    return result
 
 
def partition(arr, predicate):
    """
    Split array into two: elements passing and failing predicate.
    
    Returns: (passing_elements, failing_elements)
    
    This is filter + anti-filter in one pass.
    """
    passing = []
    failing = []
    for x in arr:
        if predicate(x):
            passing.append(x)
        else:
            failing.append(x)
    return passing, failing
 
 
def filter_first_n(arr, predicate, n):
    """
    Get first n elements matching predicate.
    Combines filtering with limit (early termination).
    """
    result = []
    for x in arr:
        if predicate(x):
            result.append(x)
            if len(result) >= n:
                break  # Got enough
    return result
 
 
# Demonstration
if __name__ == "__main__":
    mixed = [1, 'hello', 3.14, None, 42, 'world', None, 7]
    numbers = [5, 12, 3, 18, 9, 24, 6, 15, 2, 21]
    
    print(f"Original: {mixed}")
    print(f"Without None: {remove_none(mixed)}")
    print(f"Integers only: {filter_by_type(mixed, int)}")
    
    print(f"
Numbers: {numbers}")
    print(f"In range [5, 15]: {filter_in_range(numbers, 5, 15)}")
    
    with_dups = [1, 2, 3, 2, 4, 3, 5, 1]
    print(f"
{with_dups} without duplicates: {remove_duplicates_stable(with_dups)}")
    
    # Partition example
    evens, odds = partition(numbers, lambda x: x % 2 == 0)
    print(f"
Partitioned {numbers}:")
    print(f"  Evens: {evens}")
    print(f"  Odds: {odds}")
    
    # Limited filter
    first_3_over_10 = filter_first_n(numbers, lambda x: x > 10, 3)
    print(f"
First 3 numbers > 10: {first_3_over_10}")

Filter vs. Map

Filter changes the quantity of elements (keeping a subset), while map changes the quality of elements (transforming values). Filter: "Which elements do I keep?" Map: "What do I do to each element?" Different questions, different patterns.

The Reduce Pattern: Aggregating to One Value

Reduction (also called folding) combines all elements of an array into a single value. This is the most general aggregation pattern—sum, product, maximum, minimum, string concatenation, and even map and filter can be expressed as reductions.

The Core Reduce Pattern:

accumulator = initial_value
for element in input:
    accumulator = combine(accumulator, element)
return accumulator

The combine function takes the running result and the current element, producing a new running result. The initial value establishes the starting point of the computation.

reduce_pattern.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# The Reduce Pattern: Collapsing Arrays to Single Values
# ========================================================
 
def reduce_basic(arr, combine, initial):
    """
    Reduce array to single value using combine function.
    
    combine(accumulator, current) -> new_accumulator
    
    Time: O(n) assuming O(1) combine
    Space: O(1) - only accumulator stored
    """
    accumulator = initial
    for x in arr:
        accumulator = combine(accumulator, x)
    return accumulator
 
 
# Classic reductions
def sum_all(arr):
    """Sum: reduce with addition, initial 0."""
    return reduce_basic(arr, lambda acc, x: acc + x, 0)
 
 
def product_all(arr):
    """Product: reduce with multiplication, initial 1."""
    return reduce_basic(arr, lambda acc, x: acc * x, 1)
 
 
def max_value(arr):
    """Maximum: reduce keeping larger value."""
    if not arr:
        return None
    return reduce_basic(arr[1:], lambda acc, x: x if x > acc else acc, arr[0])
 
 
def min_value(arr):
    """Minimum: reduce keeping smaller value."""
    if not arr:
        return None
    return reduce_basic(arr[1:], lambda acc, x: x if x < acc else acc, arr[0])
 
 
def concat_strings(arr, separator=""):
    """Concatenate strings with separator."""
    if not arr:
        return ""
    return reduce_basic(arr[1:], lambda acc, x: acc + separator + x, arr[0])
 
 
def count_occurrences(arr, target):
    """Count how many times target appears."""
    return reduce_basic(arr, lambda acc, x: acc + (1 if x == target else 0), 0)
 
 
# Advanced reductions
def build_frequency_map(arr):
    """
    Build a dictionary counting occurrences of each element.
    
    This shows reduce building a complex structure, not just a number.
    """
    def update_counts(counts, x):
        counts[x] = counts.get(x, 0) + 1
        return counts
    
    return reduce_basic(arr, update_counts, {})
 
 
def flatten(nested_arr):
    """
    Flatten one level of nesting.
    [[1, 2], [3], [4, 5]] -> [1, 2, 3, 4, 5]
    """
    return reduce_basic(nested_arr, lambda acc, x: acc + x, [])
 
 
def group_by(arr, key_fn):
    """
    Group elements by a key function.
    
    Example: group_by([1, 2, 3, 4, 5], lambda x: x % 2)
    -> {0: [2, 4], 1: [1, 3, 5]}
    """
    def add_to_group(groups, x):
        key = key_fn(x)
        if key not in groups:
            groups[key] = []
        groups[key].append(x)
        return groups
    
    return reduce_basic(arr, add_to_group, {})
 
 
# Map and filter as special cases of reduce
def map_via_reduce(arr, transform):
    """Map implemented using reduce."""
    return reduce_basic(arr, lambda acc, x: acc + [transform(x)], [])
 
 
def filter_via_reduce(arr, predicate):
    """Filter implemented using reduce."""
    return reduce_basic(arr, 
                        lambda acc, x: acc + [x] if predicate(x) else acc, 
                        [])
 
 
# Demonstration
if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]
    words = ["hello", "world", "python"]
    
    print(f"Numbers: {numbers}")
    print(f"Sum: {sum_all(numbers)}")
    print(f"Product: {product_all(numbers)}")
    print(f"Max: {max_value(numbers)}")
    print(f"Min: {min_value(numbers)}")
    
    print(f"
Words: {words}")
    print(f"Joined: {concat_strings(words, ' ')}")
    
    data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
    print(f"
Data: {data}")
    print(f"Frequency map: {build_frequency_map(data)}")
    
    grouped = group_by(numbers, lambda x: "even" if x % 2 == 0 else "odd")
    print(f"Grouped by parity: {grouped}")
    
    nested = [[1, 2], [3, 4], [5]]
    print(f"
Nested: {nested}")
    print(f"Flattened: {flatten(nested)}")

The Power of Reduce

Reduce is the most general traversal pattern. In theory, any operation expressible with a loop can be expressed as a reduce. This is why functional programming languages treat reduce (or fold) as fundamental. Understanding reduce deeply unlocks the ability to express complex logic declaratively.

Composing Filter, Map, and Reduce

The real power emerges when you compose these patterns. Complex data transformations are often expressible as a chain of simple operations: filter to select, map to transform, reduce to aggregate.

The Pipeline Pattern:

data → filter → map → reduce → result

Each stage takes the output of the previous stage as input. This creates a readable, maintainable data transformation pipeline.

composing_operations.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# Composing Filter, Map, and Reduce
# ===================================
 
# Basic implementations for composition
def my_filter(arr, predicate):
    return [x for x in arr if predicate(x)]
 
def my_map(arr, transform):
    return [transform(x) for x in arr]
 
def my_reduce(arr, combine, initial):
    acc = initial
    for x in arr:
        acc = combine(acc, x)
    return acc
 
 
# Example 1: Process sales data
# "What is the total revenue from orders over $100?"
def total_high_value_revenue(orders):
    """
    Pipeline: 
    1. Filter orders > $100
    2. Map to extract amount
    3. Reduce to sum
    """
    high_value = my_filter(orders, lambda o: o['amount'] > 100)
    amounts = my_map(high_value, lambda o: o['amount'])
    total = my_reduce(amounts, lambda a, b: a + b, 0)
    return total
 
 
# Example 2: Text processing
# "Get uppercase versions of words with more than 4 letters"
def transform_long_words(words):
    """
    Pipeline:
    1. Filter words with len > 4
    2. Map to uppercase
    """
    long_words = my_filter(words, lambda w: len(w) > 4)
    uppercased = my_map(long_words, lambda w: w.upper())
    return uppercased
 
 
# Example 3: Complex aggregation
# "Get the average score of passing students"
def average_passing_score(students):
    """
    Pipeline:
    1. Filter students with score >= 60 (passing)
    2. Map to extract scores
    3. Reduce to compute average
    """
    passing = my_filter(students, lambda s: s['score'] >= 60)
    
    if not passing:
        return 0
    
    scores = my_map(passing, lambda s: s['score'])
    total = my_reduce(scores, lambda a, b: a + b, 0)
    return total / len(scores)
 
 
# The chain pattern - fluent style
class DataPipeline:
    """
    Wrapper enabling fluent chaining of operations.
    """
    def __init__(self, data):
        self.data = list(data)
    
    def filter(self, predicate):
        self.data = [x for x in self.data if predicate(x)]
        return self  # Return self for chaining
    
    def map(self, transform):
        self.data = [transform(x) for x in self.data]
        return self
    
    def reduce(self, combine, initial):
        acc = initial
        for x in self.data:
            acc = combine(acc, x)
        return acc  # Terminal operation
    
    def collect(self):
        return self.data  # Terminal operation
 
 
# Same examples using fluent style
def total_high_value_fluent(orders):
    """Fluent pipeline version."""
    return (DataPipeline(orders)
            .filter(lambda o: o['amount'] > 100)
            .map(lambda o: o['amount'])
            .reduce(lambda a, b: a + b, 0))
 
 
# Demonstration
if __name__ == "__main__":
    # Sales data example
    orders = [
        {'id': 1, 'amount': 50},
        {'id': 2, 'amount': 150},
        {'id': 3, 'amount': 75},
        {'id': 4, 'amount': 200},
        {'id': 5, 'amount': 120}
    ]
    
    total = total_high_value_revenue(orders)
    print(f"Total revenue from orders > $100: ${total}")
    
    # Same with fluent API
    total_fluent = total_high_value_fluent(orders)
    print(f"Fluent version: ${total_fluent}")
    
    # Text processing
    words = ['cat', 'elephant', 'dog', 'rhinoceros', 'ant']
    long_upper = transform_long_words(words)
    print(f"
Long words uppercased: {long_upper}")
    
    # Student scores
    students = [
        {'name': 'Alice', 'score': 85},
        {'name': 'Bob', 'score': 55},
        {'name': 'Carol', 'score': 72},
        {'name': 'David', 'score': 48},
        {'name': 'Eve', 'score': 91}
    ]
    
    avg = average_passing_score(students)
    print(f"
Average passing score: {avg:.1f}")
    
    # Fluent chaining demo
    result = (DataPipeline([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
              .filter(lambda x: x % 2 == 0)  # Keep evens
              .map(lambda x: x * x)          # Square them
              .collect())                     # Get result
    print(f"
Evens squared: {result}")

The SQL Analogy

Filter is like SQL's WHERE clause (select rows). Map is like SELECT with transformations (choose/modify columns). Reduce is like aggregate functions (SUM, COUNT, AVG). Group_by is like SQL's GROUP BY. This mental model helps when switching between code and databases.

Efficiency Considerations

While chaining filter-map-reduce operations is elegant, it's important to understand the performance implications and optimize when necessary.

Multiple Passes vs. Single Pass:

Naive chaining creates intermediate arrays and makes multiple passes:

data → [filter pass] → temp1 → [map pass] → temp2 → [reduce pass] → result

This is O(n) time per operation (good), but creates O(n) intermediate storage per step (potentially wasteful).

efficiency_optimizations.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# Efficiency Optimizations for Data Pipelines
# =============================================
 
# Naive chaining (multiple passes, intermediate arrays)
def naive_pipeline(data):
    """
    Three passes, two intermediate arrays.
    Time: O(n) + O(k) + O(k) where k = filtered count
    Space: O(k) + O(k) for intermediates
    """
    filtered = [x for x in data if x > 0]           # Pass 1
    transformed = [x * 2 for x in filtered]         # Pass 2
    result = sum(transformed)                        # Pass 3
    return result
 
 
# Single-pass fusion (one loop, no intermediates)
def fused_pipeline(data):
    """
    One pass, no intermediate arrays.
    Time: O(n)
    Space: O(1)
    """
    result = 0
    for x in data:
        if x > 0:              # Filter logic
            transformed = x * 2  # Map logic
            result += transformed  # Reduce logic
    return result
 
 
# Generator-based lazy evaluation (single pass, streaming)
def lazy_pipeline(data):
    """
    Generators don't create intermediate arrays.
    Each element flows through the entire pipeline.
    
    Time: O(n)
    Space: O(1) - no intermediate storage
    """
    filtered = (x for x in data if x > 0)     # Generator (lazy)
    transformed = (x * 2 for x in filtered)     # Generator (lazy)
    result = sum(transformed)                   # Consumes generators
    return result
 
 
# When to use each approach
"""
NAIVE CHAINING:
- Readable and maintainable
- Good for prototyping
- Acceptable for small datasets
- When debugging (can inspect intermediates)
 
FUSED PIPELINE:
- Best performance
- More complex logic in one place
- Harder to modify individual steps
- Use for hot paths in production
 
LAZY GENERATORS:
- Best of both worlds
- Maintains composability
- No intermediate storage
- Python-specific (other languages have similar constructs)
"""
 
 
# Benchmarking comparison
import time
 
def benchmark():
    data = list(range(-500000, 500000))  # 1 million elements
    
    # Benchmark naive
    start = time.time()
    result1 = naive_pipeline(data)
    naive_time = time.time() - start
    
    # Benchmark fused  
    start = time.time()
    result2 = fused_pipeline(data)
    fused_time = time.time() - start
    
    # Benchmark lazy
    start = time.time()
    result3 = lazy_pipeline(data)
    lazy_time = time.time() - start
    
    print(f"Results: naive={result1}, fused={result2}, lazy={result3}")
    print(f"Times: naive={naive_time:.4f}s, fused={fused_time:.4f}s, lazy={lazy_time:.4f}s")
    print(f"Speedup (fused vs naive): {naive_time/fused_time:.2f}x")
 
 
# Order matters for efficiency
def order_matters_example():
    """
    Filter placement affects performance.
    Filter early to reduce work for subsequent operations.
    """
    data = list(range(1000000))
    
    # BAD: Expensive map on all elements, then filter
    def bad_order():
        mapped = [x ** 2 for x in data]  # 1M expensive operations
        filtered = [x for x in mapped if x < 100]  # Filter after
        return filtered
    
    # GOOD: Filter first, then expensive map on fewer elements
    def good_order():
        filtered = [x for x in data if x < 10]  # Filter to ~10 elements
        mapped = [x ** 2 for x in filtered]    # Only 10 operations
        return mapped
    
    # Compare
    start = time.time()
    bad_order()
    bad_time = time.time() - start
    
    start = time.time()
    good_order()
    good_time = time.time() - start
    
    print(f"
Order matters:")
    print(f"  Map then filter: {bad_time:.4f}s")
    print(f"  Filter then map: {good_time:.4f}s")
 
 
if __name__ == "__main__":
    benchmark()
    order_matters_example()

Filter Early, Transform Late

When composing operations, filter as early as possible to reduce the dataset before expensive transformations. If you filter 90% of elements, all subsequent operations only process 10% of the original data. This simple reordering can yield 10x speedups.

Real-World Applications

The collect/map/filter/reduce patterns appear throughout real-world software development. Recognizing these patterns helps you leverage existing tools and write more idiomatic code.

Industry Applications

•Data Processing Pipelines — ETL (Extract, Transform, Load) operations follow filter-map-reduce patterns. Spark, Pandas, and similar frameworks expose these as first-class operations.
•Database Queries — SQL's SELECT (map), WHERE (filter), and aggregate functions (reduce) mirror these patterns. Understanding them helps write better queries.
•API Response Processing — Receiving JSON arrays from APIs, filtering relevant items, transforming to application models, and aggregating statistics.
•User Interface Rendering — Filtering visible items, mapping data to UI components, and reducing to compute display properties.
•Log Analysis — Filtering log entries by level/timestamp, extracting relevant fields, counting occurrences, or computing error rates.
•Financial Calculations — Filtering transactions by criteria, mapping to normalized values, reducing to totals, averages, or complex financial metrics.

realworld_examples.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# Real-World Pattern Examples
# ============================
 
# Example: E-commerce order processing
def process_orders(orders):
    """
    Business requirement: Get total revenue from completed 
    orders placed after 2024, excluding test accounts.
    """
    is_real_order = lambda o: not o.get('is_test', False)
    is_completed = lambda o: o['status'] == 'completed'
    is_after_2024 = lambda o: o['year'] >= 2024
    
    # Pipeline: filter test accounts -> filter completed -> 
    # filter year -> map to amount -> reduce to sum
    real_orders = [o for o in orders if is_real_order(o)]
    completed = [o for o in real_orders if is_completed(o)]
    recent = [o for o in completed if is_after_2024(o)]
    amounts = [o['amount'] for o in recent]
    total = sum(amounts)
    
    return total
 
 
# Example: Log analysis
def analyze_errors(log_entries):
    """
    Find error distribution by service for this week.
    """
    from collections import defaultdict
    from datetime import datetime, timedelta
    
    one_week_ago = datetime.now() - timedelta(days=7)
    
    # Filter: errors only, within time window
    errors = [
        entry for entry in log_entries 
        if entry['level'] == 'ERROR' and entry['timestamp'] > one_week_ago
    ]
    
    # Group by service, count occurrences
    error_counts = defaultdict(int)
    for entry in errors:
        error_counts[entry['service']] += 1
    
    return dict(error_counts)
 
 
# Example: User analytics
def calculate_engagement_score(user_activities):
    """
    Compute engagement score based on weighted activity.
    """
    weights = {
        'view': 1,
        'like': 2,
        'comment': 3,
        'share': 5,
        'purchase': 10
    }
    
    # Map activities to weighted scores, then sum
    scores = [weights.get(activity['type'], 0) for activity in user_activities]
    total_score = sum(scores)
    
    return total_score
 
 
# Example: Feature flag rollout
def get_feature_eligible_users(users, feature_config):
    """
    Determine which users should see a new feature.
    """
    # Filter by multiple criteria
    eligible = []
    for user in users:
        # Check region
        if feature_config.get('regions') and user['region'] not in feature_config['regions']:
            continue
        # Check user tier
        if feature_config.get('tiers') and user['tier'] not in feature_config['tiers']:
            continue
        # Check percentage rollout (by user ID hash)
        if feature_config.get('percentage'):
            if hash(user['id']) % 100 >= feature_config['percentage']:
                continue
        eligible.append(user)
    
    # Map to just user IDs
    return [u['id'] for u in eligible]
 
 
# Demonstration
if __name__ == "__main__":
    # Order processing example
    orders = [
        {'id': 1, 'amount': 100, 'status': 'completed', 'year': 2024, 'is_test': False},
        {'id': 2, 'amount': 200, 'status': 'completed', 'year': 2024, 'is_test': True},
        {'id': 3, 'amount': 150, 'status': 'pending', 'year': 2024, 'is_test': False},
        {'id': 4, 'amount': 300, 'status': 'completed', 'year': 2024, 'is_test': False},
        {'id': 5, 'amount': 50, 'status': 'completed', 'year': 2023, 'is_test': False},
    ]
    
    revenue = process_orders(orders)
    print(f"Total qualifying revenue: ${revenue}")
    # Expected: $400 (orders 1 and 4 only)

Summary: Productive Traversal Mastery

You've now mastered the art of transforming data during traversal. These patterns—collect, map, filter, and reduce—are the building blocks of data processing in every programming paradigm and domain.

Key Takeaways

•Collect gathers selected elements into new arrays based on conditions
•Map transforms every element 1:1, preserving array structure while changing values
•Filter creates subsets by including only elements matching a predicate
•Reduce aggregates an entire array into a single value using an accumulator pattern
•Composition chains these operations into powerful pipelines (filter → map → reduce)
•Efficiency matters: filter early, consider single-pass fusion for hot paths, use generators for lazy evaluation
•These patterns are universal — they appear in SQL, data pipelines, functional programming, and everyday code

What's Next:

Now that you can transform data during traversal, the final piece is building output arrays: understanding how to efficiently construct new arrays as you iterate, handle growing collections, and apply these patterns when the output structure differs from the input. The next page explores these techniques in depth.

Page Complete

You've mastered collecting, mapping, filtering, and reducing—the four fundamental patterns for productive traversal. You can now express complex data transformations as compositions of simple operations. In the next page, we'll focus on efficient techniques for building output arrays.