Common Concurrency Patterns - Learning Module

Loading content...

0/240

Pipeline Pattern

Assembly Lines for Data: The Pipeline Pattern

Henry Ford revolutionized manufacturing with the assembly line: rather than one worker building an entire car, workers specialize—one installs the engine, another the wheels, another the doors. Each station works concurrently on different cars, maximizing throughput while each car still progresses through all steps in order.

This industrial insight translates directly to concurrent programming as the pipeline pattern. Instead of processing each data item sequentially through all stages, we have multiple stages executing concurrently, each handling one step of the processing. As one stage finishes with item N, it passes the result downstream and immediately begins processing item N+1.

What You Will Learn

By the end of this page, you will understand the pipeline pattern deeply: its structure and semantics, how it achieves parallelism through staging, the critical role of buffers between stages, throughput and latency analysis, implementation patterns across languages, and where pipelines appear in real systems from compilers to video processing.

Pattern Definition and Structure

The pipeline pattern organizes concurrent processing as a sequence of stages connected by channels. Each stage performs a specific transformation on data items, passing results to the next stage.

Formal Definition:

A pipeline consists of:

Stages: Independent processing units, each running in its own thread/goroutine/task
Channels: Bounded buffers connecting stages, conveying data from one stage to the next
Data Items: Work units that flow through the pipeline from source to sink

Key Characteristics:

Converting Mermaid diagram...

Pipeline Pattern Components
Component	Responsibility	Concurrency Property
Stage	Perform one step of processing	Runs in dedicated thread/goroutine
Channel	Buffer data between stages	Thread-safe queue (producer-consumer)
Source	Generate or receive input data	May be external (I/O) or internal
Sink	Consume final output	May be external (I/O) or aggregation

The Pipeline Invariant:

Each data item passes through all stages in order, but multiple items are processed concurrently across different stages.

This is the key insight: while item ordering is preserved (item 1 exits before item 2), stages process different items simultaneously. With N stages, up to N items can be "in flight" at once.

Comparison with Sequential Processing:

Sequential Processing

•Process item 1 through all stages
•Then process item 2 through all stages
•One item at a time in the system
•Latency per item = sum of all stages
•Throughput = 1 / total_processing_time

Pipeline Processing

•Stage 1 works on item 3
•While Stage 2 works on item 2
•While Stage 3 works on item 1
•Latency per item = sum of all stages
•Throughput = 1 / slowest_stage_time

Latency vs Throughput

Pipelines improve THROUGHPUT (items per second), not LATENCY (time for one item). Each item still passes through all stages sequentially. But after the pipeline fills, we complete one item per slowest-stage-time instead of one item per total-time. For large workloads, this is a massive improvement.

Throughput and Latency Analysis

Understanding pipeline performance requires careful analysis of throughput, latency, and the impact of stage imbalance.

Key Performance Metrics:

Latency (time for one item to traverse the entire pipeline):

Latency = Σ(stage_time[i]) + Σ(queue_wait_time[i])

For a pipeline with stages taking t₁, t₂, t₃ time:

Best-case latency = t₁ + t₂ + t₃ (no queue waiting)
Latency increases if stages block waiting on full/empty queues

Throughput (items processed per unit time):

Throughput = 1 / max(stage_time[i])  // Bottleneck determines throughput

The slowest stage (bottleneck) determines overall throughput. Faster stages must wait.

pipeline_analysis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Example: 3-stage pipeline
 
# Stage processing times (seconds per item)
stage_1_time = 0.1  # Parse: 100ms
stage_2_time = 0.3  # Transform: 300ms (BOTTLENECK)
stage_3_time = 0.1  # Validate: 100ms
 
# Sequential processing
sequential_time_per_item = stage_1_time + stage_2_time + stage_3_time
# = 0.5 seconds per item
# For 1000 items: 500 seconds total
 
# Pipeline processing
pipeline_latency = stage_1_time + stage_2_time + stage_3_time
# = 0.5 seconds (same as sequential for one item)
 
pipeline_throughput = 1 / max(stage_1_time, stage_2_time, stage_3_time)
# = 1 / 0.3 = 3.33 items per second
# For 1000 items: ~300 seconds + 0.5 seconds startup
 
# Speedup = sequential_time / pipeline_time
# = 500 / 300.5 ≈ 1.66x improvement
 
# With balanced stages (all 0.167s):
balanced_throughput = 1 / 0.167
# = 6 items per second
# For 1000 items: ~167 seconds
# Speedup = 500 / 167 ≈ 3x improvement (equals number of stages!)

The Bottleneck Problem:

The slowest stage determines throughput. Faster stages spend time idle, waiting for the bottleneck to accept output or provide input.

Example with imbalanced stages:

Stage Utilization with Bottleneck
Stage	Time/Item	Utilization	Idle Time/Cycle
Stage 1 (Parse)	100ms	33%	200ms waiting
Stage 2 (Transform)	300ms	100%	0ms (bottleneck)
Stage 3 (Validate)	100ms	33%	200ms waiting

Strategies to Address Bottlenecks:

Parallelize the Bottleneck Stage: Run multiple instances of the slow stage
Split the Bottleneck: Divide the slow stage into smaller substages
Offload Work: Move some processing to adjacent stages
Optimize the Stage: Improve algorithm/implementation efficiency

With 3 parallel instances of Stage 2 (300ms / 3 = 100ms effective):

All stages now take 100ms
Throughput = 10 items/second (3x improvement)

Amdahl's Law for Pipelines

Maximum speedup from pipelining = number of stages. A 4-stage pipeline can achieve at most 4x throughput vs sequential. But this only happens with perfectly balanced stages. In practice, imbalance limits gains, and the slowest stage dominates performance.

Buffer Sizing and Back-Pressure

The channels between pipeline stages are bounded buffers. Their sizing profoundly affects pipeline behavior, memory usage, and latency characteristics.

Buffer Sizing Considerations:

Impact of Buffer Size
Buffer Size	Memory	Latency	Throughput	Behavior
0 (synchronous)	Minimal	Lower (no queueing)	Limited by slowest combo	Stages synchronize on every handoff
1	Low	Low	Allows some decoupling	Minimal buffering
N (moderate)	Moderate	Moderate	Absorbs burst variations	Good balance
∞ (unbounded)	Unbounded	Potentially huge	Producers never block	Risk of memory exhaustion

Back-Pressure Mechanism:

Bounded buffers create back-pressure—when a downstream stage is slow, its input buffer fills, causing the upstream stage to block. This propagates upstream, eventually throttling the source.

Back-pressure is essential for:

Memory Safety: Prevents unbounded memory growth
Fairness: Slows producers when consumers can't keep up
Load Shedding: Naturally limits system overload

backpressure_example.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
package main
 
import "fmt"
 
func main() {
    // Buffered channels create back-pressure
    stage1_to_2 := make(chan int, 5)  // Buffer size 5
    stage2_to_3 := make(chan int, 5)
    done := make(chan bool)
    
    // Stage 1: Fast producer (10ms per item)
    go func() {
        for i := 0; i < 100; i++ {
            stage1_to_2 <- i  // Blocks if buffer full
            fmt.Printf("Stage 1: produced %d
", i)
        }
        close(stage1_to_2)
    }()
    
    // Stage 2: Slow processor (100ms per item) - BOTTLENECK
    go func() {
        for item := range stage1_to_2 {
            time.Sleep(100 * time.Millisecond)  // Slow processing
            stage2_to_3 <- item * 2
        }
        close(stage2_to_3)
    }()
    
    // Stage 3: Fast consumer (10ms per item)
    go func() {
        for item := range stage2_to_3 {
            fmt.Printf("Stage 3: consumed %d
", item)
        }
        done <- true
    }()
    
    <-done
    
    // Without back-pressure (unbounded buffers):
    //   Stage 1 would produce all 100 items immediately
    //   Memory usage spikes, items wait ages in buffer
    
    // With back-pressure (bounded buffers):
    //   Stage 1 blocks after producing ~5 items
    //   Production paces with bottleneck consumption
    //   Steady memory usage, predictable latency
}

Buffer Sizing Guidelines:

Start Small: Begin with buffer size 1 or a few items. Add capacity only if profiling shows blocking.
Match to Variance: Larger buffers help when stage processing times are variable. They absorb bursts without blocking.
Consider Memory Footprint: If items are large (images, documents), even small buffer counts consume significant memory.
Profile Under Load: Optimal buffer size depends on actual workload characteristics. Measure contention and throughput.
Prefer Bounded: Almost always use bounded buffers. Unbounded buffers hide backpressure problems that will eventually cause failures.

Unbounded Buffers Are Dangerous

Unbounded buffers are seductive—they never block producers. But they hide problems. If producers outpace consumers, memory grows without bound. Eventually, the system dies from memory exhaustion or unbounded latency. Bounded buffers force you to confront and solve rate mismatches.

Implementation in Go

Go's channels and goroutines are ideally suited for pipeline implementation. The language's concurrency primitives map directly to pipeline concepts.

Complete Pipeline Example:

pipeline_complete.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
package main
 
import (
    "fmt"
    "strings"
)
 
// Pipeline stage type: takes input channel, returns output channel
type Stage func(<-chan string) <-chan string
 
// Stage 1: Read lines (source)
func source(lines []string) <-chan string {
    out := make(chan string)
    go func() {
        for _, line := range lines {
            out <- line
        }
        close(out)  // Signal completion
    }()
    return out
}
 
// Stage 2: Trim whitespace
func trimStage(in <-chan string) <-chan string {
    out := make(chan string)
    go func() {
        for line := range in {
            out <- strings.TrimSpace(line)
        }
        close(out)
    }()
    return out
}
 
// Stage 3: Convert to uppercase
func upperStage(in <-chan string) <-chan string {
    out := make(chan string)
    go func() {
        for line := range in {
            out <- strings.ToUpper(line)
        }
        close(out)
    }()
    return out
}
 
// Stage 4: Filter empty lines
func filterEmptyStage(in <-chan string) <-chan string {
    out := make(chan string)
    go func() {
        for line := range in {
            if len(line) > 0 {
                out <- line
            }
        }
        close(out)
    }()
    return out
}
 
// Stage 5: Add line numbers
func numberStage(in <-chan string) <-chan string {
    out := make(chan string)
    go func() {
        n := 1
        for line := range in {
            out <- fmt.Sprintf("%d: %s", n, line)
            n++
        }
        close(out)
    }()
    return out
}
 
// Compose pipeline from stages
func pipeline(input []string, stages ...Stage) <-chan string {
    ch := source(input)
    for _, stage := range stages {
        ch = stage(ch)
    }
    return ch
}
 
func main() {
    input := []string{
        "  hello world  ",
        "",
        "  foo bar  ",
        "   ",
        "baz qux",
    }
    
    // Build pipeline: source -> trim -> upper -> filter -> number
    results := pipeline(input,
        trimStage,
        upperStage,
        filterEmptyStage,
        numberStage,
    )
    
    // Consume results
    for result := range results {
        fmt.Println(result)
    }
    // Output:
    // 1: HELLO WORLD
    // 2: FOO BAR
    // 3: BAZ QUX
}

Key Go Pipeline Idioms:

Channel Closure Propagates: When a stage closes its output channel, downstream stages' range loops terminate, cascading shutdown through the pipeline.
Each Stage Owns Its Output Channel: The stage that creates a channel is responsible for closing it.
Composable Design: Stages have uniform signature (channel in → channel out), enabling flexible composition.
No Explicit Thread Management: Goroutines are lightweight; spawn one per stage without concern.

Fan-Out and Fan-In

A common extension is fan-out (one stage feeding multiple parallel workers) and fan-in (multiple sources merging into one channel). This enables parallel processing of a bottleneck stage while maintaining the pipeline structure. Go's select statement makes fan-in straightforward.

fan_out_fan_in.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Fan-out: split work across multiple workers
func fanOut(in <-chan int, numWorkers int) []<-chan int {
    workers := make([]<-chan int, numWorkers)
    for i := 0; i < numWorkers; i++ {
        workers[i] = worker(in)  // Each worker reads from same input
    }
    return workers
}
 
// Worker processes items
func worker(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        for item := range in {
            // Heavy processing...
            out <- process(item)
        }
        close(out)
    }()
    return out
}
 
// Fan-in: merge multiple channels into one
func fanIn(channels ...<-chan int) <-chan int {
    out := make(chan int)
    var wg sync.WaitGroup
    
    for _, ch := range channels {
        wg.Add(1)
        go func(c <-chan int) {
            defer wg.Done()
            for item := range c {
                out <- item
            }
        }(ch)
    }
    
    go func() {
        wg.Wait()
        close(out)
    }()
    
    return out
}
 
// Usage: parallelize slow stage
func parallelPipeline(in <-chan int) <-chan int {
    workers := fanOut(in, 4)      // 4 parallel workers
    merged := fanIn(workers...)    // Merge results
    return merged
}

Implementation Across Languages

While Go's channels are particularly well-suited for pipelines, the pattern is implementable in any language with concurrency support.

pipeline_python.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import queue
import threading
from typing import Callable, Iterator, Any
 
class PipelineStage:
    """A pipeline stage that processes items from input queue to output queue."""
    
    def __init__(self, func: Callable, input_q: queue.Queue, output_q: queue.Queue):
        self.func = func
        self.input_q = input_q
        self.output_q = output_q
        self.thread = threading.Thread(target=self._run)
    
    def _run(self):
        while True:
            item = self.input_q.get()
            if item is None:  # Sentinel for shutdown
                self.output_q.put(None)
                break
            result = self.func(item)
            if result is not None:
                self.output_q.put(result)
        
    def start(self):
        self.thread.start()
    
    def join(self):
        self.thread.join()
 
def create_pipeline(source: Iterator, *stages: Callable, buffer_size: int = 10):
    """Create a pipeline from a source and series of stage functions."""
    
    queues = [queue.Queue(maxsize=buffer_size) for _ in range(len(stages) + 1)]
    stage_objects = []
    
    # Create stages
    for i, func in enumerate(stages):
        stage = PipelineStage(func, queues[i], queues[i + 1])
        stage_objects.append(stage)
        stage.start()
    
    # Feed source
    def feed_source():
        for item in source:
            queues[0].put(item)
        queues[0].put(None)  # Sentinel
    
    feed_thread = threading.Thread(target=feed_source)
    feed_thread.start()
    
    # Return output iterator
    def output_iterator():
        while True:
            item = queues[-1].get()
            if item is None:
                break
            yield item
        for stage in stage_objects:
            stage.join()
        feed_thread.join()
    
    return output_iterator()
 
# Example usage
def trim(s): return s.strip()
def upper(s): return s.upper()
def filter_empty(s): return s if len(s) > 0 else None
 
lines = ["  hello  ", "", "  world  ", "   ", "foo"]
results = create_pipeline(iter(lines), trim, upper, filter_empty)
for result in results:
    print(result)  # HELLO, WORLD, FOO

Error Handling and Cancellation

Production pipelines must handle errors gracefully and support cancellation. Naively ignoring these concerns leads to resource leaks and hung goroutines.

Error Handling Strategies:

Error Handling Approaches

•Error Channel: Separate channel for errors, collected by error handler goroutine
•Result Type: Send (value, error) tuples through pipeline; each stage handles or propagates
•Skip and Log: Log errors, skip bad items, continue processing
•Fail Fast: First error cancels entire pipeline via context

pipeline_with_errors.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
package main
 
import (
    "context"
    "fmt"
    "strconv"
)
 
// Result type for error-aware pipeline
type Result struct {
    Value int
    Err   error
}
 
// Stage with error handling and cancellation
func parseNumbers(ctx context.Context, in <-chan string) <-chan Result {
    out := make(chan Result)
    go func() {
        defer close(out)
        for {
            select {
            case <-ctx.Done():
                return  // Pipeline cancelled
            case s, ok := <-in:
                if !ok {
                    return  // Input exhausted
                }
                n, err := strconv.Atoi(s)
                select {
                case out <- Result{Value: n, Err: err}:
                case <-ctx.Done():
                    return
                }
            }
        }
    }()
    return out
}
 
func doubleNumbers(ctx context.Context, in <-chan Result) <-chan Result {
    out := make(chan Result)
    go func() {
        defer close(out)
        for {
            select {
            case <-ctx.Done():
                return
            case r, ok := <-in:
                if !ok {
                    return
                }
                if r.Err != nil {
                    // Propagate error
                    out <- r
                    continue
                }
                // Process valid value
                out <- Result{Value: r.Value * 2, Err: nil}
            }
        }
    }()
    return out
}
 
func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()
    
    // Source
    source := make(chan string)
    go func() {
        defer close(source)
        for _, s := range []string{"1", "2", "three", "4", "5"} {
            select {
            case source <- s:
            case <-ctx.Done():
                return
            }
        }
    }()
    
    // Pipeline
    parsed := parseNumbers(ctx, source)
    doubled := doubleNumbers(ctx, parsed)
    
    // Consume
    errorCount := 0
    for r := range doubled {
        if r.Err != nil {
            fmt.Printf("Error: %v
", r.Err)
            errorCount++
            if errorCount > 2 {
                cancel()  // Too many errors, cancel pipeline
            }
        } else {
            fmt.Printf("Result: %d
", r.Value)
        }
    }
}

Context for Cancellation

Go's context package is the idiomatic way to propagate cancellation through pipelines. Every stage checks ctx.Done() in its select. When context is cancelled, all stages return promptly, closing their output channels and allowing goroutines to be garbage collected.

Real-World Applications

The pipeline pattern appears throughout computing systems. Recognizing these applications helps you apply the pattern appropriately.

Compiler and Language Processing

•Compilation Pipeline: Lexer → Parser → Semantic Analyzer → Optimizer → Code Generator. Each phase transforms representation; phases can work on different functions concurrently.
•Shell Pipelines: cat file | grep pattern | sort | uniq. Each command is a stage; pipes are channels. The original pipeline pattern implementation!
•ETL Processing: Extract → Transform → Load. Database pipelines for data warehousing follow this exact pattern.

Media Processing

•Video Encoding: Decode → Resize → Filter → Encode. Each frame passes through stages; multiple frames in flight simultaneously.
•Audio Processing: Input → Effect1 → Effect2 → Mixer → Output. Real-time audio pipelines with strict latency requirements.
•Image Processing: Read → Decode → Transform → Encode → Write. Batch image processing with parallel stage execution.

Network and System Applications

•HTTP Request Processing: Accept → Parse Headers → Authenticate → Route → Handle → Serialize → Send. Web servers process requests through pipelines.
•Log Processing: Collect → Parse → Enrich → Filter → Aggregate → Store. Systems like Logstash implement configurable pipelines.
•Network Packet Processing: Receive → Checksum → Decrypt → Parse → Route → Forward. High-performance networking uses pipeline stages.

CPU Instruction Pipeline

The CPU itself uses pipelining! Fetch → Decode → Execute → Memory → Writeback. Multiple instructions are in-flight at different stages, dramatically increasing throughput. Pipeline stalls (hazards) occur when dependencies force waiting—exactly like software pipeline bottlenecks.

Summary: The Pipeline Pattern

The pipeline pattern is a powerful approach to structuring concurrent computation. It improves throughput by processing multiple items simultaneously across stages while maintaining orderly, comprehensible code. Let's consolidate the key insights:

Key Takeaways

•Stages and Channels: Pipelines consist of independent stages connected by channels. Each stage transforms data and passes results downstream.
•Throughput vs Latency: Pipelines improve throughput (items/second) not latency (time per item). Maximum speedup equals number of stages with perfect balance.
•Bottleneck Dominates: The slowest stage determines overall throughput. Address bottlenecks by parallelizing, splitting, or optimizing that stage.
•Buffer Sizing Matters: Bounded buffers create back-pressure, preventing memory exhaustion. Size buffers to absorb variance without unbounded growth.
•Error Handling Required: Production pipelines need error channels, result types, and cancellation support via contexts.
•Ubiquitous Pattern: From shell pipes to CPU instruction processing, pipelines appear throughout computing. Recognizing the pattern enables appropriate application.

What's Next:

Having explored sequential processing patterns (producer-consumer, reader-writer, pipeline), we'll next examine the map-reduce pattern—a powerful approach for parallel data processing that partitions work, processes in parallel, and aggregates results. This pattern underlies distributed computing frameworks from Hadoop to modern stream processing systems.

Page Complete

You now deeply understand the pipeline pattern—from its assembly-line inspiration through performance analysis to production implementation. Pipelines are essential for high-throughput data processing, appearing everywhere from shell commands to video encoding. Next, we'll explore map-reduce for parallel aggregation.