Database Management SystemsSort-Merge Join

Sort-Merge Join: Algorithm, Implementation, and Optimization

LevelIntermediate

Duration75 mins

TopicSort-Merge Join

4 / 5

Blocking Nature

When Operators Must Wait

In query execution, operators can be classified by how they consume input and produce output. Some operators can begin producing results immediately upon receiving the first input tuple—these are pipeline-able or non-blocking operators. Others must consume their entire input before producing any output—these are blocking operators.

Sort-merge join occupies an interesting middle ground: its merge phase is non-blocking, but its sort phase is blocking. This distinction has profound implications for query execution strategy, memory usage, response time, and parallelization. Understanding the blocking nature of sort-merge join is essential for designing efficient query plans.

What You Will Learn

By the end of this page, you will understand what makes an operator blocking, why sorting is inherently blocking, how this affects query pipelines and response time, strategies for mitigating blocking behavior, and how modern systems handle blocking operators through pipelining and materialization decisions.

Understanding Blocking Operators

To understand why sorting is blocking, let's first establish a clear taxonomy of operator types based on their input/output behavior:

Non-Blocking (Pipelineable) Operators:

These operators can produce output tuples as they receive input tuples, without waiting for the complete input. They maintain local state but don't require the full dataset.

Examples of Non-Blocking Operators
Operator	Behavior	Memory Requirement
Selection (σ)	Process each tuple immediately, output if condition met	O(1) - no state
Projection (π)	Extract columns from each tuple, output immediately	O(1) - no state
Merge Join (phase 2)	Compare current tuples, advance cursors, output matches	O(partition size) worst case
Limit	Pass through tuples until limit reached, then stop	O(1) - just counter

Blocking Operators:

These operators must consume their entire input before producing any output. They need to see all data to determine the correct output.

Examples of Blocking Operators
Operator	Why Blocking	Memory Requirement
Sort	Last input tuple might be first output tuple	O(n) or external
Aggregate (GROUP BY)	Final aggregate unknown until all groups processed	O(groups)
Hash Join (build)	Build hash table from entire inner relation	O(inner relation)
Distinct (hash-based)	Must see all tuples to know which are unique	O(distinct values)
Union All	Non-blocking unless implementing UNION (distinct)	O(1) for UNION ALL

Semi-Blocking Operators

Some operators are partially blocking. Hash join is blocking on the build side (must complete hash table before probing) but non-blocking on the probe side. Sort-merge join is blocking for the sort phases but non-blocking for the merge phase.

The Pipeline Breaker Concept:

In query execution, operators form a pipeline—data flows from one operator to the next. A blocking operator acts as a pipeline breaker. Data cannot flow past a blocking operator until it has consumed all input. This creates a materialization point where all intermediate data must exist simultaneously.

Why Sorting is Inherently Blocking

Sorting is the canonical blocking operator. To understand why, consider the fundamental problem: we cannot know which tuple should be output first until we've seen all input tuples.

Proof by Example:

Suppose we're sorting by key and have seen these tuples: (10), (5), (20), (3)

Can we output (3)? No! The next input might be (1), which should come first.
Can we output anything? Only if we know input is exhausted.

The last input tuple might belong at the beginning of the sorted output. Therefore, we must wait for all input before producing any output.

Formal Argument:

For a comparison-based sort on n elements:

The correct output position of element e depends on its comparison with all other elements
Until all elements are known, we cannot determine any element's final position with certainty
Therefore, no output can be produced until all input is received

blocking_visualization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Illustration: Why sorting must wait for all input
 
def non_blocking_filter(input_stream, predicate):
    """
    Selection is non-blocking: output immediately when condition met.
    """
    for tuple in input_stream:
        if predicate(tuple):
            yield tuple  # Immediate output!
 
def blocking_sort(input_stream, key):
    """
    Sorting is blocking: must collect everything before outputting.
    """
    # Phase 1: Consume ALL input (blocking)
    all_tuples = []
    for tuple in input_stream:
        all_tuples.append(tuple)
    # At this point, input is exhausted but no output produced yet
    
    # Phase 2: Sort
    all_tuples.sort(key=key)
    
    # Phase 3: Finally produce output
    for tuple in all_tuples:
        yield tuple
 
# The key difference:
# - non_blocking_filter yields during input consumption
# - blocking_sort only yields AFTER all input consumed
 
# Timeline for 1 million tuples:
# 
# Non-blocking filter:
# Time 0:      Input tuple 1 → Output tuple 1 (if matches)
# Time 1:      Input tuple 2 → Output tuple 2 (if matches)
# ...
# Time 999999: Input tuple 1M → Output (if matches)
# 
# Blocking sort:
# Time 0:      Input tuple 1 → (buffered, no output)
# Time 1:      Input tuple 2 → (buffered, no output)
# ...
# Time 999999: Input tuple 1M → (buffered, no output)
# Time 1000000: Sort begins → (no output)
# Time 1000000+sort_time: Output tuple 1
# Time 1000000+sort_time+1: Output tuple 2
# ...

First Output Latency

For blocking operators, 'time to first output' equals 'time to complete'. A sort over 1 billion tuples produces zero output until all 1 billion are processed. This has significant UX implications for interactive queries.

Special Case: Top-K Queries:

Interestingly, ORDER BY ... LIMIT K can be partially non-blocking. We maintain a heap of the top K elements seen so far. While we still must consume all input, we only need O(K) memory rather than O(n), and can potentially output immediately after input ends without a full sort.

Impact on Query Execution

The blocking nature of sort-merge join's sort phase creates cascading effects throughout query execution:

Effect 1: Materialization Requirement

Blocking operators force intermediate result materialization. For a query plan:

Project → Sort → Filter → Scan

Data flows from Scan through Filter (non-blocking pipeline), but must be fully materialized at Sort before Project receives any input.

pipeline_breaker.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def execute_pipeline_with_blocking(plan):
    """
    Execution with a blocking operator (sort) in the middle.
    """
    # Phase 1: Everything below the sort runs as a pipeline
    print("Phase 1: Executing sub-plan below sort...")
    filter_to_sort_buffer = []
    
    for page in scan("table"):
        for tuple in page:
            if filter_predicate(tuple):
                filter_to_sort_buffer.append(tuple)
                # Note: We must buffer here - can't pass to sort yet
    
    print(f"Buffered {len(filter_to_sort_buffer)} tuples for sort")
    # Memory usage peaks here!
    
    # Phase 2: Sort (blocking)
    print("Phase 2: Sorting...")
    sorted_data = sort(filter_to_sort_buffer, key=sort_key)
    # Original buffer can now be freed
    
    # Phase 3: Everything above sort runs as a pipeline
    print("Phase 3: Executing sub-plan above sort...")
    for tuple in sorted_data:
        projected = project(tuple)
        yield projected  # Streaming output
 
 
# For sort-merge join specifically:
def execute_sort_merge_join(R, S, join_key, buffer_pages):
    """
    Sort-merge join execution showing blocking points.
    """
    # BLOCKING: Sort R
    print("Sorting R (blocking)...")
    sorted_R = external_sort(R, join_key, buffer_pages // 2)
    # All of R must be processed before continuing
    
    # BLOCKING: Sort S  
    print("Sorting S (blocking)...")
    sorted_S = external_sort(S, join_key, buffer_pages // 2)
    # All of S must be processed before merge
    
    # NON-BLOCKING: Merge phase
    print("Merging (streaming)...")
    for match in merge_join(sorted_R, sorted_S, join_key):
        yield match  # Results stream out immediately

Effect 2: Memory Pressure

Blocking operators create memory pressure because their input must be materialized (in memory or on disk). For large datasets, this means:

During sort: All input data exists in runs (memory or temp files)
Peak memory usage occurs at the blocking boundary
Concurrent queries compete for buffer pool space

Effect 3: Response Time vs Throughput Trade-off

Blocking affects response time differently than throughput:

First Row Time

•Time until first result is available
•Critical for interactive queries
•Sort-merge: High (must complete both sorts)
•Hash join: Medium (must complete build phase)
•Nested loop: Low (can produce after first match)

Complete Time

•Time until all results produced
•Critical for batch processing
•Sort-merge: Competitive for large data
•Hash join: Often fastest for unsorted
•Nested loop: Can be very slow for large joins

The LIMIT Optimization

For queries with LIMIT, blocking operators prevent early termination. A SELECT * FROM R JOIN S LIMIT 10 with sort-merge join must complete both sorts even though only 10 results are needed. Optimizers may prefer nested loop join for small LIMIT values.

Pipeline Breakers in Query Plans

Query optimizers think in terms of pipeline stages separated by pipeline breakers. Understanding how sort-merge join creates breaking points helps in plan analysis:

Single Sort-Merge Join:

          Output (streaming)
             ↑
        Merge (non-blocking)
         ↑       ↑
    Sort R     Sort S
    (BREAK)    (BREAK)
       ↑          ↑
    Scan R    Scan S

This plan has 3 pipeline stages:

Scan R → Sort R (blocked until R sorted)
Scan S → Sort S (can run concurrently with stage 1)
Merge → Output (blocked until both sorts complete)

pipeline_analysis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
from dataclasses import dataclass
from typing import List, Set
from enum import Enum
 
class OperatorType(Enum):
    SCAN = "scan"
    FILTER = "filter"
    PROJECT = "project"
    SORT = "sort"
    MERGE_JOIN = "merge_join"
    HASH_JOIN_BUILD = "hash_build"
    HASH_JOIN_PROBE = "hash_probe"
    AGGREGATE = "aggregate"
 
BLOCKING_OPERATORS = {
    OperatorType.SORT,
    OperatorType.HASH_JOIN_BUILD,
    OperatorType.AGGREGATE,
}
 
@dataclass
class Operator:
    type: OperatorType
    children: List['Operator']
    estimated_rows: int
    
    @property
    def is_blocking(self) -> bool:
        return self.type in BLOCKING_OPERATORS
 
def identify_pipeline_stages(root: Operator) -> List[Set[Operator]]:
    """
    Identify pipeline stages in a query plan.
    
    Returns list of stages, each stage is a set of operators
    that can execute together without blocking.
    """
    stages = []
    current_stage = set()
    
    def traverse(op: Operator):
        if op.is_blocking:
            # Blocking operator ends current stage
            if current_stage:
                stages.append(current_stage.copy())
                current_stage.clear()
            
            # Process children first (they run before this operator)
            for child in op.children:
                traverse(child)
            
            # This operator is its own "stage" (the blocking point)
            stages.append({op})
        else:
            # Non-blocking operator joins current stage
            current_stage.add(op)
            for child in op.children:
                traverse(child)
    
    traverse(root)
    if current_stage:
        stages.append(current_stage)
    
    return stages
 
 
def analyze_sort_merge_plan():
    """
    Analyze a sort-merge join plan for pipeline stages.
    """
    # Build plan: Scan R → Sort R → Merge ← Sort S ← Scan S
    scan_r = Operator(OperatorType.SCAN, [], 1000000)
    scan_s = Operator(OperatorType.SCAN, [], 5000000)
    sort_r = Operator(OperatorType.SORT, [scan_r], 1000000)
    sort_s = Operator(OperatorType.SORT, [scan_s], 5000000)
    merge = Operator(OperatorType.MERGE_JOIN, [sort_r, sort_s], 500000)
    
    stages = identify_pipeline_stages(merge)
    
    print("Pipeline stages for Sort-Merge Join:")
    for i, stage in enumerate(stages):
        ops = [op.type.value for op in stage]
        print(f"  Stage {i+1}: {ops}")
    
    # Output:
    # Stage 1: ['scan'] (Scan R, feeds into Sort R)
    # Stage 2: ['sort'] (Sort R - blocking)
    # Stage 3: ['scan'] (Scan S, feeds into Sort S)
    # Stage 4: ['sort'] (Sort S - blocking)
    # Stage 5: ['merge_join'] (Merge - consumes both sorted inputs)
 
 
analyze_sort_merge_plan()

Multi-Way Joins and Blocking:

Blocking effects compound in complex queries. Consider joining three tables:

              Output
                ↑
           Merge(R⋈S, T)
            ↑           ↑
       Sort(R⋈S)     Sort T    ← Two more sorts!
       (BREAK)       (BREAK)
           ↑            ↑
      Merge(R,S)     Scan T
       ↑       ↑
    Sort R  Sort S
   (BREAK) (BREAK)
      ↑       ↑
   Scan R  Scan S

This plan has 4 blocking sorts! Each must complete before its consumer can proceed. The alternative (hash join) might have fewer blocking points.

Interesting Sort

The term 'interesting order' refers to sort orders that benefit downstream operators. If the result of Merge(R,S) is already sorted on the key needed for joining with T, we might skip Sort(R⋈S)—eliminating a pipeline breaker and potentially halving the cost of the second join.

Mitigation Strategies

While we can't eliminate the inherent blocking nature of sorting, several strategies can mitigate its impact:

Strategy 1: Leverage Existing Sort Orders

The most effective mitigation is avoiding sorting entirely. Pre-sorted data sources eliminate blocking:

Sources of Pre-Sorted Data

•Clustered Index Scan: Data is physically sorted by index key. Scan produces sorted stream naturally.
•Index-Only Scan: If all needed columns are in index, index entries are already sorted.
•Previous Operation Output: If an earlier sort/merge in the plan produced data in the right order, reuse it.
•Materialized View: Pre-sorted materialized views can provide sorted input directly.
•Partition Pruning: Within a range partition, data may be sorted on partition key.

Strategy 2: Parallel Execution

While sorting is blocking, we can parallelize within and across sorts:

parallel_sort.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import concurrent.futures
from typing import List, Iterator
 
def parallel_sort_merge_join(R_source, S_source, join_key, 
                              num_workers: int = 4):
    """
    Parallel execution of sort-merge join.
    
    Both sorts run concurrently, reducing wall-clock time.
    """
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
        # Submit both sorts concurrently
        future_R = executor.submit(external_sort, R_source, join_key)
        future_S = executor.submit(external_sort, S_source, join_key)
        
        # Wait for both to complete (still blocking overall)
        sorted_R = future_R.result()
        sorted_S = future_S.result()
    
    # Merge phase (still sequential, but could parallelize for ranges)
    yield from merge_join(sorted_R, sorted_S, join_key)
 
 
def parallel_external_sort(source, key, num_workers: int = 4):
    """
    Parallel external merge sort.
    
    - Phase 1: Partition data, sort each partition in parallel
    - Phase 2: Merge sorted partitions
    """
    # Partition input for parallel sorting
    partitions = partition_data(source, num_workers)
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
        # Sort all partitions concurrently
        futures = [
            executor.submit(in_memory_sort, partition, key)
            for partition in partitions
        ]
        sorted_partitions = [f.result() for f in futures]
    
    # K-way merge of sorted partitions
    return k_way_merge(sorted_partitions)
 
 
def range_partitioned_merge(sorted_R, sorted_S, join_key, ranges):
    """
    Parallel merge using range partitioning.
    
    If data is range-partitioned on join key, each range can be
    merged independently in parallel.
    """
    results = []
    
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = []
        for (low, high) in ranges:
            # Extract tuples in this key range from both relations
            R_range = filter_range(sorted_R, join_key, low, high)
            S_range = filter_range(sorted_S, join_key, low, high)
            
            # Merge this range on a separate thread
            future = executor.submit(
                list, merge_join(R_range, S_range, join_key)
            )
            futures.append(future)
        
        # Collect results (already in correct order due to range design)
        for future in futures:
            results.extend(future.result())
    
    return results

Strategy 3: Graceful Degradation

For interactive queries, consider plan alternatives that trade total execution time for better first-row latency:

Join Strategy Selection for Latency Optimization
Query Pattern	Best for First Row	Best for Throughput	Trade-off
Small LIMIT	Nested Loop with Index	Hash Join	Scan more pages but get results immediately
Large result set	Hash Join	Sort-Merge (if sorted needed)	Build phase blocks but faster after
Pre-sorted, no LIMIT	Sort-Merge	Sort-Merge	No trade-off needed
Unsorted, need sorted output	Hash + Sort after	Sort-Merge	Hash avoids double-sorting

Adaptive Execution

Modern databases use adaptive query execution. If an initial sort is taking too long, the engine might switch strategies mid-execution. For example, PostgreSQL can switch between nested loop and hash join based on actual row counts encountered during execution.

Comparison: Sort-Merge vs Hash Join Blocking

Hash join is also partially blocking, but with different characteristics. Understanding the difference helps optimizers choose:

Hash Join Blocking Profile:

Build Phase (Blocking): Scan inner relation and build hash table. No output until complete.
Probe Phase (Non-blocking): Scan outer relation, probe hash table, output matches immediately.

Comparison:

Blocking Behavior Comparison
Aspect	Sort-Merge Join	Hash Join
Blocking Phases	Sort R, Sort S (both blocking)	Build only (inner relation)
Non-blocking Phases	Merge (after sorts)	Probe (after build)
First Row Latency	After BOTH relations sorted	After inner relation built
Asymmetry	Symmetric (both sides sorted)	Asymmetric (inner builds, outer probes)
Memory During Block	External sort temp files	Hash table in memory
Can Start Probe Early	No - both must be sorted	Yes - as soon as build completes

blocking_comparison.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def analyze_blocking_timeline():
    """
    Compare blocking timeline for sort-merge vs hash join.
    
    Scenario: Join R (1000 pages) with S (5000 pages)
    """
    # Time units (arbitrary, proportional to work)
    R_scan_time = 1000
    R_sort_time = 2000  # External sort with merge passes
    S_scan_time = 5000
    S_sort_time = 10000
    merge_time = 6000   # Merge both sorted relations
    
    build_time = 1000   # Build hash table from R
    probe_time = 5000   # Probe hash table with S
    
    # Sort-Merge Timeline:
    # 
    # Time 0-3000: Sort R (scan + sort) - No output
    # Time 0-15000: Sort S (scan + sort) - No output (parallel possible)
    # Time 15000: Both sorts complete
    # Time 15000-21000: Merge - Output streaming
    # 
    # First row at: 15000
    # Last row at: 21000
    
    sort_merge_first_row = max(R_scan_time + R_sort_time, 
                                S_scan_time + S_sort_time)
    sort_merge_complete = sort_merge_first_row + merge_time
    
    print("Sort-Merge Join:")
    print(f"  First row at: {sort_merge_first_row}")
    print(f"  Complete at: {sort_merge_complete}")
    
    # Hash Join Timeline:
    # 
    # Time 0-1000: Build from R (scan + hash) - No output
    # Time 1000-6000: Probe with S - Output streaming
    # 
    # First row at: 1000 (as soon as build done)
    # Last row at: 6000
    
    hash_join_first_row = R_scan_time + build_time
    hash_join_complete = hash_join_first_row + probe_time
    
    print("
Hash Join:")
    print(f"  First row at: {hash_join_first_row}")  
    print(f"  Complete at: {hash_join_complete}")
    
    # Analysis
    print(f"
First Row: Hash join is {sort_merge_first_row - hash_join_first_row}x faster")
    print(f"Complete: Hash join is {(sort_merge_complete - hash_join_complete) / hash_join_complete:.1%} faster")
 
 
analyze_blocking_timeline()

The Asymmetry Advantage

Hash join's asymmetry (only build blocks) means first rows arrive much earlier—as soon as the smaller relation is processed. This is why hash join is often preferred for OLTP queries where fast response matters, while sort-merge shines for batch analytics where total throughput matters.

Summary: Understanding Blocking Behavior

The blocking nature of sort-merge join's sort phase is a fundamental characteristic that affects query planning and execution strategy. Let's consolidate the key insights:

Key Takeaways

•Blocking Defined: Blocking operators must consume all input before producing any output. Sorting is inherently blocking because output order depends on all input values.
•Pipeline Breakers: Blocking operators create materialization points in query plans. Data must be fully materialized (memory or disk) at these points.
•Sort-Merge Profile: Both sort phases are blocking; the merge phase is non-blocking. First output arrives only after both relations are fully sorted.
•Impact on Latency: Blocking increases first-row latency dramatically compared to fully pipelined execution. Critical for interactive queries with LIMIT.
•Mitigation Strategies: Pre-sorted data (indexes), parallel execution, and adaptive plan selection can reduce blocking impact.
•Hash Join Comparison: Hash join blocks only on build side, making it better for first-row latency with unsorted inputs.

Looking Ahead:

With the blocking characteristics understood, we'll conclude with a comprehensive examination of sort-merge join's advantages—exploring the scenarios where its unique properties make it the optimal choice despite the blocking overhead.

Page Complete

You now understand the blocking nature of sort-merge join in complete detail—from theoretical foundations to practical implications. This knowledge is essential for query plan analysis, performance optimization, and understanding optimizer decisions.

4 / 5

Loading learning content...

Database Management SystemsSort-Merge Join

Sort-Merge Join: Algorithm, Implementation, and Optimization

LevelIntermediate

Duration75 mins

TopicSort-Merge Join

4 / 5

Blocking Nature

When Operators Must Wait

What You Will Learn

Understanding Blocking Operators

To understand why sorting is blocking, let's first establish a clear taxonomy of operator types based on their input/output behavior:

Non-Blocking (Pipelineable) Operators:

These operators can produce output tuples as they receive input tuples, without waiting for the complete input. They maintain local state but don't require the full dataset.

Examples of Non-Blocking Operators
Operator	Behavior	Memory Requirement
Selection (σ)	Process each tuple immediately, output if condition met	O(1) - no state
Projection (π)	Extract columns from each tuple, output immediately	O(1) - no state
Merge Join (phase 2)	Compare current tuples, advance cursors, output matches	O(partition size) worst case
Limit	Pass through tuples until limit reached, then stop	O(1) - just counter

Blocking Operators:

These operators must consume their entire input before producing any output. They need to see all data to determine the correct output.

Examples of Blocking Operators
Operator	Why Blocking	Memory Requirement
Sort	Last input tuple might be first output tuple	O(n) or external
Aggregate (GROUP BY)	Final aggregate unknown until all groups processed	O(groups)
Hash Join (build)	Build hash table from entire inner relation	O(inner relation)
Distinct (hash-based)	Must see all tuples to know which are unique	O(distinct values)
Union All	Non-blocking unless implementing UNION (distinct)	O(1) for UNION ALL

Semi-Blocking Operators

The Pipeline Breaker Concept:

Why Sorting is Inherently Blocking

Sorting is the canonical blocking operator. To understand why, consider the fundamental problem: we cannot know which tuple should be output first until we've seen all input tuples.

Proof by Example:

Suppose we're sorting by key and have seen these tuples: (10), (5), (20), (3)

Can we output (3)? No! The next input might be (1), which should come first.
Can we output anything? Only if we know input is exhausted.

The last input tuple might belong at the beginning of the sorted output. Therefore, we must wait for all input before producing any output.

Formal Argument:

For a comparison-based sort on n elements:

The correct output position of element e depends on its comparison with all other elements
Until all elements are known, we cannot determine any element's final position with certainty
Therefore, no output can be produced until all input is received

blocking_visualization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Illustration: Why sorting must wait for all input
 
def non_blocking_filter(input_stream, predicate):
    """
    Selection is non-blocking: output immediately when condition met.
    """
    for tuple in input_stream:
        if predicate(tuple):
            yield tuple  # Immediate output!
 
def blocking_sort(input_stream, key):
    """
    Sorting is blocking: must collect everything before outputting.
    """
    # Phase 1: Consume ALL input (blocking)
    all_tuples = []
    for tuple in input_stream:
        all_tuples.append(tuple)
    # At this point, input is exhausted but no output produced yet
    
    # Phase 2: Sort
    all_tuples.sort(key=key)
    
    # Phase 3: Finally produce output
    for tuple in all_tuples:
        yield tuple
 
# The key difference:
# - non_blocking_filter yields during input consumption
# - blocking_sort only yields AFTER all input consumed
 
# Timeline for 1 million tuples:
# 
# Non-blocking filter:
# Time 0:      Input tuple 1 → Output tuple 1 (if matches)
# Time 1:      Input tuple 2 → Output tuple 2 (if matches)
# ...
# Time 999999: Input tuple 1M → Output (if matches)
# 
# Blocking sort:
# Time 0:      Input tuple 1 → (buffered, no output)
# Time 1:      Input tuple 2 → (buffered, no output)
# ...
# Time 999999: Input tuple 1M → (buffered, no output)
# Time 1000000: Sort begins → (no output)
# Time 1000000+sort_time: Output tuple 1
# Time 1000000+sort_time+1: Output tuple 2
# ...

First Output Latency

Special Case: Top-K Queries:

Impact on Query Execution

The blocking nature of sort-merge join's sort phase creates cascading effects throughout query execution:

Effect 1: Materialization Requirement

Blocking operators force intermediate result materialization. For a query plan:

Project → Sort → Filter → Scan

Data flows from Scan through Filter (non-blocking pipeline), but must be fully materialized at Sort before Project receives any input.

pipeline_breaker.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def execute_pipeline_with_blocking(plan):
    """
    Execution with a blocking operator (sort) in the middle.
    """
    # Phase 1: Everything below the sort runs as a pipeline
    print("Phase 1: Executing sub-plan below sort...")
    filter_to_sort_buffer = []
    
    for page in scan("table"):
        for tuple in page:
            if filter_predicate(tuple):
                filter_to_sort_buffer.append(tuple)
                # Note: We must buffer here - can't pass to sort yet
    
    print(f"Buffered {len(filter_to_sort_buffer)} tuples for sort")
    # Memory usage peaks here!
    
    # Phase 2: Sort (blocking)
    print("Phase 2: Sorting...")
    sorted_data = sort(filter_to_sort_buffer, key=sort_key)
    # Original buffer can now be freed
    
    # Phase 3: Everything above sort runs as a pipeline
    print("Phase 3: Executing sub-plan above sort...")
    for tuple in sorted_data:
        projected = project(tuple)
        yield projected  # Streaming output
 
 
# For sort-merge join specifically:
def execute_sort_merge_join(R, S, join_key, buffer_pages):
    """
    Sort-merge join execution showing blocking points.
    """
    # BLOCKING: Sort R
    print("Sorting R (blocking)...")
    sorted_R = external_sort(R, join_key, buffer_pages // 2)
    # All of R must be processed before continuing
    
    # BLOCKING: Sort S  
    print("Sorting S (blocking)...")
    sorted_S = external_sort(S, join_key, buffer_pages // 2)
    # All of S must be processed before merge
    
    # NON-BLOCKING: Merge phase
    print("Merging (streaming)...")
    for match in merge_join(sorted_R, sorted_S, join_key):
        yield match  # Results stream out immediately

Effect 2: Memory Pressure

Blocking operators create memory pressure because their input must be materialized (in memory or on disk). For large datasets, this means:

During sort: All input data exists in runs (memory or temp files)
Peak memory usage occurs at the blocking boundary
Concurrent queries compete for buffer pool space

Effect 3: Response Time vs Throughput Trade-off

Blocking affects response time differently than throughput:

First Row Time

•Time until first result is available
•Critical for interactive queries
•Sort-merge: High (must complete both sorts)
•Hash join: Medium (must complete build phase)
•Nested loop: Low (can produce after first match)

Complete Time

•Time until all results produced
•Critical for batch processing
•Sort-merge: Competitive for large data
•Hash join: Often fastest for unsorted
•Nested loop: Can be very slow for large joins

The LIMIT Optimization

Pipeline Breakers in Query Plans

Query optimizers think in terms of pipeline stages separated by pipeline breakers. Understanding how sort-merge join creates breaking points helps in plan analysis:

Single Sort-Merge Join:

          Output (streaming)
             ↑
        Merge (non-blocking)
         ↑       ↑
    Sort R     Sort S
    (BREAK)    (BREAK)
       ↑          ↑
    Scan R    Scan S

This plan has 3 pipeline stages:

Scan R → Sort R (blocked until R sorted)
Scan S → Sort S (can run concurrently with stage 1)
Merge → Output (blocked until both sorts complete)

pipeline_analysis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
from dataclasses import dataclass
from typing import List, Set
from enum import Enum
 
class OperatorType(Enum):
    SCAN = "scan"
    FILTER = "filter"
    PROJECT = "project"
    SORT = "sort"
    MERGE_JOIN = "merge_join"
    HASH_JOIN_BUILD = "hash_build"
    HASH_JOIN_PROBE = "hash_probe"
    AGGREGATE = "aggregate"
 
BLOCKING_OPERATORS = {
    OperatorType.SORT,
    OperatorType.HASH_JOIN_BUILD,
    OperatorType.AGGREGATE,
}
 
@dataclass
class Operator:
    type: OperatorType
    children: List['Operator']
    estimated_rows: int
    
    @property
    def is_blocking(self) -> bool:
        return self.type in BLOCKING_OPERATORS
 
def identify_pipeline_stages(root: Operator) -> List[Set[Operator]]:
    """
    Identify pipeline stages in a query plan.
    
    Returns list of stages, each stage is a set of operators
    that can execute together without blocking.
    """
    stages = []
    current_stage = set()
    
    def traverse(op: Operator):
        if op.is_blocking:
            # Blocking operator ends current stage
            if current_stage:
                stages.append(current_stage.copy())
                current_stage.clear()
            
            # Process children first (they run before this operator)
            for child in op.children:
                traverse(child)
            
            # This operator is its own "stage" (the blocking point)
            stages.append({op})
        else:
            # Non-blocking operator joins current stage
            current_stage.add(op)
            for child in op.children:
                traverse(child)
    
    traverse(root)
    if current_stage:
        stages.append(current_stage)
    
    return stages
 
 
def analyze_sort_merge_plan():
    """
    Analyze a sort-merge join plan for pipeline stages.
    """
    # Build plan: Scan R → Sort R → Merge ← Sort S ← Scan S
    scan_r = Operator(OperatorType.SCAN, [], 1000000)
    scan_s = Operator(OperatorType.SCAN, [], 5000000)
    sort_r = Operator(OperatorType.SORT, [scan_r], 1000000)
    sort_s = Operator(OperatorType.SORT, [scan_s], 5000000)
    merge = Operator(OperatorType.MERGE_JOIN, [sort_r, sort_s], 500000)
    
    stages = identify_pipeline_stages(merge)
    
    print("Pipeline stages for Sort-Merge Join:")
    for i, stage in enumerate(stages):
        ops = [op.type.value for op in stage]
        print(f"  Stage {i+1}: {ops}")
    
    # Output:
    # Stage 1: ['scan'] (Scan R, feeds into Sort R)
    # Stage 2: ['sort'] (Sort R - blocking)
    # Stage 3: ['scan'] (Scan S, feeds into Sort S)
    # Stage 4: ['sort'] (Sort S - blocking)
    # Stage 5: ['merge_join'] (Merge - consumes both sorted inputs)
 
 
analyze_sort_merge_plan()

Multi-Way Joins and Blocking:

Blocking effects compound in complex queries. Consider joining three tables:

              Output
                ↑
           Merge(R⋈S, T)
            ↑           ↑
       Sort(R⋈S)     Sort T    ← Two more sorts!
       (BREAK)       (BREAK)
           ↑            ↑
      Merge(R,S)     Scan T
       ↑       ↑
    Sort R  Sort S
   (BREAK) (BREAK)
      ↑       ↑
   Scan R  Scan S

This plan has 4 blocking sorts! Each must complete before its consumer can proceed. The alternative (hash join) might have fewer blocking points.

Interesting Sort

Mitigation Strategies

While we can't eliminate the inherent blocking nature of sorting, several strategies can mitigate its impact:

Strategy 1: Leverage Existing Sort Orders

The most effective mitigation is avoiding sorting entirely. Pre-sorted data sources eliminate blocking:

Sources of Pre-Sorted Data

•Clustered Index Scan: Data is physically sorted by index key. Scan produces sorted stream naturally.
•Index-Only Scan: If all needed columns are in index, index entries are already sorted.
•Previous Operation Output: If an earlier sort/merge in the plan produced data in the right order, reuse it.
•Materialized View: Pre-sorted materialized views can provide sorted input directly.
•Partition Pruning: Within a range partition, data may be sorted on partition key.

Strategy 2: Parallel Execution

While sorting is blocking, we can parallelize within and across sorts:

parallel_sort.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import concurrent.futures
from typing import List, Iterator
 
def parallel_sort_merge_join(R_source, S_source, join_key, 
                              num_workers: int = 4):
    """
    Parallel execution of sort-merge join.
    
    Both sorts run concurrently, reducing wall-clock time.
    """
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
        # Submit both sorts concurrently
        future_R = executor.submit(external_sort, R_source, join_key)
        future_S = executor.submit(external_sort, S_source, join_key)
        
        # Wait for both to complete (still blocking overall)
        sorted_R = future_R.result()
        sorted_S = future_S.result()
    
    # Merge phase (still sequential, but could parallelize for ranges)
    yield from merge_join(sorted_R, sorted_S, join_key)
 
 
def parallel_external_sort(source, key, num_workers: int = 4):
    """
    Parallel external merge sort.
    
    - Phase 1: Partition data, sort each partition in parallel
    - Phase 2: Merge sorted partitions
    """
    # Partition input for parallel sorting
    partitions = partition_data(source, num_workers)
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
        # Sort all partitions concurrently
        futures = [
            executor.submit(in_memory_sort, partition, key)
            for partition in partitions
        ]
        sorted_partitions = [f.result() for f in futures]
    
    # K-way merge of sorted partitions
    return k_way_merge(sorted_partitions)
 
 
def range_partitioned_merge(sorted_R, sorted_S, join_key, ranges):
    """
    Parallel merge using range partitioning.
    
    If data is range-partitioned on join key, each range can be
    merged independently in parallel.
    """
    results = []
    
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = []
        for (low, high) in ranges:
            # Extract tuples in this key range from both relations
            R_range = filter_range(sorted_R, join_key, low, high)
            S_range = filter_range(sorted_S, join_key, low, high)
            
            # Merge this range on a separate thread
            future = executor.submit(
                list, merge_join(R_range, S_range, join_key)
            )
            futures.append(future)
        
        # Collect results (already in correct order due to range design)
        for future in futures:
            results.extend(future.result())
    
    return results

Strategy 3: Graceful Degradation

For interactive queries, consider plan alternatives that trade total execution time for better first-row latency:

Join Strategy Selection for Latency Optimization
Query Pattern	Best for First Row	Best for Throughput	Trade-off
Small LIMIT	Nested Loop with Index	Hash Join	Scan more pages but get results immediately
Large result set	Hash Join	Sort-Merge (if sorted needed)	Build phase blocks but faster after
Pre-sorted, no LIMIT	Sort-Merge	Sort-Merge	No trade-off needed
Unsorted, need sorted output	Hash + Sort after	Sort-Merge	Hash avoids double-sorting

Adaptive Execution

Comparison: Sort-Merge vs Hash Join Blocking

Hash join is also partially blocking, but with different characteristics. Understanding the difference helps optimizers choose:

Hash Join Blocking Profile:

Build Phase (Blocking): Scan inner relation and build hash table. No output until complete.
Probe Phase (Non-blocking): Scan outer relation, probe hash table, output matches immediately.

Comparison:

Blocking Behavior Comparison
Aspect	Sort-Merge Join	Hash Join
Blocking Phases	Sort R, Sort S (both blocking)	Build only (inner relation)
Non-blocking Phases	Merge (after sorts)	Probe (after build)
First Row Latency	After BOTH relations sorted	After inner relation built
Asymmetry	Symmetric (both sides sorted)	Asymmetric (inner builds, outer probes)
Memory During Block	External sort temp files	Hash table in memory
Can Start Probe Early	No - both must be sorted	Yes - as soon as build completes

blocking_comparison.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def analyze_blocking_timeline():
    """
    Compare blocking timeline for sort-merge vs hash join.
    
    Scenario: Join R (1000 pages) with S (5000 pages)
    """
    # Time units (arbitrary, proportional to work)
    R_scan_time = 1000
    R_sort_time = 2000  # External sort with merge passes
    S_scan_time = 5000
    S_sort_time = 10000
    merge_time = 6000   # Merge both sorted relations
    
    build_time = 1000   # Build hash table from R
    probe_time = 5000   # Probe hash table with S
    
    # Sort-Merge Timeline:
    # 
    # Time 0-3000: Sort R (scan + sort) - No output
    # Time 0-15000: Sort S (scan + sort) - No output (parallel possible)
    # Time 15000: Both sorts complete
    # Time 15000-21000: Merge - Output streaming
    # 
    # First row at: 15000
    # Last row at: 21000
    
    sort_merge_first_row = max(R_scan_time + R_sort_time, 
                                S_scan_time + S_sort_time)
    sort_merge_complete = sort_merge_first_row + merge_time
    
    print("Sort-Merge Join:")
    print(f"  First row at: {sort_merge_first_row}")
    print(f"  Complete at: {sort_merge_complete}")
    
    # Hash Join Timeline:
    # 
    # Time 0-1000: Build from R (scan + hash) - No output
    # Time 1000-6000: Probe with S - Output streaming
    # 
    # First row at: 1000 (as soon as build done)
    # Last row at: 6000
    
    hash_join_first_row = R_scan_time + build_time
    hash_join_complete = hash_join_first_row + probe_time
    
    print("
Hash Join:")
    print(f"  First row at: {hash_join_first_row}")  
    print(f"  Complete at: {hash_join_complete}")
    
    # Analysis
    print(f"
First Row: Hash join is {sort_merge_first_row - hash_join_first_row}x faster")
    print(f"Complete: Hash join is {(sort_merge_complete - hash_join_complete) / hash_join_complete:.1%} faster")
 
 
analyze_blocking_timeline()

The Asymmetry Advantage

Summary: Understanding Blocking Behavior

The blocking nature of sort-merge join's sort phase is a fundamental characteristic that affects query planning and execution strategy. Let's consolidate the key insights:

Key Takeaways

•Blocking Defined: Blocking operators must consume all input before producing any output. Sorting is inherently blocking because output order depends on all input values.
•Pipeline Breakers: Blocking operators create materialization points in query plans. Data must be fully materialized (memory or disk) at these points.
•Sort-Merge Profile: Both sort phases are blocking; the merge phase is non-blocking. First output arrives only after both relations are fully sorted.
•Impact on Latency: Blocking increases first-row latency dramatically compared to fully pipelined execution. Critical for interactive queries with LIMIT.
•Mitigation Strategies: Pre-sorted data (indexes), parallel execution, and adaptive plan selection can reduce blocking impact.
•Hash Join Comparison: Hash join blocks only on build side, making it better for first-row latency with unsorted inputs.

Looking Ahead:

Page Complete

4 / 5