Database Management SystemsSort-Merge Join

Sort-Merge Join: Algorithm, Implementation, and Optimization

LevelIntermediate

Duration75 mins

TopicSort-Merge Join

5 / 5

Advantages

The Sort-Merge Sweet Spot

Every join algorithm has its strengths. Nested loop join excels with small tables and indexed access. Hash join dominates for large unsorted equi-joins. But sort-merge join occupies a unique niche in the query optimization landscape—one where its distinctive properties provide advantages that no other algorithm can match.

After understanding the algorithm mechanics, cost model, and blocking behavior, we can now synthesize this knowledge to answer the practical question: When should you choose sort-merge join? This page provides a comprehensive analysis of sort-merge join's advantages, complete with decision frameworks and real-world guidance.

What You Will Learn

By the end of this page, you will understand sort-merge join's unique advantages: its efficiency with pre-sorted inputs, its ability to handle non-equi-joins, its sorted output property, its predictable memory behavior, and its suitability for parallel execution. You'll develop a decision framework for choosing sort-merge join in practice.

The Pre-Sorted Input Advantage

The single greatest advantage of sort-merge join is its efficiency when input relations are already sorted on the join key. When sorting is eliminated, sort-merge join becomes the most efficient join algorithm, with cost reducing to simply reading both relations once.

When Data is Pre-Sorted:

Clustered Index on Join Key: If both tables have clustered (IOT) indexes on the join columns, the base table data is physically sorted. Scanning the table via the index produces tuples in sorted order.
Covering Index Scan: If all needed columns are in an index that includes the join key, the index itself provides sorted access.
Previous Operation Output: The result of a previous sort-merge join or ORDER BY clause may already be sorted on the key.
Partitioned Tables: Range-partitioned tables on the join key have sorted data within each partition.

pre_sorted_example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def cost_comparison_pre_sorted():
    """
    Compare join costs with and without pre-sorted input.
    """
    b_R = 10_000   # 10,000 pages
    b_S = 50_000   # 50,000 pages
    M = 1_000      # 1,000 buffer pages
    
    # Case 1: Neither relation sorted
    # Sort R: 2 × 10,000 × 2 = 40,000 (1 merge pass)
    # Sort S: 2 × 50,000 × 2 = 200,000 (1 merge pass)
    # Merge: 10,000 + 50,000 = 60,000
    unsorted_cost = 40_000 + 200_000 + 60_000  # 300,000
    
    # Case 2: Both relations sorted (e.g., clustered index scan)
    # Sort R: 0 (already sorted)
    # Sort S: 0 (already sorted)
    # Merge: 10,000 + 50,000 = 60,000
    sorted_cost = 0 + 0 + 60_000  # 60,000
    
    # Case 3: Hash join (for comparison)
    # Partition: 2 × (10,000 + 50,000) = 120,000
    # Probe: 10,000 + 50,000 = 60,000
    hash_cost = 120_000 + 60_000  # 180,000
    
    print("Join Cost Comparison:")
    print(f"  Sort-Merge (unsorted): {unsorted_cost:,} page I/Os")
    print(f"  Sort-Merge (sorted):   {sorted_cost:,} page I/Os")
    print(f"  Hash Join:             {hash_cost:,} page I/Os")
    print()
    print(f"Pre-sorted advantage: {(unsorted_cost - sorted_cost) / unsorted_cost:.1%} cost reduction")
    print(f"vs Hash (sorted):     {(hash_cost - sorted_cost) / hash_cost:.1%} cheaper than hash")
 
 
# Results:
# Sort-Merge (unsorted): 300,000 page I/Os
# Sort-Merge (sorted):   60,000 page I/Os  <-- 5× cheaper!
# Hash Join:             180,000 page I/Os
# 
# Pre-sorted advantage: 80.0% cost reduction
# vs Hash (sorted):     66.7% cheaper than hash

Index Design Implication

When designing indexes, consider join patterns. A clustered index on a frequently-joined column eliminates sorting cost for every join on that column. This can reduce join costs by 80% or more, as seen in the example.

Partial Pre-Sorting:

Even if only one relation is pre-sorted, sort-merge still benefits:

# One relation sorted:
# Sort R: 0 (already sorted)
# Sort S: 200,000 (still need to sort)
# Merge: 60,000
# Total: 260,000 (vs 300,000 unsorted, 13% savings)

Optimizers track 'interesting orders' to propagate sort order information through query plans and identify these opportunities.

The Sorted Output Property

Sort-merge join has a unique property among join algorithms: its output is sorted on the join key. This property can eliminate subsequent sort operations, creating cascading savings throughout the query plan.

Downstream Operations That Benefit:

Operations Reusing Sorted Output

•ORDER BY: If the query has ORDER BY join_column, no additional sort is needed.
•GROUP BY: Stream aggregate can be used instead of hash aggregate when data is sorted on group key.
•DISTINCT: Sorted data allows streaming duplicate elimination (compare adjacent rows) instead of hash-based.
•Subsequent Joins: If another join uses the same key, the already-sorted output can be reused.
•Window Functions: Many window functions (RANK, ROW_NUMBER, LAG/LEAD) benefit from sorted input on partition key.

sorted_output_cascade.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def query_plan_comparison():
    """
    Compare query plans with and without sort-merge join.
    
    Query: SELECT customer_id, SUM(amount)
           FROM orders JOIN customers ON orders.customer_id = customers.id
           GROUP BY customer_id
           ORDER BY customer_id
    """
    b_orders = 50_000
    b_customers = 10_000
    M = 1_000
    
    # Plan A: Hash Join + Sort + Hash Aggregate + Sort
    # (Hash join output is unsorted)
    join_cost = 3 * (b_orders + b_customers)  # 180,000
    sort_for_group = 2 * (b_orders + b_customers) * 2  # ~240,000 (estimate)
    # Could use hash aggregate (no sort needed), but ORDER BY requires sort
    sort_for_order = 2 * (b_orders + b_customers) * 2  # ~240,000
    
    plan_a_cost = join_cost + sort_for_order
    # With streaming aggregate after sort: join + 1 sort
    plan_a_optimized = 180_000 + 240_000  # 420,000
    
    # Plan B: Sort-Merge Join (customers sorted by clustered index)
    # Output is sorted by customer_id!
    sort_orders = 2 * b_orders * 2  # 200,000
    sort_customers = 0  # Clustered index
    merge_cost = b_orders + b_customers  # 60,000
    # Output is sorted - stream aggregate AND no ORDER BY sort needed!
    
    plan_b_cost = sort_orders + merge_cost
    # Total: 260,000
    
    print("Query Plan Cost Comparison:")
    print("-" * 50)
    print("Plan A (Hash Join):")
    print(f"  Hash Join:     180,000")
    print(f"  Sort for ORDER BY: 240,000")
    print(f"  Stream Aggregate: (included)")
    print(f"  TOTAL:         {plan_a_optimized:,}")
    print()
    print("Plan B (Sort-Merge Join):")
    print(f"  Sort Orders:   200,000")
    print(f"  Merge:         60,000")
    print(f"  Aggregate:     (streaming, no cost)")
    print(f"  ORDER BY:      (already sorted, FREE!)")
    print(f"  TOTAL:         {plan_b_cost:,}")
    print()
    print(f"Sort-Merge saves: {plan_a_optimized - plan_b_cost:,} I/Os ({100*(plan_a_optimized - plan_b_cost)/plan_a_optimized:.1f}%)")
 
 
# Output:
# Plan B saves: 160,000 I/Os (38.1%)
# The sorted output eliminates both GROUP BY and ORDER BY sorts!

Interesting Orders in Optimization

Query optimizers track 'interesting orders'—sort orders that benefit downstream operations. When evaluating join methods, they consider not just join cost but total plan cost including downstream operations. Sort-merge often wins when sorted output eliminates later sorts.

Chain of Joins:

For multi-way joins on the same key, the sorted output cascades:

SELECT * FROM A JOIN B ON A.id = B.a_id
                 JOIN C ON B.id = C.b_id
                 JOIN D ON C.id = D.c_id
ORDER BY A.id

With sort-merge:

Sort-Merge(A, B) → Output sorted by A.id (join key)
If B.id relates to A.id ordering, can leverage
Final output naturally sorted for ORDER BY

Hash join would require an additional sort at the end.

Non-Equi-Join Support

Hash join only works for equality conditions (=). For non-equality comparisons (<, >, <=, >=, BETWEEN), hash tables provide no benefit. Sort-merge join, however, handles these conditions efficiently due to its sorted access pattern.

Common Non-Equi-Join Patterns:

Non-Equi-Join Use Cases
Join Type	Example	Use Case
Range Join	A.date BETWEEN B.start AND B.end	Event containment, validity periods
Band Join	\|A.value - B.value\| < threshold	Similarity matching, fuzzy joins
Inequality Join	A.salary > B.budget	Budget analysis, threshold comparisons
Temporal Join	A.timestamp < B.timestamp	Ordering events, causal analysis
Cross Apply with Range	B.price < A.max_price	Product matching, filtering

non_equi_join.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def sort_merge_range_join(R, S, r_key, s_start, s_end):
    """
    Range join: R.key BETWEEN S.start AND S.end
    
    Sort R by r_key, S by s_start. For each R tuple, scan S tuples
    where s_start <= r_key. Stop when s_start > r_key.
    Also check s_end >= r_key for completeness.
    
    Much more efficient than nested loop when both are sorted.
    """
    # Sort both relations
    R_sorted = sort(R, key=r_key)
    S_sorted = sort(S, key=s_start)
    
    r_idx = 0
    s_idx = 0
    
    # Queue of "active" S tuples whose range includes current position
    active_S = []
    
    for r in R_sorted:
        r_val = r[r_key]
        
        # Add S tuples whose start <= r_val
        while s_idx < len(S_sorted) and S_sorted[s_idx][s_start] <= r_val:
            active_S.append(S_sorted[s_idx])
            s_idx += 1
        
        # Remove S tuples whose end < r_val
        active_S = [s for s in active_S if s[s_end] >= r_val]
        
        # All remaining active_S tuples match this R tuple
        for s in active_S:
            yield concatenate(r, s)
 
 
def sort_merge_inequality_join(R, S, r_key, s_key, operator='<'):
    """
    Inequality join: R.key < S.key (or >, <=, >=)
    
    For '<': Each R tuple joins with all S tuples having larger key.
    Sort both, then for each R tuple, output with all S tuples beyond it.
    
    Warning: Can produce O(n²) output even with optimal algorithm!
    """
    R_sorted = sort(R, key=r_key)
    S_sorted = sort(S, key=s_key)
    
    s_start = 0  # Track where valid S tuples begin
    
    for r in R_sorted:
        r_val = r[r_key]
        
        # Advance s_start past S tuples that don't satisfy condition
        if operator == '<':
            while s_start < len(S_sorted) and S_sorted[s_start][s_key] <= r_val:
                s_start += 1
        elif operator == '<=':
            while s_start < len(S_sorted) and S_sorted[s_start][s_key] < r_val:
                s_start += 1
        # Similar for >, >=
        
        # Output all matching S tuples
        for s in S_sorted[s_start:]:
            yield concatenate(r, s)
 
 
# Example: Find all orders that exceeded customer's credit limit
# SELECT * FROM orders o JOIN customers c
# ON o.customer_id = c.id AND o.amount > c.credit_limit
 
def efficient_band_join(R, S, r_key, s_key, epsilon):
    """
    Band join: |R.key - S.key| <= epsilon
    
    Sort both relations. For each R tuple, only examine S tuples
    in the band [r_key - epsilon, r_key + epsilon].
    
    Time: O(|R| + |S| + |output|) when band is small.
    """
    R_sorted = sort(R, key=r_key)
    S_sorted = sort(S, key=s_key)
    
    s_start = 0  # Start of valid S range
    
    for r in R_sorted:
        r_val = r[r_key]
        lower = r_val - epsilon
        upper = r_val + epsilon
        
        # Advance s_start to first S tuple in range
        while s_start < len(S_sorted) and S_sorted[s_start][s_key] < lower:
            s_start += 1
        
        # Output all S tuples in range
        s_idx = s_start
        while s_idx < len(S_sorted) and S_sorted[s_idx][s_key] <= upper:
            yield concatenate(r, S_sorted[s_idx])
            s_idx += 1

Non-Equi-Join Output Size

Non-equi-joins can produce massive output. R.a < S.b might produce O(|R| × |S|) tuples in the worst case. Optimizers estimate output cardinality carefully and may reject plans with expected gigantic outputs. The algorithm is efficient, but the join semantics may inherently require large results.

Predictable Memory Behavior

Sort-merge join has more predictable memory behavior than hash join, which is valuable in memory-constrained environments and for query planning:

Hash Join Memory Challenges:

Hash Table Size: Depends on inner relation size AND hash function collision rate
Skew Sensitivity: Highly skewed data creates large hash buckets, degrading performance
Recursive Partitioning: If partitions don't fit, must recursively partition—unpredictable recursion depth
Memory Spikes: Hash table construction can cause sudden large memory allocation

Sort-Merge Memory Advantages

•Predictable Run Sizes: With M buffer pages, each run is exactly M pages. No surprises.
•Bounded Merge Memory: Merge phase needs exactly K buffers for K-way merge. Fixed and known.
•Linear Scaling: Memory usage scales linearly with buffer allocation. Doubling buffers halves merge passes.
•No Hash Table Overhead: No hash function computation, no collision handling, no bucket management.
•Graceful Degradation: Adding more data adds more runs but uses same memory. Predictable cost increase.

memory_comparison.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def compare_memory_behavior():
    """
    Compare memory usage patterns of hash join vs sort-merge.
    """
    # Scenario: Join with skewed data
    # R: 1 million rows, 100MB
    # S: 10 million rows, 1GB (heavily skewed - 30% have same key value)
    
    # Hash Join Memory Analysis:
    # Build phase creates hash table for R (100MB base)
    # With 30% of S sharing one key value:
    #   - That hash bucket has 300,000 entries sharing collision lists
    #   - Causes severe hash table degradation
    #   - May trigger recursive partitioning
    # 
    # Memory: Unpredictable, could spike 2-3x expected
    
    # Sort-Merge Memory Analysis:
    # Sort R: Uses allocated buffer (say 100MB), produces runs
    # Sort S: Uses same buffer, produces runs (more runs, but same buffer)
    # Merge: Uses buffer for K-way merge
    # 
    # Skewed partition handling:
    #   - 30% matching keys create large partition
    #   - Handled by spilling partition to temp file
    #   - Uses same buffer size, just more I/O
    # 
    # Memory: Exactly as predicted by buffer allocation
    
    print("Memory Behavior Comparison:")
    print("-" * 50)
    print("Hash Join:")
    print("  Base estimate: 100 MB for hash table")
    print("  Actual (skewed): 200-300 MB (collision lists)")
    print("  Worst case: Recursive partitioning, unpredictable")
    print()
    print("Sort-Merge Join:")
    print("  Buffer allocation: 100 MB")
    print("  Actual usage: 100 MB (exactly)")
    print("  Skew impact: More runs, more I/O, same memory")
 
 
def memory_planning_example():
    """
    How predictable memory helps query planning.
    """
    # System has 2GB for query processing
    # Running 10 concurrent queries
    # Each gets 200MB budget
    
    # With hash join:
    # - Hard to guarantee 200MB is enough
    # - One query's hash table might balloon
    # - Risk of OOM or excessive spilling
    
    # With sort-merge:
    # - Each query uses exactly its 200MB buffer
    # - Sort produces runs that fit in budget
    # - Merge uses bounded number of buffers
    # - No risk of exceeding allocation
    
    print("\nQuery Memory Planning:")
    print("-" * 50)
    print("Available: 2 GB for 10 concurrent queries")
    print("Per-query budget: 200 MB")
    print()
    print("Hash Join risks:")
    print("  - Cannot guarantee budget compliance")
    print("  - Skew or estimation errors cause overruns")
    print()
    print("Sort-Merge guarantees:")
    print("  - Exact 200 MB usage per query")
    print("  - External sort handles any input size")
    print("  - Bounded, predictable behavior")

Resource Governor Friendly

In enterprise databases with resource governors that enforce memory limits per query or user, sort-merge join's predictable memory usage makes it easier to comply with limits. Hash join's potential for memory spikes can trigger throttling or query cancellation.

Parallel Execution Suitability

Sort-merge join is inherently well-suited for parallel execution, particularly the sort phase. Modern multi-core systems and distributed databases leverage this for scalability:

Parallel Sort Phase:

External sort parallelizes naturally:

Run Generation: Different chunks of the input can be sorted by different threads/nodes
Merge: Each merge level can be parallelized (merge runs in parallel)
Range Partitioning: Split by key ranges for independent parallel sorting

parallel_sort_merge.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
import concurrent.futures
from typing import List, Tuple
 
def parallel_distributed_sort_merge_join(
    R_partitions: List[str],  # R data files across nodes
    S_partitions: List[str],  # S data files across nodes  
    join_key: str,
    num_workers: int
) -> Iterator:
    """
    Distributed parallel sort-merge join.
    
    Strategy:
    1. Range-partition both R and S on join key
    2. Each worker sorts and merges its partition range
    3. Concatenate results (already in global sorted order)
    """
    # Step 1: Sample to determine key ranges
    key_ranges = determine_ranges(R_partitions + S_partitions, 
                                   join_key, num_workers)
    
    # Step 2: Redistribute data to workers by key range
    R_by_range = repartition_by_ranges(R_partitions, join_key, key_ranges)
    S_by_range = repartition_by_ranges(S_partitions, join_key, key_ranges)
    
    # Step 3: Each worker sorts and joins its range independently
    with concurrent.futures.ProcessPoolExecutor(max_workers=num_workers) as executor:
        futures = []
        for i in range(num_workers):
            future = executor.submit(
                sort_merge_join_partition,
                R_by_range[i],
                S_by_range[i],
                join_key
            )
            futures.append((i, future))
        
        # Collect results in order (ranges are ordered)
        for i, future in sorted(futures, key=lambda x: x[0]):
            yield from future.result()
 
 
def sort_merge_join_partition(R_data, S_data, join_key):
    """
    Standard sort-merge join on a single partition.
    Runs on a single worker/node.
    """
    # Local sort
    R_sorted = sort(R_data, key=join_key)
    S_sorted = sort(S_data, key=join_key)
    
    # Standard merge
    return list(merge_join(R_sorted, S_sorted, join_key))
 
 
def parallel_sort_single_relation(data, key, num_workers):
    """
    Parallel sort of a single relation on a multi-core machine.
    
    1. Partition data into N chunks
    2. Sort each chunk in parallel
    3. Merge sorted chunks
    """
    chunk_size = len(data) // num_workers
    chunks = [data[i*chunk_size:(i+1)*chunk_size] for i in range(num_workers)]
    
    # Parallel sort
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
        sorted_chunks = list(executor.map(
            lambda chunk: sorted(chunk, key=lambda x: x[key]),
            chunks
        ))
    
    # K-way merge of sorted chunks
    return k_way_merge(sorted_chunks, key)
 
 
# Parallel merge phase is also possible with range-based parallelism
def parallel_merge_range_partitioned(
    sorted_R: str,
    sorted_S: str,
    join_key: str,
    key_ranges: List[Tuple],
    num_workers: int
) -> Iterator:
    """
    Parallel merge using non-overlapping key ranges.
    
    Each worker merges tuples within its assigned key range.
    Results are concatenated (no final merge needed - ranges are disjoint).
    """
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
        futures = []
        for i, (low, high) in enumerate(key_ranges):
            future = executor.submit(
                merge_range,
                sorted_R, sorted_S, join_key, low, high
            )
            futures.append((i, future))
        
        # Yield in order
        for i, future in sorted(futures, key=lambda x: x[0]):
            yield from future.result()

Advantages for Parallel Systems:

Parallelism Benefits

•Linear Speedup: Sorting and merging by ranges scales nearly linearly with workers. N workers → approximately N× faster.
•No Communication During Merge: Once data is range-partitioned, workers operate independently. No synchronization needed.
•Balanced Load: Range partitioning with proper sampling distributes work evenly across workers.
•Pipelined Output: After sorting, each worker can immediately stream its portion of results.
•Disk Parallelism: External sort naturally distributes I/O across multiple disks when runs are spread.

Hash Join Also Parallelizes

Hash join is also parallelizable (partition by hash, each worker builds/probes its partition). The parallel efficiency is comparable. The choice between them in parallel systems often comes down to other factors: sorted output, non-equi-joins, or pre-sorted input.

When to Choose Sort-Merge Join

Based on all the advantages discussed, here's a practical decision framework for choosing sort-merge join::

Strongly Prefer Sort-Merge When:

Sort-Merge Join Decision Matrix
Condition	Why Sort-Merge Wins	Savings
Both inputs pre-sorted (index)	Skip sorting entirely	70-90% cost reduction
Query has ORDER BY on join key	Sorted output eliminates final sort	1 full sort eliminated
Non-equi-join (>, <, BETWEEN)	Only algorithm that handles efficiently	Hash join not applicable
Subsequent join on same key	Sorted output reused	1 sort eliminated per join
GROUP BY on join key	Stream aggregate possible	Hash aggregate overhead saved
Memory-constrained system	Predictable memory usage	No OOM risk, no spilling surprises

Prefer Hash Join When:

When Sort-Merge is Suboptimal

•Unsorted inputs, no order needed: Hash join skips sorting entirely, providing 40-60% savings.
•Small LIMIT clause: Hash join's faster first-row time matters more than total throughput.
•Highly skewed data: Hash join's partition-level strategies handle skew better than sort-merge's cross-product.
•One very small input: Index nested loop may beat both when inner relation fits in memory with index.
•Memory abundant, I/O slow: Hash join's single-pass probe phase may outperform multi-pass sorting.

decision_logic.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def recommend_join_algorithm(
    R_stats: RelationStats,
    S_stats: RelationStats,
    join_condition: JoinCondition,
    query_context: QueryContext
) -> str:
    """
    Recommend join algorithm based on characteristics.
    
    This is a simplified heuristic; real optimizers use cost estimation.
    """
    # Rule 1: Non-equi-join requires sort-merge or nested loop
    if not join_condition.is_equality:
        if R_stats.is_sorted and S_stats.is_sorted:
            return "SORT_MERGE"  # Best case for range join
        elif R_stats.num_tuples * S_stats.num_tuples < NESTED_LOOP_THRESHOLD:
            return "NESTED_LOOP"
        else:
            return "SORT_MERGE"  # Must sort, but still efficient
    
    # Rule 2: Pre-sorted inputs strongly favor sort-merge
    if R_stats.is_sorted and S_stats.is_sorted:
        return "SORT_MERGE"  # Cannot beat merge cost of b_R + b_S
    
    # Rule 3: ORDER BY on join key favors sort-merge
    if query_context.has_order_by_on_join_key:
        # Sort-merge's sorted output eliminates final sort
        return "SORT_MERGE"
    
    # Rule 4: GROUP BY on join key favors sort-merge
    if query_context.has_group_by_on_join_key:
        return "SORT_MERGE"  # Enables stream aggregate
    
    # Rule 5: Small LIMIT favors hash (or index nested loop)
    if query_context.limit and query_context.limit < 100:
        if has_index(S_stats, join_condition.S_key):
            return "INDEX_NESTED_LOOP"
        return "HASH_JOIN"
    
    # Rule 6: One sorted + need for ordered output
    if query_context.has_order_by_on_join_key:
        if R_stats.is_sorted or S_stats.is_sorted:
            return "SORT_MERGE"  # Partial sort savings
    
    # Default: For unsorted equi-join, hash join usually wins
    return "HASH_JOIN"
 
 
# Decision summary:
# 
# SORT_MERGE_INDICATORS:
#   - Pre-sorted inputs (clustered index, previous sort)
#   - ORDER BY / GROUP BY on join key
#   - Non-equality join conditions
#   - Subsequent joins on same key  
#   - Memory-constrained execution
#
# HASH_JOIN_INDICATORS:
#   - Unsorted inputs, no order needed
#   - Small LIMIT clause
#   - Highly skewed data
#   - Abundant memory, slow I/O

Index Design for Sort-Merge

If your workload has many joins on a particular column with ORDER BY or GROUP BY on the same column, consider a clustered index on that column. This enables sort-merge join to operate at its most efficient—merge cost only, with sorted output for downstream operations.

Summary: Sort-Merge Join Mastery

We've completed a comprehensive exploration of sort-merge join—from algorithm mechanics to production optimization. Let's consolidate the key advantages that make this algorithm invaluable:

Sort-Merge Join Advantages Summary

•Pre-Sorted Input Efficiency: When inputs are sorted (indexes, previous operations), cost drops to linear scan—unbeatable efficiency.
•Sorted Output: Output is sorted on join key, eliminating downstream sorts for ORDER BY, GROUP BY, and subsequent joins.
•Non-Equi-Join Support: Only practical algorithm for range, band, and inequality joins. Hash join cannot compete here.
•Predictable Memory: Fixed buffer usage regardless of data skew. No hash table explosion, no recursive partitioning surprises.
•Parallel Friendly: Range partitioning enables linear scalability on multi-core and distributed systems.
•Chain of Joins: For multi-way joins on shared keys, sorting once enables efficient multiple joins.

Module Complete:

You now have mastery of sort-merge join—understanding when it excels, why it works, how to analyze its cost, and how to optimize for it. This knowledge enables you to:

Design Better Indexes: Create clustered indexes that eliminate sorting costs for common join patterns.
Write Better Queries: Structure queries to leverage sort-merge when beneficial.
Analyze Query Plans: Understand optimizer choices and predict performance.
Optimize Performance: Adjust buffer sizes, index strategies, and query structures for efficiency.

Module Complete: Sort-Merge Join Mastery

Congratulations! You've mastered sort-merge join from theory to practice. You understand external sorting, the merge algorithm, cost analysis, blocking behavior, and when to choose this powerful join method. This knowledge places you among engineers who truly understand database internals.

5 / 5

Loading learning content...

Database Management SystemsSort-Merge Join

Sort-Merge Join: Algorithm, Implementation, and Optimization

LevelIntermediate

Duration75 mins

TopicSort-Merge Join

5 / 5

Advantages

The Sort-Merge Sweet Spot

What You Will Learn

The Pre-Sorted Input Advantage

When Data is Pre-Sorted:

Clustered Index on Join Key: If both tables have clustered (IOT) indexes on the join columns, the base table data is physically sorted. Scanning the table via the index produces tuples in sorted order.
Covering Index Scan: If all needed columns are in an index that includes the join key, the index itself provides sorted access.
Previous Operation Output: The result of a previous sort-merge join or ORDER BY clause may already be sorted on the key.
Partitioned Tables: Range-partitioned tables on the join key have sorted data within each partition.

pre_sorted_example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def cost_comparison_pre_sorted():
    """
    Compare join costs with and without pre-sorted input.
    """
    b_R = 10_000   # 10,000 pages
    b_S = 50_000   # 50,000 pages
    M = 1_000      # 1,000 buffer pages
    
    # Case 1: Neither relation sorted
    # Sort R: 2 × 10,000 × 2 = 40,000 (1 merge pass)
    # Sort S: 2 × 50,000 × 2 = 200,000 (1 merge pass)
    # Merge: 10,000 + 50,000 = 60,000
    unsorted_cost = 40_000 + 200_000 + 60_000  # 300,000
    
    # Case 2: Both relations sorted (e.g., clustered index scan)
    # Sort R: 0 (already sorted)
    # Sort S: 0 (already sorted)
    # Merge: 10,000 + 50,000 = 60,000
    sorted_cost = 0 + 0 + 60_000  # 60,000
    
    # Case 3: Hash join (for comparison)
    # Partition: 2 × (10,000 + 50,000) = 120,000
    # Probe: 10,000 + 50,000 = 60,000
    hash_cost = 120_000 + 60_000  # 180,000
    
    print("Join Cost Comparison:")
    print(f"  Sort-Merge (unsorted): {unsorted_cost:,} page I/Os")
    print(f"  Sort-Merge (sorted):   {sorted_cost:,} page I/Os")
    print(f"  Hash Join:             {hash_cost:,} page I/Os")
    print()
    print(f"Pre-sorted advantage: {(unsorted_cost - sorted_cost) / unsorted_cost:.1%} cost reduction")
    print(f"vs Hash (sorted):     {(hash_cost - sorted_cost) / hash_cost:.1%} cheaper than hash")
 
 
# Results:
# Sort-Merge (unsorted): 300,000 page I/Os
# Sort-Merge (sorted):   60,000 page I/Os  <-- 5× cheaper!
# Hash Join:             180,000 page I/Os
# 
# Pre-sorted advantage: 80.0% cost reduction
# vs Hash (sorted):     66.7% cheaper than hash

Index Design Implication

Partial Pre-Sorting:

Even if only one relation is pre-sorted, sort-merge still benefits:

# One relation sorted:
# Sort R: 0 (already sorted)
# Sort S: 200,000 (still need to sort)
# Merge: 60,000
# Total: 260,000 (vs 300,000 unsorted, 13% savings)

Optimizers track 'interesting orders' to propagate sort order information through query plans and identify these opportunities.

The Sorted Output Property

Downstream Operations That Benefit:

Operations Reusing Sorted Output

•ORDER BY: If the query has ORDER BY join_column, no additional sort is needed.
•GROUP BY: Stream aggregate can be used instead of hash aggregate when data is sorted on group key.
•DISTINCT: Sorted data allows streaming duplicate elimination (compare adjacent rows) instead of hash-based.
•Subsequent Joins: If another join uses the same key, the already-sorted output can be reused.
•Window Functions: Many window functions (RANK, ROW_NUMBER, LAG/LEAD) benefit from sorted input on partition key.

sorted_output_cascade.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def query_plan_comparison():
    """
    Compare query plans with and without sort-merge join.
    
    Query: SELECT customer_id, SUM(amount)
           FROM orders JOIN customers ON orders.customer_id = customers.id
           GROUP BY customer_id
           ORDER BY customer_id
    """
    b_orders = 50_000
    b_customers = 10_000
    M = 1_000
    
    # Plan A: Hash Join + Sort + Hash Aggregate + Sort
    # (Hash join output is unsorted)
    join_cost = 3 * (b_orders + b_customers)  # 180,000
    sort_for_group = 2 * (b_orders + b_customers) * 2  # ~240,000 (estimate)
    # Could use hash aggregate (no sort needed), but ORDER BY requires sort
    sort_for_order = 2 * (b_orders + b_customers) * 2  # ~240,000
    
    plan_a_cost = join_cost + sort_for_order
    # With streaming aggregate after sort: join + 1 sort
    plan_a_optimized = 180_000 + 240_000  # 420,000
    
    # Plan B: Sort-Merge Join (customers sorted by clustered index)
    # Output is sorted by customer_id!
    sort_orders = 2 * b_orders * 2  # 200,000
    sort_customers = 0  # Clustered index
    merge_cost = b_orders + b_customers  # 60,000
    # Output is sorted - stream aggregate AND no ORDER BY sort needed!
    
    plan_b_cost = sort_orders + merge_cost
    # Total: 260,000
    
    print("Query Plan Cost Comparison:")
    print("-" * 50)
    print("Plan A (Hash Join):")
    print(f"  Hash Join:     180,000")
    print(f"  Sort for ORDER BY: 240,000")
    print(f"  Stream Aggregate: (included)")
    print(f"  TOTAL:         {plan_a_optimized:,}")
    print()
    print("Plan B (Sort-Merge Join):")
    print(f"  Sort Orders:   200,000")
    print(f"  Merge:         60,000")
    print(f"  Aggregate:     (streaming, no cost)")
    print(f"  ORDER BY:      (already sorted, FREE!)")
    print(f"  TOTAL:         {plan_b_cost:,}")
    print()
    print(f"Sort-Merge saves: {plan_a_optimized - plan_b_cost:,} I/Os ({100*(plan_a_optimized - plan_b_cost)/plan_a_optimized:.1f}%)")
 
 
# Output:
# Plan B saves: 160,000 I/Os (38.1%)
# The sorted output eliminates both GROUP BY and ORDER BY sorts!

Interesting Orders in Optimization

Chain of Joins:

For multi-way joins on the same key, the sorted output cascades:

SELECT * FROM A JOIN B ON A.id = B.a_id
                 JOIN C ON B.id = C.b_id
                 JOIN D ON C.id = D.c_id
ORDER BY A.id

With sort-merge:

Sort-Merge(A, B) → Output sorted by A.id (join key)
If B.id relates to A.id ordering, can leverage
Final output naturally sorted for ORDER BY

Hash join would require an additional sort at the end.

Non-Equi-Join Support

Common Non-Equi-Join Patterns:

Non-Equi-Join Use Cases
Join Type	Example	Use Case
Range Join	A.date BETWEEN B.start AND B.end	Event containment, validity periods
Band Join	\|A.value - B.value\| < threshold	Similarity matching, fuzzy joins
Inequality Join	A.salary > B.budget	Budget analysis, threshold comparisons
Temporal Join	A.timestamp < B.timestamp	Ordering events, causal analysis
Cross Apply with Range	B.price < A.max_price	Product matching, filtering

non_equi_join.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def sort_merge_range_join(R, S, r_key, s_start, s_end):
    """
    Range join: R.key BETWEEN S.start AND S.end
    
    Sort R by r_key, S by s_start. For each R tuple, scan S tuples
    where s_start <= r_key. Stop when s_start > r_key.
    Also check s_end >= r_key for completeness.
    
    Much more efficient than nested loop when both are sorted.
    """
    # Sort both relations
    R_sorted = sort(R, key=r_key)
    S_sorted = sort(S, key=s_start)
    
    r_idx = 0
    s_idx = 0
    
    # Queue of "active" S tuples whose range includes current position
    active_S = []
    
    for r in R_sorted:
        r_val = r[r_key]
        
        # Add S tuples whose start <= r_val
        while s_idx < len(S_sorted) and S_sorted[s_idx][s_start] <= r_val:
            active_S.append(S_sorted[s_idx])
            s_idx += 1
        
        # Remove S tuples whose end < r_val
        active_S = [s for s in active_S if s[s_end] >= r_val]
        
        # All remaining active_S tuples match this R tuple
        for s in active_S:
            yield concatenate(r, s)
 
 
def sort_merge_inequality_join(R, S, r_key, s_key, operator='<'):
    """
    Inequality join: R.key < S.key (or >, <=, >=)
    
    For '<': Each R tuple joins with all S tuples having larger key.
    Sort both, then for each R tuple, output with all S tuples beyond it.
    
    Warning: Can produce O(n²) output even with optimal algorithm!
    """
    R_sorted = sort(R, key=r_key)
    S_sorted = sort(S, key=s_key)
    
    s_start = 0  # Track where valid S tuples begin
    
    for r in R_sorted:
        r_val = r[r_key]
        
        # Advance s_start past S tuples that don't satisfy condition
        if operator == '<':
            while s_start < len(S_sorted) and S_sorted[s_start][s_key] <= r_val:
                s_start += 1
        elif operator == '<=':
            while s_start < len(S_sorted) and S_sorted[s_start][s_key] < r_val:
                s_start += 1
        # Similar for >, >=
        
        # Output all matching S tuples
        for s in S_sorted[s_start:]:
            yield concatenate(r, s)
 
 
# Example: Find all orders that exceeded customer's credit limit
# SELECT * FROM orders o JOIN customers c
# ON o.customer_id = c.id AND o.amount > c.credit_limit
 
def efficient_band_join(R, S, r_key, s_key, epsilon):
    """
    Band join: |R.key - S.key| <= epsilon
    
    Sort both relations. For each R tuple, only examine S tuples
    in the band [r_key - epsilon, r_key + epsilon].
    
    Time: O(|R| + |S| + |output|) when band is small.
    """
    R_sorted = sort(R, key=r_key)
    S_sorted = sort(S, key=s_key)
    
    s_start = 0  # Start of valid S range
    
    for r in R_sorted:
        r_val = r[r_key]
        lower = r_val - epsilon
        upper = r_val + epsilon
        
        # Advance s_start to first S tuple in range
        while s_start < len(S_sorted) and S_sorted[s_start][s_key] < lower:
            s_start += 1
        
        # Output all S tuples in range
        s_idx = s_start
        while s_idx < len(S_sorted) and S_sorted[s_idx][s_key] <= upper:
            yield concatenate(r, S_sorted[s_idx])
            s_idx += 1

Non-Equi-Join Output Size

Predictable Memory Behavior

Sort-merge join has more predictable memory behavior than hash join, which is valuable in memory-constrained environments and for query planning:

Hash Join Memory Challenges:

Hash Table Size: Depends on inner relation size AND hash function collision rate
Skew Sensitivity: Highly skewed data creates large hash buckets, degrading performance
Recursive Partitioning: If partitions don't fit, must recursively partition—unpredictable recursion depth
Memory Spikes: Hash table construction can cause sudden large memory allocation

Sort-Merge Memory Advantages

•Predictable Run Sizes: With M buffer pages, each run is exactly M pages. No surprises.
•Bounded Merge Memory: Merge phase needs exactly K buffers for K-way merge. Fixed and known.
•Linear Scaling: Memory usage scales linearly with buffer allocation. Doubling buffers halves merge passes.
•No Hash Table Overhead: No hash function computation, no collision handling, no bucket management.
•Graceful Degradation: Adding more data adds more runs but uses same memory. Predictable cost increase.

memory_comparison.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def compare_memory_behavior():
    """
    Compare memory usage patterns of hash join vs sort-merge.
    """
    # Scenario: Join with skewed data
    # R: 1 million rows, 100MB
    # S: 10 million rows, 1GB (heavily skewed - 30% have same key value)
    
    # Hash Join Memory Analysis:
    # Build phase creates hash table for R (100MB base)
    # With 30% of S sharing one key value:
    #   - That hash bucket has 300,000 entries sharing collision lists
    #   - Causes severe hash table degradation
    #   - May trigger recursive partitioning
    # 
    # Memory: Unpredictable, could spike 2-3x expected
    
    # Sort-Merge Memory Analysis:
    # Sort R: Uses allocated buffer (say 100MB), produces runs
    # Sort S: Uses same buffer, produces runs (more runs, but same buffer)
    # Merge: Uses buffer for K-way merge
    # 
    # Skewed partition handling:
    #   - 30% matching keys create large partition
    #   - Handled by spilling partition to temp file
    #   - Uses same buffer size, just more I/O
    # 
    # Memory: Exactly as predicted by buffer allocation
    
    print("Memory Behavior Comparison:")
    print("-" * 50)
    print("Hash Join:")
    print("  Base estimate: 100 MB for hash table")
    print("  Actual (skewed): 200-300 MB (collision lists)")
    print("  Worst case: Recursive partitioning, unpredictable")
    print()
    print("Sort-Merge Join:")
    print("  Buffer allocation: 100 MB")
    print("  Actual usage: 100 MB (exactly)")
    print("  Skew impact: More runs, more I/O, same memory")
 
 
def memory_planning_example():
    """
    How predictable memory helps query planning.
    """
    # System has 2GB for query processing
    # Running 10 concurrent queries
    # Each gets 200MB budget
    
    # With hash join:
    # - Hard to guarantee 200MB is enough
    # - One query's hash table might balloon
    # - Risk of OOM or excessive spilling
    
    # With sort-merge:
    # - Each query uses exactly its 200MB buffer
    # - Sort produces runs that fit in budget
    # - Merge uses bounded number of buffers
    # - No risk of exceeding allocation
    
    print("\nQuery Memory Planning:")
    print("-" * 50)
    print("Available: 2 GB for 10 concurrent queries")
    print("Per-query budget: 200 MB")
    print()
    print("Hash Join risks:")
    print("  - Cannot guarantee budget compliance")
    print("  - Skew or estimation errors cause overruns")
    print()
    print("Sort-Merge guarantees:")
    print("  - Exact 200 MB usage per query")
    print("  - External sort handles any input size")
    print("  - Bounded, predictable behavior")

Resource Governor Friendly

Parallel Execution Suitability

Sort-merge join is inherently well-suited for parallel execution, particularly the sort phase. Modern multi-core systems and distributed databases leverage this for scalability:

Parallel Sort Phase:

External sort parallelizes naturally:

Run Generation: Different chunks of the input can be sorted by different threads/nodes
Merge: Each merge level can be parallelized (merge runs in parallel)
Range Partitioning: Split by key ranges for independent parallel sorting

parallel_sort_merge.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
import concurrent.futures
from typing import List, Tuple
 
def parallel_distributed_sort_merge_join(
    R_partitions: List[str],  # R data files across nodes
    S_partitions: List[str],  # S data files across nodes  
    join_key: str,
    num_workers: int
) -> Iterator:
    """
    Distributed parallel sort-merge join.
    
    Strategy:
    1. Range-partition both R and S on join key
    2. Each worker sorts and merges its partition range
    3. Concatenate results (already in global sorted order)
    """
    # Step 1: Sample to determine key ranges
    key_ranges = determine_ranges(R_partitions + S_partitions, 
                                   join_key, num_workers)
    
    # Step 2: Redistribute data to workers by key range
    R_by_range = repartition_by_ranges(R_partitions, join_key, key_ranges)
    S_by_range = repartition_by_ranges(S_partitions, join_key, key_ranges)
    
    # Step 3: Each worker sorts and joins its range independently
    with concurrent.futures.ProcessPoolExecutor(max_workers=num_workers) as executor:
        futures = []
        for i in range(num_workers):
            future = executor.submit(
                sort_merge_join_partition,
                R_by_range[i],
                S_by_range[i],
                join_key
            )
            futures.append((i, future))
        
        # Collect results in order (ranges are ordered)
        for i, future in sorted(futures, key=lambda x: x[0]):
            yield from future.result()
 
 
def sort_merge_join_partition(R_data, S_data, join_key):
    """
    Standard sort-merge join on a single partition.
    Runs on a single worker/node.
    """
    # Local sort
    R_sorted = sort(R_data, key=join_key)
    S_sorted = sort(S_data, key=join_key)
    
    # Standard merge
    return list(merge_join(R_sorted, S_sorted, join_key))
 
 
def parallel_sort_single_relation(data, key, num_workers):
    """
    Parallel sort of a single relation on a multi-core machine.
    
    1. Partition data into N chunks
    2. Sort each chunk in parallel
    3. Merge sorted chunks
    """
    chunk_size = len(data) // num_workers
    chunks = [data[i*chunk_size:(i+1)*chunk_size] for i in range(num_workers)]
    
    # Parallel sort
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
        sorted_chunks = list(executor.map(
            lambda chunk: sorted(chunk, key=lambda x: x[key]),
            chunks
        ))
    
    # K-way merge of sorted chunks
    return k_way_merge(sorted_chunks, key)
 
 
# Parallel merge phase is also possible with range-based parallelism
def parallel_merge_range_partitioned(
    sorted_R: str,
    sorted_S: str,
    join_key: str,
    key_ranges: List[Tuple],
    num_workers: int
) -> Iterator:
    """
    Parallel merge using non-overlapping key ranges.
    
    Each worker merges tuples within its assigned key range.
    Results are concatenated (no final merge needed - ranges are disjoint).
    """
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
        futures = []
        for i, (low, high) in enumerate(key_ranges):
            future = executor.submit(
                merge_range,
                sorted_R, sorted_S, join_key, low, high
            )
            futures.append((i, future))
        
        # Yield in order
        for i, future in sorted(futures, key=lambda x: x[0]):
            yield from future.result()

Advantages for Parallel Systems:

Parallelism Benefits

•Linear Speedup: Sorting and merging by ranges scales nearly linearly with workers. N workers → approximately N× faster.
•No Communication During Merge: Once data is range-partitioned, workers operate independently. No synchronization needed.
•Balanced Load: Range partitioning with proper sampling distributes work evenly across workers.
•Pipelined Output: After sorting, each worker can immediately stream its portion of results.
•Disk Parallelism: External sort naturally distributes I/O across multiple disks when runs are spread.

Hash Join Also Parallelizes

When to Choose Sort-Merge Join

Based on all the advantages discussed, here's a practical decision framework for choosing sort-merge join::

Strongly Prefer Sort-Merge When:

Sort-Merge Join Decision Matrix
Condition	Why Sort-Merge Wins	Savings
Both inputs pre-sorted (index)	Skip sorting entirely	70-90% cost reduction
Query has ORDER BY on join key	Sorted output eliminates final sort	1 full sort eliminated
Non-equi-join (>, <, BETWEEN)	Only algorithm that handles efficiently	Hash join not applicable
Subsequent join on same key	Sorted output reused	1 sort eliminated per join
GROUP BY on join key	Stream aggregate possible	Hash aggregate overhead saved
Memory-constrained system	Predictable memory usage	No OOM risk, no spilling surprises

Prefer Hash Join When:

When Sort-Merge is Suboptimal

•Unsorted inputs, no order needed: Hash join skips sorting entirely, providing 40-60% savings.
•Small LIMIT clause: Hash join's faster first-row time matters more than total throughput.
•Highly skewed data: Hash join's partition-level strategies handle skew better than sort-merge's cross-product.
•One very small input: Index nested loop may beat both when inner relation fits in memory with index.
•Memory abundant, I/O slow: Hash join's single-pass probe phase may outperform multi-pass sorting.

decision_logic.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def recommend_join_algorithm(
    R_stats: RelationStats,
    S_stats: RelationStats,
    join_condition: JoinCondition,
    query_context: QueryContext
) -> str:
    """
    Recommend join algorithm based on characteristics.
    
    This is a simplified heuristic; real optimizers use cost estimation.
    """
    # Rule 1: Non-equi-join requires sort-merge or nested loop
    if not join_condition.is_equality:
        if R_stats.is_sorted and S_stats.is_sorted:
            return "SORT_MERGE"  # Best case for range join
        elif R_stats.num_tuples * S_stats.num_tuples < NESTED_LOOP_THRESHOLD:
            return "NESTED_LOOP"
        else:
            return "SORT_MERGE"  # Must sort, but still efficient
    
    # Rule 2: Pre-sorted inputs strongly favor sort-merge
    if R_stats.is_sorted and S_stats.is_sorted:
        return "SORT_MERGE"  # Cannot beat merge cost of b_R + b_S
    
    # Rule 3: ORDER BY on join key favors sort-merge
    if query_context.has_order_by_on_join_key:
        # Sort-merge's sorted output eliminates final sort
        return "SORT_MERGE"
    
    # Rule 4: GROUP BY on join key favors sort-merge
    if query_context.has_group_by_on_join_key:
        return "SORT_MERGE"  # Enables stream aggregate
    
    # Rule 5: Small LIMIT favors hash (or index nested loop)
    if query_context.limit and query_context.limit < 100:
        if has_index(S_stats, join_condition.S_key):
            return "INDEX_NESTED_LOOP"
        return "HASH_JOIN"
    
    # Rule 6: One sorted + need for ordered output
    if query_context.has_order_by_on_join_key:
        if R_stats.is_sorted or S_stats.is_sorted:
            return "SORT_MERGE"  # Partial sort savings
    
    # Default: For unsorted equi-join, hash join usually wins
    return "HASH_JOIN"
 
 
# Decision summary:
# 
# SORT_MERGE_INDICATORS:
#   - Pre-sorted inputs (clustered index, previous sort)
#   - ORDER BY / GROUP BY on join key
#   - Non-equality join conditions
#   - Subsequent joins on same key  
#   - Memory-constrained execution
#
# HASH_JOIN_INDICATORS:
#   - Unsorted inputs, no order needed
#   - Small LIMIT clause
#   - Highly skewed data
#   - Abundant memory, slow I/O

Index Design for Sort-Merge

Summary: Sort-Merge Join Mastery

We've completed a comprehensive exploration of sort-merge join—from algorithm mechanics to production optimization. Let's consolidate the key advantages that make this algorithm invaluable:

Sort-Merge Join Advantages Summary

•Pre-Sorted Input Efficiency: When inputs are sorted (indexes, previous operations), cost drops to linear scan—unbeatable efficiency.
•Sorted Output: Output is sorted on join key, eliminating downstream sorts for ORDER BY, GROUP BY, and subsequent joins.
•Non-Equi-Join Support: Only practical algorithm for range, band, and inequality joins. Hash join cannot compete here.
•Predictable Memory: Fixed buffer usage regardless of data skew. No hash table explosion, no recursive partitioning surprises.
•Parallel Friendly: Range partitioning enables linear scalability on multi-core and distributed systems.
•Chain of Joins: For multi-way joins on shared keys, sorting once enables efficient multiple joins.

Module Complete:

You now have mastery of sort-merge join—understanding when it excels, why it works, how to analyze its cost, and how to optimize for it. This knowledge enables you to:

Design Better Indexes: Create clustered indexes that eliminate sorting costs for common join patterns.
Write Better Queries: Structure queries to leverage sort-merge when beneficial.
Analyze Query Plans: Understand optimizer choices and predict performance.
Optimize Performance: Adjust buffer sizes, index strategies, and query structures for efficiency.

Module Complete: Sort-Merge Join Mastery

5 / 5