Sort Merge Join - Learning Module

Loading content...

0/252

Cost Analysis

Quantifying Join Performance

Query optimization is fundamentally a cost-minimization problem. To choose between nested loop join, hash join, and sort-merge join, the optimizer must accurately estimate the cost of each alternative. This requires precise mathematical models that capture the real-world performance characteristics of each algorithm.

Sort-merge join's cost is uniquely structured: high upfront sorting cost, minimal merge cost. This makes it an excellent choice when inputs are already sorted or when sorted output is required. Understanding the cost formula empowers you to predict when sort-merge join excels and when other methods win.

What You Will Learn

By the end of this page, you will understand the complete cost model for sort-merge join: I/O costs for sorting and merging, CPU costs for comparisons and tuple manipulation, how pre-sorted input dramatically reduces cost, the effect of buffer memory on performance, and direct comparison with alternative join methods.

Cost Model Foundations

Database cost models quantify performance using several metrics. Understanding these metrics is essential before diving into sort-merge join's specific costs:

Primary Cost Metrics:

Cost Model Components
Metric	Symbol	Description	Typical Magnitude
Page I/O	1 page	Read or write a single disk page	~0.1ms (SSD), ~10ms (HDD)
Sequential Scan	—	Read N pages sequentially	~N × 0.01ms (SSD), N × 0.1ms (HDD)
Random Access	—	Seek + read one page	~0.1ms (SSD), ~10ms (HDD)
Tuple Comparison	—	Compare two sort keys	~100 CPU cycles
Tuple Copy	—	Copy tuple between buffers	~1000 CPU cycles
Hash Computation	—	Compute hash of join key	~200 CPU cycles

Why I/O Dominates:

For magnetic disks (HDD), a page read takes ~10 ms. In that time, a modern CPU executes ~40 billion cycles. Even with SSDs (~0.1 ms per random access), I/O is 100,000× slower than CPU operations per operation.

Consequently, cost models primarily count page I/O operations. CPU costs are considered only when:

Comparing algorithms with similar I/O patterns
Deciding between in-memory algorithm variants
Estimating time for CPU-bound phases (sorting in memory)

Standard Notation:

We use the following notation throughout this analysis:

|R| = number of tuples in relation R
b_R = number of pages (blocks) containing R
|S| = number of tuples in relation S
b_S = number of pages containing S
M = number of available buffer pages
B = page size in bytes

Block vs Page

We use 'page' and 'block' interchangeably. Both refer to the unit of disk I/O (typically 4-16 KB). The notation b_R means 'blocks of R' — the number of disk pages needed to store relation R.

Complete Sort-Merge Join Cost Formula

The total cost of sort-merge join decomposes into three phases:

Total Cost = Sort R + Sort S + Merge

Phase 1: Sort R (if not already sorted)

Using external merge sort:

Cost_Sort_R = 2 × b_R × (1 + ⌈log_{M-1}(⌈b_R/M⌉)⌉)

Breakdown:

Initial run generation: read b_R pages, write b_R pages = 2 × b_R
Each merge pass: read b_R pages, write b_R pages = 2 × b_R
Number of merge passes: ⌈log_{M-1}(⌈b_R/M⌉)⌉

Phase 2: Sort S (if not already sorted)

Identical formula:

Cost_Sort_S = 2 × b_S × (1 + ⌈log_{M-1}(⌈b_S/M⌉)⌉)

Phase 3: Merge Phase

The merge reads each sorted relation exactly once:

Cost_Merge = b_R + b_S

No writes are counted if we're streaming output to the next pipeline stage. If output is materialized:

Cost_Merge_Materialized = b_R + b_S + b_Output

cost_calculator.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
import math
from dataclasses import dataclass
from typing import Optional
 
@dataclass
class RelationStats:
    """Statistics for a relation."""
    num_tuples: int        # |R| or |S|
    num_pages: int         # b_R or b_S
    is_sorted: bool        # Already sorted on join key?
    tuple_size: int        # Average tuple size in bytes
 
 
def calculate_sort_cost(num_pages: int, 
                        buffer_pages: int, 
                        is_sorted: bool = False) -> int:
    """
    Calculate I/O cost to sort a relation.
    
    Returns number of page I/Os required.
    """
    if is_sorted:
        return 0  # No sorting needed
    
    b = num_pages
    M = buffer_pages
    
    # Number of initial runs
    num_runs = math.ceil(b / M)
    
    if num_runs <= 1:
        # Fits in memory - single pass sort
        return 2 * b  # Read once, write once
    
    # Number of merge passes needed
    # Fan-in is (M-1) because one buffer is for output
    merge_passes = math.ceil(math.log(num_runs) / math.log(M - 1))
    
    # Total: initial pass + merge passes, each reading and writing all data
    return 2 * b * (1 + merge_passes)
 
 
def calculate_merge_cost(stats_R: RelationStats,
                         stats_S: RelationStats,
                         materialize_output: bool = False,
                         output_pages: Optional[int] = None) -> int:
    """
    Calculate I/O cost for merge phase.
    
    Args:
        stats_R: Statistics for left relation
        stats_S: Statistics for right relation  
        materialize_output: Whether to write output to disk
        output_pages: If materializing, number of output pages
    
    Returns:
        Number of page I/Os for merge phase
    """
    # Read both relations once
    cost = stats_R.num_pages + stats_S.num_pages
    
    if materialize_output:
        if output_pages is None:
            raise ValueError("Must provide output_pages when materializing")
        cost += output_pages
    
    return cost
 
 
def calculate_sort_merge_join_cost(
    stats_R: RelationStats,
    stats_S: RelationStats,
    buffer_pages: int,
    materialize_output: bool = False,
    output_pages: Optional[int] = None
) -> dict:
    """
    Calculate complete sort-merge join cost.
    
    Returns breakdown of costs by phase.
    """
    sort_R_cost = calculate_sort_cost(
        stats_R.num_pages, buffer_pages, stats_R.is_sorted
    )
    
    sort_S_cost = calculate_sort_cost(
        stats_S.num_pages, buffer_pages, stats_S.is_sorted
    )
    
    merge_cost = calculate_merge_cost(
        stats_R, stats_S, materialize_output, output_pages
    )
    
    total_cost = sort_R_cost + sort_S_cost + merge_cost
    
    return {
        'sort_R': sort_R_cost,
        'sort_S': sort_S_cost,
        'merge': merge_cost,
        'total': total_cost,
        'breakdown': f"Sort R: {sort_R_cost} + Sort S: {sort_S_cost} + Merge: {merge_cost} = {total_cost}"
    }
 
 
# Example calculations
if __name__ == "__main__":
    # Scenario: Join 100MB table with 500MB table, 10MB buffer
    # Assume 8KB pages
    PAGE_SIZE = 8 * 1024
    
    R = RelationStats(
        num_tuples=500_000,
        num_pages=100_000_000 // PAGE_SIZE,  # ~12,500 pages
        is_sorted=False,
        tuple_size=200
    )
    
    S = RelationStats(
        num_tuples=2_500_000,
        num_pages=500_000_000 // PAGE_SIZE,  # ~62,500 pages
        is_sorted=False,
        tuple_size=200
    )
    
    buffer_pages = 10_000_000 // PAGE_SIZE  # ~1,250 pages
    
    result = calculate_sort_merge_join_cost(R, S, buffer_pages)
    print(result['breakdown'])

The Sorting Dominates

For unsorted inputs, sorting typically accounts for 70-90% of the total cost. This is why pre-sorted input (from an index or previous operation) makes sort-merge join dramatically cheaper.

Concrete Cost Examples

Let's work through several realistic scenarios to build intuition for sort-merge join costs:

Example 1: Small Tables, Moderate Buffer

R: 1,000 pages, S: 500 pages
Buffer: 100 pages
Neither relation is sorted

example_1.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Example 1: b_R = 1000, b_S = 500, M = 100
 
# Sort R:
# - Initial runs: ceil(1000/100) = 10 runs
# - Merge passes: ceil(log_99(10)) = ceil(0.502) = 1 pass
# - Cost: 2 × 1000 × (1 + 1) = 4,000 page I/Os
 
# Sort S:
# - Initial runs: ceil(500/100) = 5 runs  
# - Merge passes: ceil(log_99(5)) = ceil(0.351) = 1 pass
# - Cost: 2 × 500 × (1 + 1) = 2,000 page I/Os
 
# Merge:
# - Cost: 1000 + 500 = 1,500 page I/Os
 
# TOTAL: 4,000 + 2,000 + 1,500 = 7,500 page I/Os

Example 2: Large Tables, Small Buffer

R: 10,000 pages, S: 50,000 pages
Buffer: 100 pages
Neither relation is sorted

example_2.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Example 2: b_R = 10,000, b_S = 50,000, M = 100
 
# Sort R:
# - Initial runs: ceil(10,000/100) = 100 runs
# - Merge passes: ceil(log_99(100)) = ceil(1.004) = 2 passes
# - Cost: 2 × 10,000 × (1 + 2) = 60,000 page I/Os
 
# Sort S:
# - Initial runs: ceil(50,000/100) = 500 runs
# - Merge passes: ceil(log_99(500)) = ceil(1.354) = 2 passes
# - Cost: 2 × 50,000 × (1 + 2) = 300,000 page I/Os
 
# Merge:
# - Cost: 10,000 + 50,000 = 60,000 page I/Os
 
# TOTAL: 60,000 + 300,000 + 60,000 = 420,000 page I/Os
 
# Note: Sorting costs 360,000 of 420,000 total (86%)!

Example 3: Pre-Sorted Input (Index Available)

Same tables as Example 2, but both are already sorted on join key:

example_3.py
1
2
3
4
5
6
7
8
9
10
11
# Example 3: b_R = 10,000, b_S = 50,000, M = 100
# BOTH RELATIONS ALREADY SORTED
 
# Sort R: 0 (already sorted)
# Sort S: 0 (already sorted)
 
# Merge: 10,000 + 50,000 = 60,000 page I/Os
 
# TOTAL: 60,000 page I/Os
 
# SAVINGS: 420,000 - 60,000 = 360,000 page I/Os (85.7% reduction!)

Cost Comparison Summary
Scenario	Sort R	Sort S	Merge	Total	Notes
Example 1 (small)	4,000	2,000	1,500	7,500	1 merge pass each
Example 2 (large)	60,000	300,000	60,000	420,000	2 merge passes each
Example 3 (sorted)	0	0	60,000	60,000	85% cheaper than Ex 2

Buffer Memory Impact

In Example 2, with only 100 buffer pages for 50,000-page S, we need 500 initial runs and 2 merge passes. Doubling the buffer to 200 pages would reduce runs to 250, still needing 2 passes. But with 1,000 pages, we'd have 50 runs and only 1 pass—reducing S's sort cost from 300,000 to 200,000 I/Os.

Comparison with Alternative Join Methods

To understand when sort-merge join excels, we must compare it with the alternatives:

Nested Loop Join (Block Nested Loop):

Cost_BNL = b_R + (b_R / (M-2)) × b_S

For each block of R (using M-2 buffers), scans all of S. Very expensive for large tables without indexes.

Hash Join (Simple Hash):

Cost_Hash = 3 × (b_R + b_S)  [if both fit after partitioning]

Partition phase: read and write both relations once (2× each) Probe phase: read both relations once (1× each)

With recursive partitioning for large tables, more passes needed.

Join Algorithm Cost Comparison (b_R = 1,000, b_S = 5,000, M = 100)
Algorithm	Cost Formula	Result	Competitive When
Block Nested Loop	1,000 + 11 × 5,000	56,000	Small inner table, or index available
Hash Join	3 × (1,000 + 5,000)	18,000	Unsorted equi-join, adequate memory
Sort-Merge (unsorted)	2×1,000×2 + 2×5,000×2 + 6,000	30,000	Sorted output needed, or many joins
Sort-Merge (sorted)	0 + 0 + 6,000	6,000	Inputs already sorted (index)

When Sort-Merge Join Wins:

Sort-Merge Join Advantages

•Pre-sorted inputs: When relations are sorted on join key (from index or previous operation), sorting cost vanishes. Sort-merge becomes just b_R + b_S.
•Sorted output required: If query has ORDER BY on join key, or result feeds another sort-merge join, the sorted output is reused—eliminating a future sort.
•Non-equi-joins: Hash join only works for equality conditions. Sort-merge handles >, <, ≥, ≤, BETWEEN efficiently.
•Memory-constrained environments: Hash join degrades with recursive partitioning under memory pressure. Sort-merge has more predictable behavior.
•Large result set: When the join produces many tuples, sort-merge's streaming output (no materialization) reduces memory pressure.
•Multiple similar joins: If several joins share the same key, sorting once enables all joins cheaply.

Sort-Merge Join Disadvantages

•Unsorted inputs, no future sort benefit: When inputs aren't sorted and output order doesn't matter, hash join is typically faster.
•Highly skewed data: Large partitions of duplicate keys cause expensive cross-products. Hash join handles skew with partition-level strategies.
•One small, one large relation: Index nested loop join may beat both hash and sort-merge when the inner relation has an index.
•Very small relations: For tiny tables that fit in memory, simple nested loop may be fastest (no overhead).

The Optimizer Decides

Modern query optimizers evaluate all these factors—input sizes, sort orders, available indexes, memory, and output requirements—to choose the best join algorithm. Understanding the cost model helps you predict and influence these decisions through query hints, index design, and schema optimization.

CPU Cost Considerations

While I/O dominates for disk-based databases, CPU costs become significant for in-memory databases and fast SSDs. Let's analyze the CPU component of sort-merge join:

Sorting Phase CPU Cost:

In-memory sorting uses O(n log n) comparisons. For each tuple:

Extract sort key: variable (depends on key complexity)
Compare keys: ~50-200 cycles for simple keys, more for compound/string keys
Swap/copy: ~100-500 cycles

CPU_Sort = |R| × log(|R|) × C_compare + Copy_overhead

Merge Phase CPU Cost:

Merging is linear in output size:

Extract keys from current tuples: 2 operations per comparison
Compare keys: 1 comparison per output tuple (amortized)
Concatenate result: proportional to tuple width

CPU_Merge = (|R| + |S|) × C_compare + |Output| × C_concat

cpu_cost_model.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
from dataclasses import dataclass
from typing import Callable
import math
 
@dataclass  
class CPUCostParams:
    """CPU cost parameters (in CPU cycles)."""
    key_extract: int = 50       # Extract sort key from tuple
    key_compare: int = 100      # Compare two keys
    tuple_copy: int = 500       # Copy one tuple in memory
    tuple_concat: int = 300     # Concatenate two tuples for output
    heap_operation: int = 200   # Heap push/pop in external sort
 
 
def estimate_sort_cpu_cost(num_tuples: int, 
                           num_pages: int,
                           buffer_pages: int,
                           params: CPUCostParams) -> int:
    """
    Estimate CPU cycles for external merge sort.
    
    Includes:
    - In-memory sorting of each run (quicksort)
    - Heap operations for k-way merge
    """
    # Phase 1: In-memory sorting
    # Each run contains approximately (buffer_pages × tuples_per_page) tuples
    tuples_per_page = num_tuples / num_pages
    tuples_per_run = buffer_pages * tuples_per_page
    num_runs = math.ceil(num_pages / buffer_pages)
    
    # Quicksort: ~1.4 × n × log2(n) comparisons on average
    sort_comparisons = 1.4 * tuples_per_run * math.log2(max(tuples_per_run, 1))
    sort_cost_per_run = sort_comparisons * (
        params.key_extract * 2 + params.key_compare
    )
    total_sort_cost = num_runs * sort_cost_per_run
    
    # Phase 2: Merge (if multiple runs)
    if num_runs <= 1:
        return int(total_sort_cost)
    
    # K-way merge: each tuple does log2(K) heap operations
    # for each merge pass
    num_passes = math.ceil(math.log(num_runs) / math.log(max(buffer_pages - 1, 2)))
    heap_cost = num_tuples * num_passes * (
        params.heap_operation * math.log2(min(num_runs, buffer_pages - 1)) +
        params.key_compare
    )
    
    return int(total_sort_cost + heap_cost)
 
 
def estimate_merge_cpu_cost(num_R: int,
                            num_S: int, 
                            output_tuples: int,
                            params: CPUCostParams) -> int:
    """
    Estimate CPU cycles for merge-join phase.
    
    Assumes unique keys (no cross-product).
    With duplicates, add |output| × C_concat for concatenation.
    """
    # Direct merge: ~(|R| + |S|) comparisons
    compare_cost = (num_R + num_S) * (
        params.key_extract * 2 + params.key_compare
    )
    
    # Output concatenation  
    output_cost = output_tuples * params.tuple_concat
    
    return compare_cost + output_cost
 
 
# Example: Compare CPU costs
R_tuples = 1_000_000
S_tuples = 5_000_000
R_pages = 10_000
S_pages = 50_000
M = 1_000
output = 500_000  # Estimated join result size
 
params = CPUCostParams()
 
sort_R_cpu = estimate_sort_cpu_cost(R_tuples, R_pages, M, params)
sort_S_cpu = estimate_sort_cpu_cost(S_tuples, S_pages, M, params)
merge_cpu = estimate_merge_cpu_cost(R_tuples, S_tuples, output, params)
 
print(f"Sort R CPU: {sort_R_cpu:,} cycles ({sort_R_cpu/1e9:.2f} billion)")
print(f"Sort S CPU: {sort_S_cpu:,} cycles ({sort_S_cpu/1e9:.2f} billion)")  
print(f"Merge CPU:  {merge_cpu:,} cycles ({merge_cpu/1e9:.2f} billion)")
 
# At 3 GHz: cycles / 3e9 = seconds

When CPU Costs Matter:

CPU-Intensive Scenarios

•In-memory databases: All data fits in RAM, so no I/O. CPU becomes the bottleneck.
•NVMe SSDs: Modern SSDs deliver ~500 MB/s random read. CPU may not keep up with decompression and comparison.
•Complex keys: Compound keys with multiple columns, string comparisons, or collation-aware sorting increase CPU cost per comparison.
•Wide tuples: Large tuples (1 KB+) make copying expensive. Strategies like pointer swapping become attractive.
•Compressed data: Reading compressed pages saves I/O but requires CPU decompression.

SIMD and Parallelism

Modern CPUs offer SIMD instructions that compare multiple keys in parallel. Parallel sorting algorithms use multiple cores to sort chunks concurrently. These optimizations can reduce CPU time by 4-16× but are often hidden inside optimized library functions.

Buffer Memory Analysis

Buffer memory profoundly affects sort-merge join performance. Let's analyze how memory allocation impacts cost:

The Memory Sweet Spot:

External sort cost drops dramatically when memory crosses certain thresholds:

M < √(b_R): Very small buffer. Many initial runs, multiple merge passes.
√(b_R) ≤ M < b_R: Moderate buffer. Few runs, typically 1-2 merge passes.
M ≥ b_R: Large buffer. Everything fits in memory. Cost = 2 × b_R (read + write, no merge).

Sort Cost vs Buffer Size (b_R = 10,000 pages)
Buffer Pages (M)	Initial Runs	Merge Passes	Total I/O	Reduction from M=50
50	200	2	60,000	—
100	100	2	60,000	0%
200	50	1	40,000	33%
500	20	1	40,000	33%
1,000	10	1	40,000	33%
5,000	2	1	40,000	33%
10,000+	1	0	20,000	67%

Observations:

Diminishing returns: Going from 50 to 200 buffer pages reduces cost by 33%. Going from 200 to 1,000 provides no additional benefit (same 1 pass).
Step function: Cost drops at thresholds where merge passes decrease. Between thresholds, extra memory provides minimal benefit.
In-memory threshold: When M ≥ b_R, we avoid merge passes entirely—the biggest single improvement.

Memory Allocation Strategy:

Given a fixed memory budget for a complex query with multiple joins:

Memory Allocation Principles

•Allocate to eliminate passes: Give enough memory to each sort to avoid an extra merge pass. Target the thresholds.
•Prioritize larger relations: The larger relation benefits more from extra memory (larger b means more I/O savings per pass eliminated).
•Consider reuse: If sorted output will be used again (e.g., another join on same key), allocate memory to that sort first.
•Leave headroom for partitions: During merge phase, memory is needed for handling duplicate key partitions. Don't allocate 100% to sorting.

memory_allocation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def optimal_memory_allocation(b_R: int, b_S: int, 
                               total_memory: int) -> dict:
    """
    Allocate buffer memory between two sorts to minimize total cost.
    
    This is a simplified heuristic. Production optimizers use
    more sophisticated cost-based allocation.
    """
    # Check if we can fit everything
    if total_memory >= b_R + b_S:
        return {
            'R_memory': b_R,
            'S_memory': b_S,
            'merge_memory': 0,
            'strategy': 'both fit in memory'
        }
    
    # Proportional allocation (simple but effective)
    ratio = b_R / (b_R + b_S)
    R_memory = int(total_memory * ratio)
    S_memory = total_memory - R_memory
    
    # Ensure minimum allocation for each
    min_allocation = max(3, total_memory // 20)  # At least 3 pages or 5%
    R_memory = max(R_memory, min_allocation)
    S_memory = max(S_memory, min_allocation)
    
    # Adjust to not exceed total
    if R_memory + S_memory > total_memory:
        excess = (R_memory + S_memory) - total_memory
        if R_memory > S_memory:
            R_memory -= excess
        else:
            S_memory -= excess
    
    # Calculate resulting costs
    import math
    
    def passes(b, M):
        if M >= b:
            return 0
        runs = math.ceil(b / M)
        if runs <= 1:
            return 0
        return math.ceil(math.log(runs) / math.log(M - 1))
    
    return {
        'R_memory': R_memory,
        'S_memory': S_memory,
        'R_passes': passes(b_R, R_memory),
        'S_passes': passes(b_S, S_memory),
        'strategy': 'proportional allocation'
    }
 
 
# Example
result = optimal_memory_allocation(
    b_R=10_000, 
    b_S=50_000, 
    total_memory=5_000
)
print(f"R memory: {result['R_memory']} pages ({result['R_passes']} passes)")
print(f"S memory: {result['S_memory']} pages ({result['S_passes']} passes)")

Memory Contention

In a multi-query environment, multiple operations compete for buffer memory. The optimizer's memory estimates may not match runtime reality. Adaptive execution engines monitor actual memory usage and may switch strategies mid-query if needed.

Summary: Cost Analysis Mastery

A solid understanding of sort-merge join costs empowers you to predict performance, design efficient schemas, and understand optimizer decisions. Let's consolidate the essential formulas and insights:

Key Takeaways

•Total Cost = Sort_R + Sort_S + Merge: Sorting dominates for unsorted inputs (often 80-90% of cost).
•Sort Cost = 2b × (1 + passes): Each merge pass reads and writes all data. Fewer passes = less I/O.
•Merge Cost = b_R + b_S: Simple sequential read of both sorted relations. Linear and predictable.
•Pre-sorted input is golden: Eliminating sorting transforms sort-merge from expensive to optimal.
•Memory thresholds matter: Cost drops at specific buffer sizes that reduce merge passes. Target those thresholds.
•Comparison with alternatives: Sort-merge wins with sorted inputs, for non-equi-joins, or when sorted output is needed. Hash join wins for unsorted equi-joins.

Looking Ahead:

With cost analysis complete, we'll examine the blocking nature of sort-merge join—a key characteristic that affects query pipelining and execution strategy. Understanding this property is crucial for designing efficient query execution plans.

Page Complete

You now have a comprehensive understanding of sort-merge join cost analysis—from I/O formulas to CPU considerations, from concrete examples to memory optimization strategies. This quantitative foundation is essential for query optimization and performance engineering.