Database Management SystemsQuery Optimization

Equivalence Rules

LevelIntermediate

Duration60 mins

TopicQuery Optimization

4 / 5

Join Commutativity

The Freedom to Swap

At first glance, join commutativity seems trivially obvious: A ⋈ B = B ⋈ A. Of course, the result is the same regardless of which table we list first. But this mathematical equivalence has profound implications for query execution.

The order in which relations appear in a join doesn't just affect notation—it affects how the join is physically executed. In a nested-loop join, the left operand becomes the outer loop and the right operand becomes the inner loop. In a hash join, typically one side builds the hash table and the other probes it. These implementation choices have dramatically different performance characteristics depending on relation sizes, available memory, and index structures.

Join commutativity gives the optimizer freedom to choose the optimal physical arrangement for each join, transforming a potentially slow execution into a fast one without changing the query's meaning.

What You Will Learn

By the end of this page, you will understand: (1) why join commutativity matters for physical execution, (2) how different join algorithms exploit operand ordering, (3) the cost implications of inner vs. outer relation choice, and (4) when commutativity is limited or doesn't apply.

The Commutativity Property Formalized

Formal Statement

For natural join:

R ⋈ S ≡ S ⋈ R

For theta join:

R ⋈_θ S ≡ S ⋈_θ R  (where θ is symmetric or appropriately transformed)

For equi-join:

R ⋈_{R.a=S.b} S ≡ S ⋈_{S.b=R.a} R

Why Does This Hold?

The join operation is defined as selecting tuples from the Cartesian product that satisfy the join condition. Since the Cartesian product itself is commutative (R × S contains the same pairs as S × R, just in different order), and the join condition is applied symmetrically, the result set must be identical.

What About Attribute Order?

Strictly speaking, R ⋈ S and S ⋈ R might have attributes in different orders. In practice:

SQL's result column order follows SELECT list, not FROM order
Optimizers can add projections to reorder attributes
For semantic equivalence, we care about the set of result tuples, not their physical ordering

This means commutativity holds for all practical purposes in query optimization.

Commutativity of Different Join Types
Join Type	Commutative?	Condition	Notes
Natural Join (⋈)	Yes	Always	Equal common attributes
Inner Equi-Join	Yes	Always	Transform a=b to b=a
Inner Theta Join	Yes	Symmetric condition	θ must be rewritten appropriately
Cross Join (×)	Yes	Always	All pair combinations identical
LEFT OUTER JOIN	No	—	R ⟕ S ≠ S ⟕ R
RIGHT OUTER JOIN	No	—	R ⟖ S ≠ S ⟖ R
FULL OUTER JOIN	Yes	Always	Both sides preserved symmetrically

Outer Joins Break Commutativity

LEFT and RIGHT outer joins are not commutative! LEFT JOIN preserves all rows from the left table; swapping operands changes which rows are preserved. FULL OUTER JOIN, however, is commutative because it preserves unmatched rows from both sides.

Physical Execution Implications

The value of commutativity becomes clear when we examine how different join algorithms work.

Nested Loop Join

In a nested loop join:

for each tuple r in R (outer):
    for each tuple s in S (inner):
        if r and s satisfy join condition:
            output (r, s)

The asymmetry: The outer relation R is scanned once. The inner relation S is scanned once for each tuple of R.

If R has 100 tuples and S has 10,000 tuples: S is scanned 100 times
If we swap: R is scanned 10,000 times

Optimal choice: Make the smaller relation the outer, or exploit indexes on the inner relation.

Hash Join

In a hash join:

// Build phase
for each tuple r in R (build side):
    insert r into hash table keyed on join attributes

// Probe phase  
for each tuple s in S (probe side):
    look up matching tuples in hash table
    output joined tuples

The asymmetry: The build side constructs an in-memory hash table. If the build side is too large, the table may spill to disk, causing massive slowdown.

Optimal choice: Make the smaller relation the build side. The hash table should fit in memory.

Nested Loop: Outer Table

•Scanned once sequentially
•Should be smaller (if no index on inner)
•Drives the loop iteration count
•Each tuple triggers inner scan
•Memory: buffer pages for outer

Nested Loop: Inner Table

•Scanned multiple times
•Best if indexed on join column
•Each scan finds matching tuples
•Benefits from caching/buffer reuse
•Memory: index structure + buffer

Sort-Merge Join

Sort-merge join is more symmetric:

sort R on join attributes
sort S on join attributes
merge: scan both sorted relations, matching equal keys

Both relations are scanned once during the merge phase. However, sorting costs depend on relation sizes, and if one relation is already sorted (e.g., from an index), commuting might avoid a redundant sort.

Index Nested Loop Join

When an index exists on the join column:

for each tuple r in R (outer):
    use index on S to find matching tuples (inner)

The strong asymmetry: The indexed relation should be the inner. If you swap, you lose the index benefit.

Optimal choice: Non-indexed relation as outer, indexed relation as inner. Commutativity enables this arrangement regardless of how the query was written.

The Optimizer's Freedom

Commutativity lets the optimizer choose: for nested loops, make the smaller or filtered table outer; for hash joins, build on the smaller table; for index nested loops, put the indexed table as inner. Without commutativity, the query's textual order would dictate a potentially terrible execution.

Cost Model Considerations

The optimizer uses cost models to decide which commutation is beneficial. Let's examine the cost formulas.

Nested Loop Join Costs

Basic nested loop (no blocking, no index):

Cost(R ⋈ S) = |R| + |R| × |S|  (tuple-at-a-time)
            = P_R + |R| × P_S  (page-at-a-time, worst case)

Where P_R, P_S are page counts. Clearly, having the smaller relation as outer is better.

Block nested loop (using B memory pages for outer):

Cost = P_R + ⌈P_R / (B-2)⌉ × P_S

Fewer outer pages means fewer scans of the inner.

Index nested loop (index on inner's join column):

Cost = P_R + |R| × (index lookup cost + tuple fetch cost)

Index lookup is ~3–4 I/Os for B-tree. If the outer is after selection pushdown (fewer tuples), cost is very low.

Hash Join Costs

Cost = 3 × (P_R + P_S)  (for partitioned hash join)

Symmetric in pages, but:

Memory pressure from build side hash table
If build side fits in memory: Cost = P_R + P_S
If build side exceeds memory: multiple passes needed

Optimal: Build on the smaller relation to maximize chance of in-memory build.

Cost Comparison Example

Analysis

Given:
  Table R: 10,000 rows, 1,000 pages
  Table S: 1,000,000 rows, 50,000 pages
  Memory buffer: 100 pages
  Index on S.join_column
 
Option 1: R as outer in nested loop (with index on S)
  Cost = P_R + |R| × (index lookup)
       = 1,000 + 10,000 × 4
       = 41,000 I/Os
 
Option 2: S as outer in nested loop (with index on S - can't use!)
  Cost = P_S + |S| × (full scan R for each S tuple, if no index on R)
       = 50,000 + 1,000,000 × 1,000
       = ~1 billion I/Os  (DISASTER!)
 
Commutativity enables Option 1:
  - Use outer-to-inner relationship that exploits the index
  - Reduce cost by factor of ~24,000

Statistics Are Critical

The optimizer needs accurate table sizes, row counts, and index availability to make good commutation decisions. Stale statistics can lead to terrible choices. Always keep statistics updated (ANALYZE in PostgreSQL, UPDATE STATISTICS in SQL Server).

Multi-Way Joins and Commutativity

When a query joins three or more tables, commutativity at each join step compounds with associativity to create a large search space of equivalent plans.

Example: Three-Way Join

For query: R ⋈ S ⋈ T

Consider just binary join orderings:

(R ⋈ S) ⋈ T with R outer for first join
(R ⋈ S) ⋈ T with S outer for first join
(S ⋈ R) ⋈ T with S outer for first join (same as 2 after commuting)
R ⋈ (S ⋈ T) with R outer for second join ... and so on

Each intermediate join can have its operands commuted. The optimizer explores combinations to find the overall best plan.

The Explosion

For n tables:

Number of join orders (tree shapes): Catalan number C_{n-1}
Number of commutations per join: 2
Total plans: O(4^n / n^{1.5}) without algorithm selection

Optimizers use dynamic programming (for small n) or heuristics (for large n) to search this space efficiently.

Join Plan Space Growth
Tables (n)	Join Orders	With Commutation	With Algorithms (3 choices)
2	1	2	6
3	2	4	36
4	5	10	270
5	14	28	2,268
6	42	84	20,412
10	4,862	9,724	~5 million
15	2.7 million	5.4 million	~billions

Why Commutativity Matters Even More in Multi-Way Joins

In multi-way joins, an early poor commutation decision propagates:

If the first join produces a huge intermediate result (wrong operand order), subsequent joins process more data
Conversely, optimal commutation at each stage keeps intermediate results small

The optimizer evaluates costs for promising combinations, applying commutativity where it helps reduce intermediate result sizes or enables index usage.

Outer Joins and Semi-Joins

Commutativity has limitations for certain join variants. Understanding these limitations is crucial for correct optimization.

Outer Join Non-Commutativity

LEFT OUTER JOIN (⟕)

R ⟕ S ≢ S ⟕ R

Why? LEFT JOIN preserves all rows from R, padding with NULLs when no match exists in S. Swapping changes which table is preserved.

Example:

R = {1, 2, 3}
S = {2, 3, 4}

R ⟕ S = {(1, NULL), (2, 2), (3, 3)}
S ⟕ R = {(2, 2), (3, 3), (4, NULL)}  -- DIFFERENT!

Implication: The optimizer cannot commute outer joins freely. The SQL author's intent (which side to preserve) must be honored.

RIGHT OUTER JOIN

Similarly non-commutative:

R ⟖ S ≢ S ⟖ R

However, there's a useful transformation:

R ⟕ S ≡ S ⟖ R  (converting between LEFT and RIGHT)

This isn't commutativity but a different equivalence that can help normalize plans.

FULL OUTER JOIN

R ⟗ S ≡ S ⟗ R  (IS commutative)

Full outer join preserves unmatched rows from both sides, so it doesn't matter which is listed first.

Semi-Joins and Anti-Joins

Semi-join (⋉): Returns tuples from R that have at least one match in S. No S columns appear in output.

R ⋉ S ≢ S ⋉ R (NOT commutative in result schema)

However, for EXISTS subqueries:

SELECT * FROM R WHERE EXISTS (SELECT 1 FROM S WHERE R.a = S.a)

The semi-join implementation can choose which table to scan first, as long as the output is correctly R's tuples.

Anti-join (▷): Returns tuples from R with no match in S (NOT EXISTS pattern).

R ▷ S ≢ S ▷ R (NOT commutative)

Anti-joins have even more restricted transformation options.

Optimizer Workarounds

Even when commutativity doesn't apply to the logical operator, physical algorithms may have flexibility. For instance, a semi-join can use either table as the hash table build side internally, as long as it correctly returns R's tuples. The optimizer exploits such physical-level freedoms.

Join Condition Symmetry

When commuting a theta join, the join condition may need to be transformed to remain semantically correct.

Symmetric Conditions

Equality conditions are naturally symmetric:

R.a = S.b ⟺ S.b = R.a

The optimizer can commute freely.

Asymmetric Conditions

Inequality conditions must be inverted:

R.a < S.b when commuted becomes S.b > R.a

Transformation table:

Original	Commuted Equivalent
`R.a = S.b`	`S.b = R.a`
`R.a < S.b`	`S.b > R.a`
`R.a <= S.b`	`S.b >= R.a`
`R.a > S.b`	`S.b < R.a`
`R.a >= S.b`	`S.b <= R.a`
`R.a != S.b`	`S.b != R.a`

Complex Conditions

Conjunctions and disjunctions transform component-wise:

(R.a = S.b) AND (R.c > S.d) 
⟺ (S.b = R.a) AND (S.d < R.c)

The optimizer applies these transformations automatically when considering commutation.

Expression Complexity

For complex join conditions involving functions, CASE expressions, or subqueries, the optimizer must verify that the condition remains evaluable after commutation. Conditions that are inherently asymmetric (e.g., involving row numbers or lateral references) may prevent commutation.

Cross Joins Require Special Care

A cross join (Cartesian product) has no join condition:

R × S ≡ S × R

Completely commutative, but:

Should rarely appear in well-written queries
Often indicates a missing WHERE clause
Optimizer may try to find implied join conditions from WHERE to convert to a proper join

When a cross join is intentional (generating combinations), commutation chooses optimal scan order.

Practical Examples

Let's examine how commutativity affects real query plans.

Join Commutativity in Practice
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Query as written
SELECT c.name, o.order_date
FROM Orders o   -- 10 million rows
JOIN Customers c ON o.customer_id = c.id  -- 100,000 rows
WHERE c.region = 'West';
 
-- The optimizer might commute this logically to:
-- Customers (small, filtered) JOIN Orders (large)
 
-- Physical plan (PostgreSQL EXPLAIN output style):
/*
Hash Join  (cost=...)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders o  (cost=...)
        -- 10M rows probe
  ->  Hash  (cost=...)
        ->  Seq Scan on customers c  (cost=...)
              Filter: (region = 'West')
              -- ~10K rows after filter, BUILD side
              
Notice: customers is the BUILD side (smaller after filtering)
This is the result of:
1. Selection pushdown: region='West' applied to customers
2. Commutativity: customers as build (inner conceptually)
*/

Example 2: Index Nested Loop

SELECT e.name, d.dept_name
FROM Departments d    -- 100 rows, no index on id
JOIN Employees e ON e.dept_id = d.id;  -- 100K rows, index on dept_id

Optimal plan:

Departments as outer (100 iterations)
For each department, use index on Employees.dept_id
~100 index lookups vs. 100K full scans

The optimizer commutes to put Departments as the driving (outer) table.

Reading Query Plans

In EXPLAIN output, look for which table appears as the 'outer' or 'build' side. If it's not the smaller table (especially post-filtering), there may be a statistics issue or missing index preventing optimal commutation.

Summary: Exploiting Operand Order

Join commutativity gives the optimizer essential flexibility in choosing how to execute joins physically. Let's consolidate the key insights:

Key Takeaways

•Inner joins are commutative — R ⋈ S ≡ S ⋈ R holds for natural, equi, and theta joins with proper condition transformation.
•Physical execution is asymmetric — Join algorithms treat operands differently (outer vs. inner, build vs. probe). Commutativity lets us choose wisely.
•Smaller table as build/outer — Generally, make the smaller (especially post-filtering) relation the build side for hash joins or outer for nested loops.
•Indexes drive decisions — Put the indexed table as the inner for index nested loop joins to exploit index lookups.
•Outer joins are NOT commutative — LEFT/RIGHT joins preserve specific sides; the optimizer cannot swap them.
•Multi-way joins compound — Commutativity combines with associativity to create large search spaces for multi-table joins.

What's Next

We've covered how to swap operands within a single join. Next, we'll explore join associativity—the ability to regroup joins in multi-table queries. This is the foundation of join ordering optimization, one of the most impactful aspects of query optimization.

Page Complete

You now understand join commutativity and its profound impact on query execution efficiency. The SQL order of tables in FROM/JOIN clauses doesn't constrain execution—the optimizer exploits commutativity to find the best physical arrangement.

4 / 5

Loading learning content...

Database Management SystemsQuery Optimization

Equivalence Rules

LevelIntermediate

Duration60 mins

TopicQuery Optimization

4 / 5

Join Commutativity

The Freedom to Swap

What You Will Learn

The Commutativity Property Formalized

Formal Statement

For natural join:

R ⋈ S ≡ S ⋈ R

For theta join:

R ⋈_θ S ≡ S ⋈_θ R  (where θ is symmetric or appropriately transformed)

For equi-join:

R ⋈_{R.a=S.b} S ≡ S ⋈_{S.b=R.a} R

Why Does This Hold?

What About Attribute Order?

Strictly speaking, R ⋈ S and S ⋈ R might have attributes in different orders. In practice:

SQL's result column order follows SELECT list, not FROM order
Optimizers can add projections to reorder attributes
For semantic equivalence, we care about the set of result tuples, not their physical ordering

This means commutativity holds for all practical purposes in query optimization.

Commutativity of Different Join Types
Join Type	Commutative?	Condition	Notes
Natural Join (⋈)	Yes	Always	Equal common attributes
Inner Equi-Join	Yes	Always	Transform a=b to b=a
Inner Theta Join	Yes	Symmetric condition	θ must be rewritten appropriately
Cross Join (×)	Yes	Always	All pair combinations identical
LEFT OUTER JOIN	No	—	R ⟕ S ≠ S ⟕ R
RIGHT OUTER JOIN	No	—	R ⟖ S ≠ S ⟖ R
FULL OUTER JOIN	Yes	Always	Both sides preserved symmetrically

Outer Joins Break Commutativity

Physical Execution Implications

The value of commutativity becomes clear when we examine how different join algorithms work.

Nested Loop Join

In a nested loop join:

for each tuple r in R (outer):
    for each tuple s in S (inner):
        if r and s satisfy join condition:
            output (r, s)

The asymmetry: The outer relation R is scanned once. The inner relation S is scanned once for each tuple of R.

If R has 100 tuples and S has 10,000 tuples: S is scanned 100 times
If we swap: R is scanned 10,000 times

Optimal choice: Make the smaller relation the outer, or exploit indexes on the inner relation.

Hash Join

In a hash join:

// Build phase
for each tuple r in R (build side):
    insert r into hash table keyed on join attributes

// Probe phase  
for each tuple s in S (probe side):
    look up matching tuples in hash table
    output joined tuples

The asymmetry: The build side constructs an in-memory hash table. If the build side is too large, the table may spill to disk, causing massive slowdown.

Optimal choice: Make the smaller relation the build side. The hash table should fit in memory.

Nested Loop: Outer Table

•Scanned once sequentially
•Should be smaller (if no index on inner)
•Drives the loop iteration count
•Each tuple triggers inner scan
•Memory: buffer pages for outer

Nested Loop: Inner Table

•Scanned multiple times
•Best if indexed on join column
•Each scan finds matching tuples
•Benefits from caching/buffer reuse
•Memory: index structure + buffer

Sort-Merge Join

Sort-merge join is more symmetric:

sort R on join attributes
sort S on join attributes
merge: scan both sorted relations, matching equal keys

Index Nested Loop Join

When an index exists on the join column:

for each tuple r in R (outer):
    use index on S to find matching tuples (inner)

The strong asymmetry: The indexed relation should be the inner. If you swap, you lose the index benefit.

Optimal choice: Non-indexed relation as outer, indexed relation as inner. Commutativity enables this arrangement regardless of how the query was written.

The Optimizer's Freedom

Cost Model Considerations

The optimizer uses cost models to decide which commutation is beneficial. Let's examine the cost formulas.

Nested Loop Join Costs

Basic nested loop (no blocking, no index):

Cost(R ⋈ S) = |R| + |R| × |S|  (tuple-at-a-time)
            = P_R + |R| × P_S  (page-at-a-time, worst case)

Where P_R, P_S are page counts. Clearly, having the smaller relation as outer is better.

Block nested loop (using B memory pages for outer):

Cost = P_R + ⌈P_R / (B-2)⌉ × P_S

Fewer outer pages means fewer scans of the inner.

Index nested loop (index on inner's join column):

Cost = P_R + |R| × (index lookup cost + tuple fetch cost)

Index lookup is ~3–4 I/Os for B-tree. If the outer is after selection pushdown (fewer tuples), cost is very low.

Hash Join Costs

Cost = 3 × (P_R + P_S)  (for partitioned hash join)

Symmetric in pages, but:

Memory pressure from build side hash table
If build side fits in memory: Cost = P_R + P_S
If build side exceeds memory: multiple passes needed

Optimal: Build on the smaller relation to maximize chance of in-memory build.

Cost Comparison Example

Analysis

Given:
  Table R: 10,000 rows, 1,000 pages
  Table S: 1,000,000 rows, 50,000 pages
  Memory buffer: 100 pages
  Index on S.join_column
 
Option 1: R as outer in nested loop (with index on S)
  Cost = P_R + |R| × (index lookup)
       = 1,000 + 10,000 × 4
       = 41,000 I/Os
 
Option 2: S as outer in nested loop (with index on S - can't use!)
  Cost = P_S + |S| × (full scan R for each S tuple, if no index on R)
       = 50,000 + 1,000,000 × 1,000
       = ~1 billion I/Os  (DISASTER!)
 
Commutativity enables Option 1:
  - Use outer-to-inner relationship that exploits the index
  - Reduce cost by factor of ~24,000

Statistics Are Critical

Multi-Way Joins and Commutativity

When a query joins three or more tables, commutativity at each join step compounds with associativity to create a large search space of equivalent plans.

Example: Three-Way Join

For query: R ⋈ S ⋈ T

Consider just binary join orderings:

(R ⋈ S) ⋈ T with R outer for first join
(R ⋈ S) ⋈ T with S outer for first join
(S ⋈ R) ⋈ T with S outer for first join (same as 2 after commuting)
R ⋈ (S ⋈ T) with R outer for second join ... and so on

Each intermediate join can have its operands commuted. The optimizer explores combinations to find the overall best plan.

The Explosion

For n tables:

Number of join orders (tree shapes): Catalan number C_{n-1}
Number of commutations per join: 2
Total plans: O(4^n / n^{1.5}) without algorithm selection

Optimizers use dynamic programming (for small n) or heuristics (for large n) to search this space efficiently.

Join Plan Space Growth
Tables (n)	Join Orders	With Commutation	With Algorithms (3 choices)
2	1	2	6
3	2	4	36
4	5	10	270
5	14	28	2,268
6	42	84	20,412
10	4,862	9,724	~5 million
15	2.7 million	5.4 million	~billions

Why Commutativity Matters Even More in Multi-Way Joins

In multi-way joins, an early poor commutation decision propagates:

If the first join produces a huge intermediate result (wrong operand order), subsequent joins process more data
Conversely, optimal commutation at each stage keeps intermediate results small

The optimizer evaluates costs for promising combinations, applying commutativity where it helps reduce intermediate result sizes or enables index usage.

Outer Joins and Semi-Joins

Commutativity has limitations for certain join variants. Understanding these limitations is crucial for correct optimization.

Outer Join Non-Commutativity

LEFT OUTER JOIN (⟕)

R ⟕ S ≢ S ⟕ R

Why? LEFT JOIN preserves all rows from R, padding with NULLs when no match exists in S. Swapping changes which table is preserved.

Example:

R = {1, 2, 3}
S = {2, 3, 4}

R ⟕ S = {(1, NULL), (2, 2), (3, 3)}
S ⟕ R = {(2, 2), (3, 3), (4, NULL)}  -- DIFFERENT!

Implication: The optimizer cannot commute outer joins freely. The SQL author's intent (which side to preserve) must be honored.

RIGHT OUTER JOIN

Similarly non-commutative:

R ⟖ S ≢ S ⟖ R

However, there's a useful transformation:

R ⟕ S ≡ S ⟖ R  (converting between LEFT and RIGHT)

This isn't commutativity but a different equivalence that can help normalize plans.

FULL OUTER JOIN

R ⟗ S ≡ S ⟗ R  (IS commutative)

Full outer join preserves unmatched rows from both sides, so it doesn't matter which is listed first.

Semi-Joins and Anti-Joins

Semi-join (⋉): Returns tuples from R that have at least one match in S. No S columns appear in output.

R ⋉ S ≢ S ⋉ R (NOT commutative in result schema)

However, for EXISTS subqueries:

SELECT * FROM R WHERE EXISTS (SELECT 1 FROM S WHERE R.a = S.a)

The semi-join implementation can choose which table to scan first, as long as the output is correctly R's tuples.

Anti-join (▷): Returns tuples from R with no match in S (NOT EXISTS pattern).

R ▷ S ≢ S ▷ R (NOT commutative)

Anti-joins have even more restricted transformation options.

Optimizer Workarounds

Join Condition Symmetry

When commuting a theta join, the join condition may need to be transformed to remain semantically correct.

Symmetric Conditions

Equality conditions are naturally symmetric:

R.a = S.b ⟺ S.b = R.a

The optimizer can commute freely.

Asymmetric Conditions

Inequality conditions must be inverted:

R.a < S.b when commuted becomes S.b > R.a

Transformation table:

Original	Commuted Equivalent
`R.a = S.b`	`S.b = R.a`
`R.a < S.b`	`S.b > R.a`
`R.a <= S.b`	`S.b >= R.a`
`R.a > S.b`	`S.b < R.a`
`R.a >= S.b`	`S.b <= R.a`
`R.a != S.b`	`S.b != R.a`

Complex Conditions

Conjunctions and disjunctions transform component-wise:

(R.a = S.b) AND (R.c > S.d) 
⟺ (S.b = R.a) AND (S.d < R.c)

The optimizer applies these transformations automatically when considering commutation.

Expression Complexity

Cross Joins Require Special Care

A cross join (Cartesian product) has no join condition:

R × S ≡ S × R

Completely commutative, but:

Should rarely appear in well-written queries
Often indicates a missing WHERE clause
Optimizer may try to find implied join conditions from WHERE to convert to a proper join

When a cross join is intentional (generating combinations), commutation chooses optimal scan order.

Practical Examples

Let's examine how commutativity affects real query plans.

Join Commutativity in Practice
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Query as written
SELECT c.name, o.order_date
FROM Orders o   -- 10 million rows
JOIN Customers c ON o.customer_id = c.id  -- 100,000 rows
WHERE c.region = 'West';
 
-- The optimizer might commute this logically to:
-- Customers (small, filtered) JOIN Orders (large)
 
-- Physical plan (PostgreSQL EXPLAIN output style):
/*
Hash Join  (cost=...)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders o  (cost=...)
        -- 10M rows probe
  ->  Hash  (cost=...)
        ->  Seq Scan on customers c  (cost=...)
              Filter: (region = 'West')
              -- ~10K rows after filter, BUILD side
              
Notice: customers is the BUILD side (smaller after filtering)
This is the result of:
1. Selection pushdown: region='West' applied to customers
2. Commutativity: customers as build (inner conceptually)
*/

Example 2: Index Nested Loop

SELECT e.name, d.dept_name
FROM Departments d    -- 100 rows, no index on id
JOIN Employees e ON e.dept_id = d.id;  -- 100K rows, index on dept_id

Optimal plan:

Departments as outer (100 iterations)
For each department, use index on Employees.dept_id
~100 index lookups vs. 100K full scans

The optimizer commutes to put Departments as the driving (outer) table.

Reading Query Plans

Summary: Exploiting Operand Order

Join commutativity gives the optimizer essential flexibility in choosing how to execute joins physically. Let's consolidate the key insights:

Key Takeaways

•Inner joins are commutative — R ⋈ S ≡ S ⋈ R holds for natural, equi, and theta joins with proper condition transformation.
•Physical execution is asymmetric — Join algorithms treat operands differently (outer vs. inner, build vs. probe). Commutativity lets us choose wisely.
•Smaller table as build/outer — Generally, make the smaller (especially post-filtering) relation the build side for hash joins or outer for nested loops.
•Indexes drive decisions — Put the indexed table as the inner for index nested loop joins to exploit index lookups.
•Outer joins are NOT commutative — LEFT/RIGHT joins preserve specific sides; the optimizer cannot swap them.
•Multi-way joins compound — Commutativity combines with associativity to create large search spaces for multi-table joins.

What's Next

Page Complete

4 / 5