Loading learning content...
At first glance, join commutativity seems trivially obvious: A ⋈ B = B ⋈ A. Of course, the result is the same regardless of which table we list first. But this mathematical equivalence has profound implications for query execution.
The order in which relations appear in a join doesn't just affect notation—it affects how the join is physically executed. In a nested-loop join, the left operand becomes the outer loop and the right operand becomes the inner loop. In a hash join, typically one side builds the hash table and the other probes it. These implementation choices have dramatically different performance characteristics depending on relation sizes, available memory, and index structures.
Join commutativity gives the optimizer freedom to choose the optimal physical arrangement for each join, transforming a potentially slow execution into a fast one without changing the query's meaning.
By the end of this page, you will understand: (1) why join commutativity matters for physical execution, (2) how different join algorithms exploit operand ordering, (3) the cost implications of inner vs. outer relation choice, and (4) when commutativity is limited or doesn't apply.
Formal Statement
For natural join:
R ⋈ S ≡ S ⋈ R
For theta join:
R ⋈_θ S ≡ S ⋈_θ R (where θ is symmetric or appropriately transformed)
For equi-join:
R ⋈_{R.a=S.b} S ≡ S ⋈_{S.b=R.a} R
Why Does This Hold?
The join operation is defined as selecting tuples from the Cartesian product that satisfy the join condition. Since the Cartesian product itself is commutative (R × S contains the same pairs as S × R, just in different order), and the join condition is applied symmetrically, the result set must be identical.
What About Attribute Order?
Strictly speaking, R ⋈ S and S ⋈ R might have attributes in different orders. In practice:
This means commutativity holds for all practical purposes in query optimization.
| Join Type | Commutative? | Condition | Notes |
|---|---|---|---|
| Natural Join (⋈) | Yes | Always | Equal common attributes |
| Inner Equi-Join | Yes | Always | Transform a=b to b=a |
| Inner Theta Join | Yes | Symmetric condition | θ must be rewritten appropriately |
| Cross Join (×) | Yes | Always | All pair combinations identical |
| LEFT OUTER JOIN | No | — | R ⟕ S ≠ S ⟕ R |
| RIGHT OUTER JOIN | No | — | R ⟖ S ≠ S ⟖ R |
| FULL OUTER JOIN | Yes | Always | Both sides preserved symmetrically |
LEFT and RIGHT outer joins are not commutative! LEFT JOIN preserves all rows from the left table; swapping operands changes which rows are preserved. FULL OUTER JOIN, however, is commutative because it preserves unmatched rows from both sides.
The value of commutativity becomes clear when we examine how different join algorithms work.
In a nested loop join:
for each tuple r in R (outer):
for each tuple s in S (inner):
if r and s satisfy join condition:
output (r, s)
The asymmetry: The outer relation R is scanned once. The inner relation S is scanned once for each tuple of R.
Optimal choice: Make the smaller relation the outer, or exploit indexes on the inner relation.
In a hash join:
// Build phase
for each tuple r in R (build side):
insert r into hash table keyed on join attributes
// Probe phase
for each tuple s in S (probe side):
look up matching tuples in hash table
output joined tuples
The asymmetry: The build side constructs an in-memory hash table. If the build side is too large, the table may spill to disk, causing massive slowdown.
Optimal choice: Make the smaller relation the build side. The hash table should fit in memory.
Sort-merge join is more symmetric:
sort R on join attributes
sort S on join attributes
merge: scan both sorted relations, matching equal keys
Both relations are scanned once during the merge phase. However, sorting costs depend on relation sizes, and if one relation is already sorted (e.g., from an index), commuting might avoid a redundant sort.
When an index exists on the join column:
for each tuple r in R (outer):
use index on S to find matching tuples (inner)
The strong asymmetry: The indexed relation should be the inner. If you swap, you lose the index benefit.
Optimal choice: Non-indexed relation as outer, indexed relation as inner. Commutativity enables this arrangement regardless of how the query was written.
Commutativity lets the optimizer choose: for nested loops, make the smaller or filtered table outer; for hash joins, build on the smaller table; for index nested loops, put the indexed table as inner. Without commutativity, the query's textual order would dictate a potentially terrible execution.
The optimizer uses cost models to decide which commutation is beneficial. Let's examine the cost formulas.
Basic nested loop (no blocking, no index):
Cost(R ⋈ S) = |R| + |R| × |S| (tuple-at-a-time)
= P_R + |R| × P_S (page-at-a-time, worst case)
Where P_R, P_S are page counts. Clearly, having the smaller relation as outer is better.
Block nested loop (using B memory pages for outer):
Cost = P_R + ⌈P_R / (B-2)⌉ × P_S
Fewer outer pages means fewer scans of the inner.
Index nested loop (index on inner's join column):
Cost = P_R + |R| × (index lookup cost + tuple fetch cost)
Index lookup is ~3–4 I/Os for B-tree. If the outer is after selection pushdown (fewer tuples), cost is very low.
Cost = 3 × (P_R + P_S) (for partitioned hash join)
Symmetric in pages, but:
Optimal: Build on the smaller relation to maximize chance of in-memory build.
12345678910111213141516171819
Given: Table R: 10,000 rows, 1,000 pages Table S: 1,000,000 rows, 50,000 pages Memory buffer: 100 pages Index on S.join_column Option 1: R as outer in nested loop (with index on S) Cost = P_R + |R| × (index lookup) = 1,000 + 10,000 × 4 = 41,000 I/Os Option 2: S as outer in nested loop (with index on S - can't use!) Cost = P_S + |S| × (full scan R for each S tuple, if no index on R) = 50,000 + 1,000,000 × 1,000 = ~1 billion I/Os (DISASTER!) Commutativity enables Option 1: - Use outer-to-inner relationship that exploits the index - Reduce cost by factor of ~24,000The optimizer needs accurate table sizes, row counts, and index availability to make good commutation decisions. Stale statistics can lead to terrible choices. Always keep statistics updated (ANALYZE in PostgreSQL, UPDATE STATISTICS in SQL Server).
When a query joins three or more tables, commutativity at each join step compounds with associativity to create a large search space of equivalent plans.
Example: Three-Way Join
For query: R ⋈ S ⋈ T
Consider just binary join orderings:
(R ⋈ S) ⋈ T with R outer for first join(R ⋈ S) ⋈ T with S outer for first join(S ⋈ R) ⋈ T with S outer for first join (same as 2 after commuting)R ⋈ (S ⋈ T) with R outer for second join
... and so onEach intermediate join can have its operands commuted. The optimizer explores combinations to find the overall best plan.
The Explosion
For n tables:
Optimizers use dynamic programming (for small n) or heuristics (for large n) to search this space efficiently.
| Tables (n) | Join Orders | With Commutation | With Algorithms (3 choices) |
|---|---|---|---|
| 2 | 1 | 2 | 6 |
| 3 | 2 | 4 | 36 |
| 4 | 5 | 10 | 270 |
| 5 | 14 | 28 | 2,268 |
| 6 | 42 | 84 | 20,412 |
| 10 | 4,862 | 9,724 | ~5 million |
| 15 | 2.7 million | 5.4 million | ~billions |
Why Commutativity Matters Even More in Multi-Way Joins
In multi-way joins, an early poor commutation decision propagates:
The optimizer evaluates costs for promising combinations, applying commutativity where it helps reduce intermediate result sizes or enables index usage.
Commutativity has limitations for certain join variants. Understanding these limitations is crucial for correct optimization.
LEFT OUTER JOIN (⟕)
R ⟕ S ≢ S ⟕ R
Why? LEFT JOIN preserves all rows from R, padding with NULLs when no match exists in S. Swapping changes which table is preserved.
Example:
R = {1, 2, 3}
S = {2, 3, 4}
R ⟕ S = {(1, NULL), (2, 2), (3, 3)}
S ⟕ R = {(2, 2), (3, 3), (4, NULL)} -- DIFFERENT!
Implication: The optimizer cannot commute outer joins freely. The SQL author's intent (which side to preserve) must be honored.
Similarly non-commutative:
R ⟖ S ≢ S ⟖ R
However, there's a useful transformation:
R ⟕ S ≡ S ⟖ R (converting between LEFT and RIGHT)
This isn't commutativity but a different equivalence that can help normalize plans.
R ⟗ S ≡ S ⟗ R (IS commutative)
Full outer join preserves unmatched rows from both sides, so it doesn't matter which is listed first.
Semi-join (⋉): Returns tuples from R that have at least one match in S. No S columns appear in output.
R ⋉ S ≢ S ⋉ R (NOT commutative in result schema)
However, for EXISTS subqueries:
SELECT * FROM R WHERE EXISTS (SELECT 1 FROM S WHERE R.a = S.a)
The semi-join implementation can choose which table to scan first, as long as the output is correctly R's tuples.
Anti-join (▷): Returns tuples from R with no match in S (NOT EXISTS pattern).
R ▷ S ≢ S ▷ R (NOT commutative)
Anti-joins have even more restricted transformation options.
Even when commutativity doesn't apply to the logical operator, physical algorithms may have flexibility. For instance, a semi-join can use either table as the hash table build side internally, as long as it correctly returns R's tuples. The optimizer exploits such physical-level freedoms.
When commuting a theta join, the join condition may need to be transformed to remain semantically correct.
Equality conditions are naturally symmetric:
R.a = S.b ⟺ S.b = R.a
The optimizer can commute freely.
Inequality conditions must be inverted:
R.a < S.b when commuted becomes S.b > R.a
Transformation table:
| Original | Commuted Equivalent |
|---|---|
R.a = S.b | S.b = R.a |
R.a < S.b | S.b > R.a |
R.a <= S.b | S.b >= R.a |
R.a > S.b | S.b < R.a |
R.a >= S.b | S.b <= R.a |
R.a != S.b | S.b != R.a |
Conjunctions and disjunctions transform component-wise:
(R.a = S.b) AND (R.c > S.d)
⟺ (S.b = R.a) AND (S.d < R.c)
The optimizer applies these transformations automatically when considering commutation.
For complex join conditions involving functions, CASE expressions, or subqueries, the optimizer must verify that the condition remains evaluable after commutation. Conditions that are inherently asymmetric (e.g., involving row numbers or lateral references) may prevent commutation.
A cross join (Cartesian product) has no join condition:
R × S ≡ S × R
Completely commutative, but:
When a cross join is intentional (generating combinations), commutation chooses optimal scan order.
Let's examine how commutativity affects real query plans.
12345678910111213141516171819202122232425
-- Query as writtenSELECT c.name, o.order_dateFROM Orders o -- 10 million rowsJOIN Customers c ON o.customer_id = c.id -- 100,000 rowsWHERE c.region = 'West'; -- The optimizer might commute this logically to:-- Customers (small, filtered) JOIN Orders (large) -- Physical plan (PostgreSQL EXPLAIN output style):/*Hash Join (cost=...) Hash Cond: (o.customer_id = c.id) -> Seq Scan on orders o (cost=...) -- 10M rows probe -> Hash (cost=...) -> Seq Scan on customers c (cost=...) Filter: (region = 'West') -- ~10K rows after filter, BUILD side Notice: customers is the BUILD side (smaller after filtering)This is the result of:1. Selection pushdown: region='West' applied to customers2. Commutativity: customers as build (inner conceptually)*/Example 2: Index Nested Loop
SELECT e.name, d.dept_name
FROM Departments d -- 100 rows, no index on id
JOIN Employees e ON e.dept_id = d.id; -- 100K rows, index on dept_id
Optimal plan:
The optimizer commutes to put Departments as the driving (outer) table.
In EXPLAIN output, look for which table appears as the 'outer' or 'build' side. If it's not the smaller table (especially post-filtering), there may be a statistics issue or missing index preventing optimal commutation.
Join commutativity gives the optimizer essential flexibility in choosing how to execute joins physically. Let's consolidate the key insights:
What's Next
We've covered how to swap operands within a single join. Next, we'll explore join associativity—the ability to regroup joins in multi-table queries. This is the foundation of join ordering optimization, one of the most impactful aspects of query optimization.
You now understand join commutativity and its profound impact on query execution efficiency. The SQL order of tables in FROM/JOIN clauses doesn't constrain execution—the optimizer exploits commutativity to find the best physical arrangement.