Database Management SystemsSQL Joins & Subqueries

Outer Joins: Preserving Unmatched Data

LevelIntermediate

Duration60 mins

TopicSQL Joins & Subqueries

3 / 5

FULL OUTER JOIN: Complete Data Preservation

When You Need Everything

We've seen how LEFT JOIN preserves all rows from the left table, and RIGHT JOIN preserves all rows from the right table. But what if you need to preserve rows from both tables—showing all left rows, all right rows, and matching where possible?

This is the domain of FULL OUTER JOIN (commonly written as FULL JOIN or FULL OUTER JOIN).

FULL OUTER JOIN guarantees that every row from both tables appears at least once in the result set. Matching rows are combined; unmatched rows from either side appear with NULLs for the other table's columns.

This is the most 'complete' of the outer joins—no data is lost from either table. It's less commonly used than LEFT JOIN, but it's the essential tool for scenarios like data reconciliation, comparative analysis, and finding discrepancies between datasets.

Learning Objectives

By completing this page, you will:\n• Understand FULL OUTER JOIN mechanics and semantics\n• Master the syntax for FULL OUTER JOIN\n• Recognize classic use cases: data reconciliation, gap analysis, comparative reporting\n• Handle the complex NULL patterns that FULL JOIN produces\n• Understand performance implications and optimization\n• Emulate FULL OUTER JOIN in databases that don't support it

Conceptual Foundation: The Complete Outer Join

FULL OUTER JOIN is the union of LEFT and RIGHT outer joins. To understand it completely, let's build from what we know:

The Fundamental Guarantee:

FULL OUTER JOIN guarantees:

Every row from the left table appears at least once
Every row from the right table appears at least once
Matched rows appear combined with data from both tables

This creates three categories of output rows:

Category	Left Table Columns	Right Table Columns
Matched rows	Actual values	Actual values
Left-only rows	Actual values	NULL
Right-only rows	NULL	Actual values

Set Theory Perspective:

FULL OUTER JOIN represents the union of both tables, with matching where possible:

FULL OUTER JOIN: A ∪ B (all rows from both, matched where possible)

Comparing the outer join types:

Join Type	Set Representation	Preserves
INNER JOIN	A ∩ B	Only matches
LEFT JOIN	A (with B where matches)	All of A
RIGHT JOIN	B (with A where matches)	All of B
FULL OUTER	A ∪ B (matched where possible)	All of A and B

FULL OUTER JOIN is truly the 'complete' join—nothing is filtered out.

Mental Model

Think of FULL OUTER JOIN as asking: "Show me everything from both tables. Where records match, combine them. Where they don't match, show them anyway with NULLs for the missing side."

Syntax Deep Dive: Writing FULL OUTER JOIN

FULL OUTER JOIN follows the same ANSI SQL-92 pattern as the other outer joins.

The ANSI SQL-92 Standard Syntax:

full_join_syntax.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Standard ANSI SQL-92 FULL OUTER JOIN syntax
-- The OUTER keyword is optional
 
SELECT 
    s1.employee_id as q1_employee,
    s1.total_sales as q1_sales,
    s2.employee_id as q2_employee,
    s2.total_sales as q2_sales
FROM Q1_Sales s1
FULL OUTER JOIN Q2_Sales s2
    ON s1.employee_id = s2.employee_id;
 
-- Shortened form (OUTER is implicit)
SELECT 
    s1.employee_id as q1_employee,
    s1.total_sales as q1_sales,
    s2.employee_id as q2_employee,
    s2.total_sales as q2_sales
FROM Q1_Sales s1
FULL JOIN Q2_Sales s2
    ON s1.employee_id = s2.employee_id;
 
-- Result shows ALL employees from BOTH quarters:
-- - Employees who sold in both: shows both values
-- - Employees who sold only in Q1: Q2 columns are NULL
-- - Employees who sold only in Q2: Q1 columns are NULL

Handling the Dual NULL Pattern:

With FULL JOIN, you'll frequently want a unified identifier column that's never NULL. The COALESCE function is essential here:

full_join_coalesce.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Using COALESCE to get a unified key column
SELECT 
    COALESCE(s1.employee_id, s2.employee_id) as employee_id,  -- Never NULL
    s1.total_sales as q1_sales,
    s2.total_sales as q2_sales,
    COALESCE(s1.total_sales, 0) + COALESCE(s2.total_sales, 0) as total_sales
FROM Q1_Sales s1
FULL OUTER JOIN Q2_Sales s2
    ON s1.employee_id = s2.employee_id
ORDER BY employee_id;
 
-- COALESCE(s1.employee_id, s2.employee_id) works because:
-- - If matched: s1.employee_id = s2.employee_id, either works
-- - If left-only: s1.employee_id has value, s2.employee_id is NULL
-- - If right-only: s1.employee_id is NULL, s2.employee_id has value
 
-- The result has a clean, unified employee_id column

Join Key Duality

In FULL OUTER JOIN, the join key columns can be NULL on either side. When you need a reliable identifier for each output row, use COALESCE(left.key, right.key) to always get a value.

Execution Mechanics: How FULL OUTER JOIN Works

Understanding FULL OUTER JOIN execution helps diagnose unexpected results and performance issues.

Conceptual Execution Algorithm:

One way to conceptualize FULL OUTER JOIN is as a combination of LEFT and RIGHT joins:

Perform LEFT OUTER JOIN (preserves all left rows)
Perform RIGHT OUTER JOIN, but only for right rows not already matched
Union the results

Alternatively, using a hash join mindset:

Build a hash table from the right table
For each left row, probe the hash table:
- If match found: output combined row, mark right row as 'used'
- If no match: output left row with NULLs
After all left rows processed, scan right table's hash entries:
- For each 'unused' right row: output with NULL left columns

full_join_execution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Sample data to trace execution
-- Q1_Sales:
-- | employee_id | total_sales |
-- |-------------|-------------|
-- | 1           | 10000       |
-- | 2           | 15000       |
-- | 3           | 8000        |
 
-- Q2_Sales:
-- | employee_id | total_sales |
-- |-------------|-------------|
-- | 2           | 12000       |
-- | 3           | 20000       |
-- | 4           | 5000        |
 
SELECT 
    COALESCE(s1.employee_id, s2.employee_id) as employee_id,
    s1.total_sales as q1_sales,
    s2.total_sales as q2_sales
FROM Q1_Sales s1
FULL OUTER JOIN Q2_Sales s2
    ON s1.employee_id = s2.employee_id;
 
-- Execution trace:
--
-- LEFT JOIN phase (all Q1 rows):
-- Employee 1: No match in Q2 → (1, 10000, NULL)
-- Employee 2: Match in Q2    → (2, 15000, 12000)
-- Employee 3: Match in Q2    → (3, 8000, 20000)
--
-- RIGHT-ONLY phase (Q2 rows not yet matched):
-- Employee 4: Not matched    → (4, NULL, 5000)
--
-- Final result set:
-- | employee_id | q1_sales | q2_sales |
-- |-------------|----------|----------|
-- | 1           | 10000    | NULL     |  -- Q1 only
-- | 2           | 15000    | 12000    |  -- Both quarters
-- | 3           | 8000     | 20000    |  -- Both quarters
-- | 4           | NULL     | 5000     |  -- Q2 only

Physical Execution Strategies:

Query optimizers have several approaches for FULL OUTER JOIN:

Strategy	How It Works	Characteristics
Hash Full Join	Hash one table, probe with other, output unmatched from both	Common for medium tables
Merge Full Join	Sort both tables, merge-scan outputting all rows	Efficient when pre-sorted
Nested Loop + Scan	LEFT JOIN via nested loop, then scan for unmatched right	Less common

FULL OUTER JOIN is generally more expensive than LEFT or INNER JOIN because it must track unmatched rows from both sides.

Performance Note

FULL OUTER JOIN typically requires more memory than LEFT JOIN because the database must track which right table rows have been matched. For very large tables, this tracking overhead can be significant.

Real-World Use Cases: When FULL OUTER JOIN Shines

FULL OUTER JOIN is less common than LEFT JOIN, but it's the perfect tool for specific analytical scenarios. Recognizing these patterns helps you choose the right join type.

Pattern 1: Data Reconciliation

Comparing two datasets to find matches and discrepancies—perhaps comparing expected vs. actual data, or comparing data between systems.

pattern_reconciliation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Data reconciliation between two systems
-- Goal: Compare expected inventory (ERP) vs actual count (physical inventory)
 
SELECT 
    COALESCE(erp.product_id, inv.product_id) as product_id,
    erp.product_name,
    erp.expected_qty,
    inv.actual_qty,
    COALESCE(inv.actual_qty, 0) - COALESCE(erp.expected_qty, 0) as variance,
    CASE 
        WHEN erp.product_id IS NULL THEN 'Found but not in ERP'
        WHEN inv.product_id IS NULL THEN 'In ERP but not found'
        WHEN inv.actual_qty = erp.expected_qty THEN 'Match'
        WHEN inv.actual_qty < erp.expected_qty THEN 'Shortage'
        ELSE 'Overage'
    END as reconciliation_status
FROM ERP_Inventory erp
FULL OUTER JOIN Physical_Inventory inv
    ON erp.product_id = inv.product_id
ORDER BY 
    CASE 
        WHEN erp.product_id IS NULL OR inv.product_id IS NULL THEN 0
        WHEN inv.actual_qty != erp.expected_qty THEN 1
        ELSE 2
    END,
    ABS(COALESCE(inv.actual_qty, 0) - COALESCE(erp.expected_qty, 0)) DESC;
 
-- This shows:
-- - Products in both systems (matched or with variance)
-- - Products only in ERP (missing from physical count)
-- - Products found physically but not in ERP (unexpected items)

Pattern 2: Comparative Period Analysis

Comparing data across time periods where entities may appear in one period but not another.

pattern_period_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Comparing customer activity across two periods
-- Some customers are new, some churned, some retained
 
WITH CurrentMonth AS (
    SELECT customer_id, SUM(amount) as current_spend
    FROM Orders
    WHERE order_date >= '2024-03-01' AND order_date < '2024-04-01'
    GROUP BY customer_id
),
PreviousMonth AS (
    SELECT customer_id, SUM(amount) as previous_spend
    FROM Orders
    WHERE order_date >= '2024-02-01' AND order_date < '2024-03-01'
    GROUP BY customer_id
)
SELECT 
    COALESCE(c.customer_id, p.customer_id) as customer_id,
    COALESCE(p.previous_spend, 0) as feb_sales,
    COALESCE(c.current_spend, 0) as mar_sales,
    CASE 
        WHEN p.customer_id IS NULL THEN 'New Customer'
        WHEN c.customer_id IS NULL THEN 'Churned'
        WHEN c.current_spend > p.previous_spend THEN 'Growth'
        WHEN c.current_spend < p.previous_spend THEN 'Decline'
        ELSE 'Stable'
    END as customer_status,
    CASE WHEN p.previous_spend > 0 
        THEN ROUND((c.current_spend - p.previous_spend) / p.previous_spend * 100, 1)
        ELSE NULL 
    END as growth_percentage
FROM CurrentMonth c
FULL OUTER JOIN PreviousMonth p
    ON c.customer_id = p.customer_id
ORDER BY customer_status, customer_id;

Pattern 3: Master Data Merge

Combining data from multiple sources where each source may have unique records.

pattern_data_merge.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Merging product data from two different vendors
-- Each vendor may have products the other doesn't
 
SELECT 
    COALESCE(v1.sku, v2.sku) as unified_sku,
    COALESCE(v1.product_name, v2.product_name) as product_name,
    v1.vendor1_price,
    v2.vendor2_price,
    CASE 
        WHEN v1.sku IS NOT NULL AND v2.sku IS NOT NULL THEN
            CASE WHEN v1.vendor1_price <= v2.vendor2_price THEN 'Vendor 1' ELSE 'Vendor 2' END
        WHEN v1.sku IS NOT NULL THEN 'Vendor 1 Only'
        ELSE 'Vendor 2 Only'
    END as best_source,
    LEAST(COALESCE(v1.vendor1_price, v2.vendor2_price), 
          COALESCE(v2.vendor2_price, v1.vendor1_price)) as best_price
FROM Vendor1_Products v1
FULL OUTER JOIN Vendor2_Products v2
    ON v1.sku = v2.sku
ORDER BY unified_sku;
 
-- This unified view shows:
-- - Products available from both vendors (with price comparison)
-- - Products only from Vendor 1
-- - Products only from Vendor 2

The FULL JOIN Signal

Whenever you hear requirements like "compare," "reconcile," "find discrepancies," "merge from multiple sources," or "show what's missing from either side"—think FULL OUTER JOIN.

Handling NULL Complexity in FULL OUTER JOIN

FULL OUTER JOIN produces the most complex NULL patterns of any join type. Mastering NULL handling is essential for writing correct FULL JOIN queries.

Three Sources of NULLs:

Left-only rows: All right table columns are NULL
Right-only rows: All left table columns are NULL
Inherent NULLs: Either table may contain NULL values in its data

Distinguishing between "unmatched row NULLs" and "actual data NULLs" requires careful query design.

null_handling_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Distinguishing NULL sources
-- Use the PRIMARY KEY (never NULL) to determine match status
 
SELECT 
    COALESCE(l.id, r.id) as unified_id,
    l.id as left_id,
    r.id as right_id,
    l.value as left_value,
    r.value as right_value,
    -- Categorize the row based on which PKs are present
    CASE 
        WHEN l.id IS NOT NULL AND r.id IS NOT NULL THEN 'Matched'
        WHEN l.id IS NOT NULL THEN 'Left Only'
        ELSE 'Right Only'
    END as match_type
FROM LeftTable l
FULL OUTER JOIN RightTable r
    ON l.id = r.id;
 
-- A NULL in left_value could mean:
-- 1. Row is right-only (check left_id IS NULL)
-- 2. Row is matched but left_value is actually NULL in the data
 
-- Safe check: Always use the PK for match determination
SELECT 
    COALESCE(l.id, r.id) as unified_id,
    CASE 
        WHEN l.id IS NULL THEN 'N/A (Right Only)'
        WHEN l.value IS NULL THEN 'NULL (in data)'
        ELSE l.value::text
    END as left_value_display
FROM LeftTable l
FULL OUTER JOIN RightTable r
    ON l.id = r.id;

The NULL Trap

Never assume a NULL value means "unmatched row." Always check the primary key of the respective table. If left_table.pk IS NULL, the row came from the right table only. If right_table.pk IS NULL, the row came from the left table only.

Filtering NULL Categories:

You can filter the FULL JOIN result to show only specific categories of rows:

filter_null_categories.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Show only matched rows (equivalent to INNER JOIN)
SELECT COALESCE(l.id, r.id) as id, l.value, r.value
FROM LeftTable l
FULL OUTER JOIN RightTable r ON l.id = r.id
WHERE l.id IS NOT NULL AND r.id IS NOT NULL;
 
-- Show only left-only rows (things in left but not right)
SELECT l.id, l.value
FROM LeftTable l
FULL OUTER JOIN RightTable r ON l.id = r.id
WHERE r.id IS NULL;
 
-- Show only right-only rows (things in right but not left)
SELECT r.id, r.value
FROM LeftTable l
FULL OUTER JOIN RightTable r ON l.id = r.id
WHERE l.id IS NULL;
 
-- Show unmatched rows from either side
SELECT 
    COALESCE(l.id, r.id) as id,
    CASE WHEN l.id IS NULL THEN 'Right Only' ELSE 'Left Only' END as source
FROM LeftTable l
FULL OUTER JOIN RightTable r ON l.id = r.id
WHERE l.id IS NULL OR r.id IS NULL;
 
-- These patterns are essential for data reconciliation

FULL JOIN NULL Handling Best Practices

•Use COALESCE for unified keys: COALESCE(left.key, right.key) ensures you always have an identifier
•Check PKs, not nullable columns: Determine match status using primary keys that are never NULL in the source data
•Be explicit about NULL meaning: Use CASE expressions to clearly label whether NULL means 'unmatched' or 'actual NULL value'
•Default NULL aggregates: Use COALESCE(SUM(...), 0) when aggregating columns that may be NULL due to unmatched rows

Emulating FULL OUTER JOIN

While most modern databases support FULL OUTER JOIN natively, there are scenarios where you may need to emulate it:

Legacy database versions (e.g., SQLite before 3.39.0, old MySQL versions)
Certain data warehouse systems
Query builder limitations

Emulation Strategy: UNION of LEFT and RIGHT JOINs

FULL OUTER JOIN can be emulated by combining a LEFT JOIN and a RIGHT JOIN, being careful to avoid duplicating matched rows.

emulate_full_join.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- Native FULL OUTER JOIN
SELECT 
    COALESCE(l.id, r.id) as id,
    l.value as left_value,
    r.value as right_value
FROM LeftTable l
FULL OUTER JOIN RightTable r
    ON l.id = r.id;
 
-- Emulation using UNION
-- LEFT JOIN (all left rows + matched right rows)
SELECT 
    COALESCE(l.id, r.id) as id,
    l.value as left_value,
    r.value as right_value
FROM LeftTable l
LEFT JOIN RightTable r
    ON l.id = r.id
 
UNION ALL
 
-- RIGHT-ONLY rows (right rows NOT matched by left join)
SELECT 
    r.id as id,
    NULL as left_value,
    r.value as right_value
FROM RightTable r
LEFT JOIN LeftTable l
    ON r.id = l.id
WHERE l.id IS NULL;
 
-- Alternative using UNION (removes duplicates, slower)
SELECT l.id, l.value as left_value, r.value as right_value
FROM LeftTable l
LEFT JOIN RightTable r ON l.id = r.id
 
UNION
 
SELECT r.id, l.value as left_value, r.value as right_value
FROM RightTable r
LEFT JOIN LeftTable l ON r.id = l.id;

UNION vs UNION ALL

We use UNION ALL with explicit filtering (WHERE l.id IS NULL) for the second query to avoid duplicating matched rows while maintaining performance. Using UNION (which removes duplicates) is simpler but slower for large datasets.

Why This Works:

The emulation works because:

First query (LEFT JOIN): Returns all left rows, including matched ones
Second query (RIGHT-only): Returns only right rows that had no match
UNION ALL: Combines without removing duplicates (we've already ensured no duplicates)

The result is identical to FULL OUTER JOIN: all left rows, all right rows, matching where possible.

Performance Considerations

FULL OUTER JOIN has unique performance characteristics due to its requirement to preserve unmatched rows from both sides.

Cost Factors:

FULL OUTER JOIN Performance Factors

•Double-sided tracking: The database must track which rows from BOTH tables have been matched, requiring more memory than LEFT JOIN
•Index utilization limited: Standard B-tree indexes help LEFT JOIN probe the right table. For FULL JOIN, both tables may need scanning for unmatched rows
•Hash join preferred: Hash-based FULL JOIN algorithms are most common; ensure adequate memory for the hash table
•Merge join efficient when sorted: If both tables are sorted on the join key, merge join can process FULL JOIN efficiently

full_join_optimization.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Optimization: Filter tables BEFORE joining
-- Reduces the size of the FULL JOIN operation
 
-- BEFORE: FULL JOIN on large tables, then filter
SELECT COALESCE(a.id, b.id) as id, a.value, b.value
FROM LargeTableA a
FULL OUTER JOIN LargeTableB b ON a.id = b.id
WHERE a.category = 'Electronics' OR b.category = 'Electronics';
 
-- AFTER: Filter first via CTEs, then FULL JOIN on smaller sets
WITH FilteredA AS (
    SELECT id, value, category
    FROM LargeTableA
    WHERE category = 'Electronics'
),
FilteredB AS (
    SELECT id, value, category
    FROM LargeTableB
    WHERE category = 'Electronics'
)
SELECT COALESCE(a.id, b.id) as id, a.value, b.value
FROM FilteredA a
FULL OUTER JOIN FilteredB b ON a.id = b.id;
 
-- Index recommendations for FULL OUTER JOIN
CREATE INDEX idx_tablea_id ON LargeTableA(id);
CREATE INDEX idx_tableb_id ON LargeTableB(id);
 
-- If using hash join, these indexes help less than sorting
-- For merge join strategy, pre-sorted data is most efficient

FULL JOIN Performance Comparison
Join Type	Relative Cost	Memory Usage	Index Benefit
INNER JOIN	Baseline	Low	High (right table index)
LEFT JOIN	Slightly higher	Low-Medium	High (right table index)
FULL OUTER JOIN	Higher	Medium-High	Moderate (both tables)
FULL JOIN emulation (UNION)	Highest	High	Same as LEFT JOINs used

When Native Beats Emulation

Native FULL OUTER JOIN is almost always faster than the UNION-based emulation because the optimizer can use specialized algorithms. Only use emulation when native support is unavailable.

Summary: FULL OUTER JOIN Mastery

FULL OUTER JOIN is the most complete of the outer joins, ensuring no data is lost from either participating table. While less common than LEFT JOIN, it's essential for comparative analysis and data reconciliation.

Key Takeaways

•FULL JOIN preserves all rows from both tables, filling NULLs where no match exists on either side.
•Three categories of output rows: matched (both sides have data), left-only (right side NULL), right-only (left side NULL).
•Use COALESCE for unified identifiers: COALESCE(left.key, right.key) gives you a non-NULL key for each row.
•Check PKs to determine match status, not nullable data columns that could have legitimate NULL values.
•Primary use cases: data reconciliation, period-over-period comparison, merging data from multiple sources.
•Can be emulated with UNION of LEFT JOIN and filtered RIGHT JOIN when native support is unavailable.
•Generally more expensive than LEFT JOIN due to double-sided unmatched row tracking.

What's Next: Preserving Unmatched Rows

Having covered the three outer join types, the next page dives deeper into the concept of preserving unmatched rows—the philosophical and practical underpinning of all outer joins. We'll explore why this capability matters, advanced filtering patterns, and how to think about outer joins in query design.

Page Complete

You now have a comprehensive understanding of FULL OUTER JOIN—the complete outer join that preserves data from both tables. Combined with LEFT and RIGHT JOIN knowledge, you can now handle any outer join scenario in production SQL.

3 / 5

Loading learning content...

Database Management SystemsSQL Joins & Subqueries

Outer Joins: Preserving Unmatched Data

LevelIntermediate

Duration60 mins

TopicSQL Joins & Subqueries

3 / 5

FULL OUTER JOIN: Complete Data Preservation

When You Need Everything

This is the domain of FULL OUTER JOIN (commonly written as FULL JOIN or FULL OUTER JOIN).

Learning Objectives

Conceptual Foundation: The Complete Outer Join

FULL OUTER JOIN is the union of LEFT and RIGHT outer joins. To understand it completely, let's build from what we know:

The Fundamental Guarantee:

FULL OUTER JOIN guarantees:

Every row from the left table appears at least once
Every row from the right table appears at least once
Matched rows appear combined with data from both tables

This creates three categories of output rows:

Category	Left Table Columns	Right Table Columns
Matched rows	Actual values	Actual values
Left-only rows	Actual values	NULL
Right-only rows	NULL	Actual values

Set Theory Perspective:

FULL OUTER JOIN represents the union of both tables, with matching where possible:

FULL OUTER JOIN: A ∪ B (all rows from both, matched where possible)

Comparing the outer join types:

Join Type	Set Representation	Preserves
INNER JOIN	A ∩ B	Only matches
LEFT JOIN	A (with B where matches)	All of A
RIGHT JOIN	B (with A where matches)	All of B
FULL OUTER	A ∪ B (matched where possible)	All of A and B

FULL OUTER JOIN is truly the 'complete' join—nothing is filtered out.

Mental Model

Think of FULL OUTER JOIN as asking: "Show me everything from both tables. Where records match, combine them. Where they don't match, show them anyway with NULLs for the missing side."

Syntax Deep Dive: Writing FULL OUTER JOIN

FULL OUTER JOIN follows the same ANSI SQL-92 pattern as the other outer joins.

The ANSI SQL-92 Standard Syntax:

full_join_syntax.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Standard ANSI SQL-92 FULL OUTER JOIN syntax
-- The OUTER keyword is optional
 
SELECT 
    s1.employee_id as q1_employee,
    s1.total_sales as q1_sales,
    s2.employee_id as q2_employee,
    s2.total_sales as q2_sales
FROM Q1_Sales s1
FULL OUTER JOIN Q2_Sales s2
    ON s1.employee_id = s2.employee_id;
 
-- Shortened form (OUTER is implicit)
SELECT 
    s1.employee_id as q1_employee,
    s1.total_sales as q1_sales,
    s2.employee_id as q2_employee,
    s2.total_sales as q2_sales
FROM Q1_Sales s1
FULL JOIN Q2_Sales s2
    ON s1.employee_id = s2.employee_id;
 
-- Result shows ALL employees from BOTH quarters:
-- - Employees who sold in both: shows both values
-- - Employees who sold only in Q1: Q2 columns are NULL
-- - Employees who sold only in Q2: Q1 columns are NULL

Handling the Dual NULL Pattern:

With FULL JOIN, you'll frequently want a unified identifier column that's never NULL. The COALESCE function is essential here:

full_join_coalesce.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Using COALESCE to get a unified key column
SELECT 
    COALESCE(s1.employee_id, s2.employee_id) as employee_id,  -- Never NULL
    s1.total_sales as q1_sales,
    s2.total_sales as q2_sales,
    COALESCE(s1.total_sales, 0) + COALESCE(s2.total_sales, 0) as total_sales
FROM Q1_Sales s1
FULL OUTER JOIN Q2_Sales s2
    ON s1.employee_id = s2.employee_id
ORDER BY employee_id;
 
-- COALESCE(s1.employee_id, s2.employee_id) works because:
-- - If matched: s1.employee_id = s2.employee_id, either works
-- - If left-only: s1.employee_id has value, s2.employee_id is NULL
-- - If right-only: s1.employee_id is NULL, s2.employee_id has value
 
-- The result has a clean, unified employee_id column

Join Key Duality

In FULL OUTER JOIN, the join key columns can be NULL on either side. When you need a reliable identifier for each output row, use COALESCE(left.key, right.key) to always get a value.

Execution Mechanics: How FULL OUTER JOIN Works

Understanding FULL OUTER JOIN execution helps diagnose unexpected results and performance issues.

Conceptual Execution Algorithm:

One way to conceptualize FULL OUTER JOIN is as a combination of LEFT and RIGHT joins:

Perform LEFT OUTER JOIN (preserves all left rows)
Perform RIGHT OUTER JOIN, but only for right rows not already matched
Union the results

Alternatively, using a hash join mindset:

Build a hash table from the right table
For each left row, probe the hash table:
- If match found: output combined row, mark right row as 'used'
- If no match: output left row with NULLs
After all left rows processed, scan right table's hash entries:
- For each 'unused' right row: output with NULL left columns

full_join_execution.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Sample data to trace execution
-- Q1_Sales:
-- | employee_id | total_sales |
-- |-------------|-------------|
-- | 1           | 10000       |
-- | 2           | 15000       |
-- | 3           | 8000        |
 
-- Q2_Sales:
-- | employee_id | total_sales |
-- |-------------|-------------|
-- | 2           | 12000       |
-- | 3           | 20000       |
-- | 4           | 5000        |
 
SELECT 
    COALESCE(s1.employee_id, s2.employee_id) as employee_id,
    s1.total_sales as q1_sales,
    s2.total_sales as q2_sales
FROM Q1_Sales s1
FULL OUTER JOIN Q2_Sales s2
    ON s1.employee_id = s2.employee_id;
 
-- Execution trace:
--
-- LEFT JOIN phase (all Q1 rows):
-- Employee 1: No match in Q2 → (1, 10000, NULL)
-- Employee 2: Match in Q2    → (2, 15000, 12000)
-- Employee 3: Match in Q2    → (3, 8000, 20000)
--
-- RIGHT-ONLY phase (Q2 rows not yet matched):
-- Employee 4: Not matched    → (4, NULL, 5000)
--
-- Final result set:
-- | employee_id | q1_sales | q2_sales |
-- |-------------|----------|----------|
-- | 1           | 10000    | NULL     |  -- Q1 only
-- | 2           | 15000    | 12000    |  -- Both quarters
-- | 3           | 8000     | 20000    |  -- Both quarters
-- | 4           | NULL     | 5000     |  -- Q2 only

Physical Execution Strategies:

Query optimizers have several approaches for FULL OUTER JOIN:

Strategy	How It Works	Characteristics
Hash Full Join	Hash one table, probe with other, output unmatched from both	Common for medium tables
Merge Full Join	Sort both tables, merge-scan outputting all rows	Efficient when pre-sorted
Nested Loop + Scan	LEFT JOIN via nested loop, then scan for unmatched right	Less common

FULL OUTER JOIN is generally more expensive than LEFT or INNER JOIN because it must track unmatched rows from both sides.

Performance Note

Real-World Use Cases: When FULL OUTER JOIN Shines

FULL OUTER JOIN is less common than LEFT JOIN, but it's the perfect tool for specific analytical scenarios. Recognizing these patterns helps you choose the right join type.

Pattern 1: Data Reconciliation

Comparing two datasets to find matches and discrepancies—perhaps comparing expected vs. actual data, or comparing data between systems.

pattern_reconciliation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Data reconciliation between two systems
-- Goal: Compare expected inventory (ERP) vs actual count (physical inventory)
 
SELECT 
    COALESCE(erp.product_id, inv.product_id) as product_id,
    erp.product_name,
    erp.expected_qty,
    inv.actual_qty,
    COALESCE(inv.actual_qty, 0) - COALESCE(erp.expected_qty, 0) as variance,
    CASE 
        WHEN erp.product_id IS NULL THEN 'Found but not in ERP'
        WHEN inv.product_id IS NULL THEN 'In ERP but not found'
        WHEN inv.actual_qty = erp.expected_qty THEN 'Match'
        WHEN inv.actual_qty < erp.expected_qty THEN 'Shortage'
        ELSE 'Overage'
    END as reconciliation_status
FROM ERP_Inventory erp
FULL OUTER JOIN Physical_Inventory inv
    ON erp.product_id = inv.product_id
ORDER BY 
    CASE 
        WHEN erp.product_id IS NULL OR inv.product_id IS NULL THEN 0
        WHEN inv.actual_qty != erp.expected_qty THEN 1
        ELSE 2
    END,
    ABS(COALESCE(inv.actual_qty, 0) - COALESCE(erp.expected_qty, 0)) DESC;
 
-- This shows:
-- - Products in both systems (matched or with variance)
-- - Products only in ERP (missing from physical count)
-- - Products found physically but not in ERP (unexpected items)

Pattern 2: Comparative Period Analysis

Comparing data across time periods where entities may appear in one period but not another.

pattern_period_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Comparing customer activity across two periods
-- Some customers are new, some churned, some retained
 
WITH CurrentMonth AS (
    SELECT customer_id, SUM(amount) as current_spend
    FROM Orders
    WHERE order_date >= '2024-03-01' AND order_date < '2024-04-01'
    GROUP BY customer_id
),
PreviousMonth AS (
    SELECT customer_id, SUM(amount) as previous_spend
    FROM Orders
    WHERE order_date >= '2024-02-01' AND order_date < '2024-03-01'
    GROUP BY customer_id
)
SELECT 
    COALESCE(c.customer_id, p.customer_id) as customer_id,
    COALESCE(p.previous_spend, 0) as feb_sales,
    COALESCE(c.current_spend, 0) as mar_sales,
    CASE 
        WHEN p.customer_id IS NULL THEN 'New Customer'
        WHEN c.customer_id IS NULL THEN 'Churned'
        WHEN c.current_spend > p.previous_spend THEN 'Growth'
        WHEN c.current_spend < p.previous_spend THEN 'Decline'
        ELSE 'Stable'
    END as customer_status,
    CASE WHEN p.previous_spend > 0 
        THEN ROUND((c.current_spend - p.previous_spend) / p.previous_spend * 100, 1)
        ELSE NULL 
    END as growth_percentage
FROM CurrentMonth c
FULL OUTER JOIN PreviousMonth p
    ON c.customer_id = p.customer_id
ORDER BY customer_status, customer_id;

Pattern 3: Master Data Merge

Combining data from multiple sources where each source may have unique records.

pattern_data_merge.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Merging product data from two different vendors
-- Each vendor may have products the other doesn't
 
SELECT 
    COALESCE(v1.sku, v2.sku) as unified_sku,
    COALESCE(v1.product_name, v2.product_name) as product_name,
    v1.vendor1_price,
    v2.vendor2_price,
    CASE 
        WHEN v1.sku IS NOT NULL AND v2.sku IS NOT NULL THEN
            CASE WHEN v1.vendor1_price <= v2.vendor2_price THEN 'Vendor 1' ELSE 'Vendor 2' END
        WHEN v1.sku IS NOT NULL THEN 'Vendor 1 Only'
        ELSE 'Vendor 2 Only'
    END as best_source,
    LEAST(COALESCE(v1.vendor1_price, v2.vendor2_price), 
          COALESCE(v2.vendor2_price, v1.vendor1_price)) as best_price
FROM Vendor1_Products v1
FULL OUTER JOIN Vendor2_Products v2
    ON v1.sku = v2.sku
ORDER BY unified_sku;
 
-- This unified view shows:
-- - Products available from both vendors (with price comparison)
-- - Products only from Vendor 1
-- - Products only from Vendor 2

The FULL JOIN Signal

Whenever you hear requirements like "compare," "reconcile," "find discrepancies," "merge from multiple sources," or "show what's missing from either side"—think FULL OUTER JOIN.

Handling NULL Complexity in FULL OUTER JOIN

FULL OUTER JOIN produces the most complex NULL patterns of any join type. Mastering NULL handling is essential for writing correct FULL JOIN queries.

Three Sources of NULLs:

Left-only rows: All right table columns are NULL
Right-only rows: All left table columns are NULL
Inherent NULLs: Either table may contain NULL values in its data

Distinguishing between "unmatched row NULLs" and "actual data NULLs" requires careful query design.

null_handling_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Distinguishing NULL sources
-- Use the PRIMARY KEY (never NULL) to determine match status
 
SELECT 
    COALESCE(l.id, r.id) as unified_id,
    l.id as left_id,
    r.id as right_id,
    l.value as left_value,
    r.value as right_value,
    -- Categorize the row based on which PKs are present
    CASE 
        WHEN l.id IS NOT NULL AND r.id IS NOT NULL THEN 'Matched'
        WHEN l.id IS NOT NULL THEN 'Left Only'
        ELSE 'Right Only'
    END as match_type
FROM LeftTable l
FULL OUTER JOIN RightTable r
    ON l.id = r.id;
 
-- A NULL in left_value could mean:
-- 1. Row is right-only (check left_id IS NULL)
-- 2. Row is matched but left_value is actually NULL in the data
 
-- Safe check: Always use the PK for match determination
SELECT 
    COALESCE(l.id, r.id) as unified_id,
    CASE 
        WHEN l.id IS NULL THEN 'N/A (Right Only)'
        WHEN l.value IS NULL THEN 'NULL (in data)'
        ELSE l.value::text
    END as left_value_display
FROM LeftTable l
FULL OUTER JOIN RightTable r
    ON l.id = r.id;

The NULL Trap

Filtering NULL Categories:

You can filter the FULL JOIN result to show only specific categories of rows:

filter_null_categories.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Show only matched rows (equivalent to INNER JOIN)
SELECT COALESCE(l.id, r.id) as id, l.value, r.value
FROM LeftTable l
FULL OUTER JOIN RightTable r ON l.id = r.id
WHERE l.id IS NOT NULL AND r.id IS NOT NULL;
 
-- Show only left-only rows (things in left but not right)
SELECT l.id, l.value
FROM LeftTable l
FULL OUTER JOIN RightTable r ON l.id = r.id
WHERE r.id IS NULL;
 
-- Show only right-only rows (things in right but not left)
SELECT r.id, r.value
FROM LeftTable l
FULL OUTER JOIN RightTable r ON l.id = r.id
WHERE l.id IS NULL;
 
-- Show unmatched rows from either side
SELECT 
    COALESCE(l.id, r.id) as id,
    CASE WHEN l.id IS NULL THEN 'Right Only' ELSE 'Left Only' END as source
FROM LeftTable l
FULL OUTER JOIN RightTable r ON l.id = r.id
WHERE l.id IS NULL OR r.id IS NULL;
 
-- These patterns are essential for data reconciliation

FULL JOIN NULL Handling Best Practices

•Use COALESCE for unified keys: COALESCE(left.key, right.key) ensures you always have an identifier
•Check PKs, not nullable columns: Determine match status using primary keys that are never NULL in the source data
•Be explicit about NULL meaning: Use CASE expressions to clearly label whether NULL means 'unmatched' or 'actual NULL value'
•Default NULL aggregates: Use COALESCE(SUM(...), 0) when aggregating columns that may be NULL due to unmatched rows

Emulating FULL OUTER JOIN

While most modern databases support FULL OUTER JOIN natively, there are scenarios where you may need to emulate it:

Legacy database versions (e.g., SQLite before 3.39.0, old MySQL versions)
Certain data warehouse systems
Query builder limitations

Emulation Strategy: UNION of LEFT and RIGHT JOINs

FULL OUTER JOIN can be emulated by combining a LEFT JOIN and a RIGHT JOIN, being careful to avoid duplicating matched rows.

emulate_full_join.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- Native FULL OUTER JOIN
SELECT 
    COALESCE(l.id, r.id) as id,
    l.value as left_value,
    r.value as right_value
FROM LeftTable l
FULL OUTER JOIN RightTable r
    ON l.id = r.id;
 
-- Emulation using UNION
-- LEFT JOIN (all left rows + matched right rows)
SELECT 
    COALESCE(l.id, r.id) as id,
    l.value as left_value,
    r.value as right_value
FROM LeftTable l
LEFT JOIN RightTable r
    ON l.id = r.id
 
UNION ALL
 
-- RIGHT-ONLY rows (right rows NOT matched by left join)
SELECT 
    r.id as id,
    NULL as left_value,
    r.value as right_value
FROM RightTable r
LEFT JOIN LeftTable l
    ON r.id = l.id
WHERE l.id IS NULL;
 
-- Alternative using UNION (removes duplicates, slower)
SELECT l.id, l.value as left_value, r.value as right_value
FROM LeftTable l
LEFT JOIN RightTable r ON l.id = r.id
 
UNION
 
SELECT r.id, l.value as left_value, r.value as right_value
FROM RightTable r
LEFT JOIN LeftTable l ON r.id = l.id;

UNION vs UNION ALL

Why This Works:

The emulation works because:

First query (LEFT JOIN): Returns all left rows, including matched ones
Second query (RIGHT-only): Returns only right rows that had no match
UNION ALL: Combines without removing duplicates (we've already ensured no duplicates)

The result is identical to FULL OUTER JOIN: all left rows, all right rows, matching where possible.

Performance Considerations

FULL OUTER JOIN has unique performance characteristics due to its requirement to preserve unmatched rows from both sides.

Cost Factors:

FULL OUTER JOIN Performance Factors

•Double-sided tracking: The database must track which rows from BOTH tables have been matched, requiring more memory than LEFT JOIN
•Index utilization limited: Standard B-tree indexes help LEFT JOIN probe the right table. For FULL JOIN, both tables may need scanning for unmatched rows
•Hash join preferred: Hash-based FULL JOIN algorithms are most common; ensure adequate memory for the hash table
•Merge join efficient when sorted: If both tables are sorted on the join key, merge join can process FULL JOIN efficiently

full_join_optimization.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Optimization: Filter tables BEFORE joining
-- Reduces the size of the FULL JOIN operation
 
-- BEFORE: FULL JOIN on large tables, then filter
SELECT COALESCE(a.id, b.id) as id, a.value, b.value
FROM LargeTableA a
FULL OUTER JOIN LargeTableB b ON a.id = b.id
WHERE a.category = 'Electronics' OR b.category = 'Electronics';
 
-- AFTER: Filter first via CTEs, then FULL JOIN on smaller sets
WITH FilteredA AS (
    SELECT id, value, category
    FROM LargeTableA
    WHERE category = 'Electronics'
),
FilteredB AS (
    SELECT id, value, category
    FROM LargeTableB
    WHERE category = 'Electronics'
)
SELECT COALESCE(a.id, b.id) as id, a.value, b.value
FROM FilteredA a
FULL OUTER JOIN FilteredB b ON a.id = b.id;
 
-- Index recommendations for FULL OUTER JOIN
CREATE INDEX idx_tablea_id ON LargeTableA(id);
CREATE INDEX idx_tableb_id ON LargeTableB(id);
 
-- If using hash join, these indexes help less than sorting
-- For merge join strategy, pre-sorted data is most efficient

FULL JOIN Performance Comparison
Join Type	Relative Cost	Memory Usage	Index Benefit
INNER JOIN	Baseline	Low	High (right table index)
LEFT JOIN	Slightly higher	Low-Medium	High (right table index)
FULL OUTER JOIN	Higher	Medium-High	Moderate (both tables)
FULL JOIN emulation (UNION)	Highest	High	Same as LEFT JOINs used

When Native Beats Emulation

Native FULL OUTER JOIN is almost always faster than the UNION-based emulation because the optimizer can use specialized algorithms. Only use emulation when native support is unavailable.

Summary: FULL OUTER JOIN Mastery

Key Takeaways

•FULL JOIN preserves all rows from both tables, filling NULLs where no match exists on either side.
•Three categories of output rows: matched (both sides have data), left-only (right side NULL), right-only (left side NULL).
•Use COALESCE for unified identifiers: COALESCE(left.key, right.key) gives you a non-NULL key for each row.
•Check PKs to determine match status, not nullable data columns that could have legitimate NULL values.
•Primary use cases: data reconciliation, period-over-period comparison, merging data from multiple sources.
•Can be emulated with UNION of LEFT JOIN and filtered RIGHT JOIN when native support is unavailable.
•Generally more expensive than LEFT JOIN due to double-sided unmatched row tracking.

What's Next: Preserving Unmatched Rows

Page Complete

3 / 5