Loading content...
Imagine a common scenario: you receive a batch of data—some records are new, some are updates to existing records. The traditional approach requires checking each record, then executing INSERT or UPDATE accordingly. This creates multiple round-trips, complex application logic, and potential race conditions.
The MERGE statement (also known as UPSERT—a portmanteau of UPDATE and INSERT) solves this elegantly. In a single atomic statement, you can insert new records, update existing ones, and optionally delete records that no longer appear in the source—all based on a matching condition.
This capability is essential for data synchronization, ETL processes, inventory updates, and any scenario where external data must be reconciled with database state.
By the end of this page, you will master the MERGE statement syntax across major database systems, understand WHEN MATCHED and WHEN NOT MATCHED clauses, implement upsert patterns for data synchronization, handle complex matching conditions and multiple actions, and recognize performance considerations and alternatives.
The MERGE statement compares a source dataset against a target table, then performs different actions based on whether rows match or not. It's declarative: you describe the conditions and actions, and the database handles the logic atomically.
The Core Concept:
123456789101112131415161718
-- Conceptual MERGE structureMERGE INTO target_table AS targetUSING source_table AS sourceON target.key_column = source.key_column WHEN MATCHED THEN -- Row exists in both: UPDATE target with source values UPDATE SET target.column1 = source.column1, ... WHEN NOT MATCHED THEN -- Row in source but not target: INSERT new row INSERT (columns...) VALUES (source.columns...); -- The MERGE:-- 1. Joins source and target on the ON condition-- 2. For rows that match: executes the MATCHED action (UPDATE)-- 3. For source rows without match: executes NOT MATCHED action (INSERT)-- 4. All actions are atomic—committed together or not at allThe Problem MERGE Solves:
Without MERGE, the upsert pattern requires multiple statements:
123456789101112131415161718192021222324252627282930313233
-- WITHOUT MERGE: Traditional upsert pattern (problematic) -- Approach 1: Check, then insert or update-- PROBLEM: Race condition between SELECT and INSERT/UPDATEIF EXISTS (SELECT 1 FROM products WHERE product_id = @id)BEGIN UPDATE products SET product_name = @name, unit_price = @price WHERE product_id = @id;ENDELSEBEGIN INSERT INTO products (product_id, product_name, unit_price) VALUES (@id, @name, @price);END -- Approach 2: Try insert, catch duplicate, then update-- PROBLEM: Relies on error handling, performance overheadBEGIN TRY INSERT INTO products (product_id, product_name, unit_price) VALUES (@id, @name, @price);END TRYBEGIN CATCH IF ERROR_NUMBER() = 2627 -- Duplicate key error UPDATE products SET product_name = @name, unit_price = @price WHERE product_id = @id;END CATCH -- Both approaches have issues:-- 1. Multiple statements = multiple round-trips-- 2. Race conditions in concurrent environments-- 3. Complex logic that's hard to maintainSQL:2003 standardized MERGE syntax, though implementations vary. The standard syntax is supported by Oracle, SQL Server, and DB2 with minor variations.
Complete MERGE Syntax:
1234567891011121314151617181920212223
-- Standard MERGE syntax (SQL Server / Oracle style) MERGE INTO target_table AS tUSING source_table AS sON t.key_column = s.key_column -- When row exists in both source and targetWHEN MATCHED THEN UPDATE SET t.column1 = s.column1, t.column2 = s.column2, t.updated_at = CURRENT_TIMESTAMP -- When row exists only in source (new data)WHEN NOT MATCHED BY TARGET THEN INSERT (key_column, column1, column2, created_at) VALUES (s.key_column, s.column1, s.column2, CURRENT_TIMESTAMP) -- When row exists only in target (orphaned data) - SQL Server onlyWHEN NOT MATCHED BY SOURCE THEN DELETE; -- or UPDATE to mark as inactive -- IMPORTANT: SQL Server requires semicolon after MERGE!Practical Example: Product Inventory Sync:
1234567891011121314151617181920212223242526272829303132333435363738394041
-- Scenario: Sync inventory from supplier feed into products table -- Source: Daily supplier feed (temporary table or table variable)CREATE TABLE #supplier_feed ( sku VARCHAR(50) PRIMARY KEY, product_name VARCHAR(200), unit_price DECIMAL(10, 2), quantity INTEGER, last_updated DATETIME); -- Insert sample supplier dataINSERT INTO #supplier_feed VALUES ('SKU001', 'Wireless Mouse', 29.99, 150, '2024-01-15'), ('SKU002', 'USB Keyboard', 49.99, 75, '2024-01-15'), ('SKU003', 'Monitor Stand', 34.99, 200, '2024-01-15'), -- New product ('SKU004', 'Desk Lamp', 24.99, 50, '2024-01-15'); -- New product -- MERGE: Sync supplier feed with products tableMERGE INTO products AS targetUSING #supplier_feed AS sourceON target.sku = source.sku WHEN MATCHED THEN -- Update existing products with new prices and quantities UPDATE SET target.product_name = source.product_name, target.unit_price = source.unit_price, target.units_in_stock = source.quantity, target.last_sync = CURRENT_TIMESTAMP WHEN NOT MATCHED BY TARGET THEN -- Insert new products from supplier INSERT (sku, product_name, unit_price, units_in_stock, created_at, last_sync) VALUES (source.sku, source.product_name, source.unit_price, source.quantity, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP); -- Result: -- Existing products (SKU001, SKU002) are updated-- New products (SKU003, SKU004) are inserted-- All in one atomic operation!SQL Server requires MERGE statements to end with a semicolon. This is unlike other T-SQL statements where the semicolon is optional. Forgetting it causes syntax errors.
While the MERGE concept is universal, syntax varies significantly between database systems. Some databases use alternative syntax rather than the standard MERGE statement.
PostgreSQL uses INSERT ... ON CONFLICT (available since 9.5):
12345678910111213141516171819202122232425262728293031323334353637383940
-- PostgreSQL: INSERT ... ON CONFLICT (Upsert) -- Basic upsert: Insert or update on primary key conflictINSERT INTO products (product_id, product_name, unit_price, units_in_stock)VALUES (101, 'Wireless Mouse', 29.99, 150)ON CONFLICT (product_id) DO UPDATE SET product_name = EXCLUDED.product_name, unit_price = EXCLUDED.unit_price, units_in_stock = EXCLUDED.units_in_stock, updated_at = CURRENT_TIMESTAMP; -- EXCLUDED refers to the row that was attempted to be inserted -- Upsert on unique constraint (by name)INSERT INTO products (product_id, product_name, unit_price)VALUES (102, 'Mechanical Keyboard', 149.99)ON CONFLICT ON CONSTRAINT products_name_uniqueDO UPDATE SET unit_price = EXCLUDED.unit_price; -- Upsert with conditional updateINSERT INTO products (product_id, product_name, unit_price)VALUES (103, 'Monitor', 299.99)ON CONFLICT (product_id) DO UPDATE SET unit_price = EXCLUDED.unit_priceWHERE products.unit_price IS DISTINCT FROM EXCLUDED.unit_price;-- Only updates if price actually changed -- Upsert: Do nothing on conflict (insert if not exists)INSERT INTO products (product_id, product_name, unit_price)VALUES (104, 'Desk Lamp', 24.99)ON CONFLICT (product_id) DO NOTHING; -- PostgreSQL 15+ also supports standard MERGE syntaxMERGE INTO products AS tUSING (VALUES (101, 'Wireless Mouse', 29.99)) AS s(product_id, product_name, unit_price)ON t.product_id = s.product_idWHEN MATCHED THEN UPDATE SET product_name = s.product_nameWHEN NOT MATCHED THEN INSERT VALUES (s.product_id, s.product_name, s.unit_price);MERGE becomes even more powerful with conditional actions, multiple WHEN clauses, and complex source queries.
Conditional Actions:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
-- MERGE with conditional WHEN MATCHED clauses (SQL Server) MERGE INTO products AS targetUSING staging_products AS sourceON target.product_id = source.product_id -- Update only if data actually changed (avoid unnecessary updates)WHEN MATCHED AND ( target.product_name != source.product_name OR target.unit_price != source.unit_price OR target.category_id != source.category_id) THEN UPDATE SET target.product_name = source.product_name, target.unit_price = source.unit_price, target.category_id = source.category_id, target.updated_at = CURRENT_TIMESTAMP -- When matched but no changes, do nothing (implicit) WHEN NOT MATCHED BY TARGET THEN INSERT (product_id, product_name, unit_price, category_id, created_at) VALUES (source.product_id, source.product_name, source.unit_price, source.category_id, CURRENT_TIMESTAMP); -- ================================================ -- Multiple WHEN MATCHED clauses with different conditionsMERGE INTO inventory AS targetUSING warehouse_shipment AS sourceON target.sku = source.sku -- If quantity drops to zero, mark as out of stockWHEN MATCHED AND target.quantity + source.quantity <= 0 THEN UPDATE SET target.quantity = 0, target.status = 'out_of_stock', target.updated_at = CURRENT_TIMESTAMP -- If quantity is positive, update normallyWHEN MATCHED AND target.quantity + source.quantity > 0 THEN UPDATE SET target.quantity = target.quantity + source.quantity, target.status = CASE WHEN target.quantity + source.quantity < 10 THEN 'low_stock' ELSE 'in_stock' END, target.updated_at = CURRENT_TIMESTAMP WHEN NOT MATCHED THEN INSERT (sku, quantity, status, created_at) VALUES (source.sku, source.quantity, 'in_stock', CURRENT_TIMESTAMP);MERGE with Aggregated Source:
123456789101112131415161718192021222324252627282930313233343536373839
-- MERGE using aggregated/transformed source data -- Update category statistics from ordersMERGE INTO category_stats AS targetUSING ( SELECT p.category_id, COUNT(DISTINCT o.order_id) AS total_orders, SUM(oi.quantity) AS total_units_sold, SUM(oi.quantity * oi.unit_price) AS total_revenue, AVG(oi.unit_price) AS avg_unit_price FROM order_items oi JOIN products p ON oi.product_id = p.product_id JOIN orders o ON oi.order_id = o.order_id WHERE o.order_date >= DATEADD(MONTH, -1, GETDATE()) GROUP BY p.category_id) AS sourceON target.category_id = source.category_id WHEN MATCHED THEN UPDATE SET target.monthly_orders = source.total_orders, target.monthly_units = source.total_units_sold, target.monthly_revenue = source.total_revenue, target.avg_price = source.avg_unit_price, target.last_calculated = CURRENT_TIMESTAMP WHEN NOT MATCHED BY TARGET THEN INSERT (category_id, monthly_orders, monthly_units, monthly_revenue, avg_price, last_calculated) VALUES (source.category_id, source.total_orders, source.total_units_sold, source.total_revenue, source.avg_unit_price, CURRENT_TIMESTAMP) -- Clean up categories with no recent salesWHEN NOT MATCHED BY SOURCE THEN UPDATE SET target.monthly_orders = 0, target.monthly_units = 0, target.monthly_revenue = 0, target.last_calculated = CURRENT_TIMESTAMP;When using multiple WHEN MATCHED clauses with conditions, the first matching condition wins. Order them from most specific to least specific, similar to IF/ELSE IF chains in programming.
SQL Server's MERGE statement supports the OUTPUT clause, which captures what actions were taken. This is invaluable for auditing, logging, and debugging synchronization operations.
OUTPUT Clause with MERGE:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- MERGE with OUTPUT for complete audit trail -- Create audit tableCREATE TABLE product_sync_audit ( audit_id INT IDENTITY PRIMARY KEY, action VARCHAR(10), product_id INT, old_name VARCHAR(200), new_name VARCHAR(200), old_price DECIMAL(10,2), new_price DECIMAL(10,2), sync_timestamp DATETIME2 DEFAULT CURRENT_TIMESTAMP); -- MERGE with OUTPUT into audit tableMERGE INTO products AS targetUSING staging_products AS sourceON target.product_id = source.product_id WHEN MATCHED THEN UPDATE SET target.product_name = source.product_name, target.unit_price = source.unit_price WHEN NOT MATCHED BY TARGET THEN INSERT (product_id, product_name, unit_price) VALUES (source.product_id, source.product_name, source.unit_price) WHEN NOT MATCHED BY SOURCE AND target.is_synced = 1 THEN DELETE -- OUTPUT captures before and after statesOUTPUT $action AS action, -- 'INSERT', 'UPDATE', or 'DELETE' COALESCE(INSERTED.product_id, DELETED.product_id) AS product_id, DELETED.product_name AS old_name, -- NULL for INSERT INSERTED.product_name AS new_name, -- NULL for DELETE DELETED.unit_price AS old_price, INSERTED.unit_price AS new_priceINTO product_sync_audit (action, product_id, old_name, new_name, old_price, new_price); -- Query the audit to see what happenedSELECT * FROM product_sync_audit ORDER BY audit_id; -- Results might show:-- action | product_id | old_name | new_name | old_price | new_price-- UPDATE | 101 | Wireless Mouse | Wireless Mouse v2 | 29.99 | 34.99-- UPDATE | 102 | Keyboard | Mechanical Keyboard| 49.99 | 149.99-- INSERT | 103 | NULL | New Monitor | NULL | 299.99-- DELETE | 999 | Discontinued | NULL | 19.99 | NULLSummary Statistics from OUTPUT:
1234567891011121314151617181920212223242526
-- Capture MERGE results into table variable for summary DECLARE @MergeResults TABLE (action VARCHAR(10)); MERGE INTO products AS targetUSING staging_products AS sourceON target.product_id = source.product_id WHEN MATCHED THEN UPDATE SET target.product_name = source.product_nameWHEN NOT MATCHED BY TARGET THEN INSERT (product_id, product_name) VALUES (source.product_id, source.product_name)WHEN NOT MATCHED BY SOURCE THEN DELETE OUTPUT $action INTO @MergeResults; -- Generate summarySELECT action AS Operation, COUNT(*) AS RowCountFROM @MergeResultsGROUP BY action; -- Results:-- Operation | RowCount-- INSERT | 15-- UPDATE | 42-- DELETE | 3PostgreSQL's INSERT ON CONFLICT also supports RETURNING, though with different semantics. For complex MERGE-like operations with full audit trails, consider using CTEs with RETURNING or separate INSERT/UPDATE statements with RETURNING captured.
Certain upsert patterns appear repeatedly in real applications. Mastering these common patterns accelerates development.
Pattern 1: Insert-or-Update-All:
12345678910111213141516171819202122
-- Pattern 1: Full replacement - update all columns on match -- PostgreSQLINSERT INTO products (product_id, name, price, stock, category, updated_at)VALUES (101, 'Widget', 9.99, 100, 'Gadgets', NOW())ON CONFLICT (product_id) DO UPDATE SET name = EXCLUDED.name, price = EXCLUDED.price, stock = EXCLUDED.stock, category = EXCLUDED.category, updated_at = EXCLUDED.updated_at; -- MySQLINSERT INTO products (product_id, name, price, stock, category, updated_at)VALUES (101, 'Widget', 9.99, 100, 'Gadgets', NOW())ON DUPLICATE KEY UPDATE name = VALUES(name), price = VALUES(price), stock = VALUES(stock), category = VALUES(category), updated_at = VALUES(updated_at);Pattern 2: Insert-or-Increment:
123456789101112131415161718192021222324252627282930
-- Pattern 2: Insert new, increment existing (counters, aggregates) -- PostgreSQL: Page view counterINSERT INTO page_views (page_url, view_count, first_view, last_view)VALUES ('/products/widget', 1, NOW(), NOW())ON CONFLICT (page_url) DO UPDATE SET view_count = page_views.view_count + 1, last_view = NOW(); -- MySQL: Inventory adjustmentINSERT INTO inventory (sku, quantity, last_updated)VALUES ('SKU-001', 50, NOW())ON DUPLICATE KEY UPDATE quantity = quantity + VALUES(quantity), last_updated = VALUES(last_updated); -- SQL Server: Running totalsMERGE INTO daily_sales AS targetUSING (SELECT @date AS sale_date, @amount AS amount) AS sourceON target.sale_date = source.sale_date WHEN MATCHED THEN UPDATE SET target.total_amount = target.total_amount + source.amount, target.transaction_count = target.transaction_count + 1 WHEN NOT MATCHED THEN INSERT (sale_date, total_amount, transaction_count) VALUES (source.sale_date, source.amount, 1);Pattern 3: Insert-or-Ignore:
1234567891011121314151617181920212223
-- Pattern 3: Insert only if not exists (idempotent inserts) -- PostgreSQLINSERT INTO users (email, name, created_at)VALUES ('user@example.com', 'John Doe', NOW())ON CONFLICT (email) DO NOTHING;-- Silently ignores if email already exists -- MySQLINSERT IGNORE INTO users (email, name, created_at)VALUES ('user@example.com', 'John Doe', NOW()); -- SQLiteINSERT OR IGNORE INTO users (email, name, created_at)VALUES ('user@example.com', 'John Doe', datetime('now')); -- SQL Server: Conditional MERGEMERGE INTO users AS targetUSING (SELECT 'user@example.com' AS email, 'John Doe' AS name) AS sourceON target.email = source.emailWHEN NOT MATCHED THEN INSERT (email, name, created_at) VALUES (source.email, source.name, GETDATE());Pattern 4: Conditional Update (update only if certain conditions):
12345678910111213141516171819
-- Pattern 4: Update only under certain conditions -- Only update if new price is lower (price protection)INSERT INTO products (product_id, name, price)VALUES (101, 'Widget', 7.99)ON CONFLICT (product_id) DO UPDATE SET price = EXCLUDED.priceWHERE products.price > EXCLUDED.price;-- Only reduces price, never increases -- Only update if source is newer (timestamp-based dedup)INSERT INTO events (event_id, event_data, event_time)VALUES ('evt-123', '{"action": "click"}', '2024-01-15 10:30:00')ON CONFLICT (event_id) DO UPDATE SET event_data = EXCLUDED.event_data, event_time = EXCLUDED.event_timeWHERE events.event_time < EXCLUDED.event_time;-- Only updates if incoming event is newerMERGE operations can be resource-intensive. Understanding performance factors helps optimize synchronization operations.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
-- Performance optimization techniques -- 1. Ensure index exists on join columnCREATE INDEX idx_products_sku ON products (sku); -- 2. Batch large operations-- Instead of one MERGE with 1 million rows, do 10 batches of 100KDECLARE @BatchSize INT = 100000;DECLARE @Offset INT = 0; WHILE EXISTS ( SELECT 1 FROM staging_products WHERE row_num > @Offset AND row_num <= @Offset + @BatchSize)BEGIN MERGE INTO products AS target USING ( SELECT * FROM staging_products WHERE row_num > @Offset AND row_num <= @Offset + @BatchSize ) AS source ON target.product_id = source.product_id WHEN MATCHED THEN UPDATE SET target.name = source.name WHEN NOT MATCHED THEN INSERT (product_id, name) VALUES (source.product_id, source.name); SET @Offset = @Offset + @BatchSize;END -- 3. Filter source to only changed rowsMERGE INTO products AS targetUSING ( SELECT * FROM staging_products s WHERE NOT EXISTS ( SELECT 1 FROM products p WHERE p.product_id = s.product_id AND p.name = s.name AND p.price = s.price )) AS sourceON target.product_id = source.product_id-- Only processes rows that actually differ -- 4. Consider simple INSERT + UPDATE for massive data-- Sometimes separate statements outperform MERGEBEGIN TRANSACTION; UPDATE products SET name = s.name, price = s.price FROM products p JOIN staging_products s ON p.product_id = s.product_id; INSERT INTO products (product_id, name, price) SELECT s.product_id, s.name, s.price FROM staging_products s LEFT JOIN products p ON s.product_id = p.product_id WHERE p.product_id IS NULL;COMMIT;Despite its elegance, MERGE isn't always the fastest approach. For extremely large datasets or when you know most rows will be updates (or most will be inserts), separate statements may perform better. Benchmark both approaches for your specific workload.
The MERGE statement unifies INSERT, UPDATE, and DELETE into a single atomic operation, solving the classic upsert problem elegantly and efficiently.
What's Next:
Having covered individual DML statements, the next page examines bulk operations—efficient techniques for handling large data volumes through batch processing, bulk inserts, and ETL optimization.
You now have comprehensive knowledge of the MERGE/UPSERT pattern across major database systems. From basic upserts to complex conditional synchronization with audit trails, you can efficiently reconcile external data with your database tables.