Loading learning content...
Normalization separates data into distinct tables connected by foreign key relationships. This elegant design eliminates redundancy but introduces a fundamental performance cost: every time you need related data, you must perform a join.
Consider an order management system with properly normalized tables:
orders → customers (to display customer name)
orders → shipping_addresses (to display delivery info)
order_items → products (to display product name)
order_items → categories (to display category)
Displaying a complete order with all its line items requires joining 5-6 tables. At scale, these joins become the dominant factor in query latency. The database must:
Duplicating foreign key data addresses this by copying frequently-accessed attributes from related tables into the referencing table, eliminating the need for joins in common queries.
By the end of this page, you will understand when and how to duplicate foreign key data for performance optimization. You'll learn to identify high-value duplication candidates, implement synchronization strategies, and manage the consistency challenges that duplication introduces.
Data duplication (also called data copying, embedding, or inlining) means storing copies of data from referenced tables directly in the referencing table. The foreign key relationship remains for referential integrity, but commonly-accessed attributes are duplicated to avoid joins.
Before (normalized):
orders table:
| order_id | customer_id | order_date | status |
customers table:
| customer_id | name | email | loyalty_tier |
Query: SELECT o.*, c.name, c.email
FROM orders o JOIN customers c ON o.customer_id = c.customer_id
After (with duplicated data):
orders table:
| order_id | customer_id | customer_name | customer_email | order_date | status |
Query: SELECT * FROM orders
The duplicated customer_name and customer_email columns eliminate the join for the most common query pattern while customer_id maintains referential integrity for less frequent operations.
| Aspect | Join (Normalized) | Duplication (Denormalized) |
|---|---|---|
| Storage | Minimal - single copy of each value | Increased - values repeated across rows |
| Read Performance | Slower - must combine data from multiple tables | Faster - all needed data in single table |
| Write Performance | Faster - update source once | Slower - must update all copies |
| Data Consistency | Guaranteed - single source of truth | Must be actively maintained |
| Schema Complexity | More tables, cleaner design | Fewer joins, more columns |
| Query Complexity | Explicit joins required | Simpler SELECT statements |
Most production systems are heavily read-biased. A typical web application might perform 100-1000 reads for every write. If duplicating data eliminates joins on 99% of reads, the write overhead of maintaining duplicates is often negligible.
Not all foreign key attributes are worth duplicating. The decision requires analyzing query patterns, update frequencies, and consistency requirements.
Evaluation criteria for duplication candidates:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
-- Analysis queries to identify duplication candidates -- 1. Find most frequently joined column combinations-- (Requires query logging/monitoring setup)SELECT query_pattern, join_tables, execution_count, avg_execution_time_msFROM query_performance_logWHERE query_pattern LIKE '%JOIN%'ORDER BY execution_count * avg_execution_time_ms DESCLIMIT 20; -- 2. Analyze update frequency of referenced table columnsSELECT column_name, COUNT(*) AS update_count, MAX(updated_at) AS last_updateFROM audit_logWHERE table_name = 'customers' AND operation = 'UPDATE' AND updated_at > CURRENT_DATE - INTERVAL '30 days'GROUP BY column_nameORDER BY update_count DESC; -- 3. Measure join cost for specific query patternsEXPLAIN ANALYZESELECT o.order_id, o.order_date, o.status, c.name AS customer_name, c.email AS customer_emailFROM orders oJOIN customers c ON o.customer_id = c.customer_idWHERE o.order_date > CURRENT_DATE - INTERVAL '7 days'ORDER BY o.order_date DESCLIMIT 100; -- Check output for:-- - Nested Loop vs Hash Join (algorithm choice)-- - Index usage on join conditions-- - Total execution time-- - Rows processed vs rows returned -- 4. Estimate storage impact of duplicationSELECT COUNT(*) AS row_count, COUNT(*) * 50 AS estimated_bytes_for_name, -- ~50 bytes per name COUNT(*) * 100 AS estimated_bytes_for_email, pg_size_pretty((COUNT(*) * 150)::bigint) AS total_additional_storageFROM orders;Never duplicate data based on assumptions. Profile your actual query workload, measure join costs, and quantify the expected benefit. Premature duplication adds complexity without guaranteed performance gains.
There are several patterns for implementing foreign key data duplication, each suited to different scenarios and consistency requirements.
Pattern 1: Point-in-Time Capture
Capture the value at transaction time and never update it. This is appropriate when the historical state is meaningful—what the value was when the action occurred, not what it is now.
12345678910111213141516171819202122232425262728293031323334
-- Point-in-Time Capture: Values frozen at creation time-- Example: Order captures product price at time of order CREATE TABLE order_items ( order_item_id SERIAL PRIMARY KEY, order_id INT NOT NULL REFERENCES orders(order_id), product_id INT NOT NULL REFERENCES products(product_id), quantity INT NOT NULL, -- Captured at order time - intentionally NOT updated when product changes unit_price DECIMAL(10, 2) NOT NULL, -- Price at time of order product_name VARCHAR(255) NOT NULL, -- Name at time of order product_sku VARCHAR(50) NOT NULL, -- SKU at time of order created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Application code captures values during order creationINSERT INTO order_items (order_id, product_id, quantity, unit_price, product_name, product_sku)SELECT :order_id, p.product_id, :quantity, p.current_price, -- Capture current price p.name, -- Capture current name p.sku -- Capture current SKUFROM products pWHERE p.product_id = :product_id; -- Benefits:-- 1. Historical accuracy - order shows what customer actually saw/paid-- 2. No synchronization needed - values are intentionally static-- 3. Audit trail - original values preserved regardless of product changes-- 4. Legal/compliance - invoices reflect actual transaction termsPattern 2: Live Synchronized Copy
Maintain a copy that stays synchronized with the source. This requires active maintenance but ensures the duplicate always reflects current state.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
-- Live Synchronized Copy: Duplicates updated when source changes-- Example: Order displays current customer name CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INT NOT NULL REFERENCES customers(customer_id), -- Synchronized copies - updated when customer data changes customer_name VARCHAR(100) NOT NULL, customer_email VARCHAR(255) NOT NULL, customer_phone VARCHAR(20), order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP, status VARCHAR(20) DEFAULT 'pending'); -- Trigger to synchronize duplicates when source changesCREATE OR REPLACE FUNCTION sync_customer_data_to_orders()RETURNS TRIGGER AS $$BEGIN -- Only update if the duplicated fields actually changed IF OLD.name IS DISTINCT FROM NEW.name OR OLD.email IS DISTINCT FROM NEW.email OR OLD.phone IS DISTINCT FROM NEW.phone THEN UPDATE orders SET customer_name = NEW.name, customer_email = NEW.email, customer_phone = NEW.phone WHERE customer_id = NEW.customer_id; -- Log the sync operation for monitoring INSERT INTO sync_audit_log (table_name, source_id, operation, synced_at) VALUES ('orders', NEW.customer_id, 'customer_update', CURRENT_TIMESTAMP); END IF; RETURN NEW;END;$$ LANGUAGE plpgsql; CREATE TRIGGER customer_data_sync_triggerAFTER UPDATE ON customersFOR EACH ROW EXECUTE FUNCTION sync_customer_data_to_orders(); -- Also sync on initial order creationCREATE OR REPLACE FUNCTION populate_order_customer_data()RETURNS TRIGGER AS $$BEGIN SELECT name, email, phone INTO NEW.customer_name, NEW.customer_email, NEW.customer_phone FROM customers WHERE customer_id = NEW.customer_id; RETURN NEW;END;$$ LANGUAGE plpgsql; CREATE TRIGGER order_customer_populate_triggerBEFORE INSERT ON ordersFOR EACH ROW EXECUTE FUNCTION populate_order_customer_data();Pattern 3: Lazy Refresh with Staleness Tolerance
Allow duplicates to become stale and refresh periodically. Appropriate when exact consistency isn't required and refresh cost should be batched.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
-- Lazy Refresh: Periodic batch synchronization-- Example: Product catalog data in recommendations table CREATE TABLE product_recommendations ( recommendation_id SERIAL PRIMARY KEY, user_id INT NOT NULL, product_id INT NOT NULL REFERENCES products(product_id), -- Duplicated product data (may be slightly stale) product_name VARCHAR(255), product_image_url VARCHAR(500), product_category VARCHAR(100), product_price DECIMAL(10, 2), -- Staleness tracking product_data_synced_at TIMESTAMP, score DECIMAL(5, 4) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Batch refresh job - run every 15 minutes or hourlyCREATE OR REPLACE FUNCTION refresh_recommendation_product_data()RETURNS INTEGER AS $$DECLARE rows_updated INTEGER;BEGIN UPDATE product_recommendations pr SET product_name = p.name, product_image_url = p.image_url, product_category = c.name, product_price = p.current_price, product_data_synced_at = CURRENT_TIMESTAMP FROM products p JOIN categories c ON p.category_id = c.category_id WHERE pr.product_id = p.product_id AND (pr.product_data_synced_at IS NULL OR pr.product_data_synced_at < p.updated_at); GET DIAGNOSTICS rows_updated = ROW_COUNT; -- Log refresh statistics INSERT INTO batch_sync_log (job_name, rows_affected, completed_at) VALUES ('refresh_recommendation_product_data', rows_updated, CURRENT_TIMESTAMP); RETURN rows_updated;END;$$ LANGUAGE plpgsql; -- Selective refresh: Only sync high-visibility rowsCREATE OR REPLACE FUNCTION refresh_active_recommendation_data()RETURNS INTEGER AS $$DECLARE rows_updated INTEGER;BEGIN -- Only refresh recommendations shown in last 24 hours UPDATE product_recommendations pr SET product_name = p.name, product_price = p.current_price, product_data_synced_at = CURRENT_TIMESTAMP FROM products p WHERE pr.product_id = p.product_id AND pr.last_displayed_at > CURRENT_TIMESTAMP - INTERVAL '24 hours' AND pr.product_data_synced_at < p.updated_at; GET DIAGNOSTICS rows_updated = ROW_COUNT; RETURN rows_updated;END;$$ LANGUAGE plpgsql;| Pattern | Use When | Examples |
|---|---|---|
| Point-in-Time Capture | Historical accuracy matters; legal/audit requirements; values represent transaction terms | Order prices, invoice addresses, contract terms |
| Live Synchronized | Current values always needed; low update frequency at source; strong consistency required | Customer contact info, product availability status |
| Lazy Refresh | Slight staleness acceptable; high volume tables; batch updates preferred | Recommendations, cached listings, search results |
Data duplication inherently creates the possibility of inconsistency—the duplicate may not match the source. Robust implementations require multiple layers of consistency protection.
Layer 1: Transactional Consistency
Wrap source updates and duplicate updates in the same database transaction:
123456789101112131415161718192021222324
-- Transactional consistency: All updates in single transactionBEGIN; -- Update source tableUPDATE customersSET name = 'Jane Smith', email = 'jane.smith@example.com'WHERE customer_id = 12345; -- Update all duplicates in same transactionUPDATE ordersSET customer_name = 'Jane Smith', customer_email = 'jane.smith@example.com'WHERE customer_id = 12345; UPDATE support_ticketsSET customer_name = 'Jane Smith', customer_email = 'jane.smith@example.com'WHERE customer_id = 12345; COMMIT; -- All changes succeed or all fail -- Better: Let triggers handle this automatically (see previous examples)-- The trigger executes within the same transaction as the source UPDATELayer 2: Consistency Verification
Scheduled jobs to detect and alert on inconsistencies:
1234567891011121314151617181920212223242526272829303132333435363738394041
-- Consistency verification: Detect drift between source and duplicates CREATE TABLE consistency_check_results ( check_id SERIAL PRIMARY KEY, check_name VARCHAR(100) NOT NULL, table_name VARCHAR(100) NOT NULL, inconsistent_rows INT NOT NULL, sample_ids TEXT, checked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Check orders.customer_name matches customers.nameCREATE OR REPLACE FUNCTION check_order_customer_consistency()RETURNS void AS $$DECLARE inconsistent_count INT; sample_order_ids TEXT;BEGIN -- Count inconsistencies SELECT COUNT(*), STRING_AGG(order_id::TEXT, ',' ORDER BY order_id LIMIT 10) INTO inconsistent_count, sample_order_ids FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.customer_name != c.name OR o.customer_email != c.email; -- Log results INSERT INTO consistency_check_results (check_name, table_name, inconsistent_rows, sample_ids) VALUES ('order_customer_data', 'orders', inconsistent_count, sample_order_ids); -- Alert if threshold exceeded IF inconsistent_count > 0 THEN PERFORM pg_notify('consistency_alert', FORMAT('Found %s inconsistent rows in orders.customer_* fields', inconsistent_count)); END IF;END;$$ LANGUAGE plpgsql; -- Schedule to run hourlySELECT cron.schedule('order-customer-consistency-check', '0 * * * *', 'SELECT check_order_customer_consistency()');Layer 3: Automated Repair
When inconsistencies are detected, repair them automatically or flag for review:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
-- Automated repair: Fix detected inconsistencies CREATE OR REPLACE FUNCTION repair_order_customer_inconsistencies()RETURNS TABLE ( order_id INT, old_customer_name VARCHAR, new_customer_name VARCHAR, repaired_at TIMESTAMP) AS $$BEGIN RETURN QUERY WITH inconsistent AS ( SELECT o.order_id, o.customer_name AS old_name, c.name AS correct_name, o.customer_email AS old_email, c.email AS correct_email FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.customer_name != c.name OR o.customer_email != c.email ), repaired AS ( UPDATE orders o SET customer_name = i.correct_name, customer_email = i.correct_email FROM inconsistent i WHERE o.order_id = i.order_id RETURNING o.order_id, i.old_name, i.correct_name ) SELECT r.order_id, r.old_name, r.correct_name, CURRENT_TIMESTAMP FROM repaired r; -- Log repair action INSERT INTO data_repair_log (repair_type, rows_repaired, repaired_at) SELECT 'order_customer_sync', COUNT(*), CURRENT_TIMESTAMP FROM repaired;END;$$ LANGUAGE plpgsql; -- Run repair with limits for safetyCREATE OR REPLACE FUNCTION safe_repair_order_customers(max_repairs INT DEFAULT 1000)RETURNS INT AS $$DECLARE repaired_count INT;BEGIN WITH limited_inconsistent AS ( SELECT o.order_id FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.customer_name != c.name LIMIT max_repairs ), repaired AS ( UPDATE orders o SET customer_name = c.name, customer_email = c.email FROM customers c WHERE o.customer_id = c.customer_id AND o.order_id IN (SELECT order_id FROM limited_inconsistent) RETURNING o.order_id ) SELECT COUNT(*) INTO repaired_count FROM repaired; RETURN repaired_count;END;$$ LANGUAGE plpgsql;No single consistency mechanism is foolproof. Triggers can be bypassed by direct SQL. Batch jobs can fail. Network issues can interrupt transactions. Layer multiple protections: transactional integrity as primary, verification as detection, repair as correction, and alerting as awareness.
When source data changes, duplicated data must be updated—but this can impact many rows. Managing this update fan-out is crucial for write performance.
Understanding update fan-out:
If a customer has 1,000 orders, updating the customer's name means updating 1,000 order rows. If a product is in 100,000 order items, a product name change affects 100,000 rows. This fan-out ratio determines the write amplification cost of duplication.
1234567891011121314151617181920212223242526272829303132
-- Analyze fan-out ratio for different source tables -- Customer fan-out: How many orders per customer?SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY order_count) AS median_orders, PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY order_count) AS p95_orders, MAX(order_count) AS max_orders, AVG(order_count) AS avg_ordersFROM ( SELECT customer_id, COUNT(*) AS order_count FROM orders GROUP BY customer_id) customer_orders; -- Product fan-out: How many order items per product?SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY item_count) AS median_orders, PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY item_count) AS p95_orders, MAX(item_count) AS max_ordersFROM ( SELECT product_id, COUNT(*) AS item_count FROM order_items GROUP BY product_id) product_items; -- Identify high-fanout entities that would cause massive updatesSELECT p.product_id, p.name, COUNT(*) AS order_item_countFROM products pJOIN order_items oi ON p.product_id = oi.product_idGROUP BY p.product_id, p.nameHAVING COUNT(*) > 10000ORDER BY order_item_count DESC;Strategy 1: Batched Updates
For high fan-out updates, process in batches to avoid long-running transactions and lock contention:
123456789101112131415161718192021222324252627282930313233343536373839404142
-- Batched update: Process in manageable chunks CREATE OR REPLACE FUNCTION update_product_name_in_order_items( p_product_id INT, p_new_name VARCHAR, p_batch_size INT DEFAULT 1000)RETURNS INT AS $$DECLARE total_updated INT := 0; batch_updated INT;BEGIN LOOP -- Update one batch WITH batch AS ( SELECT order_item_id FROM order_items WHERE product_id = p_product_id AND product_name != p_new_name LIMIT p_batch_size FOR UPDATE SKIP LOCKED -- Avoid blocking concurrent transactions ) UPDATE order_items SET product_name = p_new_name WHERE order_item_id IN (SELECT order_item_id FROM batch); GET DIAGNOSTICS batch_updated = ROW_COUNT; total_updated := total_updated + batch_updated; -- Exit when no more rows to update EXIT WHEN batch_updated = 0; -- Brief pause between batches to reduce lock pressure PERFORM pg_sleep(0.1); -- 100ms -- Commit each batch (requires calling from application with autocommit) COMMIT; END LOOP; RETURN total_updated;END;$$ LANGUAGE plpgsql;Strategy 2: Async Queue-Based Updates
Decouple source updates from duplicate propagation using a message queue:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
// Async queue-based duplicate propagation interface DataChangeEvent { sourceTable: string; sourceId: number; changedFields: Record<string, { old: any; new: any }>; timestamp: Date;} class DuplicateDataSynchronizer { private queue: MessageQueue; private db: Database; // Source table update triggers this async onSourceDataChange(event: DataChangeEvent) { // Enqueue for async processing await this.queue.publish('duplicate-sync', event); } // Worker processes queue messages async processDuplicateSync(event: DataChangeEvent) { const syncConfig = this.getSyncConfig(event.sourceTable); for (const target of syncConfig.targets) { await this.batchUpdateTarget( target.table, target.foreignKey, event.sourceId, this.mapFields(event.changedFields, target.fieldMapping) ); } } private async batchUpdateTarget( table: string, foreignKey: string, sourceId: number, updates: Record<string, any>, batchSize: number = 500 ) { let offset = 0; let updated = 0; do { const result = await this.db.execute(` UPDATE ${table} SET ${Object.entries(updates).map(([k, v], i) => `${k} = $${i + 3}`).join(', ')}, sync_updated_at = NOW() WHERE ${foreignKey} = $1 AND id > $2 ORDER BY id LIMIT ${batchSize} `, [sourceId, offset, ...Object.values(updates)]); updated = result.rowCount; offset += batchSize; // Yield to allow other operations await new Promise(resolve => setImmediate(resolve)); } while (updated === batchSize); }}For display-only duplicated data, eventual consistency is often acceptable. A customer's name showing as 'John' for a few seconds after they changed it to 'Jonathan' rarely causes problems. Design your synchronization for the actual consistency requirements, not theoretical perfection.
Let's examine a complete implementation of foreign key data duplication for an e-commerce order display system.
Requirement: Display order history with customer name, shipping address, and product details without joins for every page load.
Before optimization:
123456789101112131415161718192021222324252627282930313233343536
-- Original normalized query for order display-- Requires 5 table joins SELECT o.order_id, o.order_date, o.status, c.name AS customer_name, c.email AS customer_email, sa.street AS shipping_street, sa.city AS shipping_city, sa.state AS shipping_state, sa.postal_code AS shipping_postal, oi.quantity, oi.unit_price, p.name AS product_name, p.sku AS product_sku, cat.name AS category_nameFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN shipping_addresses sa ON o.shipping_address_id = sa.address_idJOIN order_items oi ON o.order_id = oi.order_idJOIN products p ON oi.product_id = p.product_idJOIN categories cat ON p.category_id = cat.category_idWHERE o.customer_id = $1ORDER BY o.order_date DESCLIMIT 50; -- EXPLAIN ANALYZE result:-- Hash Join (cost=1234.56..5678.90)-- -> Nested Loop (cost=...)-- -> Index Scan on orders...-- -> Hash (cost=...)-- -> Seq Scan on customers...-- Planning Time: 8.234 ms-- Execution Time: 156.789 msAfter optimization with duplicated data:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
-- Schema with duplicated foreign key data CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INT NOT NULL REFERENCES customers(customer_id), shipping_address_id INT NOT NULL REFERENCES shipping_addresses(address_id), -- Duplicated customer data (point-in-time for legal/audit) customer_name_snapshot VARCHAR(100) NOT NULL, customer_email_snapshot VARCHAR(255) NOT NULL, -- Duplicated shipping address (point-in-time - where it was shipped) shipping_street VARCHAR(255) NOT NULL, shipping_city VARCHAR(100) NOT NULL, shipping_state VARCHAR(50) NOT NULL, shipping_postal_code VARCHAR(20) NOT NULL, order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP, status VARCHAR(20) DEFAULT 'pending'); CREATE TABLE order_items ( order_item_id SERIAL PRIMARY KEY, order_id INT NOT NULL REFERENCES orders(order_id), product_id INT NOT NULL REFERENCES products(product_id), quantity INT NOT NULL, -- Duplicated product data (point-in-time - what was ordered) unit_price_snapshot DECIMAL(10, 2) NOT NULL, product_name_snapshot VARCHAR(255) NOT NULL, product_sku_snapshot VARCHAR(50) NOT NULL, category_name_snapshot VARCHAR(100) NOT NULL); -- Optimized query: single table + child table, no external joinsSELECT o.order_id, o.order_date, o.status, o.customer_name_snapshot AS customer_name, o.customer_email_snapshot AS customer_email, o.shipping_street, o.shipping_city, o.shipping_state, o.shipping_postal_code, oi.quantity, oi.unit_price_snapshot AS unit_price, oi.product_name_snapshot AS product_name, oi.product_sku_snapshot AS product_sku, oi.category_name_snapshot AS category_nameFROM orders oJOIN order_items oi ON o.order_id = oi.order_idWHERE o.customer_id = $1ORDER BY o.order_date DESCLIMIT 50; -- EXPLAIN ANALYZE result:-- Nested Loop (cost=0.85..123.45)-- -> Index Scan using idx_orders_customer_date on orders-- -> Index Scan using idx_order_items_order_id on order_items-- Planning Time: 0.456 ms-- Execution Time: 2.345 ms -- 67x FASTER!1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
-- Order creation procedure: Captures point-in-time snapshots CREATE OR REPLACE FUNCTION create_order( p_customer_id INT, p_shipping_address_id INT, p_items JSONB -- Array of {product_id, quantity})RETURNS INT AS $$DECLARE v_order_id INT; v_customer RECORD; v_address RECORD; v_item JSONB; v_product RECORD;BEGIN -- Fetch customer data for snapshot SELECT name, email INTO v_customer FROM customers WHERE customer_id = p_customer_id; -- Fetch shipping address for snapshot SELECT street, city, state, postal_code INTO v_address FROM shipping_addresses WHERE address_id = p_shipping_address_id; -- Create order with snapshots INSERT INTO orders ( customer_id, shipping_address_id, customer_name_snapshot, customer_email_snapshot, shipping_street, shipping_city, shipping_state, shipping_postal_code ) VALUES ( p_customer_id, p_shipping_address_id, v_customer.name, v_customer.email, v_address.street, v_address.city, v_address.state, v_address.postal_code ) RETURNING order_id INTO v_order_id; -- Create order items with product snapshots FOR v_item IN SELECT * FROM jsonb_array_elements(p_items) LOOP SELECT p.name, p.sku, p.current_price, c.name AS category_name INTO v_product FROM products p JOIN categories c ON p.category_id = c.category_id WHERE p.product_id = (v_item->>'product_id')::INT; INSERT INTO order_items ( order_id, product_id, quantity, unit_price_snapshot, product_name_snapshot, product_sku_snapshot, category_name_snapshot ) VALUES ( v_order_id, (v_item->>'product_id')::INT, (v_item->>'quantity')::INT, v_product.current_price, v_product.name, v_product.sku, v_product.category_name ); END LOOP; RETURN v_order_id;END;$$ LANGUAGE plpgsql;Query time reduced from 157ms to 2.3ms—a 67x improvement. The e-commerce platform can now serve order history pages instantly, improving user experience and reducing database load. The trade-off: slightly larger order tables and snapshot capture during order creation.
Duplicating foreign key data is a powerful technique for eliminating join overhead in read-heavy workloads. When implemented carefully, it can transform query performance while maintaining data integrity.
What's Next:
Duplicating foreign key data eliminates individual joins. But sometimes the right solution is even more aggressive: fully merging related tables to eliminate the relationship entirely. The next page explores merging tables—combining normalized tables into wider structures for maximum read efficiency.
You now understand foreign key data duplication as a denormalization technique. You can identify duplication candidates, implement appropriate patterns, maintain consistency, and handle update fan-out. Next, we explore merging tables for even greater denormalization.