Loading learning content...
Normalization produces schemas with many related tables, each representing a distinct entity or relationship. This separation is elegant and reduces redundancy—but it comes at a cost. Every boundary between tables requires:
When related data is almost always accessed together, the separation overhead may outweigh its benefits. Table merging addresses this by combining related tables into a single unified structure, eliminating the relationship overhead entirely.
This is the most aggressive form of denormalization. Unlike derived columns (which add redundant data) or data duplication (which copies specific attributes), table merging eliminates the normalized structure itself. It requires careful consideration and is appropriate only in specific scenarios.
By the end of this page, you will understand when and how to merge normalized tables. You'll learn to identify merge candidates, implement different merge patterns, handle the schema transformation, and manage the trade-offs this aggressive denormalization introduces.
Table merging combines two or more normalized tables into a single denormalized table. The relationship that previously existed between tables is absorbed into the merged structure.
Conceptual transformation:
BEFORE (Normalized):
┌──────────────────┐ ┌──────────────────┐
│ customers │ │ customer_details│
├──────────────────┤ ├──────────────────┤
│ customer_id (PK) │◄────│ customer_id (FK) │
│ name │ │ preferences │
│ email │ │ loyalty_tier │
│ created_at │ │ notes │
└──────────────────┘ └──────────────────┘
AFTER (Merged):
┌────────────────────────────────────────┐
│ customers_merged │
├────────────────────────────────────────┤
│ customer_id (PK) │
│ name │
│ email │
│ created_at │
│ preferences (from customer_details)│
│ loyalty_tier (from customer_details)│
│ notes (from customer_details)│
└────────────────────────────────────────┘
The one-to-one relationship between customers and customer_details is eliminated by merging all attributes into a single table.
| Aspect | Separated Tables | Merged Table |
|---|---|---|
| Join requirement | Required for every access | None - all data in one table |
| Storage efficiency | No redundancy | May introduce NULLs or sparse columns |
| Update scope | Update specific table only | Single table, but wider rows |
| Schema clarity | Clear entity boundaries | Blurred entity distinctions |
| Query simplicity | Requires JOIN syntax | Simple SELECT with all columns |
| Index strategy | Separate indexes per table | Wider table may need composite indexes |
Table merging changes your data model fundamentally. While it can be reversed, doing so requires a migration that re-extracts data into normalized forms. Plan merges carefully; they're harder to undo than adding derived columns or duplicating data.
Not all related tables should be merged. The decision requires analyzing relationship patterns, access patterns, and data characteristics.
Relationship types and merge suitability:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
-- Analysis queries to identify merge candidates -- 1. Find one-to-one relationships (candidate for merge)SELECT tc.table_name AS referencing_table, ccu.table_name AS referenced_table, kcu.column_name AS fk_column, (SELECT COUNT(*) FROM information_schema.columns WHERE table_name = tc.table_name) AS referencing_cols, (SELECT COUNT(DISTINCT fk_col.column_value) FROM your_referencing_table) AS unique_fk_values, (SELECT COUNT(*) FROM your_referencing_table) AS total_rowsFROM information_schema.table_constraints tcJOIN information_schema.key_column_usage kcu ON tc.constraint_name = kcu.constraint_nameJOIN information_schema.constraint_column_usage ccu ON ccu.constraint_name = tc.constraint_nameWHERE tc.constraint_type = 'FOREIGN KEY'; -- If unique_fk_values = total_rows, it's one-to-one (or one-to-zero/one) -- 2. Analyze access patterns: Are tables always joined together?SELECT query_pattern, COUNT(*) AS execution_count, AVG(execution_time_ms) AS avg_timeFROM query_logWHERE query_pattern LIKE '%orders%' AND query_pattern LIKE '%order_details%'GROUP BY query_patternORDER BY execution_count DESC; -- 3. Measure join overhead for candidate relationshipsEXPLAIN ANALYZESELECT o.*, od.*FROM orders oJOIN order_details od ON o.order_id = od.order_idWHERE o.order_id = 12345; -- Compare with merged table performance:EXPLAIN ANALYZESELECT * FROM orders_mergedWHERE order_id = 12345; -- 4. Check for NULL ratio in optional relationshipsSELECT COUNT(*) AS total_rows, COUNT(related_table_fk) AS rows_with_relationship, (COUNT(*) - COUNT(related_table_fk))::FLOAT / COUNT(*) * 100 AS null_percentageFROM main_table; -- High null percentage suggests optional relationship - merge with cautionThe strongest indicator for merging is consistent shared access. If 95%+ of queries to table A also join table B, the separation creates overhead with minimal benefit. Conversely, if table B is often accessed independently (for updates, reports, or specific features), keep the separation.
Different relationship types require different merge strategies. Let's examine the common patterns.
Pattern 1: Full Merge (Mandatory One-to-One)
When every parent row has exactly one child row, merge all columns into a single table:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- Pattern 1: Full Merge - Mandatory One-to-One Relationship-- Example: Users and User Profiles (every user has exactly one profile) -- BEFORE: Separated tablesCREATE TABLE users ( user_id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, password_hash VARCHAR(128) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); CREATE TABLE user_profiles ( profile_id SERIAL PRIMARY KEY, user_id INT NOT NULL UNIQUE REFERENCES users(user_id), display_name VARCHAR(100) NOT NULL, bio TEXT, avatar_url VARCHAR(500), timezone VARCHAR(50) DEFAULT 'UTC', notification_preferences JSONB, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- AFTER: Merged tableCREATE TABLE users_merged ( user_id SERIAL PRIMARY KEY, -- From original users table email VARCHAR(255) UNIQUE NOT NULL, password_hash VARCHAR(128) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- From user_profiles table (absorbed) display_name VARCHAR(100) NOT NULL, bio TEXT, avatar_url VARCHAR(500), timezone VARCHAR(50) DEFAULT 'UTC', notification_preferences JSONB, profile_updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Migration from separated to mergedINSERT INTO users_merged ( user_id, email, password_hash, created_at, display_name, bio, avatar_url, timezone, notification_preferences, profile_updated_at)SELECT u.user_id, u.email, u.password_hash, u.created_at, p.display_name, p.bio, p.avatar_url, p.timezone, p.notification_preferences, p.updated_atFROM users uJOIN user_profiles p ON u.user_id = p.user_id;Pattern 2: Partial Merge with NULLs (Optional One-to-One)
When the child table is optional (some parents have no child), merge with nullable columns:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- Pattern 2: Partial Merge - Optional One-to-One Relationship-- Example: Customers and Premium Membership (not all customers are premium) -- BEFORE: Separated tablesCREATE TABLE customers ( customer_id SERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL, email VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); CREATE TABLE premium_memberships ( membership_id SERIAL PRIMARY KEY, customer_id INT NOT NULL UNIQUE REFERENCES customers(customer_id), tier VARCHAR(20) NOT NULL, -- 'silver', 'gold', 'platinum' started_at TIMESTAMP NOT NULL, expires_at TIMESTAMP NOT NULL, benefits JSONB); -- AFTER: Merged with nullable premium columnsCREATE TABLE customers_merged ( customer_id SERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL, email VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- Premium membership fields (NULL if not premium) is_premium BOOLEAN GENERATED ALWAYS AS (premium_tier IS NOT NULL AND premium_expires_at > CURRENT_TIMESTAMP) STORED, premium_tier VARCHAR(20), -- NULL if not premium premium_started_at TIMESTAMP, premium_expires_at TIMESTAMP, premium_benefits JSONB); -- Index for premium membership queriesCREATE INDEX idx_customers_premium ON customers_merged(is_premium, premium_tier)WHERE is_premium = TRUE; -- MigrationINSERT INTO customers_merged ( customer_id, name, email, created_at, premium_tier, premium_started_at, premium_expires_at, premium_benefits)SELECT c.customer_id, c.name, c.email, c.created_at, p.tier, p.started_at, p.expires_at, p.benefitsFROM customers cLEFT JOIN premium_memberships p ON c.customer_id = p.customer_id;Pattern 3: Lookup Table Embedding (Many-to-One with Small Lookup)
When many rows reference the same small set of lookup values, embed the lookup data:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
-- Pattern 3: Lookup Table Embedding-- Example: Orders with Order Status (small, static lookup table) -- BEFORE: Separated tablesCREATE TABLE order_statuses ( status_id INT PRIMARY KEY, status_code VARCHAR(20) UNIQUE NOT NULL, display_name VARCHAR(50) NOT NULL, description TEXT, is_terminal BOOLEAN DEFAULT FALSE); -- Status table has only ~10 rows, rarely changesINSERT INTO order_statuses VALUES (1, 'pending', 'Pending', 'Order received, awaiting processing', FALSE), (2, 'processing', 'Processing', 'Order being prepared', FALSE), (3, 'shipped', 'Shipped', 'Order dispatched', FALSE), (4, 'delivered', 'Delivered', 'Order received by customer', TRUE), (5, 'cancelled', 'Cancelled', 'Order was cancelled', TRUE); CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INT NOT NULL, status_id INT REFERENCES order_statuses(status_id), order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- AFTER: Embed lookup values directly (with ENUM or CHECK constraint)CREATE TYPE order_status AS ENUM ( 'pending', 'processing', 'shipped', 'delivered', 'cancelled'); CREATE TABLE orders_merged ( order_id SERIAL PRIMARY KEY, customer_id INT NOT NULL, status order_status NOT NULL DEFAULT 'pending', order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Status metadata can be handled in application code or a config table-- that's only joined for admin/reporting, not for every order query -- Alternative: Keep status_id but embed common display dataCREATE TABLE orders_with_status_cache ( order_id SERIAL PRIMARY KEY, customer_id INT NOT NULL, status_id INT NOT NULL, -- Keep for referential integrity -- Embedded lookup values (duplicated from order_statuses) status_code VARCHAR(20) NOT NULL, status_display_name VARCHAR(50) NOT NULL, status_is_terminal BOOLEAN NOT NULL, order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Trigger to maintain embedded valuesCREATE OR REPLACE FUNCTION embed_order_status()RETURNS TRIGGER AS $$BEGIN SELECT status_code, display_name, is_terminal INTO NEW.status_code, NEW.status_display_name, NEW.status_is_terminal FROM order_statuses WHERE status_id = NEW.status_id; RETURN NEW;END;$$ LANGUAGE plpgsql; CREATE TRIGGER order_status_embed_triggerBEFORE INSERT OR UPDATE OF status_id ON orders_with_status_cacheFOR EACH ROW EXECUTE FUNCTION embed_order_status();For small, static lookup tables with fewer than 20-30 values, database ENUMs often eliminate both the join and the storage overhead of a lookup table. ENUMs are stored as integers internally but display as strings. Use them for status codes, categories, and similar fixed value sets.
Merging tables requires a careful migration process. Rushing can lead to data loss, extended downtime, or application errors. Follow a systematic approach:
Phase 1: Preparation
12345678910111213141516171819202122232425262728293031323334353637383940
-- Phase 1: Preparation - Create merged table alongside originals -- 1. Create the merged table structureCREATE TABLE users_merged ( user_id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, password_hash VARCHAR(128) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- Absorbed from user_profiles display_name VARCHAR(100), -- Temporarily nullable for migration bio TEXT, avatar_url VARCHAR(500), timezone VARCHAR(50) DEFAULT 'UTC', notification_preferences JSONB, profile_updated_at TIMESTAMP); -- 2. Create indexes matching current access patternsCREATE INDEX idx_users_merged_email ON users_merged(email);CREATE INDEX idx_users_merged_display_name ON users_merged(display_name); -- 3. Migrate existing dataINSERT INTO users_merged ( user_id, email, password_hash, created_at, display_name, bio, avatar_url, timezone, notification_preferences, profile_updated_at)SELECT u.user_id, u.email, u.password_hash, u.created_at, p.display_name, p.bio, p.avatar_url, p.timezone, p.notification_preferences, p.updated_atFROM users uLEFT JOIN user_profiles p ON u.user_id = p.user_id; -- 4. Verify row countsSELECT (SELECT COUNT(*) FROM users) AS original_users, (SELECT COUNT(*) FROM user_profiles) AS original_profiles, (SELECT COUNT(*) FROM users_merged) AS merged_rows;Phase 2: Dual-Write Period
Write to both old and new structures simultaneously while reads still use old tables:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
-- Phase 2: Dual-Write - Maintain both structures during transition -- Trigger to propagate writes to original tables into merged tableCREATE OR REPLACE FUNCTION sync_user_to_merged()RETURNS TRIGGER AS $$BEGIN IF TG_OP = 'INSERT' THEN INSERT INTO users_merged (user_id, email, password_hash, created_at) VALUES (NEW.user_id, NEW.email, NEW.password_hash, NEW.created_at) ON CONFLICT (user_id) DO UPDATE SET email = EXCLUDED.email, password_hash = EXCLUDED.password_hash; ELSIF TG_OP = 'UPDATE' THEN UPDATE users_merged SET email = NEW.email, password_hash = NEW.password_hash WHERE user_id = NEW.user_id; ELSIF TG_OP = 'DELETE' THEN DELETE FROM users_merged WHERE user_id = OLD.user_id; END IF; RETURN COALESCE(NEW, OLD);END;$$ LANGUAGE plpgsql; CREATE TRIGGER user_sync_triggerAFTER INSERT OR UPDATE OR DELETE ON usersFOR EACH ROW EXECUTE FUNCTION sync_user_to_merged(); -- Similar trigger for user_profilesCREATE OR REPLACE FUNCTION sync_profile_to_merged()RETURNS TRIGGER AS $$BEGIN IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN UPDATE users_merged SET display_name = NEW.display_name, bio = NEW.bio, avatar_url = NEW.avatar_url, timezone = NEW.timezone, notification_preferences = NEW.notification_preferences, profile_updated_at = NEW.updated_at WHERE user_id = NEW.user_id; ELSIF TG_OP = 'DELETE' THEN UPDATE users_merged SET display_name = NULL, bio = NULL, avatar_url = NULL, timezone = 'UTC', notification_preferences = NULL, profile_updated_at = NULL WHERE user_id = OLD.user_id; END IF; RETURN COALESCE(NEW, OLD);END;$$ LANGUAGE plpgsql; CREATE TRIGGER profile_sync_triggerAFTER INSERT OR UPDATE OR DELETE ON user_profilesFOR EACH ROW EXECUTE FUNCTION sync_profile_to_merged(); -- Verify sync is working (run after some production traffic)SELECT um.user_id, um.email = u.email AS email_match, um.display_name = p.display_name AS name_matchFROM users_merged umJOIN users u ON um.user_id = u.user_idLEFT JOIN user_profiles p ON um.user_id = p.user_idWHERE um.email != u.email OR um.display_name != p.display_nameLIMIT 100;Phase 3: Switch and Cleanup
Once merged table is verified, switch reads to it and eventually remove old tables:
1234567891011121314151617181920212223242526
-- Phase 3: Switch and Cleanup -- 1. Update application to read from merged table (application deployment)-- All queries now go to users_merged instead of users + user_profiles -- 2. After stable period with reads from merged table, drop sync triggersDROP TRIGGER user_sync_trigger ON users;DROP TRIGGER profile_sync_trigger ON user_profiles;DROP FUNCTION sync_user_to_merged();DROP FUNCTION sync_profile_to_merged(); -- 3. Rename tables for safety (don't delete yet)ALTER TABLE users RENAME TO users_deprecated;ALTER TABLE user_profiles RENAME TO user_profiles_deprecated;ALTER TABLE users_merged RENAME TO users; -- 4. Add NOT NULL constraints now that migration is completeALTER TABLE users ALTER COLUMN display_name SET NOT NULL; -- 5. Update sequences if neededSELECT setval('users_user_id_seq', (SELECT MAX(user_id) FROM users) + 1); -- 6. After extended validation period (days/weeks), drop old tables-- DROP TABLE users_deprecated CASCADE;-- DROP TABLE user_profiles_deprecated CASCADE;Keep deprecated tables for at least 2-4 weeks after switchover. Maintain the ability to recreate sync triggers and switch back if issues arise. Only drop old structures after thorough validation in production.
Table merging transforms your query patterns. Queries become simpler, but other considerations change.
Query simplification:
12345678910111213141516171819202122232425262728293031323334353637
-- Query transformation examples after table merge -- BEFORE: Join requiredSELECT u.user_id, u.email, p.display_name, p.bio, p.timezoneFROM users uJOIN user_profiles p ON u.user_id = p.user_idWHERE u.email = 'jane@example.com'; -- AFTER: Simple selectSELECT user_id, email, display_name, bio, timezoneFROM usersWHERE email = 'jane@example.com'; -- BEFORE: Existence check requires subquery or joinSELECT u.*FROM users uWHERE EXISTS (SELECT 1 FROM user_profiles p WHERE p.user_id = u.user_id); -- AFTER: Direct filter on merged dataSELECT *FROM usersWHERE display_name IS NOT NULL; -- If optional relationship was merged -- BEFORE: Aggregation across tablesSELECT COUNT(DISTINCT u.user_id) AS total_users, COUNT(DISTINCT p.user_id) AS users_with_profiles, COUNT(CASE WHEN p.timezone != 'UTC' THEN 1 END) AS custom_timezone_usersFROM users uLEFT JOIN user_profiles p ON u.user_id = p.user_id; -- AFTER: Single-table aggregationSELECT COUNT(*) AS total_users, COUNT(display_name) AS users_with_profiles, COUNT(CASE WHEN timezone != 'UTC' THEN 1 END) AS custom_timezone_usersFROM users;Index strategy changes:
Merged tables often require different indexing strategies than separated tables:
12345678910111213141516171819202122232425262728
-- Index strategy for merged tables -- BEFORE: Indexes on separate tablesCREATE INDEX idx_users_email ON users(email);CREATE INDEX idx_profiles_user ON user_profiles(user_id);CREATE INDEX idx_profiles_display ON user_profiles(display_name); -- AFTER: Consolidated indexes on merged tableCREATE UNIQUE INDEX idx_users_email ON users(email);CREATE INDEX idx_users_display_name ON users(display_name); -- Composite index for common combined queriesCREATE INDEX idx_users_email_display ON users(email, display_name); -- Partial indexes for optional relationship dataCREATE INDEX idx_users_with_profiles ON users(user_id, display_name)WHERE display_name IS NOT NULL; -- Merged tables can have wider rows, impacting index sizes-- Monitor index sizes post-migration:SELECT indexrelname AS index_name, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, idx_scan AS scans, idx_tup_read AS tuples_readFROM pg_stat_user_indexesWHERE schemaname = 'public' AND relname = 'users'ORDER BY pg_relation_size(indexrelid) DESC;| Aspect | Consideration |
|---|---|
| Row width | Wider rows mean fewer rows per page, affecting range scans and sequential reads |
| Composite indexes | May need new composite indexes for patterns that previously used join conditions |
| Partial indexes | Use for sparse/optional data to keep indexes compact |
| Covering indexes | More columns available for covering index strategies |
| Index maintenance | Fewer indexes total (eliminated join indexes), but each may be larger |
After merging, run EXPLAIN ANALYZE on your critical queries. The optimizer may choose different plans for the merged table. Create new indexes based on actual execution plans, not assumptions from the old schema.
Table merging has significant trade-offs. There are many scenarios where maintaining separated tables is the better choice.
Avoid merging when:
1234567891011121314151617181920212223242526272829303132
-- Example: Why NOT to merge orders with order_audit_log -- Orders table: Compact, frequently accessedCREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INT NOT NULL, status VARCHAR(20) NOT NULL, total DECIMAL(10, 2) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- ~100 bytes per row, 10M rows, frequently queried -- Audit log: Large, rarely accessedCREATE TABLE order_audit_log ( log_id SERIAL PRIMARY KEY, order_id INT NOT NULL REFERENCES orders(order_id), action VARCHAR(50) NOT NULL, changed_by VARCHAR(100), old_values JSONB, new_values JSONB, logged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- ~2KB per row, 100M rows, rarely queried -- WHY NOT MERGE:-- 1. Audit has many rows per order (one-to-many, not one-to-one)-- 2. Order queries would load massive audit data they don't need-- 3. Audit has different retention (compliance: 7 years) vs orders (operational: 2 years)-- 4. Audit writes constantly; orders are mostly stable after creation -- Keep separated. If needed, add order summary fields as derived columns:ALTER TABLE orders ADD COLUMN last_status_change TIMESTAMP;-- This gives you the "merge benefit" for specific commonly-needed data-- without the overhead of full mergeTable merging is a powerful but aggressive optimization. In most cases, derived columns or foreign key data duplication achieve similar read performance improvements with less schema disruption. Reserve full merging for clear one-to-one relationships with consistent access patterns.
Table merging is the most aggressive denormalization technique, eliminating the normalized structure entirely. When applied to appropriate relationships, it delivers maximum query simplification and performance improvement.
What's Next:
We've covered individual denormalization techniques. The final page brings everything together with summary tables—purpose-built denormalized structures that combine multiple techniques for reporting and analytics use cases.
You now understand table merging as a denormalization technique. You can identify appropriate merge candidates, implement different merge patterns, execute safe migrations, and recognize when merging is counterproductive. Next, we explore summary tables for analytics optimization.