Database Management SystemsDenormalization

Denormalization Techniques

LevelIntermediate

Duration75 mins

TopicDenormalization

4 / 5

Merging Tables

When Separation Becomes Overhead

Normalization produces schemas with many related tables, each representing a distinct entity or relationship. This separation is elegant and reduces redundancy—but it comes at a cost. Every boundary between tables requires:

Join operations to reconnect related data
Foreign key lookups to traverse relationships
Index maintenance on both sides of relationships
Transaction coordination across multiple tables

When related data is almost always accessed together, the separation overhead may outweigh its benefits. Table merging addresses this by combining related tables into a single unified structure, eliminating the relationship overhead entirely.

This is the most aggressive form of denormalization. Unlike derived columns (which add redundant data) or data duplication (which copies specific attributes), table merging eliminates the normalized structure itself. It requires careful consideration and is appropriate only in specific scenarios.

What You Will Learn

By the end of this page, you will understand when and how to merge normalized tables. You'll learn to identify merge candidates, implement different merge patterns, handle the schema transformation, and manage the trade-offs this aggressive denormalization introduces.

Understanding Table Merging

Table merging combines two or more normalized tables into a single denormalized table. The relationship that previously existed between tables is absorbed into the merged structure.

Conceptual transformation:

BEFORE (Normalized):
┌──────────────────┐     ┌──────────────────┐
│     customers    │     │  customer_details│
├──────────────────┤     ├──────────────────┤
│ customer_id (PK) │◄────│ customer_id (FK) │
│ name             │     │ preferences      │
│ email            │     │ loyalty_tier     │
│ created_at       │     │ notes            │
└──────────────────┘     └──────────────────┘

AFTER (Merged):
┌────────────────────────────────────────┐
│            customers_merged            │
├────────────────────────────────────────┤
│ customer_id (PK)                       │
│ name                                   │
│ email                                  │
│ created_at                             │
│ preferences     (from customer_details)│
│ loyalty_tier    (from customer_details)│
│ notes           (from customer_details)│
└────────────────────────────────────────┘

The one-to-one relationship between customers and customer_details is eliminated by merging all attributes into a single table.

Table Merging Trade-offs
Aspect	Separated Tables	Merged Table
Join requirement	Required for every access	None - all data in one table
Storage efficiency	No redundancy	May introduce NULLs or sparse columns
Update scope	Update specific table only	Single table, but wider rows
Schema clarity	Clear entity boundaries	Blurred entity distinctions
Query simplicity	Requires JOIN syntax	Simple SELECT with all columns
Index strategy	Separate indexes per table	Wider table may need composite indexes

Irreversibility Consideration

Table merging changes your data model fundamentally. While it can be reversed, doing so requires a migration that re-extracts data into normalized forms. Plan merges carefully; they're harder to undo than adding derived columns or duplicating data.

Identifying Merge Candidates

Not all related tables should be merged. The decision requires analyzing relationship patterns, access patterns, and data characteristics.

Relationship types and merge suitability:

Strong Candidates for Merging

•Mandatory one-to-one relationships — When every row in table A has exactly one corresponding row in table B, merging eliminates redundant joins without data concerns (e.g., user + user_profile)
•One-to-one with consistent access — When table A data is almost never accessed without table B data, separation provides no benefit (e.g., order + order_billing_info)
•Small lookup tables with static data — When a referenced table has few rows that rarely change, embedding avoids the join overhead (e.g., status codes, country codes)
•Vertical partitions created for legacy reasons — Tables split historically that should logically be one entity (e.g., old systems with column limits)

Poor Candidates for Merging

•One-to-many relationships — Merging would require repeating parent data for every child (massive redundancy) or denormalizing into arrays/JSON (query complexity)
•Many-to-many relationships — Cannot be directly merged; would require duplicate rows or complex structures
•Optional one-to-one with sparse data — If table B exists for only 5% of table A rows, merging adds many NULL columns to 95% of rows
•Independently accessed entities — If tables are frequently accessed separately, merging costs row width without join elimination
•Frequently updated secondary tables — If table B updates often while table A is stable, merging increases write amplification

merge-candidate-analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- Analysis queries to identify merge candidates
 
-- 1. Find one-to-one relationships (candidate for merge)
SELECT 
    tc.table_name AS referencing_table,
    ccu.table_name AS referenced_table,
    kcu.column_name AS fk_column,
    (SELECT COUNT(*) FROM information_schema.columns 
     WHERE table_name = tc.table_name) AS referencing_cols,
    (SELECT COUNT(DISTINCT fk_col.column_value) 
     FROM your_referencing_table) AS unique_fk_values,
    (SELECT COUNT(*) FROM your_referencing_table) AS total_rows
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu 
    ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu 
    ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';
 
-- If unique_fk_values = total_rows, it's one-to-one (or one-to-zero/one)
 
-- 2. Analyze access patterns: Are tables always joined together?
SELECT 
    query_pattern,
    COUNT(*) AS execution_count,
    AVG(execution_time_ms) AS avg_time
FROM query_log
WHERE query_pattern LIKE '%orders%' AND query_pattern LIKE '%order_details%'
GROUP BY query_pattern
ORDER BY execution_count DESC;
 
-- 3. Measure join overhead for candidate relationships
EXPLAIN ANALYZE
SELECT o.*, od.*
FROM orders o
JOIN order_details od ON o.order_id = od.order_id
WHERE o.order_id = 12345;
 
-- Compare with merged table performance:
EXPLAIN ANALYZE
SELECT * FROM orders_merged
WHERE order_id = 12345;
 
-- 4. Check for NULL ratio in optional relationships
SELECT 
    COUNT(*) AS total_rows,
    COUNT(related_table_fk) AS rows_with_relationship,
    (COUNT(*) - COUNT(related_table_fk))::FLOAT / COUNT(*) * 100 AS null_percentage
FROM main_table;
 
-- High null percentage suggests optional relationship - merge with caution

Access Pattern Analysis

The strongest indicator for merging is consistent shared access. If 95%+ of queries to table A also join table B, the separation creates overhead with minimal benefit. Conversely, if table B is often accessed independently (for updates, reports, or specific features), keep the separation.

Merge Patterns

Different relationship types require different merge strategies. Let's examine the common patterns.

Pattern 1: Full Merge (Mandatory One-to-One)

When every parent row has exactly one child row, merge all columns into a single table:

full-merge-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Pattern 1: Full Merge - Mandatory One-to-One Relationship
-- Example: Users and User Profiles (every user has exactly one profile)
 
-- BEFORE: Separated tables
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(128) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
CREATE TABLE user_profiles (
    profile_id SERIAL PRIMARY KEY,
    user_id INT NOT NULL UNIQUE REFERENCES users(user_id),
    display_name VARCHAR(100) NOT NULL,
    bio TEXT,
    avatar_url VARCHAR(500),
    timezone VARCHAR(50) DEFAULT 'UTC',
    notification_preferences JSONB,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- AFTER: Merged table
CREATE TABLE users_merged (
    user_id SERIAL PRIMARY KEY,
    
    -- From original users table
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(128) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- From user_profiles table (absorbed)
    display_name VARCHAR(100) NOT NULL,
    bio TEXT,
    avatar_url VARCHAR(500),
    timezone VARCHAR(50) DEFAULT 'UTC',
    notification_preferences JSONB,
    profile_updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Migration from separated to merged
INSERT INTO users_merged (
    user_id, email, password_hash, created_at,
    display_name, bio, avatar_url, timezone, notification_preferences, profile_updated_at
)
SELECT 
    u.user_id, u.email, u.password_hash, u.created_at,
    p.display_name, p.bio, p.avatar_url, p.timezone, p.notification_preferences, p.updated_at
FROM users u
JOIN user_profiles p ON u.user_id = p.user_id;

Pattern 2: Partial Merge with NULLs (Optional One-to-One)

When the child table is optional (some parents have no child), merge with nullable columns:

partial-merge-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Pattern 2: Partial Merge - Optional One-to-One Relationship
-- Example: Customers and Premium Membership (not all customers are premium)
 
-- BEFORE: Separated tables
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
CREATE TABLE premium_memberships (
    membership_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL UNIQUE REFERENCES customers(customer_id),
    tier VARCHAR(20) NOT NULL,  -- 'silver', 'gold', 'platinum'
    started_at TIMESTAMP NOT NULL,
    expires_at TIMESTAMP NOT NULL,
    benefits JSONB
);
 
-- AFTER: Merged with nullable premium columns
CREATE TABLE customers_merged (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- Premium membership fields (NULL if not premium)
    is_premium BOOLEAN GENERATED ALWAYS AS 
        (premium_tier IS NOT NULL AND premium_expires_at > CURRENT_TIMESTAMP) STORED,
    premium_tier VARCHAR(20),  -- NULL if not premium
    premium_started_at TIMESTAMP,
    premium_expires_at TIMESTAMP,
    premium_benefits JSONB
);
 
-- Index for premium membership queries
CREATE INDEX idx_customers_premium ON customers_merged(is_premium, premium_tier)
WHERE is_premium = TRUE;
 
-- Migration
INSERT INTO customers_merged (
    customer_id, name, email, created_at,
    premium_tier, premium_started_at, premium_expires_at, premium_benefits
)
SELECT 
    c.customer_id, c.name, c.email, c.created_at,
    p.tier, p.started_at, p.expires_at, p.benefits
FROM customers c
LEFT JOIN premium_memberships p ON c.customer_id = p.customer_id;

Pattern 3: Lookup Table Embedding (Many-to-One with Small Lookup)

When many rows reference the same small set of lookup values, embed the lookup data:

lookup-embedding-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
-- Pattern 3: Lookup Table Embedding
-- Example: Orders with Order Status (small, static lookup table)
 
-- BEFORE: Separated tables
CREATE TABLE order_statuses (
    status_id INT PRIMARY KEY,
    status_code VARCHAR(20) UNIQUE NOT NULL,
    display_name VARCHAR(50) NOT NULL,
    description TEXT,
    is_terminal BOOLEAN DEFAULT FALSE
);
 
-- Status table has only ~10 rows, rarely changes
INSERT INTO order_statuses VALUES
    (1, 'pending', 'Pending', 'Order received, awaiting processing', FALSE),
    (2, 'processing', 'Processing', 'Order being prepared', FALSE),
    (3, 'shipped', 'Shipped', 'Order dispatched', FALSE),
    (4, 'delivered', 'Delivered', 'Order received by customer', TRUE),
    (5, 'cancelled', 'Cancelled', 'Order was cancelled', TRUE);
 
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status_id INT REFERENCES order_statuses(status_id),
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- AFTER: Embed lookup values directly (with ENUM or CHECK constraint)
CREATE TYPE order_status AS ENUM (
    'pending', 'processing', 'shipped', 'delivered', 'cancelled'
);
 
CREATE TABLE orders_merged (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status order_status NOT NULL DEFAULT 'pending',
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Status metadata can be handled in application code or a config table
-- that's only joined for admin/reporting, not for every order query
 
-- Alternative: Keep status_id but embed common display data
CREATE TABLE orders_with_status_cache (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status_id INT NOT NULL,  -- Keep for referential integrity
    
    -- Embedded lookup values (duplicated from order_statuses)
    status_code VARCHAR(20) NOT NULL,
    status_display_name VARCHAR(50) NOT NULL,
    status_is_terminal BOOLEAN NOT NULL,
    
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Trigger to maintain embedded values
CREATE OR REPLACE FUNCTION embed_order_status()
RETURNS TRIGGER AS $$
BEGIN
    SELECT status_code, display_name, is_terminal
    INTO NEW.status_code, NEW.status_display_name, NEW.status_is_terminal
    FROM order_statuses
    WHERE status_id = NEW.status_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER order_status_embed_trigger
BEFORE INSERT OR UPDATE OF status_id ON orders_with_status_cache
FOR EACH ROW EXECUTE FUNCTION embed_order_status();

Consider PostgreSQL ENUMs

For small, static lookup tables with fewer than 20-30 values, database ENUMs often eliminate both the join and the storage overhead of a lookup table. ENUMs are stored as integers internally but display as strings. Use them for status codes, categories, and similar fixed value sets.

Schema Transformation Process

Merging tables requires a careful migration process. Rushing can lead to data loss, extended downtime, or application errors. Follow a systematic approach:

Phase 1: Preparation

merge-phase1-preparation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Phase 1: Preparation - Create merged table alongside originals
 
-- 1. Create the merged table structure
CREATE TABLE users_merged (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(128) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- Absorbed from user_profiles
    display_name VARCHAR(100),  -- Temporarily nullable for migration
    bio TEXT,
    avatar_url VARCHAR(500),
    timezone VARCHAR(50) DEFAULT 'UTC',
    notification_preferences JSONB,
    profile_updated_at TIMESTAMP
);
 
-- 2. Create indexes matching current access patterns
CREATE INDEX idx_users_merged_email ON users_merged(email);
CREATE INDEX idx_users_merged_display_name ON users_merged(display_name);
 
-- 3. Migrate existing data
INSERT INTO users_merged (
    user_id, email, password_hash, created_at,
    display_name, bio, avatar_url, timezone, 
    notification_preferences, profile_updated_at
)
SELECT 
    u.user_id, u.email, u.password_hash, u.created_at,
    p.display_name, p.bio, p.avatar_url, p.timezone,
    p.notification_preferences, p.updated_at
FROM users u
LEFT JOIN user_profiles p ON u.user_id = p.user_id;
 
-- 4. Verify row counts
SELECT 
    (SELECT COUNT(*) FROM users) AS original_users,
    (SELECT COUNT(*) FROM user_profiles) AS original_profiles,
    (SELECT COUNT(*) FROM users_merged) AS merged_rows;

Phase 2: Dual-Write Period

Write to both old and new structures simultaneously while reads still use old tables:

merge-phase2-dual-write.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- Phase 2: Dual-Write - Maintain both structures during transition
 
-- Trigger to propagate writes to original tables into merged table
CREATE OR REPLACE FUNCTION sync_user_to_merged()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO users_merged (user_id, email, password_hash, created_at)
        VALUES (NEW.user_id, NEW.email, NEW.password_hash, NEW.created_at)
        ON CONFLICT (user_id) DO UPDATE SET
            email = EXCLUDED.email,
            password_hash = EXCLUDED.password_hash;
    ELSIF TG_OP = 'UPDATE' THEN
        UPDATE users_merged
        SET email = NEW.email,
            password_hash = NEW.password_hash
        WHERE user_id = NEW.user_id;
    ELSIF TG_OP = 'DELETE' THEN
        DELETE FROM users_merged WHERE user_id = OLD.user_id;
    END IF;
    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER user_sync_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION sync_user_to_merged();
 
-- Similar trigger for user_profiles
CREATE OR REPLACE FUNCTION sync_profile_to_merged()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
        UPDATE users_merged
        SET display_name = NEW.display_name,
            bio = NEW.bio,
            avatar_url = NEW.avatar_url,
            timezone = NEW.timezone,
            notification_preferences = NEW.notification_preferences,
            profile_updated_at = NEW.updated_at
        WHERE user_id = NEW.user_id;
    ELSIF TG_OP = 'DELETE' THEN
        UPDATE users_merged
        SET display_name = NULL,
            bio = NULL,
            avatar_url = NULL,
            timezone = 'UTC',
            notification_preferences = NULL,
            profile_updated_at = NULL
        WHERE user_id = OLD.user_id;
    END IF;
    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER profile_sync_trigger
AFTER INSERT OR UPDATE OR DELETE ON user_profiles
FOR EACH ROW EXECUTE FUNCTION sync_profile_to_merged();
 
-- Verify sync is working (run after some production traffic)
SELECT 
    um.user_id,
    um.email = u.email AS email_match,
    um.display_name = p.display_name AS name_match
FROM users_merged um
JOIN users u ON um.user_id = u.user_id
LEFT JOIN user_profiles p ON um.user_id = p.user_id
WHERE um.email != u.email OR um.display_name != p.display_name
LIMIT 100;

Phase 3: Switch and Cleanup

Once merged table is verified, switch reads to it and eventually remove old tables:

merge-phase3-switchover.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Phase 3: Switch and Cleanup
 
-- 1. Update application to read from merged table (application deployment)
-- All queries now go to users_merged instead of users + user_profiles
 
-- 2. After stable period with reads from merged table, drop sync triggers
DROP TRIGGER user_sync_trigger ON users;
DROP TRIGGER profile_sync_trigger ON user_profiles;
DROP FUNCTION sync_user_to_merged();
DROP FUNCTION sync_profile_to_merged();
 
-- 3. Rename tables for safety (don't delete yet)
ALTER TABLE users RENAME TO users_deprecated;
ALTER TABLE user_profiles RENAME TO user_profiles_deprecated;
ALTER TABLE users_merged RENAME TO users;
 
-- 4. Add NOT NULL constraints now that migration is complete
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;
 
-- 5. Update sequences if needed
SELECT setval('users_user_id_seq', 
    (SELECT MAX(user_id) FROM users) + 1);
 
-- 6. After extended validation period (days/weeks), drop old tables
-- DROP TABLE users_deprecated CASCADE;
-- DROP TABLE user_profiles_deprecated CASCADE;

Maintain Rollback Capability

Keep deprecated tables for at least 2-4 weeks after switchover. Maintain the ability to recreate sync triggers and switch back if issues arise. Only drop old structures after thorough validation in production.

Impact on Queries and Indexes

Table merging transforms your query patterns. Queries become simpler, but other considerations change.

Query simplification:

query-transformation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Query transformation examples after table merge
 
-- BEFORE: Join required
SELECT u.user_id, u.email, p.display_name, p.bio, p.timezone
FROM users u
JOIN user_profiles p ON u.user_id = p.user_id
WHERE u.email = 'jane@example.com';
 
-- AFTER: Simple select
SELECT user_id, email, display_name, bio, timezone
FROM users
WHERE email = 'jane@example.com';
 
-- BEFORE: Existence check requires subquery or join
SELECT u.*
FROM users u
WHERE EXISTS (SELECT 1 FROM user_profiles p WHERE p.user_id = u.user_id);
 
-- AFTER: Direct filter on merged data
SELECT *
FROM users
WHERE display_name IS NOT NULL;  -- If optional relationship was merged
 
-- BEFORE: Aggregation across tables
SELECT 
    COUNT(DISTINCT u.user_id) AS total_users,
    COUNT(DISTINCT p.user_id) AS users_with_profiles,
    COUNT(CASE WHEN p.timezone != 'UTC' THEN 1 END) AS custom_timezone_users
FROM users u
LEFT JOIN user_profiles p ON u.user_id = p.user_id;
 
-- AFTER: Single-table aggregation
SELECT 
    COUNT(*) AS total_users,
    COUNT(display_name) AS users_with_profiles,
    COUNT(CASE WHEN timezone != 'UTC' THEN 1 END) AS custom_timezone_users
FROM users;

Index strategy changes:

Merged tables often require different indexing strategies than separated tables:

merged-table-indexes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Index strategy for merged tables
 
-- BEFORE: Indexes on separate tables
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_profiles_user ON user_profiles(user_id);
CREATE INDEX idx_profiles_display ON user_profiles(display_name);
 
-- AFTER: Consolidated indexes on merged table
CREATE UNIQUE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_display_name ON users(display_name);
 
-- Composite index for common combined queries
CREATE INDEX idx_users_email_display ON users(email, display_name);
 
-- Partial indexes for optional relationship data
CREATE INDEX idx_users_with_profiles ON users(user_id, display_name)
WHERE display_name IS NOT NULL;
 
-- Merged tables can have wider rows, impacting index sizes
-- Monitor index sizes post-migration:
SELECT 
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS scans,
    idx_tup_read AS tuples_read
FROM pg_stat_user_indexes
WHERE schemaname = 'public' AND relname = 'users'
ORDER BY pg_relation_size(indexrelid) DESC;

Index Considerations for Merged Tables
Aspect	Consideration
Row width	Wider rows mean fewer rows per page, affecting range scans and sequential reads
Composite indexes	May need new composite indexes for patterns that previously used join conditions
Partial indexes	Use for sparse/optional data to keep indexes compact
Covering indexes	More columns available for covering index strategies
Index maintenance	Fewer indexes total (eliminated join indexes), but each may be larger

Analyze Query Plans Post-Merge

After merging, run EXPLAIN ANALYZE on your critical queries. The optimizer may choose different plans for the merged table. Create new indexes based on actual execution plans, not assumptions from the old schema.

When NOT to Merge Tables

Table merging has significant trade-offs. There are many scenarios where maintaining separated tables is the better choice.

Avoid merging when:

Scenarios Where Merging is Counterproductive

•Tables have different access patterns — If table A is read 1000x/sec and table B is read 10x/sec, merged table carries unnecessary data for most queries
•Significant storage waste from NULLs — If optional relationship applies to <20% of rows, 80%+ of rows have wasted columns
•Tables require different security policies — Separating sensitive data (PII, financial) in distinct tables enables clearer access control
•One table is frequently updated — Merging stable data with frequently-changing data increases write amplification
•Tables have different archival requirements — Transaction details may need 7-year retention while user profiles need 30-day retention after account deletion
•Vertical partitioning is intentional — Some schemas split tables to keep hot data small and cold data separate

merge-antipattern-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Example: Why NOT to merge orders with order_audit_log
 
-- Orders table: Compact, frequently accessed
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status VARCHAR(20) NOT NULL,
    total DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);  -- ~100 bytes per row, 10M rows, frequently queried
 
-- Audit log: Large, rarely accessed
CREATE TABLE order_audit_log (
    log_id SERIAL PRIMARY KEY,
    order_id INT NOT NULL REFERENCES orders(order_id),
    action VARCHAR(50) NOT NULL,
    changed_by VARCHAR(100),
    old_values JSONB,
    new_values JSONB,
    logged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);  -- ~2KB per row, 100M rows, rarely queried
 
-- WHY NOT MERGE:
-- 1. Audit has many rows per order (one-to-many, not one-to-one)
-- 2. Order queries would load massive audit data they don't need
-- 3. Audit has different retention (compliance: 7 years) vs orders (operational: 2 years)
-- 4. Audit writes constantly; orders are mostly stable after creation
 
-- Keep separated. If needed, add order summary fields as derived columns:
ALTER TABLE orders ADD COLUMN last_status_change TIMESTAMP;
-- This gives you the "merge benefit" for specific commonly-needed data
-- without the overhead of full merge

Merge Sparingly

Table merging is a powerful but aggressive optimization. In most cases, derived columns or foreign key data duplication achieve similar read performance improvements with less schema disruption. Reserve full merging for clear one-to-one relationships with consistent access patterns.

Summary and Best Practices

Table merging is the most aggressive denormalization technique, eliminating the normalized structure entirely. When applied to appropriate relationships, it delivers maximum query simplification and performance improvement.

Key Takeaways

•Table merging combines normalized tables into one — Eliminates join overhead entirely by absorbing related entities into a single table.
•Best suited for mandatory one-to-one relationships — When every row in table A has exactly one corresponding row in table B that's always accessed together.
•Use different merge patterns based on relationship type — Full merge for mandatory relationships, partial merge with NULLs for optional, embedding for small lookups.
•Follow a phased migration process — Create merged table, enable dual-write, validate, switch reads, then cleanup old structures after extended validation.
•Reassess indexes after merging — Query plans change with merged tables; create indexes based on actual execution analysis.
•Know when NOT to merge — Different access patterns, sparse optional data, security separation, and different retention requirements all argue against merging.

What's Next:

We've covered individual denormalization techniques. The final page brings everything together with summary tables—purpose-built denormalized structures that combine multiple techniques for reporting and analytics use cases.

Page Complete

You now understand table merging as a denormalization technique. You can identify appropriate merge candidates, implement different merge patterns, execute safe migrations, and recognize when merging is counterproductive. Next, we explore summary tables for analytics optimization.

4 / 5

Loading learning content...

Database Management SystemsDenormalization

Denormalization Techniques

LevelIntermediate

Duration75 mins

TopicDenormalization

4 / 5

Merging Tables

When Separation Becomes Overhead

Join operations to reconnect related data
Foreign key lookups to traverse relationships
Index maintenance on both sides of relationships
Transaction coordination across multiple tables

What You Will Learn

Understanding Table Merging

Table merging combines two or more normalized tables into a single denormalized table. The relationship that previously existed between tables is absorbed into the merged structure.

Conceptual transformation:

BEFORE (Normalized):
┌──────────────────┐     ┌──────────────────┐
│     customers    │     │  customer_details│
├──────────────────┤     ├──────────────────┤
│ customer_id (PK) │◄────│ customer_id (FK) │
│ name             │     │ preferences      │
│ email            │     │ loyalty_tier     │
│ created_at       │     │ notes            │
└──────────────────┘     └──────────────────┘

AFTER (Merged):
┌────────────────────────────────────────┐
│            customers_merged            │
├────────────────────────────────────────┤
│ customer_id (PK)                       │
│ name                                   │
│ email                                  │
│ created_at                             │
│ preferences     (from customer_details)│
│ loyalty_tier    (from customer_details)│
│ notes           (from customer_details)│
└────────────────────────────────────────┘

The one-to-one relationship between customers and customer_details is eliminated by merging all attributes into a single table.

Table Merging Trade-offs
Aspect	Separated Tables	Merged Table
Join requirement	Required for every access	None - all data in one table
Storage efficiency	No redundancy	May introduce NULLs or sparse columns
Update scope	Update specific table only	Single table, but wider rows
Schema clarity	Clear entity boundaries	Blurred entity distinctions
Query simplicity	Requires JOIN syntax	Simple SELECT with all columns
Index strategy	Separate indexes per table	Wider table may need composite indexes

Irreversibility Consideration

Identifying Merge Candidates

Not all related tables should be merged. The decision requires analyzing relationship patterns, access patterns, and data characteristics.

Relationship types and merge suitability:

Strong Candidates for Merging

•Mandatory one-to-one relationships — When every row in table A has exactly one corresponding row in table B, merging eliminates redundant joins without data concerns (e.g., user + user_profile)
•One-to-one with consistent access — When table A data is almost never accessed without table B data, separation provides no benefit (e.g., order + order_billing_info)
•Small lookup tables with static data — When a referenced table has few rows that rarely change, embedding avoids the join overhead (e.g., status codes, country codes)
•Vertical partitions created for legacy reasons — Tables split historically that should logically be one entity (e.g., old systems with column limits)

Poor Candidates for Merging

•One-to-many relationships — Merging would require repeating parent data for every child (massive redundancy) or denormalizing into arrays/JSON (query complexity)
•Many-to-many relationships — Cannot be directly merged; would require duplicate rows or complex structures
•Optional one-to-one with sparse data — If table B exists for only 5% of table A rows, merging adds many NULL columns to 95% of rows
•Independently accessed entities — If tables are frequently accessed separately, merging costs row width without join elimination
•Frequently updated secondary tables — If table B updates often while table A is stable, merging increases write amplification

merge-candidate-analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- Analysis queries to identify merge candidates
 
-- 1. Find one-to-one relationships (candidate for merge)
SELECT 
    tc.table_name AS referencing_table,
    ccu.table_name AS referenced_table,
    kcu.column_name AS fk_column,
    (SELECT COUNT(*) FROM information_schema.columns 
     WHERE table_name = tc.table_name) AS referencing_cols,
    (SELECT COUNT(DISTINCT fk_col.column_value) 
     FROM your_referencing_table) AS unique_fk_values,
    (SELECT COUNT(*) FROM your_referencing_table) AS total_rows
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu 
    ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu 
    ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';
 
-- If unique_fk_values = total_rows, it's one-to-one (or one-to-zero/one)
 
-- 2. Analyze access patterns: Are tables always joined together?
SELECT 
    query_pattern,
    COUNT(*) AS execution_count,
    AVG(execution_time_ms) AS avg_time
FROM query_log
WHERE query_pattern LIKE '%orders%' AND query_pattern LIKE '%order_details%'
GROUP BY query_pattern
ORDER BY execution_count DESC;
 
-- 3. Measure join overhead for candidate relationships
EXPLAIN ANALYZE
SELECT o.*, od.*
FROM orders o
JOIN order_details od ON o.order_id = od.order_id
WHERE o.order_id = 12345;
 
-- Compare with merged table performance:
EXPLAIN ANALYZE
SELECT * FROM orders_merged
WHERE order_id = 12345;
 
-- 4. Check for NULL ratio in optional relationships
SELECT 
    COUNT(*) AS total_rows,
    COUNT(related_table_fk) AS rows_with_relationship,
    (COUNT(*) - COUNT(related_table_fk))::FLOAT / COUNT(*) * 100 AS null_percentage
FROM main_table;
 
-- High null percentage suggests optional relationship - merge with caution

Access Pattern Analysis

Merge Patterns

Different relationship types require different merge strategies. Let's examine the common patterns.

Pattern 1: Full Merge (Mandatory One-to-One)

When every parent row has exactly one child row, merge all columns into a single table:

full-merge-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Pattern 1: Full Merge - Mandatory One-to-One Relationship
-- Example: Users and User Profiles (every user has exactly one profile)
 
-- BEFORE: Separated tables
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(128) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
CREATE TABLE user_profiles (
    profile_id SERIAL PRIMARY KEY,
    user_id INT NOT NULL UNIQUE REFERENCES users(user_id),
    display_name VARCHAR(100) NOT NULL,
    bio TEXT,
    avatar_url VARCHAR(500),
    timezone VARCHAR(50) DEFAULT 'UTC',
    notification_preferences JSONB,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- AFTER: Merged table
CREATE TABLE users_merged (
    user_id SERIAL PRIMARY KEY,
    
    -- From original users table
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(128) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- From user_profiles table (absorbed)
    display_name VARCHAR(100) NOT NULL,
    bio TEXT,
    avatar_url VARCHAR(500),
    timezone VARCHAR(50) DEFAULT 'UTC',
    notification_preferences JSONB,
    profile_updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Migration from separated to merged
INSERT INTO users_merged (
    user_id, email, password_hash, created_at,
    display_name, bio, avatar_url, timezone, notification_preferences, profile_updated_at
)
SELECT 
    u.user_id, u.email, u.password_hash, u.created_at,
    p.display_name, p.bio, p.avatar_url, p.timezone, p.notification_preferences, p.updated_at
FROM users u
JOIN user_profiles p ON u.user_id = p.user_id;

Pattern 2: Partial Merge with NULLs (Optional One-to-One)

When the child table is optional (some parents have no child), merge with nullable columns:

partial-merge-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Pattern 2: Partial Merge - Optional One-to-One Relationship
-- Example: Customers and Premium Membership (not all customers are premium)
 
-- BEFORE: Separated tables
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
CREATE TABLE premium_memberships (
    membership_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL UNIQUE REFERENCES customers(customer_id),
    tier VARCHAR(20) NOT NULL,  -- 'silver', 'gold', 'platinum'
    started_at TIMESTAMP NOT NULL,
    expires_at TIMESTAMP NOT NULL,
    benefits JSONB
);
 
-- AFTER: Merged with nullable premium columns
CREATE TABLE customers_merged (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- Premium membership fields (NULL if not premium)
    is_premium BOOLEAN GENERATED ALWAYS AS 
        (premium_tier IS NOT NULL AND premium_expires_at > CURRENT_TIMESTAMP) STORED,
    premium_tier VARCHAR(20),  -- NULL if not premium
    premium_started_at TIMESTAMP,
    premium_expires_at TIMESTAMP,
    premium_benefits JSONB
);
 
-- Index for premium membership queries
CREATE INDEX idx_customers_premium ON customers_merged(is_premium, premium_tier)
WHERE is_premium = TRUE;
 
-- Migration
INSERT INTO customers_merged (
    customer_id, name, email, created_at,
    premium_tier, premium_started_at, premium_expires_at, premium_benefits
)
SELECT 
    c.customer_id, c.name, c.email, c.created_at,
    p.tier, p.started_at, p.expires_at, p.benefits
FROM customers c
LEFT JOIN premium_memberships p ON c.customer_id = p.customer_id;

Pattern 3: Lookup Table Embedding (Many-to-One with Small Lookup)

When many rows reference the same small set of lookup values, embed the lookup data:

lookup-embedding-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
-- Pattern 3: Lookup Table Embedding
-- Example: Orders with Order Status (small, static lookup table)
 
-- BEFORE: Separated tables
CREATE TABLE order_statuses (
    status_id INT PRIMARY KEY,
    status_code VARCHAR(20) UNIQUE NOT NULL,
    display_name VARCHAR(50) NOT NULL,
    description TEXT,
    is_terminal BOOLEAN DEFAULT FALSE
);
 
-- Status table has only ~10 rows, rarely changes
INSERT INTO order_statuses VALUES
    (1, 'pending', 'Pending', 'Order received, awaiting processing', FALSE),
    (2, 'processing', 'Processing', 'Order being prepared', FALSE),
    (3, 'shipped', 'Shipped', 'Order dispatched', FALSE),
    (4, 'delivered', 'Delivered', 'Order received by customer', TRUE),
    (5, 'cancelled', 'Cancelled', 'Order was cancelled', TRUE);
 
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status_id INT REFERENCES order_statuses(status_id),
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- AFTER: Embed lookup values directly (with ENUM or CHECK constraint)
CREATE TYPE order_status AS ENUM (
    'pending', 'processing', 'shipped', 'delivered', 'cancelled'
);
 
CREATE TABLE orders_merged (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status order_status NOT NULL DEFAULT 'pending',
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Status metadata can be handled in application code or a config table
-- that's only joined for admin/reporting, not for every order query
 
-- Alternative: Keep status_id but embed common display data
CREATE TABLE orders_with_status_cache (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status_id INT NOT NULL,  -- Keep for referential integrity
    
    -- Embedded lookup values (duplicated from order_statuses)
    status_code VARCHAR(20) NOT NULL,
    status_display_name VARCHAR(50) NOT NULL,
    status_is_terminal BOOLEAN NOT NULL,
    
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Trigger to maintain embedded values
CREATE OR REPLACE FUNCTION embed_order_status()
RETURNS TRIGGER AS $$
BEGIN
    SELECT status_code, display_name, is_terminal
    INTO NEW.status_code, NEW.status_display_name, NEW.status_is_terminal
    FROM order_statuses
    WHERE status_id = NEW.status_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER order_status_embed_trigger
BEFORE INSERT OR UPDATE OF status_id ON orders_with_status_cache
FOR EACH ROW EXECUTE FUNCTION embed_order_status();

Consider PostgreSQL ENUMs

Schema Transformation Process

Merging tables requires a careful migration process. Rushing can lead to data loss, extended downtime, or application errors. Follow a systematic approach:

Phase 1: Preparation

merge-phase1-preparation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Phase 1: Preparation - Create merged table alongside originals
 
-- 1. Create the merged table structure
CREATE TABLE users_merged (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(128) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- Absorbed from user_profiles
    display_name VARCHAR(100),  -- Temporarily nullable for migration
    bio TEXT,
    avatar_url VARCHAR(500),
    timezone VARCHAR(50) DEFAULT 'UTC',
    notification_preferences JSONB,
    profile_updated_at TIMESTAMP
);
 
-- 2. Create indexes matching current access patterns
CREATE INDEX idx_users_merged_email ON users_merged(email);
CREATE INDEX idx_users_merged_display_name ON users_merged(display_name);
 
-- 3. Migrate existing data
INSERT INTO users_merged (
    user_id, email, password_hash, created_at,
    display_name, bio, avatar_url, timezone, 
    notification_preferences, profile_updated_at
)
SELECT 
    u.user_id, u.email, u.password_hash, u.created_at,
    p.display_name, p.bio, p.avatar_url, p.timezone,
    p.notification_preferences, p.updated_at
FROM users u
LEFT JOIN user_profiles p ON u.user_id = p.user_id;
 
-- 4. Verify row counts
SELECT 
    (SELECT COUNT(*) FROM users) AS original_users,
    (SELECT COUNT(*) FROM user_profiles) AS original_profiles,
    (SELECT COUNT(*) FROM users_merged) AS merged_rows;

Phase 2: Dual-Write Period

Write to both old and new structures simultaneously while reads still use old tables:

merge-phase2-dual-write.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- Phase 2: Dual-Write - Maintain both structures during transition
 
-- Trigger to propagate writes to original tables into merged table
CREATE OR REPLACE FUNCTION sync_user_to_merged()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO users_merged (user_id, email, password_hash, created_at)
        VALUES (NEW.user_id, NEW.email, NEW.password_hash, NEW.created_at)
        ON CONFLICT (user_id) DO UPDATE SET
            email = EXCLUDED.email,
            password_hash = EXCLUDED.password_hash;
    ELSIF TG_OP = 'UPDATE' THEN
        UPDATE users_merged
        SET email = NEW.email,
            password_hash = NEW.password_hash
        WHERE user_id = NEW.user_id;
    ELSIF TG_OP = 'DELETE' THEN
        DELETE FROM users_merged WHERE user_id = OLD.user_id;
    END IF;
    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER user_sync_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION sync_user_to_merged();
 
-- Similar trigger for user_profiles
CREATE OR REPLACE FUNCTION sync_profile_to_merged()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
        UPDATE users_merged
        SET display_name = NEW.display_name,
            bio = NEW.bio,
            avatar_url = NEW.avatar_url,
            timezone = NEW.timezone,
            notification_preferences = NEW.notification_preferences,
            profile_updated_at = NEW.updated_at
        WHERE user_id = NEW.user_id;
    ELSIF TG_OP = 'DELETE' THEN
        UPDATE users_merged
        SET display_name = NULL,
            bio = NULL,
            avatar_url = NULL,
            timezone = 'UTC',
            notification_preferences = NULL,
            profile_updated_at = NULL
        WHERE user_id = OLD.user_id;
    END IF;
    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER profile_sync_trigger
AFTER INSERT OR UPDATE OR DELETE ON user_profiles
FOR EACH ROW EXECUTE FUNCTION sync_profile_to_merged();
 
-- Verify sync is working (run after some production traffic)
SELECT 
    um.user_id,
    um.email = u.email AS email_match,
    um.display_name = p.display_name AS name_match
FROM users_merged um
JOIN users u ON um.user_id = u.user_id
LEFT JOIN user_profiles p ON um.user_id = p.user_id
WHERE um.email != u.email OR um.display_name != p.display_name
LIMIT 100;

Phase 3: Switch and Cleanup

Once merged table is verified, switch reads to it and eventually remove old tables:

merge-phase3-switchover.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Phase 3: Switch and Cleanup
 
-- 1. Update application to read from merged table (application deployment)
-- All queries now go to users_merged instead of users + user_profiles
 
-- 2. After stable period with reads from merged table, drop sync triggers
DROP TRIGGER user_sync_trigger ON users;
DROP TRIGGER profile_sync_trigger ON user_profiles;
DROP FUNCTION sync_user_to_merged();
DROP FUNCTION sync_profile_to_merged();
 
-- 3. Rename tables for safety (don't delete yet)
ALTER TABLE users RENAME TO users_deprecated;
ALTER TABLE user_profiles RENAME TO user_profiles_deprecated;
ALTER TABLE users_merged RENAME TO users;
 
-- 4. Add NOT NULL constraints now that migration is complete
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;
 
-- 5. Update sequences if needed
SELECT setval('users_user_id_seq', 
    (SELECT MAX(user_id) FROM users) + 1);
 
-- 6. After extended validation period (days/weeks), drop old tables
-- DROP TABLE users_deprecated CASCADE;
-- DROP TABLE user_profiles_deprecated CASCADE;

Maintain Rollback Capability

Impact on Queries and Indexes

Table merging transforms your query patterns. Queries become simpler, but other considerations change.

Query simplification:

query-transformation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Query transformation examples after table merge
 
-- BEFORE: Join required
SELECT u.user_id, u.email, p.display_name, p.bio, p.timezone
FROM users u
JOIN user_profiles p ON u.user_id = p.user_id
WHERE u.email = 'jane@example.com';
 
-- AFTER: Simple select
SELECT user_id, email, display_name, bio, timezone
FROM users
WHERE email = 'jane@example.com';
 
-- BEFORE: Existence check requires subquery or join
SELECT u.*
FROM users u
WHERE EXISTS (SELECT 1 FROM user_profiles p WHERE p.user_id = u.user_id);
 
-- AFTER: Direct filter on merged data
SELECT *
FROM users
WHERE display_name IS NOT NULL;  -- If optional relationship was merged
 
-- BEFORE: Aggregation across tables
SELECT 
    COUNT(DISTINCT u.user_id) AS total_users,
    COUNT(DISTINCT p.user_id) AS users_with_profiles,
    COUNT(CASE WHEN p.timezone != 'UTC' THEN 1 END) AS custom_timezone_users
FROM users u
LEFT JOIN user_profiles p ON u.user_id = p.user_id;
 
-- AFTER: Single-table aggregation
SELECT 
    COUNT(*) AS total_users,
    COUNT(display_name) AS users_with_profiles,
    COUNT(CASE WHEN timezone != 'UTC' THEN 1 END) AS custom_timezone_users
FROM users;

Index strategy changes:

Merged tables often require different indexing strategies than separated tables:

merged-table-indexes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Index strategy for merged tables
 
-- BEFORE: Indexes on separate tables
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_profiles_user ON user_profiles(user_id);
CREATE INDEX idx_profiles_display ON user_profiles(display_name);
 
-- AFTER: Consolidated indexes on merged table
CREATE UNIQUE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_display_name ON users(display_name);
 
-- Composite index for common combined queries
CREATE INDEX idx_users_email_display ON users(email, display_name);
 
-- Partial indexes for optional relationship data
CREATE INDEX idx_users_with_profiles ON users(user_id, display_name)
WHERE display_name IS NOT NULL;
 
-- Merged tables can have wider rows, impacting index sizes
-- Monitor index sizes post-migration:
SELECT 
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS scans,
    idx_tup_read AS tuples_read
FROM pg_stat_user_indexes
WHERE schemaname = 'public' AND relname = 'users'
ORDER BY pg_relation_size(indexrelid) DESC;

Index Considerations for Merged Tables
Aspect	Consideration
Row width	Wider rows mean fewer rows per page, affecting range scans and sequential reads
Composite indexes	May need new composite indexes for patterns that previously used join conditions
Partial indexes	Use for sparse/optional data to keep indexes compact
Covering indexes	More columns available for covering index strategies
Index maintenance	Fewer indexes total (eliminated join indexes), but each may be larger

Analyze Query Plans Post-Merge

When NOT to Merge Tables

Table merging has significant trade-offs. There are many scenarios where maintaining separated tables is the better choice.

Avoid merging when:

Scenarios Where Merging is Counterproductive

•Tables have different access patterns — If table A is read 1000x/sec and table B is read 10x/sec, merged table carries unnecessary data for most queries
•Significant storage waste from NULLs — If optional relationship applies to <20% of rows, 80%+ of rows have wasted columns
•Tables require different security policies — Separating sensitive data (PII, financial) in distinct tables enables clearer access control
•One table is frequently updated — Merging stable data with frequently-changing data increases write amplification
•Tables have different archival requirements — Transaction details may need 7-year retention while user profiles need 30-day retention after account deletion
•Vertical partitioning is intentional — Some schemas split tables to keep hot data small and cold data separate

merge-antipattern-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Example: Why NOT to merge orders with order_audit_log
 
-- Orders table: Compact, frequently accessed
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status VARCHAR(20) NOT NULL,
    total DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);  -- ~100 bytes per row, 10M rows, frequently queried
 
-- Audit log: Large, rarely accessed
CREATE TABLE order_audit_log (
    log_id SERIAL PRIMARY KEY,
    order_id INT NOT NULL REFERENCES orders(order_id),
    action VARCHAR(50) NOT NULL,
    changed_by VARCHAR(100),
    old_values JSONB,
    new_values JSONB,
    logged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);  -- ~2KB per row, 100M rows, rarely queried
 
-- WHY NOT MERGE:
-- 1. Audit has many rows per order (one-to-many, not one-to-one)
-- 2. Order queries would load massive audit data they don't need
-- 3. Audit has different retention (compliance: 7 years) vs orders (operational: 2 years)
-- 4. Audit writes constantly; orders are mostly stable after creation
 
-- Keep separated. If needed, add order summary fields as derived columns:
ALTER TABLE orders ADD COLUMN last_status_change TIMESTAMP;
-- This gives you the "merge benefit" for specific commonly-needed data
-- without the overhead of full merge

Merge Sparingly

Summary and Best Practices

Key Takeaways

•Table merging combines normalized tables into one — Eliminates join overhead entirely by absorbing related entities into a single table.
•Best suited for mandatory one-to-one relationships — When every row in table A has exactly one corresponding row in table B that's always accessed together.
•Use different merge patterns based on relationship type — Full merge for mandatory relationships, partial merge with NULLs for optional, embedding for small lookups.
•Follow a phased migration process — Create merged table, enable dual-write, validate, switch reads, then cleanup old structures after extended validation.
•Reassess indexes after merging — Query plans change with merged tables; create indexes based on actual execution analysis.
•Know when NOT to merge — Different access patterns, sparse optional data, security separation, and different retention requirements all argue against merging.

What's Next:

Page Complete

4 / 5