Database Management SystemsNoSQL vs SQL

NoSQL vs SQL: A Comprehensive Comparison

LevelIntermediate

Duration75 mins

TopicNoSQL vs SQL

5 / 5

Migration Considerations: Evolving Database Architecture

The Reality of Database Migrations

Database migrations are among the most complex and risky operations in software engineering. Unlike application deployments that can be quickly rolled back, database migrations involve persistent state that, once changed, cannot be trivially undone.

Yet migrations are often necessary:

An application outgrows its original database's capabilities
Business requirements shift, demanding different trade-offs
Technology evolution makes new options compelling
Organizational changes bring different expertise and preferences
Cost optimization requires moving to different platforms

This page provides a comprehensive framework for thinking about and executing database migrations. Whether you're moving from SQL to NoSQL, the reverse, or between databases within the same paradigm, these principles and practices will guide you through the process safely.

What You Will Master

By the end of this page, you will understand the common motivations for database migrations, be able to plan migrations with appropriate phases and milestones, navigate the data model transformation challenges, implement migration patterns that minimize risk, and handle the organizational and operational aspects of database transitions.

Why Database Migrations Happen

Understanding why migrations occur helps you anticipate them, design for them, and evaluate whether a proposed migration is justified.

Common Migration Drivers:

Database Migration Motivations
Motivation	Example Scenario	Typical Direction
Scale limitations	PostgreSQL can't handle 100K writes/sec	SQL → NoSQL (Cassandra)
Query complexity needs	Analytics require complex JOINs impossible in DynamoDB	NoSQL → SQL
Cost optimization	Oracle licensing too expensive; moving to PostgreSQL	Vendor → Open Source
Managed service adoption	Want zero-ops; moving to Aurora or DynamoDB	Self-managed → Cloud
Schema flexibility needs	Product catalog needs varying attributes	SQL → Document DB
Consistency requirements	Financial data needs ACID after NoSQL experiment	NoSQL → SQL
Performance problems	Wrong database choice causing latency issues	Any → More appropriate
Acquisition/merger	Two companies consolidating on one platform	Varies
Skill availability	Can't hire Cassandra experts; moving to PostgreSQL	Specialized → Common

When NOT to Migrate:

Not every performance problem or capability gap justifies migration. Consider alternatives:

Migration Alternatives to Consider

•Optimization — Query tuning, indexing, denormalization might solve performance issues
•Vertical scaling — Bigger hardware is often cheaper than migration risk
•Caching layer — Redis can alleviate read load without full migration
•Read replicas — Distribute reads without changing primary database
•Partitioning/Sharding — PostgreSQL extensions like Citus add horizontal scale
•Hybrid approach — Add specialized database for specific workload (polyglot) instead of full migration

Migration Is Expensive

A major database migration typically takes 6-18 months, involves substantial engineering effort, carries significant risk, and distracts from feature development. Ensure the benefits clearly outweigh these costs before proceeding.

Migration Planning Framework

A successful migration requires comprehensive planning before any data moves. The planning phase often takes 20-30% of the total migration timeline.

Phase 1: Assessment

migration_assessment.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Migration Assessment Checklist
 
## Current State Analysis
- [ ] Document all tables/collections, row counts, data sizes
- [ ] Map all data types and their target equivalents
- [ ] Identify relationships and integrity constraints
- [ ] Catalog stored procedures, triggers, views
- [ ] List all application queries and access patterns
- [ ] Document indexes and their usage frequency
- [ ] Identify sensitive data requiring special handling
- [ ] Measure current performance baseline (latency, throughput)
 
## Target State Definition
- [ ] Define target database schema/data model
- [ ] Plan data type mappings (e.g., SERIAL → UUID)
- [ ] Decide on denormalization strategy if applicable
- [ ] Design new indexes for target query patterns
- [ ] Plan for features not available in target (stored procs → app code)
 
## Dependency Mapping
- [ ] List all applications connecting to current database
- [ ] Identify third-party tools (BI, CDC, backups) requiring changes
- [ ] Map team responsibilities and skills
- [ ] Identify compliance/audit requirements
- [ ] Document SLAs that must be maintained during migration
 
## Risk Assessment
- [ ] Estimate data loss window acceptable to business
- [ ] Identify rollback scenarios and procedures
- [ ] Plan for extended rollback period
- [ ] Define go/no-go criteria for final cutover

Phase 2: Strategy Selection

There are multiple migration strategies, each with different trade-offs:

Migration Strategy Comparison
Strategy	Description	Downtime	Risk	Complexity
Big Bang	Migrate everything at once during maintenance window	Hours to days	High—no rollback	Low
Parallel Run	Write to both databases; gradually shift reads	Zero	Medium	High
Strangler Pattern	New features on new DB; migrate old gradually	Zero	Low per phase	Medium
Trickle Migration	Move data in small batches over time	Zero	Medium	Medium
Blue-Green	Full copy to new DB; instant cutover	Minimal	Medium	High

Most common approach: Parallel Run with Gradual Traffic Shift

This pattern provides the safest migration for critical systems:

Converting Mermaid diagram...

Data Model Transformation

Migrating between SQL and NoSQL (or vice versa) requires fundamental data model transformation. This is often the most intellectually challenging part of migration.

SQL to Document (e.g., PostgreSQL to MongoDB)

The key transformation is from normalized tables with joins to embedded documents:

sql_to_document.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// SQL Schema (normalized)
/*
CREATE TABLE customers (id, name, email);
CREATE TABLE addresses (id, customer_id, street, city, country, is_primary);
CREATE TABLE orders (id, customer_id, order_date, status);
CREATE TABLE order_items (order_id, product_id, quantity, price);
CREATE TABLE products (id, name, price, category_id);
CREATE TABLE categories (id, name, parent_id);
*/
 
// Document Schema (denormalized for read optimization)
// Decision: What to embed vs reference?
 
// Customer document - embed addresses (1:few, queried together)
{
    "_id": ObjectId("..."),
    "name": "Alice Smith",
    "email": "alice@example.com",
    "addresses": [
        {
            "street": "123 Main St",
            "city": "Seattle",
            "country": "USA",
            "isPrimary": true
        },
        {
            "street": "456 Work Ave",
            "city": "Seattle", 
            "country": "USA",
            "isPrimary": false
        }
    ]
}
 
// Order document - embed items, reference customer and products
// Items embedded (always queried with order)
// Customer referenced (avoid duplication, infrequent access)
// Products referenced with denormalized fields (name, price at time of order)
{
    "_id": ObjectId("..."),
    "customerId": ObjectId("..."),      // Reference
    "customerEmail": "alice@example.com", // Denormalized for notifications
    "orderDate": ISODate("2024-01-15"),
    "status": "shipped",
    "items": [                           // Embedded
        {
            "productId": ObjectId("..."),  // Reference
            "productName": "Widget X",     // Denormalized (snapshot at order time)
            "quantity": 2,
            "unitPrice": 29.99,            // Price at time of order
            "lineTotal": 59.98
        }
    ],
    "shippingAddress": {                 // Embedded (snapshot at order time)
        "street": "123 Main St",
        "city": "Seattle",
        "country": "USA"
    },
    "totals": {
        "subtotal": 59.98,
        "tax": 6.00,
        "shipping": 5.99,
        "total": 71.97
    }
}
 
// Transformation considerations:
// 1. What's queried together? → Embed
// 2. What changes independently? → Reference
// 3. What needs historical snapshot? → Embed with copy
// 4. What would create unbounded arrays? → Reference

Document to SQL (e.g., MongoDB to PostgreSQL)

The reverse transformation normalizes embedded data into tables:

document_to_sql.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- MongoDB document (denormalized)
/*
{
    "_id": ObjectId("..."),
    "title": "Understanding Databases",
    "author": {
        "name": "Jane Expert",
        "bio": "Database specialist...",
        "social": {
            "twitter": "@janedb",
            "github": "janeexpert"
        }
    },
    "tags": ["databases", "nosql", "sql"],
    "comments": [
        {
            "user": "reader1",
            "text": "Great article!",
            "timestamp": ISODate("2024-01-15T10:30:00Z"),
            "likes": 15
        },
        {
            "user": "reader2",
            "text": "Very helpful",
            "timestamp": ISODate("2024-01-15T11:45:00Z"),
            "likes": 8
        }
    ]
}
*/
 
-- Normalized SQL Schema
CREATE TABLE authors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    bio TEXT,
    twitter_handle VARCHAR(100),
    github_handle VARCHAR(100)
);
 
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    mongo_id VARCHAR(24) UNIQUE,  -- Preserve original ID for mapping
    title VARCHAR(500) NOT NULL,
    author_id INTEGER REFERENCES authors(id),
    created_at TIMESTAMP DEFAULT NOW()
);
 
CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) UNIQUE NOT NULL
);
 
CREATE TABLE article_tags (
    article_id INTEGER REFERENCES articles(id) ON DELETE CASCADE,
    tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (article_id, tag_id)
);
 
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    article_id INTEGER REFERENCES articles(id) ON DELETE CASCADE,
    username VARCHAR(100) NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL,
    likes INTEGER DEFAULT 0
);
 
-- Migration queries
-- 1. Extract unique authors
INSERT INTO authors (name, bio, twitter_handle, github_handle)
SELECT DISTINCT 
    doc->>'author.name',
    doc->>'author.bio',
    doc->'author'->'social'->>'twitter',
    doc->'author'->'social'->>'github'
FROM mongo_import;
 
-- 2. Insert articles with author lookup
-- 3. Extract and insert unique tags
-- 4. Create article-tag relationships
-- 5. Unnest comments array into comments table

Data Model Migration Is Application Migration

Changing the data model usually requires significant application changes. Queries must be rewritten, data access patterns adjusted, and business logic updated. Budget for application refactoring as a major part of database migration.

Migration Implementation Patterns

Here are battle-tested patterns for implementing database migrations safely.

Pattern 1: Dual Write with Shadow Read

During migration, write to both databases but only read from original. Compare results to validate new database:

dual_write_pattern.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
// Dual Write Pattern with Shadow Read Comparison
 
class MigratingOrderRepository {
    private oldDb: PostgresClient;
    private newDb: MongoClient;
    private metrics: MetricsClient;
    private featureFlags: FeatureFlags;
    
    async save(order: Order): Promise<void> {
        // Always write to old database (source of truth during migration)
        await this.oldDb.query(
            'INSERT INTO orders (...) VALUES (...)',
            [order.id, order.customerId, ...]
        );
        
        // Also write to new database (async, non-blocking)
        this.writeToNewDb(order).catch(err => {
            this.metrics.increment('migration.dual_write.new_db_error');
            console.error('New DB write failed:', err);
            // Don't fail the operation - old DB is source of truth
        });
    }
    
    async findById(orderId: string): Promise<Order | null> {
        // Read from old database (source of truth)
        const oldResult = await this.oldDb.query(
            'SELECT * FROM orders WHERE id = $1',
            [orderId]
        );
        
        // Shadow read from new database (async, for comparison)
        if (this.featureFlags.isEnabled('migration.shadow_reads')) {
            this.shadowRead(orderId, oldResult.rows[0]).catch(err => {
                this.metrics.increment('migration.shadow_read.error');
            });
        }
        
        return oldResult.rows[0] || null;
    }
    
    private async shadowRead(orderId: string, oldResult: any): Promise<void> {
        const newResult = await this.newDb.collection('orders')
            .findOne({ _id: orderId });
        
        // Compare results
        const differences = this.compare(oldResult, newResult);
        
        if (differences.length > 0) {
            this.metrics.increment('migration.shadow_read.mismatch');
            console.warn('Data mismatch:', { orderId, differences });
            // Log for investigation, don't fail
        } else {
            this.metrics.increment('migration.shadow_read.match');
        }
    }
    
    private compare(old: any, new_: any): string[] {
        // Implement comparison logic accounting for schema differences
        // Return list of field differences
    }
}
 
// Gradually shift traffic
class MigratingOrderRepositoryV2 {
    async findById(orderId: string): Promise<Order | null> {
        const readFromNew = this.featureFlags.percentage('migration.read_new_db');
        
        if (Math.random() < readFromNew) {
            // Read from new database for this percentage of requests
            const result = await this.newDb.collection('orders')
                .findOne({ _id: orderId });
                
            if (result) {
                this.metrics.increment('migration.read.new_db.success');
                return this.mapToOrder(result);
            } else {
                // Fallback to old if not found (consistency lag)
                this.metrics.increment('migration.read.new_db.fallback');
                return this.readFromOldDb(orderId);
            }
        }
        
        return this.readFromOldDb(orderId);
    }
}

Pattern 2: Strangler Fig Migration

For large systems, migrate functionality piece by piece rather than all at once:

Converting Mermaid diagram...

Pattern 3: CDC-Based Migration

Use Change Data Capture to continuously sync data:

cdc_migration.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# CDC-based migration pipeline
 
# Phase 1: Initial bulk load
initial_load:
  source: PostgreSQL
  target: MongoDB
  method: pg_dump | transform | mongoimport
  estimated_time: "4 hours for 500GB"
  
# Phase 2: Continuous sync via CDC
cdc_pipeline:
  source:
    type: PostgreSQL
    connector: Debezium
    tables:
      - public.orders
      - public.order_items
      - public.customers
      
  transform:
    type: Kafka Streams
    logic: |
      - Join order with items
      - Embed customer address
      - Convert types (SERIAL → ObjectId)
      - Transform timestamps
      
  target:
    type: MongoDB
    connector: MongoDB Kafka Connector
    collection: orders
    write_mode: upsert
    
# Phase 3: Cutover
cutover:
  steps:
    1: Stop writes to old database
    2: Wait for CDC lag to reach zero
    3: Verify row counts match
    4: Switch application to read/write new database
    5: Monitor for 24 hours
    6: Decommission old database (after 30-day safety window)
    
rollback:
  if_within: "30 days"
  method: |
    - Stop writes to new database
    - Bulk export and import to old
    - Resume CDC in reverse direction
    - Switch application back

Application Layer Changes

Database migration isn't just about data—the application must change to work with the new database paradigm. This often represents 50%+ of the total migration effort.

Query Rewriting:

postgres_queries.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- PostgreSQL: Complex JOIN
SELECT 
    o.id,
    o.total,
    c.name AS customer_name,
    c.email
FROM orders o
JOIN customers c 
    ON o.customer_id = c.id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;
 
-- Aggregation with GROUP BY
SELECT 
    DATE(created_at) AS order_date,
    COUNT(*) AS order_count,
    SUM(total) AS daily_revenue
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY DATE(created_at)
ORDER BY order_date;
 
-- Subquery
SELECT *
FROM products
WHERE id IN (
    SELECT product_id 
    FROM order_items
    GROUP BY product_id
    HAVING COUNT(*) > 100
);

mongodb_queries.js
MongoDB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// MongoDB: Denormalized (no join needed)
db.orders.find({
    status: "pending",
    createdAt: { 
        $gt: new Date(Date.now() - 7*24*60*60*1000) 
    }
})
.sort({ createdAt: -1 })
.limit(50);
// Note: customer name/email embedded
 
// Aggregation pipeline
db.orders.aggregate([
    { $match: { 
        createdAt: { $gte: new Date("2024-01-01") } 
    }},
    { $group: {
        _id: { $dateToString: { 
            format: "%Y-%m-%d", 
            date: "$createdAt" 
        }},
        orderCount: { $sum: 1 },
        dailyRevenue: { $sum: "$total" }
    }},
    { $sort: { _id: 1 } }
]);
 
// Subquery equivalent
const popularProducts = await db.orderItems.aggregate([
    { $group: { _id: "$productId", count: { $sum: 1 }}},
    { $match: { count: { $gt: 100 }}}
]).toArray();
const ids = popularProducts.map(p => p._id);
db.products.find({ _id: { $in: ids }});

Repository Abstraction for Migration:

Design your application to support both databases during migration:

repository_abstraction.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Abstract repository interface - database agnostic
interface OrderRepository {
    findById(id: string): Promise<Order | null>;
    findByCustomer(customerId: string, limit: number): Promise<Order[]>;
    save(order: Order): Promise<void>;
    findPendingOrders(olderThanDays: number): Promise<Order[]>;
}
 
// PostgreSQL implementation
class PostgresOrderRepository implements OrderRepository {
    async findById(id: string): Promise<Order | null> {
        const result = await this.pool.query(
            `SELECT o.*, c.name as customer_name, c.email as customer_email
             FROM orders o
             JOIN customers c ON o.customer_id = c.id
             WHERE o.id = $1`,
            [id]
        );
        return result.rows[0] ? this.mapToOrder(result.rows[0]) : null;
    }
    
    // ... other methods
}
 
// MongoDB implementation
class MongoOrderRepository implements OrderRepository {
    async findById(id: string): Promise<Order | null> {
        const doc = await this.collection.findOne({ _id: new ObjectId(id) });
        return doc ? this.mapToOrder(doc) : null;
    }
    
    // ... other methods
}
 
// Migration-aware factory
function createOrderRepository(config: Config): OrderRepository {
    switch (config.database.migration_phase) {
        case 'postgres_only':
            return new PostgresOrderRepository(config.postgres);
            
        case 'dual_write':
            return new DualWriteOrderRepository(
                new PostgresOrderRepository(config.postgres),
                new MongoOrderRepository(config.mongo)
            );
            
        case 'mongo_primary':
            return new MongoOrderRepository(config.mongo);
            
        default:
            throw new Error('Unknown migration phase');
    }
}
 
// Application code remains unchanged throughout migration
class OrderService {
    constructor(private orderRepo: OrderRepository) {}
    
    async getOrder(id: string): Promise<Order | null> {
        return this.orderRepo.findById(id);
    }
}

Invest in Abstraction Early

If you're not already using repository patterns or similar abstractions, introduce them before migration. The refactoring cost pays for itself by making the database swap significantly easier and lower risk.

Testing and Validation

Thorough testing is the difference between successful migrations and data disasters. Testing must cover data integrity, performance, and application behavior.

Data Validation Testing:

data_validation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Data validation framework for migration
 
import hashlib
from typing import Dict, List
 
class MigrationValidator:
    def __init__(self, source_db, target_db):
        self.source = source_db
        self.target = target_db
        self.errors: List[str] = []
    
    def validate_row_counts(self) -> bool:
        """Verify all entities migrated"""
        tables = [
            ('customers', 'customers'),
            ('orders', 'orders'),
            ('products', 'products'),
        ]
        
        for source_table, target_collection in tables:
            source_count = self.source.execute(
                f"SELECT COUNT(*) FROM {source_table}"
            ).fetchone()[0]
            
            target_count = self.target[target_collection].count_documents({})
            
            if source_count != target_count:
                self.errors.append(
                    f"{source_table}: source={source_count}, target={target_count}"
                )
        
        return len(self.errors) == 0
    
    def validate_data_integrity(self, sample_size: int = 10000) -> bool:
        """Spot check data values match"""
        # Random sample of IDs from source
        sample_ids = self.source.execute(
            f"SELECT id FROM orders ORDER BY RANDOM() LIMIT {sample_size}"
        ).fetchall()
        
        mismatches = []
        for (order_id,) in sample_ids:
            source_order = self.fetch_source_order(order_id)
            target_order = self.fetch_target_order(order_id)
            
            diffs = self.compare_orders(source_order, target_order)
            if diffs:
                mismatches.append({'id': order_id, 'diffs': diffs})
        
        if mismatches:
            self.errors.extend([
                f"Data mismatch: {m['id']}: {m['diffs']}" 
                for m in mismatches[:10]  # Log first 10
            ])
        
        mismatch_rate = len(mismatches) / sample_size
        print(f"Mismatch rate: {mismatch_rate:.4%}")
        return mismatch_rate < 0.0001  # < 0.01% acceptable
    
    def validate_checksums(self) -> bool:
        """Verify critical numeric fields sum correctly"""
        checks = [
            # (source_query, target_pipeline, field_name)
            (
                "SELECT SUM(total) FROM orders WHERE status = 'completed'",
                [{"$match": {"status": "completed"}}, 
                 {"$group": {"_id": None, "sum": {"$sum": "$total"}}}],
                "completed_order_total"
            ),
        ]
        
        for source_query, target_pipeline, field_name in checks:
            source_sum = self.source.execute(source_query).fetchone()[0]
            target_result = list(self.target.orders.aggregate(target_pipeline))
            target_sum = target_result[0]['sum'] if target_result else 0
            
            # Allow for floating point precision
            if abs(source_sum - target_sum) > 0.01:
                self.errors.append(
                    f"{field_name}: source={source_sum}, target={target_sum}"
                )
        
        return len(self.errors) == 0
 
# Run validation suite
validator = MigrationValidator(postgres_conn, mongo_client)
results = {
    'row_counts': validator.validate_row_counts(),
    'data_integrity': validator.validate_data_integrity(),
    'checksums': validator.validate_checksums(),
}
 
if all(results.values()):
    print("✅ Migration validation passed")
else:
    print("❌ Migration validation failed")
    print("Errors:", validator.errors)

Performance Testing:

Performance Validation Checklist

•Query latency comparison — Run top 50 queries against both databases; compare p50, p95, p99
•Write throughput — Load test writes at 2x current peak; verify target handles load
•Concurrent connections — Verify target handles max concurrent connections
•Index effectiveness — Confirm indexes cover critical queries (EXPLAIN plans)
•Cold start performance — Test after restart with cold cache
•Degradation under load — Observe latency as load increases

Test with Production Data Volumes

Performance differences often emerge only at production scale. Testing with a 1% sample won't reveal issues that appear with full data volume. Use production-scale data (anonymized if necessary) for performance testing.

Cutover and Rollback Planning

The final cutover is the highest-risk moment in any migration. Meticulous planning and clear rollback procedures are essential.

Cutover Checklist:

cutover_runbook.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Migration Cutover Runbook
 
## Pre-Cutover (T-24 hours)
- [ ] All stakeholders notified
- [ ] Support team briefed on potential issues
- [ ] Rollback procedure tested in staging
- [ ] All validation tests passing
- [ ] CDC lag < 10 seconds consistently
- [ ] Go/no-go meeting completed
 
## Pre-Cutover (T-1 hour)
- [ ] Freeze all deployments
- [ ] Confirm team availability for duration
- [ ] Open communication channel (Slack, bridge call)
- [ ] Prepare status update templates
 
## Cutover Procedure
 
### Step 1: Stop Writes (T+0:00)
- [ ] Enable maintenance mode / stop application writes
- [ ] Verify no new writes in old database (check WAL position)
- Record time: _____________
 
### Step 2: Final Sync (T+0:05)
- [ ] Wait for CDC lag to reach zero
- [ ] Run final validation (row counts, checksums)
- [ ] If validation fails → ABORT and rollback
- Record time: _____________
 
### Step 3: Switch Configuration (T+0:15)
- [ ] Deploy application config pointing to new database
- [ ] Verify application connecting to new database
- [ ] Enable writes to new database
- Record time: _____________
 
### Step 4: Verification (T+0:20)
- [ ] Smoke test critical paths manually
- [ ] Verify metrics (error rates, latencies)
- [ ] Check all services healthy
- Record time: _____________
 
### Step 5: Re-enable Traffic (T+0:30)
- [ ] Disable maintenance mode
- [ ] Monitor closely for 30 minutes
- [ ] If issues → ROLLBACK
- Record time: _____________
 
## Success Criteria (before declaring victory)
- [ ] Error rate < 0.1% for 30 minutes
- [ ] p99 latency within 20% of baseline
- [ ] All critical transactions completing
- [ ] No data integrity alerts
 
## Rollback Trigger Conditions
- Error rate > 1%
- Critical transaction failures
- Data corruption detected
- Latency > 3x baseline for > 5 minutes

Rollback Strategy:

Always maintain the ability to roll back, even after cutover:

Rollback Requirements

•Keep old database running — Maintain for 30+ days after cutover
•Reverse sync pipeline — CDC from new database to old for quick rollback
•Feature flags for instant switch — Don't require deployment to rollback
•Data export capability — Ability to export from new and import to old
•Documented procedure — Step-by-step rollback tested in staging

The Parallel Run Safety Net

During parallel run phases, you can rollback instantly by simply switching which database handles reads. This is why the parallel run pattern, despite its complexity, is preferred for critical systems. Invest in robust dual-write and comparison infrastructure.

Summary: Navigating Database Migrations

We've comprehensively explored the challenging world of database migrations. Let's consolidate the essential insights:

Key Takeaways

•Migrations are expensive — 6-18 months, significant engineering effort, substantial risk. Ensure benefits clearly justify costs.
•Consider alternatives first — Optimization, caching, read replicas, and polyglot approaches may solve problems without full migration.
•Planning is critical — Spend 20-30% of timeline on assessment, strategy selection, and detailed runbook creation.
•Data model transformation is hard — Converting between normalized and denormalized models requires deep understanding of both paradigms.
•Parallel run patterns provide safety — Dual writes with gradual traffic shift minimize risk for critical systems.
•Application changes dominate effort — Query rewriting, repository abstraction, and testing often exceed data migration work.
•Testing must be comprehensive — Row counts, data integrity, checksums, and production-scale performance testing are non-negotiable.
•Always have a rollback plan — Maintain old database, implement reverse sync, and keep rollback capability for 30+ days.

Module Completion:

This concludes Module 6: NoSQL vs SQL. You've now mastered:

Comprehensive comparison of SQL and NoSQL paradigms
Decision frameworks for when to use each
Polyglot persistence architecture patterns
Database migration strategies and best practices

You're now equipped to make informed database technology decisions and guide teams through complex database evolution.

Module Complete

Congratulations! You've completed the comprehensive exploration of NoSQL vs SQL. You now possess the knowledge to evaluate database technologies, make appropriate selections for different workloads, design multi-database architectures, and navigate the challenges of database migrations. This expertise is essential for modern data architecture and system design.

5 / 5

Loading learning content...

Database Management SystemsNoSQL vs SQL

NoSQL vs SQL: A Comprehensive Comparison

LevelIntermediate

Duration75 mins

TopicNoSQL vs SQL

5 / 5

Migration Considerations: Evolving Database Architecture

The Reality of Database Migrations

Yet migrations are often necessary:

An application outgrows its original database's capabilities
Business requirements shift, demanding different trade-offs
Technology evolution makes new options compelling
Organizational changes bring different expertise and preferences
Cost optimization requires moving to different platforms

What You Will Master

Why Database Migrations Happen

Understanding why migrations occur helps you anticipate them, design for them, and evaluate whether a proposed migration is justified.

Common Migration Drivers:

Database Migration Motivations
Motivation	Example Scenario	Typical Direction
Scale limitations	PostgreSQL can't handle 100K writes/sec	SQL → NoSQL (Cassandra)
Query complexity needs	Analytics require complex JOINs impossible in DynamoDB	NoSQL → SQL
Cost optimization	Oracle licensing too expensive; moving to PostgreSQL	Vendor → Open Source
Managed service adoption	Want zero-ops; moving to Aurora or DynamoDB	Self-managed → Cloud
Schema flexibility needs	Product catalog needs varying attributes	SQL → Document DB
Consistency requirements	Financial data needs ACID after NoSQL experiment	NoSQL → SQL
Performance problems	Wrong database choice causing latency issues	Any → More appropriate
Acquisition/merger	Two companies consolidating on one platform	Varies
Skill availability	Can't hire Cassandra experts; moving to PostgreSQL	Specialized → Common

When NOT to Migrate:

Not every performance problem or capability gap justifies migration. Consider alternatives:

Migration Alternatives to Consider

•Optimization — Query tuning, indexing, denormalization might solve performance issues
•Vertical scaling — Bigger hardware is often cheaper than migration risk
•Caching layer — Redis can alleviate read load without full migration
•Read replicas — Distribute reads without changing primary database
•Partitioning/Sharding — PostgreSQL extensions like Citus add horizontal scale
•Hybrid approach — Add specialized database for specific workload (polyglot) instead of full migration

Migration Is Expensive

Migration Planning Framework

A successful migration requires comprehensive planning before any data moves. The planning phase often takes 20-30% of the total migration timeline.

Phase 1: Assessment

migration_assessment.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Migration Assessment Checklist
 
## Current State Analysis
- [ ] Document all tables/collections, row counts, data sizes
- [ ] Map all data types and their target equivalents
- [ ] Identify relationships and integrity constraints
- [ ] Catalog stored procedures, triggers, views
- [ ] List all application queries and access patterns
- [ ] Document indexes and their usage frequency
- [ ] Identify sensitive data requiring special handling
- [ ] Measure current performance baseline (latency, throughput)
 
## Target State Definition
- [ ] Define target database schema/data model
- [ ] Plan data type mappings (e.g., SERIAL → UUID)
- [ ] Decide on denormalization strategy if applicable
- [ ] Design new indexes for target query patterns
- [ ] Plan for features not available in target (stored procs → app code)
 
## Dependency Mapping
- [ ] List all applications connecting to current database
- [ ] Identify third-party tools (BI, CDC, backups) requiring changes
- [ ] Map team responsibilities and skills
- [ ] Identify compliance/audit requirements
- [ ] Document SLAs that must be maintained during migration
 
## Risk Assessment
- [ ] Estimate data loss window acceptable to business
- [ ] Identify rollback scenarios and procedures
- [ ] Plan for extended rollback period
- [ ] Define go/no-go criteria for final cutover

Phase 2: Strategy Selection

There are multiple migration strategies, each with different trade-offs:

Migration Strategy Comparison
Strategy	Description	Downtime	Risk	Complexity
Big Bang	Migrate everything at once during maintenance window	Hours to days	High—no rollback	Low
Parallel Run	Write to both databases; gradually shift reads	Zero	Medium	High
Strangler Pattern	New features on new DB; migrate old gradually	Zero	Low per phase	Medium
Trickle Migration	Move data in small batches over time	Zero	Medium	Medium
Blue-Green	Full copy to new DB; instant cutover	Minimal	Medium	High

Most common approach: Parallel Run with Gradual Traffic Shift

This pattern provides the safest migration for critical systems:

Converting Mermaid diagram...

Data Model Transformation

Migrating between SQL and NoSQL (or vice versa) requires fundamental data model transformation. This is often the most intellectually challenging part of migration.

SQL to Document (e.g., PostgreSQL to MongoDB)

The key transformation is from normalized tables with joins to embedded documents:

sql_to_document.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// SQL Schema (normalized)
/*
CREATE TABLE customers (id, name, email);
CREATE TABLE addresses (id, customer_id, street, city, country, is_primary);
CREATE TABLE orders (id, customer_id, order_date, status);
CREATE TABLE order_items (order_id, product_id, quantity, price);
CREATE TABLE products (id, name, price, category_id);
CREATE TABLE categories (id, name, parent_id);
*/
 
// Document Schema (denormalized for read optimization)
// Decision: What to embed vs reference?
 
// Customer document - embed addresses (1:few, queried together)
{
    "_id": ObjectId("..."),
    "name": "Alice Smith",
    "email": "alice@example.com",
    "addresses": [
        {
            "street": "123 Main St",
            "city": "Seattle",
            "country": "USA",
            "isPrimary": true
        },
        {
            "street": "456 Work Ave",
            "city": "Seattle", 
            "country": "USA",
            "isPrimary": false
        }
    ]
}
 
// Order document - embed items, reference customer and products
// Items embedded (always queried with order)
// Customer referenced (avoid duplication, infrequent access)
// Products referenced with denormalized fields (name, price at time of order)
{
    "_id": ObjectId("..."),
    "customerId": ObjectId("..."),      // Reference
    "customerEmail": "alice@example.com", // Denormalized for notifications
    "orderDate": ISODate("2024-01-15"),
    "status": "shipped",
    "items": [                           // Embedded
        {
            "productId": ObjectId("..."),  // Reference
            "productName": "Widget X",     // Denormalized (snapshot at order time)
            "quantity": 2,
            "unitPrice": 29.99,            // Price at time of order
            "lineTotal": 59.98
        }
    ],
    "shippingAddress": {                 // Embedded (snapshot at order time)
        "street": "123 Main St",
        "city": "Seattle",
        "country": "USA"
    },
    "totals": {
        "subtotal": 59.98,
        "tax": 6.00,
        "shipping": 5.99,
        "total": 71.97
    }
}
 
// Transformation considerations:
// 1. What's queried together? → Embed
// 2. What changes independently? → Reference
// 3. What needs historical snapshot? → Embed with copy
// 4. What would create unbounded arrays? → Reference

Document to SQL (e.g., MongoDB to PostgreSQL)

The reverse transformation normalizes embedded data into tables:

document_to_sql.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- MongoDB document (denormalized)
/*
{
    "_id": ObjectId("..."),
    "title": "Understanding Databases",
    "author": {
        "name": "Jane Expert",
        "bio": "Database specialist...",
        "social": {
            "twitter": "@janedb",
            "github": "janeexpert"
        }
    },
    "tags": ["databases", "nosql", "sql"],
    "comments": [
        {
            "user": "reader1",
            "text": "Great article!",
            "timestamp": ISODate("2024-01-15T10:30:00Z"),
            "likes": 15
        },
        {
            "user": "reader2",
            "text": "Very helpful",
            "timestamp": ISODate("2024-01-15T11:45:00Z"),
            "likes": 8
        }
    ]
}
*/
 
-- Normalized SQL Schema
CREATE TABLE authors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    bio TEXT,
    twitter_handle VARCHAR(100),
    github_handle VARCHAR(100)
);
 
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    mongo_id VARCHAR(24) UNIQUE,  -- Preserve original ID for mapping
    title VARCHAR(500) NOT NULL,
    author_id INTEGER REFERENCES authors(id),
    created_at TIMESTAMP DEFAULT NOW()
);
 
CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) UNIQUE NOT NULL
);
 
CREATE TABLE article_tags (
    article_id INTEGER REFERENCES articles(id) ON DELETE CASCADE,
    tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (article_id, tag_id)
);
 
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    article_id INTEGER REFERENCES articles(id) ON DELETE CASCADE,
    username VARCHAR(100) NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL,
    likes INTEGER DEFAULT 0
);
 
-- Migration queries
-- 1. Extract unique authors
INSERT INTO authors (name, bio, twitter_handle, github_handle)
SELECT DISTINCT 
    doc->>'author.name',
    doc->>'author.bio',
    doc->'author'->'social'->>'twitter',
    doc->'author'->'social'->>'github'
FROM mongo_import;
 
-- 2. Insert articles with author lookup
-- 3. Extract and insert unique tags
-- 4. Create article-tag relationships
-- 5. Unnest comments array into comments table

Data Model Migration Is Application Migration

Migration Implementation Patterns

Here are battle-tested patterns for implementing database migrations safely.

Pattern 1: Dual Write with Shadow Read

During migration, write to both databases but only read from original. Compare results to validate new database:

dual_write_pattern.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
// Dual Write Pattern with Shadow Read Comparison
 
class MigratingOrderRepository {
    private oldDb: PostgresClient;
    private newDb: MongoClient;
    private metrics: MetricsClient;
    private featureFlags: FeatureFlags;
    
    async save(order: Order): Promise<void> {
        // Always write to old database (source of truth during migration)
        await this.oldDb.query(
            'INSERT INTO orders (...) VALUES (...)',
            [order.id, order.customerId, ...]
        );
        
        // Also write to new database (async, non-blocking)
        this.writeToNewDb(order).catch(err => {
            this.metrics.increment('migration.dual_write.new_db_error');
            console.error('New DB write failed:', err);
            // Don't fail the operation - old DB is source of truth
        });
    }
    
    async findById(orderId: string): Promise<Order | null> {
        // Read from old database (source of truth)
        const oldResult = await this.oldDb.query(
            'SELECT * FROM orders WHERE id = $1',
            [orderId]
        );
        
        // Shadow read from new database (async, for comparison)
        if (this.featureFlags.isEnabled('migration.shadow_reads')) {
            this.shadowRead(orderId, oldResult.rows[0]).catch(err => {
                this.metrics.increment('migration.shadow_read.error');
            });
        }
        
        return oldResult.rows[0] || null;
    }
    
    private async shadowRead(orderId: string, oldResult: any): Promise<void> {
        const newResult = await this.newDb.collection('orders')
            .findOne({ _id: orderId });
        
        // Compare results
        const differences = this.compare(oldResult, newResult);
        
        if (differences.length > 0) {
            this.metrics.increment('migration.shadow_read.mismatch');
            console.warn('Data mismatch:', { orderId, differences });
            // Log for investigation, don't fail
        } else {
            this.metrics.increment('migration.shadow_read.match');
        }
    }
    
    private compare(old: any, new_: any): string[] {
        // Implement comparison logic accounting for schema differences
        // Return list of field differences
    }
}
 
// Gradually shift traffic
class MigratingOrderRepositoryV2 {
    async findById(orderId: string): Promise<Order | null> {
        const readFromNew = this.featureFlags.percentage('migration.read_new_db');
        
        if (Math.random() < readFromNew) {
            // Read from new database for this percentage of requests
            const result = await this.newDb.collection('orders')
                .findOne({ _id: orderId });
                
            if (result) {
                this.metrics.increment('migration.read.new_db.success');
                return this.mapToOrder(result);
            } else {
                // Fallback to old if not found (consistency lag)
                this.metrics.increment('migration.read.new_db.fallback');
                return this.readFromOldDb(orderId);
            }
        }
        
        return this.readFromOldDb(orderId);
    }
}

Pattern 2: Strangler Fig Migration

For large systems, migrate functionality piece by piece rather than all at once:

Converting Mermaid diagram...

Pattern 3: CDC-Based Migration

Use Change Data Capture to continuously sync data:

cdc_migration.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# CDC-based migration pipeline
 
# Phase 1: Initial bulk load
initial_load:
  source: PostgreSQL
  target: MongoDB
  method: pg_dump | transform | mongoimport
  estimated_time: "4 hours for 500GB"
  
# Phase 2: Continuous sync via CDC
cdc_pipeline:
  source:
    type: PostgreSQL
    connector: Debezium
    tables:
      - public.orders
      - public.order_items
      - public.customers
      
  transform:
    type: Kafka Streams
    logic: |
      - Join order with items
      - Embed customer address
      - Convert types (SERIAL → ObjectId)
      - Transform timestamps
      
  target:
    type: MongoDB
    connector: MongoDB Kafka Connector
    collection: orders
    write_mode: upsert
    
# Phase 3: Cutover
cutover:
  steps:
    1: Stop writes to old database
    2: Wait for CDC lag to reach zero
    3: Verify row counts match
    4: Switch application to read/write new database
    5: Monitor for 24 hours
    6: Decommission old database (after 30-day safety window)
    
rollback:
  if_within: "30 days"
  method: |
    - Stop writes to new database
    - Bulk export and import to old
    - Resume CDC in reverse direction
    - Switch application back

Application Layer Changes

Database migration isn't just about data—the application must change to work with the new database paradigm. This often represents 50%+ of the total migration effort.

Query Rewriting:

postgres_queries.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- PostgreSQL: Complex JOIN
SELECT 
    o.id,
    o.total,
    c.name AS customer_name,
    c.email
FROM orders o
JOIN customers c 
    ON o.customer_id = c.id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;
 
-- Aggregation with GROUP BY
SELECT 
    DATE(created_at) AS order_date,
    COUNT(*) AS order_count,
    SUM(total) AS daily_revenue
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY DATE(created_at)
ORDER BY order_date;
 
-- Subquery
SELECT *
FROM products
WHERE id IN (
    SELECT product_id 
    FROM order_items
    GROUP BY product_id
    HAVING COUNT(*) > 100
);

mongodb_queries.js
MongoDB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// MongoDB: Denormalized (no join needed)
db.orders.find({
    status: "pending",
    createdAt: { 
        $gt: new Date(Date.now() - 7*24*60*60*1000) 
    }
})
.sort({ createdAt: -1 })
.limit(50);
// Note: customer name/email embedded
 
// Aggregation pipeline
db.orders.aggregate([
    { $match: { 
        createdAt: { $gte: new Date("2024-01-01") } 
    }},
    { $group: {
        _id: { $dateToString: { 
            format: "%Y-%m-%d", 
            date: "$createdAt" 
        }},
        orderCount: { $sum: 1 },
        dailyRevenue: { $sum: "$total" }
    }},
    { $sort: { _id: 1 } }
]);
 
// Subquery equivalent
const popularProducts = await db.orderItems.aggregate([
    { $group: { _id: "$productId", count: { $sum: 1 }}},
    { $match: { count: { $gt: 100 }}}
]).toArray();
const ids = popularProducts.map(p => p._id);
db.products.find({ _id: { $in: ids }});

Repository Abstraction for Migration:

Design your application to support both databases during migration:

repository_abstraction.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Abstract repository interface - database agnostic
interface OrderRepository {
    findById(id: string): Promise<Order | null>;
    findByCustomer(customerId: string, limit: number): Promise<Order[]>;
    save(order: Order): Promise<void>;
    findPendingOrders(olderThanDays: number): Promise<Order[]>;
}
 
// PostgreSQL implementation
class PostgresOrderRepository implements OrderRepository {
    async findById(id: string): Promise<Order | null> {
        const result = await this.pool.query(
            `SELECT o.*, c.name as customer_name, c.email as customer_email
             FROM orders o
             JOIN customers c ON o.customer_id = c.id
             WHERE o.id = $1`,
            [id]
        );
        return result.rows[0] ? this.mapToOrder(result.rows[0]) : null;
    }
    
    // ... other methods
}
 
// MongoDB implementation
class MongoOrderRepository implements OrderRepository {
    async findById(id: string): Promise<Order | null> {
        const doc = await this.collection.findOne({ _id: new ObjectId(id) });
        return doc ? this.mapToOrder(doc) : null;
    }
    
    // ... other methods
}
 
// Migration-aware factory
function createOrderRepository(config: Config): OrderRepository {
    switch (config.database.migration_phase) {
        case 'postgres_only':
            return new PostgresOrderRepository(config.postgres);
            
        case 'dual_write':
            return new DualWriteOrderRepository(
                new PostgresOrderRepository(config.postgres),
                new MongoOrderRepository(config.mongo)
            );
            
        case 'mongo_primary':
            return new MongoOrderRepository(config.mongo);
            
        default:
            throw new Error('Unknown migration phase');
    }
}
 
// Application code remains unchanged throughout migration
class OrderService {
    constructor(private orderRepo: OrderRepository) {}
    
    async getOrder(id: string): Promise<Order | null> {
        return this.orderRepo.findById(id);
    }
}

Invest in Abstraction Early

Testing and Validation

Thorough testing is the difference between successful migrations and data disasters. Testing must cover data integrity, performance, and application behavior.

Data Validation Testing:

data_validation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Data validation framework for migration
 
import hashlib
from typing import Dict, List
 
class MigrationValidator:
    def __init__(self, source_db, target_db):
        self.source = source_db
        self.target = target_db
        self.errors: List[str] = []
    
    def validate_row_counts(self) -> bool:
        """Verify all entities migrated"""
        tables = [
            ('customers', 'customers'),
            ('orders', 'orders'),
            ('products', 'products'),
        ]
        
        for source_table, target_collection in tables:
            source_count = self.source.execute(
                f"SELECT COUNT(*) FROM {source_table}"
            ).fetchone()[0]
            
            target_count = self.target[target_collection].count_documents({})
            
            if source_count != target_count:
                self.errors.append(
                    f"{source_table}: source={source_count}, target={target_count}"
                )
        
        return len(self.errors) == 0
    
    def validate_data_integrity(self, sample_size: int = 10000) -> bool:
        """Spot check data values match"""
        # Random sample of IDs from source
        sample_ids = self.source.execute(
            f"SELECT id FROM orders ORDER BY RANDOM() LIMIT {sample_size}"
        ).fetchall()
        
        mismatches = []
        for (order_id,) in sample_ids:
            source_order = self.fetch_source_order(order_id)
            target_order = self.fetch_target_order(order_id)
            
            diffs = self.compare_orders(source_order, target_order)
            if diffs:
                mismatches.append({'id': order_id, 'diffs': diffs})
        
        if mismatches:
            self.errors.extend([
                f"Data mismatch: {m['id']}: {m['diffs']}" 
                for m in mismatches[:10]  # Log first 10
            ])
        
        mismatch_rate = len(mismatches) / sample_size
        print(f"Mismatch rate: {mismatch_rate:.4%}")
        return mismatch_rate < 0.0001  # < 0.01% acceptable
    
    def validate_checksums(self) -> bool:
        """Verify critical numeric fields sum correctly"""
        checks = [
            # (source_query, target_pipeline, field_name)
            (
                "SELECT SUM(total) FROM orders WHERE status = 'completed'",
                [{"$match": {"status": "completed"}}, 
                 {"$group": {"_id": None, "sum": {"$sum": "$total"}}}],
                "completed_order_total"
            ),
        ]
        
        for source_query, target_pipeline, field_name in checks:
            source_sum = self.source.execute(source_query).fetchone()[0]
            target_result = list(self.target.orders.aggregate(target_pipeline))
            target_sum = target_result[0]['sum'] if target_result else 0
            
            # Allow for floating point precision
            if abs(source_sum - target_sum) > 0.01:
                self.errors.append(
                    f"{field_name}: source={source_sum}, target={target_sum}"
                )
        
        return len(self.errors) == 0
 
# Run validation suite
validator = MigrationValidator(postgres_conn, mongo_client)
results = {
    'row_counts': validator.validate_row_counts(),
    'data_integrity': validator.validate_data_integrity(),
    'checksums': validator.validate_checksums(),
}
 
if all(results.values()):
    print("✅ Migration validation passed")
else:
    print("❌ Migration validation failed")
    print("Errors:", validator.errors)

Performance Testing:

Performance Validation Checklist

•Query latency comparison — Run top 50 queries against both databases; compare p50, p95, p99
•Write throughput — Load test writes at 2x current peak; verify target handles load
•Concurrent connections — Verify target handles max concurrent connections
•Index effectiveness — Confirm indexes cover critical queries (EXPLAIN plans)
•Cold start performance — Test after restart with cold cache
•Degradation under load — Observe latency as load increases

Test with Production Data Volumes

Cutover and Rollback Planning

The final cutover is the highest-risk moment in any migration. Meticulous planning and clear rollback procedures are essential.

Cutover Checklist:

cutover_runbook.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Migration Cutover Runbook
 
## Pre-Cutover (T-24 hours)
- [ ] All stakeholders notified
- [ ] Support team briefed on potential issues
- [ ] Rollback procedure tested in staging
- [ ] All validation tests passing
- [ ] CDC lag < 10 seconds consistently
- [ ] Go/no-go meeting completed
 
## Pre-Cutover (T-1 hour)
- [ ] Freeze all deployments
- [ ] Confirm team availability for duration
- [ ] Open communication channel (Slack, bridge call)
- [ ] Prepare status update templates
 
## Cutover Procedure
 
### Step 1: Stop Writes (T+0:00)
- [ ] Enable maintenance mode / stop application writes
- [ ] Verify no new writes in old database (check WAL position)
- Record time: _____________
 
### Step 2: Final Sync (T+0:05)
- [ ] Wait for CDC lag to reach zero
- [ ] Run final validation (row counts, checksums)
- [ ] If validation fails → ABORT and rollback
- Record time: _____________
 
### Step 3: Switch Configuration (T+0:15)
- [ ] Deploy application config pointing to new database
- [ ] Verify application connecting to new database
- [ ] Enable writes to new database
- Record time: _____________
 
### Step 4: Verification (T+0:20)
- [ ] Smoke test critical paths manually
- [ ] Verify metrics (error rates, latencies)
- [ ] Check all services healthy
- Record time: _____________
 
### Step 5: Re-enable Traffic (T+0:30)
- [ ] Disable maintenance mode
- [ ] Monitor closely for 30 minutes
- [ ] If issues → ROLLBACK
- Record time: _____________
 
## Success Criteria (before declaring victory)
- [ ] Error rate < 0.1% for 30 minutes
- [ ] p99 latency within 20% of baseline
- [ ] All critical transactions completing
- [ ] No data integrity alerts
 
## Rollback Trigger Conditions
- Error rate > 1%
- Critical transaction failures
- Data corruption detected
- Latency > 3x baseline for > 5 minutes

Rollback Strategy:

Always maintain the ability to roll back, even after cutover:

Rollback Requirements

•Keep old database running — Maintain for 30+ days after cutover
•Reverse sync pipeline — CDC from new database to old for quick rollback
•Feature flags for instant switch — Don't require deployment to rollback
•Data export capability — Ability to export from new and import to old
•Documented procedure — Step-by-step rollback tested in staging

The Parallel Run Safety Net

Summary: Navigating Database Migrations

We've comprehensively explored the challenging world of database migrations. Let's consolidate the essential insights:

Key Takeaways

•Migrations are expensive — 6-18 months, significant engineering effort, substantial risk. Ensure benefits clearly justify costs.
•Consider alternatives first — Optimization, caching, read replicas, and polyglot approaches may solve problems without full migration.
•Planning is critical — Spend 20-30% of timeline on assessment, strategy selection, and detailed runbook creation.
•Data model transformation is hard — Converting between normalized and denormalized models requires deep understanding of both paradigms.
•Parallel run patterns provide safety — Dual writes with gradual traffic shift minimize risk for critical systems.
•Application changes dominate effort — Query rewriting, repository abstraction, and testing often exceed data migration work.
•Testing must be comprehensive — Row counts, data integrity, checksums, and production-scale performance testing are non-negotiable.
•Always have a rollback plan — Maintain old database, implement reverse sync, and keep rollback capability for 30+ days.

Module Completion:

This concludes Module 6: NoSQL vs SQL. You've now mastered:

Comprehensive comparison of SQL and NoSQL paradigms
Decision frameworks for when to use each
Polyglot persistence architecture patterns
Database migration strategies and best practices

You're now equipped to make informed database technology decisions and guide teams through complex database evolution.

Module Complete

5 / 5