System Design (HLD)Database Decomposition

Database Decomposition

LevelAdvanced

Duration90 mins

TopicDatabase Decomposition

3 / 5

Data Migration Strategies

Moving Data Without Breaking Everything

You've decided to decompose your shared database. You understand why Database per Service is the right architecture. Now comes the hard part: actually moving the data.

Data migration in a live system is one of the most challenging operations in software engineering. Unlike code deployments, which can often be rolled back instantly, data migrations involve state changes that are much harder to reverse. A botched migration can mean hours of downtime, data corruption, or worse—permanent data loss.

The stakes are high, but the techniques are well-established. With careful planning and the right patterns, you can migrate data safely, incrementally, and with minimal disruption.

What You Will Learn

This page provides comprehensive coverage of data migration strategies. You'll learn the Parallel Run pattern for safe migrations, the Strangler Fig approach applied to data, techniques for maintaining data synchronization during migration, strategies for handling rollback, and practical considerations for executing migrations in production systems.

Migration Philosophy: Incremental Over Big Bang

The single most important principle in data migration is: never do a big-bang migration if you can possibly avoid it.

A big-bang migration attempts to move all data at once, typically during a maintenance window. While conceptually simple, this approach carries extreme risk:

Big Bang Migration Risks

•Extended downtime required
•All-or-nothing success
•Rollback is extremely difficult
•No opportunity to validate gradually
•High pressure, high stress
•Data volume may exceed time window

Incremental Migration Benefits

•Zero or minimal downtime
•Gradual verification at each step
•Easy rollback at any stage
•Issues caught early with small blast radius
•Team can respond calmly to problems
•Handles any data volume

The Incremental Migration Mindset

Incremental migration means:

Start with a copy, not a move — The new database receives data while the old database continues serving traffic
Verify before switching — Thoroughly validate that the new database is correct before any cutover
Switch traffic gradually — Move read traffic first, then writes, potentially percentage by percentage
Maintain rollback capability — At every step, ensure you can go back to the previous state
Decommission only when confident — The old database stays until the new one is proven

The Zero-Downtime Goal

Modern users expect 24/7 availability. Your migration strategy should target zero downtime. While brief "freeze" periods might be necessary for final cutover, the goal is that users never notice the migration happening. Every technique in this page aims for this goal.

The Parallel Run Pattern

The Parallel Run Pattern is the most robust approach to database migration. It involves running both the old and new databases simultaneously, with mechanisms to keep them synchronized and compare their outputs.

How it works:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Phase 1: Initial Sync
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Shared Database (Primary)                                     │
│   ├── users table                                               │
│   ├── orders table                                              │
│   └── products table                                            │
│         │                                                       │
│         │ Initial data copy (bulk migration)                    │
│         ▼                                                       │
│   User Service Database (Shadow)                                │
│   └── users table (copy of users, read-only initially)         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
 
Phase 2: Dual Write
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   User Service                                                  │
│         │                                                       │
│         ├──────────────────────────────────────────────────────┤
│         │                                                       │
│   Write ├───> Shared Database (Primary source of truth)        │
│         │                                                       │
│         └───> User Service DB (Shadow, verified for correctness)│
│                                                                 │
│   Read  ─────> Shared Database (still primary)                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
 
Phase 3: Shadow Read Comparison
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   User Service                                                  │
│         │                                                       │
│   Read  ├───> Shared Database (returns response to user)        │
│         │                                                       │
│         └───> User Service DB (compared, discrepancies logged) │
│                                                                 │
│   Comparison engine logs any differences                        │
│   Team investigates and resolves before cutover                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
 
Phase 4: Traffic Shift
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   User Service                                                  │
│         │                                                       │
│   Read  ├─[10%]─> User Service DB (new primary)                │
│         │                                                       │
│         └─[90%]─> Shared Database                               │
│                                                                 │
│   Write ───> Both databases (dual write continues)             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
 
Phase 5: Complete Cutover
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   User Service                                                  │
│         │                                                       │
│   Read  ─────> User Service DB (100%)                           │
│   Write ─────> User Service DB (new source of truth)           │
│                                                                 │
│   Shared Database (read-only backup, pending decommission)     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Implementation: Dual Write with Comparison

The critical mechanism is dual-write combined with read comparison:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class UserRepository {
  constructor(
    private legacyDb: LegacyDatabase,
    private newDb: NewDatabase,
    private migrationConfig: MigrationConfig,
  ) {}
 
  async createUser(userData: CreateUserData): Promise<User> {
    // Always write to legacy (source of truth during migration)
    const user = await this.legacyDb.users.create(userData);
    
    // Also write to new database (shadow)
    try {
      await this.newDb.users.create({
        id: user.id,  // Use same ID for correlation
        ...userData,
      });
    } catch (error) {
      // Log but don't fail - shadow write failure is not critical
      this.metrics.increment('migration.shadow_write_failure');
      this.logger.error('Shadow write failed', { error, userId: user.id });
    }
    
    return user;
  }
 
  async getUser(userId: string): Promise<User | null> {
    // Read from legacy (source of truth)
    const legacyUser = await this.legacyDb.users.findById(userId);
    
    // Optionally compare with new database
    if (this.migrationConfig.enableReadComparison) {
      this.compareInBackground(userId, legacyUser);
    }
    
    // Optionally read from new database based on traffic percentage
    if (this.shouldReadFromNew()) {
      return this.newDb.users.findById(userId);
    }
    
    return legacyUser;
  }
 
  private async compareInBackground(userId: string, legacyUser: User | null) {
    // Non-blocking comparison
    setImmediate(async () => {
      try {
        const newUser = await this.newDb.users.findById(userId);
        const discrepancies = this.findDiscrepancies(legacyUser, newUser);
        
        if (discrepancies.length > 0) {
          this.metrics.increment('migration.read_discrepancy');
          this.logger.warn('Data discrepancy detected', {
            userId,
            discrepancies,
          });
        }
      } catch (error) {
        this.logger.error('Comparison failed', { error, userId });
      }
    });
  }
 
  private shouldReadFromNew(): boolean {
    // Gradual traffic shift based on configuration
    const percentage = this.migrationConfig.newDbReadPercentage;
    return Math.random() * 100 < percentage;
  }
}

Comparison Granularity

Read comparisons can be expensive. Consider sampling (compare 1% of reads) rather than comparing every read. Use background jobs to periodically reconcile the entire dataset. Log discrepancies with enough context to investigate root causes.

Initial Data Synchronization

Before parallel running can begin, you need to populate the new database with existing data. This initial synchronization is often the most time-consuming part of the migration.

Approach 1: Direct Database Copy

For smaller datasets, direct tools work well:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# PostgreSQL: pg_dump and pg_restore
pg_dump -h legacy-db -d monolith -t users -t user_profiles \
  | psql -h user-service-db -d userservice
 
# MySQL: mysqldump
mysqldump -h legacy-db monolith users user_profiles \
  | mysql -h user-service-db userservice
 
# AWS DMS for cloud migrations
aws dms create-replication-task \
  --replication-task-identifier user-migration \
  --source-endpoint-arn arn:aws:dms:...:endpoint:source \
  --target-endpoint-arn arn:aws:dms:...:endpoint:target \
  --migration-type full-load-and-cdc \
  --table-mappings file://user-table-mappings.json

Approach 2: ETL Pipeline for Transformations

When the new schema differs from the legacy schema, an ETL pipeline handles transformation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class UserMigrationPipeline {
  async runInitialMigration() {
    const batchSize = 10000;
    let offset = 0;
    let hasMore = true;
 
    while (hasMore) {
      // Extract: Fetch batch from legacy database
      const legacyUsers = await this.legacyDb.query(`
        SELECT id, email, first_name, last_name, phone, 
               street_address, city, state, zip, country,
               created_at, updated_at
        FROM users
        ORDER BY id
        LIMIT $1 OFFSET $2
      `, [batchSize, offset]);
 
      if (legacyUsers.length === 0) {
        hasMore = false;
        continue;
      }
 
      // Transform: Convert to new schema
      const transformedUsers = legacyUsers.map(legacy => ({
        id: legacy.id,
        email: legacy.email.toLowerCase(),  // Normalize email
        name: {
          first: legacy.first_name,
          last: legacy.last_name,
        },
        contact: {
          phone: this.normalizePhone(legacy.phone),
        },
        address: {
          street: legacy.street_address,
          city: legacy.city,
          state: legacy.state,
          postalCode: legacy.zip,
          country: this.normalizeCountry(legacy.country),
        },
        metadata: {
          migratedAt: new Date(),
          legacyId: legacy.id,
        },
        createdAt: legacy.created_at,
        updatedAt: legacy.updated_at,
      }));
 
      // Load: Insert into new database
      await this.newDb.users.bulkInsert(transformedUsers);
 
      // Progress tracking
      offset += batchSize;
      this.logger.info(`Migrated ${offset} users`);
      
      // Throttle to avoid overwhelming databases
      await this.sleep(100);
    }
 
    this.logger.info('Initial migration complete');
  }
 
  private normalizePhone(phone: string): string {
    // Remove non-numeric characters, format consistently
    return phone?.replace(/\D/g, '') || null;
  }
 
  private normalizeCountry(country: string): string {
    // Convert to ISO country code
    return countryCodeMapping[country?.toLowerCase()] || country;
  }
}

Approach 3: Change Data Capture (CDC)

For large datasets where initial copy takes hours or days, CDC ensures you don't fall behind during the copy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Timeline of CDC Migration:
 
T=0: Start CDC capture on legacy database
     All changes are captured and queued
 
T=0 to T+8h: Run initial bulk copy
     - Legacy: 10 million records copied
     - CDC queue: 50,000 changes accumulated
 
T+8h: Apply accumulated CDC changes
     - Process 50,000 queued changes
     - Takes 10 minutes
 
T+8h 10m: Steady state
     - Bulk copy done
     - CDC queue drained
     - Real-time sync begins
 
CDC Workflow:
┌────────────────────────────────────────────────────────────────┐
│                                                                │
│  Legacy DB ──[Change Log/WAL]──> CDC Connector                │
│                                       │                        │
│                                       ▼                        │
│                              Kafka / Event Queue               │
│                                       │                        │
│                                       ▼                        │
│  New DB <────────────────────── CDC Consumer                  │
│                               applies changes                  │
│                                                                │
└────────────────────────────────────────────────────────────────┘

CDC Tools

Popular CDC tools include Debezium (open source, works with Kafka), AWS DMS, Google Datastream, and Azure Data Factory. These tools capture changes at the database log level, ensuring no changes are missed even during high-load periods.

Handling Writes During Migration

The trickiest part of migration is handling writes while both databases are active. Several strategies exist, each with tradeoffs.

Strategy 1: Legacy-Primary Dual Write

Writes go to legacy first, then replicate to new. Legacy remains the source of truth.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
async createUser(userData: CreateUserData): Promise<User> {
  // Step 1: Write to legacy (synchronous, must succeed)
  const user = await this.legacyDb.users.create(userData);
  
  // Step 2: Replicate to new (async, failure tolerated)
  this.replicateToNew(user).catch(err => {
    this.logger.error('Replication failed', { err, userId: user.id });
    this.queueForRetry(user);  // Will retry later
  });
  
  return user;
}
 
// Periodic job catches up any missed replications
async reconcileDatabases() {
  const legacyUsers = await this.legacyDb.users.findModifiedSince(
    this.lastReconcileTime
  );
  
  for (const user of legacyUsers) {
    await this.replicateToNew(user);
  }
}

Strategy 2: Transaction Outbox Pattern

For reliable replication without distributed transactions, use the outbox pattern:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
async createUser(userData: CreateUserData): Promise<User> {
  // Single transaction ensures atomicity
  return this.legacyDb.transaction(async (tx) => {
    // Create the user
    const user = await tx.users.create(userData);
    
    // Record the change in outbox table (same transaction)
    await tx.outbox.create({
      aggregateType: 'User',
      aggregateId: user.id,
      eventType: 'UserCreated',
      payload: JSON.stringify(user),
      status: 'pending',
      createdAt: new Date(),
    });
    
    return user;
  });
}
 
// Separate process polls outbox and applies to new database
class OutboxProcessor {
  async process() {
    const pendingEvents = await this.legacyDb.outbox.findByStatus('pending');
    
    for (const event of pendingEvents) {
      try {
        await this.applyToNewDatabase(event);
        await this.legacyDb.outbox.updateStatus(event.id, 'processed');
      } catch (error) {
        await this.legacyDb.outbox.updateStatus(event.id, 'failed');
        this.logger.error('Outbox processing failed', { event, error });
      }
    }
  }
}

Strategy 3: New-Primary with Backfill

In the final phase of migration, the new database becomes primary:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class UserRepository {
  constructor(private config: MigrationConfig) {}
 
  async createUser(userData: CreateUserData): Promise<User> {
    if (this.config.newDatabaseIsPrimary) {
      // New database is source of truth
      const user = await this.newDb.users.create(userData);
      
      // Backfill to legacy for rollback safety
      this.backfillToLegacy(user).catch(err => {
        this.logger.warn('Legacy backfill failed', { err, userId: user.id });
        // Not critical - legacy is no longer primary
      });
      
      return user;
    } else {
      // Legacy is still primary (previous code path)
      return this.legacyPrimaryCreate(userData);
    }
  }
}

Write Strategy Comparison
Strategy	Complexity	Consistency	Rollback Safety
Legacy-Primary Dual Write	Low	Eventual (new may lag)	Excellent
Transaction Outbox	Medium	Guaranteed delivery	Excellent
CDC Streaming	High	Near real-time	Excellent
New-Primary with Backfill	Low	New is authoritative	Good (requires backfill)

Never Lose Writes

The cardinal rule of write handling: never lose a write. If replication fails, queue for retry. If the queue fails, log persistently. Have reconciliation jobs that catch any gaps. Audit regularly to ensure counts match between databases.

Traffic Shifting Strategies

Once the new database is synchronized and verified, you can begin shifting traffic. The goal is gradual, controllable, reversible traffic migration.

Percentage-Based Traffic Split

The most common approach: route a percentage of traffic to the new database, increasing over time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class UserRepository {
  async getUser(userId: string): Promise<User | null> {
    const config = await this.featureFlags.get('user_db_migration');
    
    // Route based on percentage
    if (this.shouldUseNewDatabase(config.readPercentage)) {
      try {
        return await this.newDb.users.findById(userId);
      } catch (error) {
        // Fallback to legacy on error
        this.metrics.increment('migration.new_db_fallback');
        return this.legacyDb.users.findById(userId);
      }
    }
    
    return this.legacyDb.users.findById(userId);
  }
 
  private shouldUseNewDatabase(percentage: number): boolean {
    // Deterministic routing based on request ID for consistency
    const requestHash = hash(this.requestContext.requestId);
    return (requestHash % 100) < percentage;
  }
}
 
// Progressive rollout schedule
// Day 1: 1% read traffic to new DB, monitor closely
// Day 2: 5% read traffic if Day 1 successful
// Day 3: 10% read traffic
// Day 5: 25% read traffic
// Day 7: 50% read traffic
// Day 10: 100% read traffic
// Day 14: Switch write traffic to new DB as primary

User/Segment-Based Migration

For more control, migrate specific user segments first:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class UserRepository {
  async getUser(userId: string): Promise<User | null> {
    const user = await this.getRoutingInfo(userId);
    
    // Internal users first
    if (user.isInternalUser) {
      return this.newDb.users.findById(userId);
    }
    
    // Then beta users who opted in
    if (user.isBetaTester) {
      return this.newDb.users.findById(userId);
    }
    
    // Then users by region (easier to support during business hours)
    if (this.isMigratedRegion(user.region)) {
      return this.newDb.users.findById(userId);
    }
    
    // Everyone else stays on legacy until their segment is migrated
    return this.legacyDb.users.findById(userId);
  }
 
  private async isMigratedRegion(region: string): Promise<boolean> {
    const migratedRegions = ['US-WEST', 'EU-WEST'];  // Gradually expand
    return migratedRegions.includes(region);
  }
}

Sticky Routing

Ensure a user's traffic consistently goes to the same database during their session to avoid confusing experiences:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class DatabaseRouter {
  async route(userId: string): Promise<Database> {
    // Use consistent hashing for deterministic routing
    const bucket = consistentHash(userId) % 100;
    const cutoverBucket = await this.featureFlags.get('migration_cutover_bucket');
    
    if (bucket < cutoverBucket) {
      return this.newDb;
    }
    return this.legacyDb;
  }
}
 
// The same user always routes to the same database
// Moving cutoverBucket from 0 to 100 gradually migrates all users
// User with bucket=25 moves when cutoverBucket reaches 26
// Prevents user from flip-flopping between databases

Monitoring During Traffic Shift

During traffic shifting, monitor aggressively: latency percentiles, error rates, data discrepancy counts, and business metrics. Automated alerts should trigger if any metric degrades beyond threshold. Use feature flags that can instantly route 100% traffic back to legacy if problems emerge.

Rollback Strategies

A robust migration plan includes detailed rollback procedures at every stage. If anything goes wrong, you must be able to return to a known-good state quickly.

Level 1: Traffic Rollback (Instant)

The fastest rollback: redirect all traffic back to the legacy database.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Emergency rollback - takes effect immediately
await featureFlags.set('user_db_migration', {
  readPercentage: 0,      // All reads go to legacy
  writeToNew: false,      // Stop writing to new DB
  newIsPrimary: false,    // Ensure legacy is source of truth
});
 
// Or via CLI/API call
$ curl -X POST https://api.featureflags.io/flags/user_db_migration/disable
 
// Alert team
await alerting.critical('Database migration rolled back', {
  reason: 'Error rate exceeded threshold',
  rollbackTime: new Date(),
  trafficPercentage: previousPercentage,
});

Level 2: Data Rollback (Careful)

If data in the new database has diverged incorrectly, you may need to correct it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// If new DB has incorrect data, re-sync from legacy
class MigrationRollback {
  async resyncFromLegacy(startTime: Date, endTime: Date) {
    // Find all records modified during the problematic period
    const affectedRecords = await this.legacyDb.users.findModifiedBetween(
      startTime,
      endTime
    );
    
    this.logger.warn(`Rolling back ${affectedRecords.length} records`);
    
    for (const legacyRecord of affectedRecords) {
      // Overwrite new DB with legacy data
      await this.newDb.users.upsert(
        this.transformForNewSchema(legacyRecord)
      );
      
      // Log for audit
      await this.auditLog.record({
        action: 'migration_rollback',
        recordId: legacyRecord.id,
        reason: 'data_correction',
        rolledBackAt: new Date(),
      });
    }
  }
}

Level 3: Post-Cutover Rollback (Complex)

If you've already cut over to the new database as primary and need to roll back, you must sync changes back to legacy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Post-Cutover Rollback Procedure:
 
1. STOP: Halt all writes (brief downtime may be necessary)
 
2. SYNC: Apply all changes from new DB back to legacy
   - Query new DB for all records modified since cutover
   - Transform to legacy schema
   - Apply to legacy DB
   
3. VERIFY: Ensure legacy has all data
   - Run reconciliation queries
   - Compare record counts
   - Validate critical business data
 
4. SWITCH: Redirect traffic to legacy
   - Update feature flags
   - Verify traffic is flowing to legacy
   - Monitor closely
 
5. RESUME: Allow writes
   - Legacy is now primary again
   - New DB becomes shadow (or is paused)
 
Timeline: 15-60 minutes depending on data volume and verification needs

Rollback Window

The ability to roll back degrades over time. Once you've been running on the new database for weeks and the legacy database is stale, rollback becomes a major undertaking. Define a "point of no return" and ensure you're confident before crossing it. Keep legacy data synchronized for as long as practically possible.

Data Validation and Reconciliation

Continuous validation ensures the migration is proceeding correctly. Never assume data made it—verify.

Real-Time Comparison

Compare results from both databases in real-time:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
class DataComparator {
  async compareRead(userId: string): Promise<void> {
    const [legacyResult, newResult] = await Promise.all([
      this.legacyDb.users.findById(userId),
      this.newDb.users.findById(userId),
    ]);
 
    const comparison = this.compare(legacyResult, newResult);
    
    if (!comparison.isEqual) {
      this.metrics.increment('migration.comparison.mismatch');
      
      await this.discrepancyLog.record({
        entity: 'User',
        entityId: userId,
        differences: comparison.differences,
        legacyValue: this.redact(legacyResult),
        newValue: this.redact(newResult),
        timestamp: new Date(),
      });
      
      // Alert if mismatch rate exceeds threshold
      if (await this.mismatchRateExceedsThreshold()) {
        await this.alertMigrationTeam('Mismatch rate too high');
      }
    } else {
      this.metrics.increment('migration.comparison.match');
    }
  }
 
  private compare(legacy: any, newRecord: any): ComparisonResult {
    const differences: Difference[] = [];
    
    // Compare each field, accounting for schema transformations
    for (const field of this.comparisonFields) {
      const legacyValue = this.extractField(legacy, field.legacyPath);
      const newValue = this.extractField(newRecord, field.newPath);
      
      if (!field.comparator(legacyValue, newValue)) {
        differences.push({
          field: field.name,
          legacyValue,
          newValue,
        });
      }
    }
    
    return {
      isEqual: differences.length === 0,
      differences,
    };
  }
}

Batch Reconciliation Jobs

Periodic full reconciliation catches any discrepancies missed by real-time comparison:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class ReconciliationJob {
  async runFullReconciliation(): Promise<ReconciliationReport> {
    const report: ReconciliationReport = {
      startTime: new Date(),
      totalLegacyRecords: 0,
      totalNewRecords: 0,
      missingInNew: [],
      missingInLegacy: [],
      mismatches: [],
    };
 
    // Get all IDs from both databases
    const legacyIds = new Set(await this.legacyDb.users.getAllIds());
    const newIds = new Set(await this.newDb.users.getAllIds());
 
    report.totalLegacyRecords = legacyIds.size;
    report.totalNewRecords = newIds.size;
 
    // Find records missing in new database
    for (const id of legacyIds) {
      if (!newIds.has(id)) {
        report.missingInNew.push(id);
      }
    }
 
    // Find records in new but not in legacy (shouldn't happen normally)
    for (const id of newIds) {
      if (!legacyIds.has(id)) {
        report.missingInLegacy.push(id);
      }
    }
 
    // Compare content of matching records (sample for large datasets)
    const sampleIds = this.sample(Array.from(legacyIds), 10000);
    for (const id of sampleIds) {
      const [legacy, newRecord] = await Promise.all([
        this.legacyDb.users.findById(id),
        this.newDb.users.findById(id),
      ]);
      
      if (!this.recordsMatch(legacy, newRecord)) {
        report.mismatches.push({
          id,
          differences: this.findDifferences(legacy, newRecord),
        });
      }
    }
 
    report.endTime = new Date();
    return report;
  }
}

Validation Checkpoints
Checkpoint	What to Verify	Action if Failed
After initial sync	Record counts match; sample data matches	Re-run sync; investigate gaps
During dual-write	Writes appear in both DBs within SLA	Check replication; fix and resync
Before traffic shift	Full reconciliation passes	Do not proceed until resolved
During traffic shift	Error rates stable; latency acceptable	Roll back traffic percentage
After cutover	Business metrics normal	Keep legacy hot; prepare rollback

Automated Validation Gates

Automate validation gates in your migration pipeline. Traffic shift to the next percentage level should be blocked until validation passes. Human approval should be required for major milestones (50%, 100%, write cutover).

Handling Foreign Key Dependencies

One of the trickiest aspects of database decomposition is handling foreign key relationships that span what will become separate databases.

The Problem: Cross-Service Foreign Keys

In a shared database, foreign keys enforce referential integrity:

1
2
3
4
5
6
7
8
9
10
-- Current state: Foreign keys across domains
CREATE TABLE orders (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES users(id),  -- Will be in different DB
  product_id UUID NOT NULL REFERENCES products(id),  -- Will be in different DB
  created_at TIMESTAMP
);
 
-- Problem: When orders moves to its own database,
-- these foreign keys cannot exist across databases

Solution: Application-Level Referential Integrity

Replace database foreign keys with application-level validation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class OrderService {
  async createOrder(orderData: CreateOrderInput): Promise<Order> {
    // Validate references exist before proceeding
    await this.validateReferences(orderData);
    
    // Store only the ID reference, not a foreign key
    const order = await this.orderDb.orders.create({
      id: generateId(),
      userId: orderData.userId,      // Just an ID, no FK constraint
      productIds: orderData.items.map(i => i.productId),  // Just IDs
      ...orderData,
    });
    
    return order;
  }
 
  private async validateReferences(orderData: CreateOrderInput): Promise<void> {
    // Check user exists via User Service API
    const user = await this.userServiceClient.getUser(orderData.userId);
    if (!user) {
      throw new ValidationError(`User ${orderData.userId} not found`);
    }
    if (!user.canPlaceOrders) {
      throw new ValidationError(`User ${orderData.userId} cannot place orders`);
    }
 
    // Check products exist via Product Service API
    const productIds = orderData.items.map(i => i.productId);
    const products = await this.productServiceClient.getProducts(productIds);
    
    const foundIds = new Set(products.map(p => p.id));
    const missingIds = productIds.filter(id => !foundIds.has(id));
    
    if (missingIds.length > 0) {
      throw new ValidationError(`Products not found: ${missingIds.join(', ')}`);
    }
  }
}

Migration Sequence for FK Dependencies

The order of migration matters when foreign keys are involved:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Dependency graph:
orders -> users (orders.user_id references users.id)
orders -> products (orders.product_id references products.id)
 
Migration sequence:
 
Step 1: Add application-level validation (while FKs still exist)
        - Order Service validates via API before creating orders
        - Both validation paths active: app + DB FK
 
Step 2: Drop foreign key constraints
        ALTER TABLE orders DROP CONSTRAINT orders_user_id_fkey;
        ALTER TABLE orders DROP CONSTRAINT orders_product_id_fkey;
        - Application validation is now the only enforcement
        - Test thoroughly
 
Step 3: Migrate tables to separate databases
        - users -> User Service DB
        - products -> Product Service DB
        - orders -> Order Service DB
 
Step 4: Handle orphaned references (cleanup)
        - Find any orders referencing non-existent users
        - Decide: soft-delete, archive, or flag for review

Orphaned References

Once you remove FK constraints, orphaned references become possible. A user could be deleted while orders referencing them exist. Your application must handle this gracefully: show 'deleted user' instead of crashing, or implement soft deletes, or use event-driven cleanup to cascade deletions.

Migration Tooling and Automation

Successful migrations require robust tooling. Manual processes at scale are error-prone and exhausting for engineers.

Essential Migration Tooling:

Migration Tool Categories

•CDC Systems — Debezium, AWS DMS, Google Datastream, Azure Data Factory for real-time data sync
•Feature Flags — LaunchDarkly, Split, Unleash for traffic routing control and instant rollback
•Comparison/Reconciliation — Custom tooling or libraries like Diffy for data comparison
•Monitoring & Alerting — Datadog, New Relic, Prometheus/Grafana for migration-specific dashboards
•Schema Migration — Flyway, Liquibase, Prisma Migrate for schema versioning in both databases
•Orchestration — Airflow, Temporal, Step Functions for coordinating migration jobs

Migration Dashboard

A dedicated migration dashboard provides visibility into progress:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
┌─────────────────────────────────────────────────────────────────────┐
│                    USER SERVICE MIGRATION DASHBOARD                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  SYNC STATUS                         TRAFFIC ROUTING                │
│  ════════════                        ═══════════════                │
│  Legacy records:     10,234,567      Read traffic:                  │
│  New DB records:     10,234,565      ├── Legacy: 60%                │
│  Sync lag:           2 records       └── New DB: 40%                │
│  Last sync:          2 seconds ago                                  │
│                                      Write traffic:                 │
│                                      └── Both (dual write)          │
│                                                                      │
│  DATA QUALITY                        PERFORMANCE                    │
│  ════════════                        ═══════════                    │
│  Comparison checks:  1,234,567       Legacy p99: 45ms               │
│  Matches:            1,234,550       New DB p99: 38ms               │
│  Mismatches:         17 (0.001%)     Error rate: 0.01%              │
│  Last mismatch:      12 min ago                                     │
│                                                                      │
│  ROLLBACK STATUS                     MIGRATION PHASE                │
│  ═══════════════                     ═══════════════                │
│  Rollback ready:     ✓ Yes          Phase: Traffic Shifting        │
│  Legacy current:     ✓ Yes          Progress: 40% reads to new     │
│  Backfill running:   ✓ Yes          Next milestone: 50% (Day 7)    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Runbook Automation

Codify migration procedures as automated runbooks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class MigrationRunbook {
  async increaseTrafficToNewDatabase(targetPercentage: number) {
    // Pre-flight checks
    await this.validatePreConditions([
      'sync_lag_under_threshold',
      'mismatch_rate_acceptable', 
      'new_db_latency_acceptable',
      'error_rate_acceptable',
    ]);
 
    // Record current state for rollback
    const previousPercentage = await this.getCurrentPercentage();
    await this.recordCheckpoint(previousPercentage);
 
    // Gradual increase (not instant jump)
    const steps = this.calculateSteps(previousPercentage, targetPercentage);
    
    for (const step of steps) {
      await this.setTrafficPercentage(step.percentage);
      await this.wait(step.observationPeriodMs);
      
      const health = await this.checkSystemHealth();
      if (!health.isHealthy) {
        await this.rollbackToPercentage(previousPercentage);
        throw new MigrationError(`Health check failed at ${step.percentage}%`);
      }
    }
 
    await this.notifyTeam(`Traffic increased to ${targetPercentage}%`);
  }
}

Invest in Tooling

The time spent building migration tooling pays dividends. You'll likely decompose multiple services over time, and robust tooling can be reused. Consider migration tooling as platform investment, not one-time project cost.

Summary: Data Migration Strategies

Data migration is challenging but manageable with the right strategies and tooling. The key is incremental, verifiable progress with rollback capability at every step.

Key Takeaways

•Incremental over big bang — Never attempt to migrate all data at once. Progress gradually with verification at each step.
•Parallel run is essential — Run both databases simultaneously, comparing results before any cutover.
•Dual-write ensures synchronization — Write to both databases during migration, with the legacy as source of truth initially.
•CDC handles initial sync at scale — For large datasets, Change Data Capture ensures no changes are missed during bulk copy.
•Traffic shifting should be gradual — Use percentage-based or segment-based routing, increasing slowly with monitoring.
•Rollback must always be possible — At every stage, you must be able to return to the previous state quickly.
•Validation is continuous — Real-time comparison and batch reconciliation catch discrepancies before they become problems.
•Foreign keys become application logic — Replace database-level referential integrity with API-level validation.
•Tooling is investment — Build robust migration infrastructure; you'll use it again for future decompositions.

What's next:

The next page addresses one of the major challenges introduced by Database per Service: Handling Joins Across Services. When you can no longer JOIN tables across services, how do you handle queries that need data from multiple sources? We'll explore API composition, data denormalization, and CQRS patterns.

Page Complete

You now have a thorough understanding of data migration strategies for database decomposition. These patterns—parallel run, dual write, CDC, traffic shifting, and continuous validation—form the practical toolkit for safely moving data to service-specific databases.

3 / 5

Loading learning content...

System Design (HLD)Database Decomposition

Database Decomposition

LevelAdvanced

Duration90 mins

TopicDatabase Decomposition

3 / 5

Data Migration Strategies

Moving Data Without Breaking Everything

You've decided to decompose your shared database. You understand why Database per Service is the right architecture. Now comes the hard part: actually moving the data.

The stakes are high, but the techniques are well-established. With careful planning and the right patterns, you can migrate data safely, incrementally, and with minimal disruption.

What You Will Learn

Migration Philosophy: Incremental Over Big Bang

The single most important principle in data migration is: never do a big-bang migration if you can possibly avoid it.

A big-bang migration attempts to move all data at once, typically during a maintenance window. While conceptually simple, this approach carries extreme risk:

Big Bang Migration Risks

•Extended downtime required
•All-or-nothing success
•Rollback is extremely difficult
•No opportunity to validate gradually
•High pressure, high stress
•Data volume may exceed time window

Incremental Migration Benefits

•Zero or minimal downtime
•Gradual verification at each step
•Easy rollback at any stage
•Issues caught early with small blast radius
•Team can respond calmly to problems
•Handles any data volume

The Incremental Migration Mindset

Incremental migration means:

Start with a copy, not a move — The new database receives data while the old database continues serving traffic
Verify before switching — Thoroughly validate that the new database is correct before any cutover
Switch traffic gradually — Move read traffic first, then writes, potentially percentage by percentage
Maintain rollback capability — At every step, ensure you can go back to the previous state
Decommission only when confident — The old database stays until the new one is proven

The Zero-Downtime Goal

The Parallel Run Pattern

How it works:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Phase 1: Initial Sync
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Shared Database (Primary)                                     │
│   ├── users table                                               │
│   ├── orders table                                              │
│   └── products table                                            │
│         │                                                       │
│         │ Initial data copy (bulk migration)                    │
│         ▼                                                       │
│   User Service Database (Shadow)                                │
│   └── users table (copy of users, read-only initially)         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
 
Phase 2: Dual Write
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   User Service                                                  │
│         │                                                       │
│         ├──────────────────────────────────────────────────────┤
│         │                                                       │
│   Write ├───> Shared Database (Primary source of truth)        │
│         │                                                       │
│         └───> User Service DB (Shadow, verified for correctness)│
│                                                                 │
│   Read  ─────> Shared Database (still primary)                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
 
Phase 3: Shadow Read Comparison
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   User Service                                                  │
│         │                                                       │
│   Read  ├───> Shared Database (returns response to user)        │
│         │                                                       │
│         └───> User Service DB (compared, discrepancies logged) │
│                                                                 │
│   Comparison engine logs any differences                        │
│   Team investigates and resolves before cutover                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
 
Phase 4: Traffic Shift
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   User Service                                                  │
│         │                                                       │
│   Read  ├─[10%]─> User Service DB (new primary)                │
│         │                                                       │
│         └─[90%]─> Shared Database                               │
│                                                                 │
│   Write ───> Both databases (dual write continues)             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
 
Phase 5: Complete Cutover
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   User Service                                                  │
│         │                                                       │
│   Read  ─────> User Service DB (100%)                           │
│   Write ─────> User Service DB (new source of truth)           │
│                                                                 │
│   Shared Database (read-only backup, pending decommission)     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Implementation: Dual Write with Comparison

The critical mechanism is dual-write combined with read comparison:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class UserRepository {
  constructor(
    private legacyDb: LegacyDatabase,
    private newDb: NewDatabase,
    private migrationConfig: MigrationConfig,
  ) {}
 
  async createUser(userData: CreateUserData): Promise<User> {
    // Always write to legacy (source of truth during migration)
    const user = await this.legacyDb.users.create(userData);
    
    // Also write to new database (shadow)
    try {
      await this.newDb.users.create({
        id: user.id,  // Use same ID for correlation
        ...userData,
      });
    } catch (error) {
      // Log but don't fail - shadow write failure is not critical
      this.metrics.increment('migration.shadow_write_failure');
      this.logger.error('Shadow write failed', { error, userId: user.id });
    }
    
    return user;
  }
 
  async getUser(userId: string): Promise<User | null> {
    // Read from legacy (source of truth)
    const legacyUser = await this.legacyDb.users.findById(userId);
    
    // Optionally compare with new database
    if (this.migrationConfig.enableReadComparison) {
      this.compareInBackground(userId, legacyUser);
    }
    
    // Optionally read from new database based on traffic percentage
    if (this.shouldReadFromNew()) {
      return this.newDb.users.findById(userId);
    }
    
    return legacyUser;
  }
 
  private async compareInBackground(userId: string, legacyUser: User | null) {
    // Non-blocking comparison
    setImmediate(async () => {
      try {
        const newUser = await this.newDb.users.findById(userId);
        const discrepancies = this.findDiscrepancies(legacyUser, newUser);
        
        if (discrepancies.length > 0) {
          this.metrics.increment('migration.read_discrepancy');
          this.logger.warn('Data discrepancy detected', {
            userId,
            discrepancies,
          });
        }
      } catch (error) {
        this.logger.error('Comparison failed', { error, userId });
      }
    });
  }
 
  private shouldReadFromNew(): boolean {
    // Gradual traffic shift based on configuration
    const percentage = this.migrationConfig.newDbReadPercentage;
    return Math.random() * 100 < percentage;
  }
}

Comparison Granularity

Initial Data Synchronization

Before parallel running can begin, you need to populate the new database with existing data. This initial synchronization is often the most time-consuming part of the migration.

Approach 1: Direct Database Copy

For smaller datasets, direct tools work well:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# PostgreSQL: pg_dump and pg_restore
pg_dump -h legacy-db -d monolith -t users -t user_profiles \
  | psql -h user-service-db -d userservice
 
# MySQL: mysqldump
mysqldump -h legacy-db monolith users user_profiles \
  | mysql -h user-service-db userservice
 
# AWS DMS for cloud migrations
aws dms create-replication-task \
  --replication-task-identifier user-migration \
  --source-endpoint-arn arn:aws:dms:...:endpoint:source \
  --target-endpoint-arn arn:aws:dms:...:endpoint:target \
  --migration-type full-load-and-cdc \
  --table-mappings file://user-table-mappings.json

Approach 2: ETL Pipeline for Transformations

When the new schema differs from the legacy schema, an ETL pipeline handles transformation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class UserMigrationPipeline {
  async runInitialMigration() {
    const batchSize = 10000;
    let offset = 0;
    let hasMore = true;
 
    while (hasMore) {
      // Extract: Fetch batch from legacy database
      const legacyUsers = await this.legacyDb.query(`
        SELECT id, email, first_name, last_name, phone, 
               street_address, city, state, zip, country,
               created_at, updated_at
        FROM users
        ORDER BY id
        LIMIT $1 OFFSET $2
      `, [batchSize, offset]);
 
      if (legacyUsers.length === 0) {
        hasMore = false;
        continue;
      }
 
      // Transform: Convert to new schema
      const transformedUsers = legacyUsers.map(legacy => ({
        id: legacy.id,
        email: legacy.email.toLowerCase(),  // Normalize email
        name: {
          first: legacy.first_name,
          last: legacy.last_name,
        },
        contact: {
          phone: this.normalizePhone(legacy.phone),
        },
        address: {
          street: legacy.street_address,
          city: legacy.city,
          state: legacy.state,
          postalCode: legacy.zip,
          country: this.normalizeCountry(legacy.country),
        },
        metadata: {
          migratedAt: new Date(),
          legacyId: legacy.id,
        },
        createdAt: legacy.created_at,
        updatedAt: legacy.updated_at,
      }));
 
      // Load: Insert into new database
      await this.newDb.users.bulkInsert(transformedUsers);
 
      // Progress tracking
      offset += batchSize;
      this.logger.info(`Migrated ${offset} users`);
      
      // Throttle to avoid overwhelming databases
      await this.sleep(100);
    }
 
    this.logger.info('Initial migration complete');
  }
 
  private normalizePhone(phone: string): string {
    // Remove non-numeric characters, format consistently
    return phone?.replace(/\D/g, '') || null;
  }
 
  private normalizeCountry(country: string): string {
    // Convert to ISO country code
    return countryCodeMapping[country?.toLowerCase()] || country;
  }
}

Approach 3: Change Data Capture (CDC)

For large datasets where initial copy takes hours or days, CDC ensures you don't fall behind during the copy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Timeline of CDC Migration:
 
T=0: Start CDC capture on legacy database
     All changes are captured and queued
 
T=0 to T+8h: Run initial bulk copy
     - Legacy: 10 million records copied
     - CDC queue: 50,000 changes accumulated
 
T+8h: Apply accumulated CDC changes
     - Process 50,000 queued changes
     - Takes 10 minutes
 
T+8h 10m: Steady state
     - Bulk copy done
     - CDC queue drained
     - Real-time sync begins
 
CDC Workflow:
┌────────────────────────────────────────────────────────────────┐
│                                                                │
│  Legacy DB ──[Change Log/WAL]──> CDC Connector                │
│                                       │                        │
│                                       ▼                        │
│                              Kafka / Event Queue               │
│                                       │                        │
│                                       ▼                        │
│  New DB <────────────────────── CDC Consumer                  │
│                               applies changes                  │
│                                                                │
└────────────────────────────────────────────────────────────────┘

CDC Tools

Handling Writes During Migration

The trickiest part of migration is handling writes while both databases are active. Several strategies exist, each with tradeoffs.

Strategy 1: Legacy-Primary Dual Write

Writes go to legacy first, then replicate to new. Legacy remains the source of truth.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
async createUser(userData: CreateUserData): Promise<User> {
  // Step 1: Write to legacy (synchronous, must succeed)
  const user = await this.legacyDb.users.create(userData);
  
  // Step 2: Replicate to new (async, failure tolerated)
  this.replicateToNew(user).catch(err => {
    this.logger.error('Replication failed', { err, userId: user.id });
    this.queueForRetry(user);  // Will retry later
  });
  
  return user;
}
 
// Periodic job catches up any missed replications
async reconcileDatabases() {
  const legacyUsers = await this.legacyDb.users.findModifiedSince(
    this.lastReconcileTime
  );
  
  for (const user of legacyUsers) {
    await this.replicateToNew(user);
  }
}

Strategy 2: Transaction Outbox Pattern

For reliable replication without distributed transactions, use the outbox pattern:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
async createUser(userData: CreateUserData): Promise<User> {
  // Single transaction ensures atomicity
  return this.legacyDb.transaction(async (tx) => {
    // Create the user
    const user = await tx.users.create(userData);
    
    // Record the change in outbox table (same transaction)
    await tx.outbox.create({
      aggregateType: 'User',
      aggregateId: user.id,
      eventType: 'UserCreated',
      payload: JSON.stringify(user),
      status: 'pending',
      createdAt: new Date(),
    });
    
    return user;
  });
}
 
// Separate process polls outbox and applies to new database
class OutboxProcessor {
  async process() {
    const pendingEvents = await this.legacyDb.outbox.findByStatus('pending');
    
    for (const event of pendingEvents) {
      try {
        await this.applyToNewDatabase(event);
        await this.legacyDb.outbox.updateStatus(event.id, 'processed');
      } catch (error) {
        await this.legacyDb.outbox.updateStatus(event.id, 'failed');
        this.logger.error('Outbox processing failed', { event, error });
      }
    }
  }
}

Strategy 3: New-Primary with Backfill

In the final phase of migration, the new database becomes primary:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class UserRepository {
  constructor(private config: MigrationConfig) {}
 
  async createUser(userData: CreateUserData): Promise<User> {
    if (this.config.newDatabaseIsPrimary) {
      // New database is source of truth
      const user = await this.newDb.users.create(userData);
      
      // Backfill to legacy for rollback safety
      this.backfillToLegacy(user).catch(err => {
        this.logger.warn('Legacy backfill failed', { err, userId: user.id });
        // Not critical - legacy is no longer primary
      });
      
      return user;
    } else {
      // Legacy is still primary (previous code path)
      return this.legacyPrimaryCreate(userData);
    }
  }
}

Write Strategy Comparison
Strategy	Complexity	Consistency	Rollback Safety
Legacy-Primary Dual Write	Low	Eventual (new may lag)	Excellent
Transaction Outbox	Medium	Guaranteed delivery	Excellent
CDC Streaming	High	Near real-time	Excellent
New-Primary with Backfill	Low	New is authoritative	Good (requires backfill)

Never Lose Writes

Traffic Shifting Strategies

Once the new database is synchronized and verified, you can begin shifting traffic. The goal is gradual, controllable, reversible traffic migration.

Percentage-Based Traffic Split

The most common approach: route a percentage of traffic to the new database, increasing over time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class UserRepository {
  async getUser(userId: string): Promise<User | null> {
    const config = await this.featureFlags.get('user_db_migration');
    
    // Route based on percentage
    if (this.shouldUseNewDatabase(config.readPercentage)) {
      try {
        return await this.newDb.users.findById(userId);
      } catch (error) {
        // Fallback to legacy on error
        this.metrics.increment('migration.new_db_fallback');
        return this.legacyDb.users.findById(userId);
      }
    }
    
    return this.legacyDb.users.findById(userId);
  }
 
  private shouldUseNewDatabase(percentage: number): boolean {
    // Deterministic routing based on request ID for consistency
    const requestHash = hash(this.requestContext.requestId);
    return (requestHash % 100) < percentage;
  }
}
 
// Progressive rollout schedule
// Day 1: 1% read traffic to new DB, monitor closely
// Day 2: 5% read traffic if Day 1 successful
// Day 3: 10% read traffic
// Day 5: 25% read traffic
// Day 7: 50% read traffic
// Day 10: 100% read traffic
// Day 14: Switch write traffic to new DB as primary

User/Segment-Based Migration

For more control, migrate specific user segments first:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class UserRepository {
  async getUser(userId: string): Promise<User | null> {
    const user = await this.getRoutingInfo(userId);
    
    // Internal users first
    if (user.isInternalUser) {
      return this.newDb.users.findById(userId);
    }
    
    // Then beta users who opted in
    if (user.isBetaTester) {
      return this.newDb.users.findById(userId);
    }
    
    // Then users by region (easier to support during business hours)
    if (this.isMigratedRegion(user.region)) {
      return this.newDb.users.findById(userId);
    }
    
    // Everyone else stays on legacy until their segment is migrated
    return this.legacyDb.users.findById(userId);
  }
 
  private async isMigratedRegion(region: string): Promise<boolean> {
    const migratedRegions = ['US-WEST', 'EU-WEST'];  // Gradually expand
    return migratedRegions.includes(region);
  }
}

Sticky Routing

Ensure a user's traffic consistently goes to the same database during their session to avoid confusing experiences:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class DatabaseRouter {
  async route(userId: string): Promise<Database> {
    // Use consistent hashing for deterministic routing
    const bucket = consistentHash(userId) % 100;
    const cutoverBucket = await this.featureFlags.get('migration_cutover_bucket');
    
    if (bucket < cutoverBucket) {
      return this.newDb;
    }
    return this.legacyDb;
  }
}
 
// The same user always routes to the same database
// Moving cutoverBucket from 0 to 100 gradually migrates all users
// User with bucket=25 moves when cutoverBucket reaches 26
// Prevents user from flip-flopping between databases

Monitoring During Traffic Shift

Rollback Strategies

A robust migration plan includes detailed rollback procedures at every stage. If anything goes wrong, you must be able to return to a known-good state quickly.

Level 1: Traffic Rollback (Instant)

The fastest rollback: redirect all traffic back to the legacy database.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Emergency rollback - takes effect immediately
await featureFlags.set('user_db_migration', {
  readPercentage: 0,      // All reads go to legacy
  writeToNew: false,      // Stop writing to new DB
  newIsPrimary: false,    // Ensure legacy is source of truth
});
 
// Or via CLI/API call
$ curl -X POST https://api.featureflags.io/flags/user_db_migration/disable
 
// Alert team
await alerting.critical('Database migration rolled back', {
  reason: 'Error rate exceeded threshold',
  rollbackTime: new Date(),
  trafficPercentage: previousPercentage,
});

Level 2: Data Rollback (Careful)

If data in the new database has diverged incorrectly, you may need to correct it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// If new DB has incorrect data, re-sync from legacy
class MigrationRollback {
  async resyncFromLegacy(startTime: Date, endTime: Date) {
    // Find all records modified during the problematic period
    const affectedRecords = await this.legacyDb.users.findModifiedBetween(
      startTime,
      endTime
    );
    
    this.logger.warn(`Rolling back ${affectedRecords.length} records`);
    
    for (const legacyRecord of affectedRecords) {
      // Overwrite new DB with legacy data
      await this.newDb.users.upsert(
        this.transformForNewSchema(legacyRecord)
      );
      
      // Log for audit
      await this.auditLog.record({
        action: 'migration_rollback',
        recordId: legacyRecord.id,
        reason: 'data_correction',
        rolledBackAt: new Date(),
      });
    }
  }
}

Level 3: Post-Cutover Rollback (Complex)

If you've already cut over to the new database as primary and need to roll back, you must sync changes back to legacy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Post-Cutover Rollback Procedure:
 
1. STOP: Halt all writes (brief downtime may be necessary)
 
2. SYNC: Apply all changes from new DB back to legacy
   - Query new DB for all records modified since cutover
   - Transform to legacy schema
   - Apply to legacy DB
   
3. VERIFY: Ensure legacy has all data
   - Run reconciliation queries
   - Compare record counts
   - Validate critical business data
 
4. SWITCH: Redirect traffic to legacy
   - Update feature flags
   - Verify traffic is flowing to legacy
   - Monitor closely
 
5. RESUME: Allow writes
   - Legacy is now primary again
   - New DB becomes shadow (or is paused)
 
Timeline: 15-60 minutes depending on data volume and verification needs

Rollback Window

Data Validation and Reconciliation

Continuous validation ensures the migration is proceeding correctly. Never assume data made it—verify.

Real-Time Comparison

Compare results from both databases in real-time:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
class DataComparator {
  async compareRead(userId: string): Promise<void> {
    const [legacyResult, newResult] = await Promise.all([
      this.legacyDb.users.findById(userId),
      this.newDb.users.findById(userId),
    ]);
 
    const comparison = this.compare(legacyResult, newResult);
    
    if (!comparison.isEqual) {
      this.metrics.increment('migration.comparison.mismatch');
      
      await this.discrepancyLog.record({
        entity: 'User',
        entityId: userId,
        differences: comparison.differences,
        legacyValue: this.redact(legacyResult),
        newValue: this.redact(newResult),
        timestamp: new Date(),
      });
      
      // Alert if mismatch rate exceeds threshold
      if (await this.mismatchRateExceedsThreshold()) {
        await this.alertMigrationTeam('Mismatch rate too high');
      }
    } else {
      this.metrics.increment('migration.comparison.match');
    }
  }
 
  private compare(legacy: any, newRecord: any): ComparisonResult {
    const differences: Difference[] = [];
    
    // Compare each field, accounting for schema transformations
    for (const field of this.comparisonFields) {
      const legacyValue = this.extractField(legacy, field.legacyPath);
      const newValue = this.extractField(newRecord, field.newPath);
      
      if (!field.comparator(legacyValue, newValue)) {
        differences.push({
          field: field.name,
          legacyValue,
          newValue,
        });
      }
    }
    
    return {
      isEqual: differences.length === 0,
      differences,
    };
  }
}

Batch Reconciliation Jobs

Periodic full reconciliation catches any discrepancies missed by real-time comparison:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class ReconciliationJob {
  async runFullReconciliation(): Promise<ReconciliationReport> {
    const report: ReconciliationReport = {
      startTime: new Date(),
      totalLegacyRecords: 0,
      totalNewRecords: 0,
      missingInNew: [],
      missingInLegacy: [],
      mismatches: [],
    };
 
    // Get all IDs from both databases
    const legacyIds = new Set(await this.legacyDb.users.getAllIds());
    const newIds = new Set(await this.newDb.users.getAllIds());
 
    report.totalLegacyRecords = legacyIds.size;
    report.totalNewRecords = newIds.size;
 
    // Find records missing in new database
    for (const id of legacyIds) {
      if (!newIds.has(id)) {
        report.missingInNew.push(id);
      }
    }
 
    // Find records in new but not in legacy (shouldn't happen normally)
    for (const id of newIds) {
      if (!legacyIds.has(id)) {
        report.missingInLegacy.push(id);
      }
    }
 
    // Compare content of matching records (sample for large datasets)
    const sampleIds = this.sample(Array.from(legacyIds), 10000);
    for (const id of sampleIds) {
      const [legacy, newRecord] = await Promise.all([
        this.legacyDb.users.findById(id),
        this.newDb.users.findById(id),
      ]);
      
      if (!this.recordsMatch(legacy, newRecord)) {
        report.mismatches.push({
          id,
          differences: this.findDifferences(legacy, newRecord),
        });
      }
    }
 
    report.endTime = new Date();
    return report;
  }
}

Validation Checkpoints
Checkpoint	What to Verify	Action if Failed
After initial sync	Record counts match; sample data matches	Re-run sync; investigate gaps
During dual-write	Writes appear in both DBs within SLA	Check replication; fix and resync
Before traffic shift	Full reconciliation passes	Do not proceed until resolved
During traffic shift	Error rates stable; latency acceptable	Roll back traffic percentage
After cutover	Business metrics normal	Keep legacy hot; prepare rollback

Automated Validation Gates

Handling Foreign Key Dependencies

One of the trickiest aspects of database decomposition is handling foreign key relationships that span what will become separate databases.

The Problem: Cross-Service Foreign Keys

In a shared database, foreign keys enforce referential integrity:

1
2
3
4
5
6
7
8
9
10
-- Current state: Foreign keys across domains
CREATE TABLE orders (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES users(id),  -- Will be in different DB
  product_id UUID NOT NULL REFERENCES products(id),  -- Will be in different DB
  created_at TIMESTAMP
);
 
-- Problem: When orders moves to its own database,
-- these foreign keys cannot exist across databases

Solution: Application-Level Referential Integrity

Replace database foreign keys with application-level validation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class OrderService {
  async createOrder(orderData: CreateOrderInput): Promise<Order> {
    // Validate references exist before proceeding
    await this.validateReferences(orderData);
    
    // Store only the ID reference, not a foreign key
    const order = await this.orderDb.orders.create({
      id: generateId(),
      userId: orderData.userId,      // Just an ID, no FK constraint
      productIds: orderData.items.map(i => i.productId),  // Just IDs
      ...orderData,
    });
    
    return order;
  }
 
  private async validateReferences(orderData: CreateOrderInput): Promise<void> {
    // Check user exists via User Service API
    const user = await this.userServiceClient.getUser(orderData.userId);
    if (!user) {
      throw new ValidationError(`User ${orderData.userId} not found`);
    }
    if (!user.canPlaceOrders) {
      throw new ValidationError(`User ${orderData.userId} cannot place orders`);
    }
 
    // Check products exist via Product Service API
    const productIds = orderData.items.map(i => i.productId);
    const products = await this.productServiceClient.getProducts(productIds);
    
    const foundIds = new Set(products.map(p => p.id));
    const missingIds = productIds.filter(id => !foundIds.has(id));
    
    if (missingIds.length > 0) {
      throw new ValidationError(`Products not found: ${missingIds.join(', ')}`);
    }
  }
}

Migration Sequence for FK Dependencies

The order of migration matters when foreign keys are involved:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Dependency graph:
orders -> users (orders.user_id references users.id)
orders -> products (orders.product_id references products.id)
 
Migration sequence:
 
Step 1: Add application-level validation (while FKs still exist)
        - Order Service validates via API before creating orders
        - Both validation paths active: app + DB FK
 
Step 2: Drop foreign key constraints
        ALTER TABLE orders DROP CONSTRAINT orders_user_id_fkey;
        ALTER TABLE orders DROP CONSTRAINT orders_product_id_fkey;
        - Application validation is now the only enforcement
        - Test thoroughly
 
Step 3: Migrate tables to separate databases
        - users -> User Service DB
        - products -> Product Service DB
        - orders -> Order Service DB
 
Step 4: Handle orphaned references (cleanup)
        - Find any orders referencing non-existent users
        - Decide: soft-delete, archive, or flag for review

Orphaned References

Migration Tooling and Automation

Successful migrations require robust tooling. Manual processes at scale are error-prone and exhausting for engineers.

Essential Migration Tooling:

Migration Tool Categories

•CDC Systems — Debezium, AWS DMS, Google Datastream, Azure Data Factory for real-time data sync
•Feature Flags — LaunchDarkly, Split, Unleash for traffic routing control and instant rollback
•Comparison/Reconciliation — Custom tooling or libraries like Diffy for data comparison
•Monitoring & Alerting — Datadog, New Relic, Prometheus/Grafana for migration-specific dashboards
•Schema Migration — Flyway, Liquibase, Prisma Migrate for schema versioning in both databases
•Orchestration — Airflow, Temporal, Step Functions for coordinating migration jobs

Migration Dashboard

A dedicated migration dashboard provides visibility into progress:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
┌─────────────────────────────────────────────────────────────────────┐
│                    USER SERVICE MIGRATION DASHBOARD                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  SYNC STATUS                         TRAFFIC ROUTING                │
│  ════════════                        ═══════════════                │
│  Legacy records:     10,234,567      Read traffic:                  │
│  New DB records:     10,234,565      ├── Legacy: 60%                │
│  Sync lag:           2 records       └── New DB: 40%                │
│  Last sync:          2 seconds ago                                  │
│                                      Write traffic:                 │
│                                      └── Both (dual write)          │
│                                                                      │
│  DATA QUALITY                        PERFORMANCE                    │
│  ════════════                        ═══════════                    │
│  Comparison checks:  1,234,567       Legacy p99: 45ms               │
│  Matches:            1,234,550       New DB p99: 38ms               │
│  Mismatches:         17 (0.001%)     Error rate: 0.01%              │
│  Last mismatch:      12 min ago                                     │
│                                                                      │
│  ROLLBACK STATUS                     MIGRATION PHASE                │
│  ═══════════════                     ═══════════════                │
│  Rollback ready:     ✓ Yes          Phase: Traffic Shifting        │
│  Legacy current:     ✓ Yes          Progress: 40% reads to new     │
│  Backfill running:   ✓ Yes          Next milestone: 50% (Day 7)    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Runbook Automation

Codify migration procedures as automated runbooks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class MigrationRunbook {
  async increaseTrafficToNewDatabase(targetPercentage: number) {
    // Pre-flight checks
    await this.validatePreConditions([
      'sync_lag_under_threshold',
      'mismatch_rate_acceptable', 
      'new_db_latency_acceptable',
      'error_rate_acceptable',
    ]);
 
    // Record current state for rollback
    const previousPercentage = await this.getCurrentPercentage();
    await this.recordCheckpoint(previousPercentage);
 
    // Gradual increase (not instant jump)
    const steps = this.calculateSteps(previousPercentage, targetPercentage);
    
    for (const step of steps) {
      await this.setTrafficPercentage(step.percentage);
      await this.wait(step.observationPeriodMs);
      
      const health = await this.checkSystemHealth();
      if (!health.isHealthy) {
        await this.rollbackToPercentage(previousPercentage);
        throw new MigrationError(`Health check failed at ${step.percentage}%`);
      }
    }
 
    await this.notifyTeam(`Traffic increased to ${targetPercentage}%`);
  }
}

Invest in Tooling

Summary: Data Migration Strategies

Data migration is challenging but manageable with the right strategies and tooling. The key is incremental, verifiable progress with rollback capability at every step.

Key Takeaways

•Incremental over big bang — Never attempt to migrate all data at once. Progress gradually with verification at each step.
•Parallel run is essential — Run both databases simultaneously, comparing results before any cutover.
•Dual-write ensures synchronization — Write to both databases during migration, with the legacy as source of truth initially.
•CDC handles initial sync at scale — For large datasets, Change Data Capture ensures no changes are missed during bulk copy.
•Traffic shifting should be gradual — Use percentage-based or segment-based routing, increasing slowly with monitoring.
•Rollback must always be possible — At every stage, you must be able to return to the previous state quickly.
•Validation is continuous — Real-time comparison and batch reconciliation catch discrepancies before they become problems.
•Foreign keys become application logic — Replace database-level referential integrity with API-level validation.
•Tooling is investment — Build robust migration infrastructure; you'll use it again for future decompositions.

What's next:

Page Complete

3 / 5