System Design (HLD)Data Modeling Fundamentals

Data Modeling Fundamentals

LevelIntermediate

Duration90 mins

TopicData Modeling Fundamentals

4 / 5

Schema Evolution

The Only Constant Is Change

No database schema survives first contact with users unchanged. Requirements evolve. Features are added. Data models that seemed perfect reveal limitations. The question isn't whether your schema will change—it's how you'll manage that change safely.

Schema evolution is the discipline of modifying database structures in production systems without breaking applications, losing data, or causing extended downtime. In systems serving millions of users, a poorly executed migration can mean hours of outage, corrupted data, or both.

This is one of the most operationally critical skills in database engineering. The techniques and patterns in this page will determine whether your schema changes are seamless events or midnight fire drills.

What You Will Learn

By the end of this page, you will understand migration strategies (expand-contract pattern, online schema changes), backward/forward compatibility, zero-downtime deployment techniques, data backfill strategies, and how to handle schema changes across different database systems.

Why Schema Evolution Is Hard

Changing a schema in development is trivial. Drop the database, recreate it with the new structure, reload test data. But production systems have constraints that make the same operation catastrophically complex:

The Core Challenges:

Production Schema Change Challenges

•Live Traffic: Applications are actively reading and writing. Schema changes must not break running queries.
•Data Volume: Tables have millions or billions of rows. Operations that take milliseconds on test data can take hours in production.
•Locking: Many DDL operations lock tables, blocking reads and writes. Even brief locks cause cascading failures at scale.
•Rolling Deployments: Application servers are updated gradually. Old and new code must coexist during deployment windows.
•Rollback: If something goes wrong, you need to revert without data loss. Adding a column is easier than removing one.
•Coordination: Schema, application code, and deployment must be synchronized across multiple services.

The Timing Window Problem:

Consider adding a new required column to a table:

If you add the column as NOT NULL, all inserts will fail until application code is updated
If you update application code first, it will try to insert into a column that doesn't exist
If you deploy simultaneously, you need instant, atomic deployment of both—often impossible

This timing mismatch between schema changes and application deployment is the source of almost all migration failures.

The 3 AM Lesson

Every engineer who has been paged at 3 AM for a migration-related outage learns the same lesson: schema evolution is not a technical problem—it's an operational problem. The SQL is usually trivial. The coordination, timing, and rollback planning are where things go wrong.

The Expand-Contract Pattern

The expand-contract pattern (also called parallel change) is the foundational technique for safe schema evolution. It splits every breaking change into three non-breaking phases:

Phase 1: Expand — Add new structures alongside existing ones. Both old and new versions work.

Phase 2: Migrate — Transition all usage from old to new. Backfill data if needed.

Phase 3: Contract — Remove old structures once nothing uses them.

This pattern ensures backward compatibility at every step. If something goes wrong, you can halt without rollback—the old structure still works.

expand-contract-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- GOAL: Rename column 'fullname' to separate 'first_name' and 'last_name'
 
-- ============================================
-- PHASE 1: EXPAND (Add new columns, keep old)
-- ============================================
ALTER TABLE users ADD COLUMN first_name VARCHAR(255);
ALTER TABLE users ADD COLUMN last_name VARCHAR(255);
 
-- Application code v1: Writes to both old and new columns
-- INSERT INTO users (fullname, first_name, last_name, ...) 
--   VALUES ('John Doe', 'John', 'Doe', ...);
 
-- Reads still use 'fullname' (backward compatible)
-- SELECT fullname FROM users WHERE ...
 
-- ============================================
-- PHASE 2: MIGRATE (Backfill and transition)
-- ============================================
-- Backfill existing data (do in batches for large tables)
UPDATE users 
SET first_name = split_part(fullname, ' ', 1),
    last_name = split_part(fullname, ' ', 2)
WHERE first_name IS NULL;
 
-- Once backfill complete, add constraints
ALTER TABLE users ALTER COLUMN first_name SET NOT NULL;
ALTER TABLE users ALTER COLUMN last_name SET NOT NULL;
 
-- Application code v2: Reads from new columns
-- SELECT first_name, last_name FROM users WHERE ...
-- (still writes to 'fullname' for rollback safety)
 
-- ============================================
-- PHASE 3: CONTRACT (Remove old structure)
-- ============================================
-- Only after all code versions use new columns
-- Application code v3: Stops writing to 'fullname'
 
-- Wait for all v2 instances to be replaced by v3
-- Then drop the old column
ALTER TABLE users DROP COLUMN fullname;

Why This Works:

Phase 1 is backward compatible: old code ignores new columns
Phase 2 is backward compatible: both column sets contain valid data
Phase 3 only occurs after all code is updated

At every step, you can pause, investigate, or roll back application code without data issues. The database change is decoupled from the deployment timing.

Expand-Contract Pattern Phases
Phase	Database State	Application State	Risk Level
Expand	Old + new structures exist	Writes to both, reads from old	Low
Migrate	Old + new, data synced	Reads from new, writes to both	Medium
Contract	Only new structure remains	Uses only new structure	Low (if validated)

Patience Is a Virtue

The expand-contract pattern is slower than a direct rename. It requires multiple deployments and migrations. But that slowness is the safety margin. A week of careful incremental changes beats a night of crisis management.

Common Migration Patterns

Different schema changes require different approaches. Here's how to apply expand-contract (or simpler patterns) to common scenarios:

Pattern 1: Adding a Column

The simplest case. Generally safe without expand-contract if the column is nullable or has a default.

add-column.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- SAFE: Add nullable column
ALTER TABLE users ADD COLUMN nickname VARCHAR(255);
-- Existing code ignores it; new code can use it
 
-- SAFE: Add column with default (PostgreSQL 11+ writes default lazily)
ALTER TABLE orders ADD COLUMN priority VARCHAR(20) DEFAULT 'normal';
-- PostgreSQL doesn't rewrite table; just records default metadata
 
-- DANGEROUS: Add NOT NULL without default
ALTER TABLE users ADD COLUMN required_field VARCHAR(255) NOT NULL;
-- This FAILS if table has existing rows
 
-- SAFE APPROACH for required column:
-- Step 1: Add nullable
ALTER TABLE users ADD COLUMN verified BOOLEAN;
-- Step 2: Backfill
UPDATE users SET verified = false WHERE verified IS NULL;
-- Step 3: Add constraint
ALTER TABLE users ALTER COLUMN verified SET NOT NULL;
ALTER TABLE users ALTER COLUMN verified SET DEFAULT false;

Pattern 2: Removing a Column

Never drop columns directly. Code might still reference them, causing runtime errors.

Remove all code references to the column
Deploy and verify no queries use it
Wait a safe period (days to weeks)
Drop the column

remove-column.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- DANGEROUS: Immediate drop
ALTER TABLE users DROP COLUMN legacy_field;
-- Any code still referencing this column will crash
 
-- SAFE APPROACH:
-- Week 1: Stop writing to the column (code change)
-- Week 2: Stop reading from the column (code change)
-- Week 3: Verify no active queries reference it
SELECT * FROM pg_stat_statements 
WHERE query ILIKE '%legacy_field%';
 
-- Week 4: Drop column
ALTER TABLE users DROP COLUMN legacy_field;
 
-- SAFER: Use soft deprecation first
-- Rename column to signal deprecation
ALTER TABLE users RENAME COLUMN legacy_field TO _deprecated_legacy_field;
-- Code referencing 'legacy_field' fails loudly
-- After fixing all references, drop the renamed column

Pattern 3: Renaming a Column

Renaming is actually drop + add. Direct rename causes errors for running queries.

rename-column.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- DANGEROUS: Direct rename
ALTER TABLE users RENAME COLUMN user_name TO username;
-- All running queries using 'user_name' will fail
 
-- SAFE APPROACH (expand-contract):
-- Phase 1: Add new column, write to both
ALTER TABLE users ADD COLUMN username VARCHAR(255);
UPDATE users SET username = user_name WHERE username IS NULL;
-- Code writes: INSERT INTO users (user_name, username, ...) VALUES (?, ?, ...)
-- Code reads: SELECT user_name FROM users... (unchanged)
 
-- Phase 2: Switch reads to new column, verify
-- Code reads: SELECT username FROM users...
-- Code writes: Still writes to both
 
-- Phase 3: Stop writing to old, then drop
-- Code writes: INSERT INTO users (username, ...) VALUES (?, ...)
-- After all old code instances are gone:
ALTER TABLE users DROP COLUMN user_name;

Pattern 4: Changing Column Type

Type changes can be dangerous. Some are fast (widening VARCHAR), some rewrite the entire table (changing to incompatible type).

change-type.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- SAFE: Widening VARCHAR (PostgreSQL)
ALTER TABLE products ALTER COLUMN sku TYPE VARCHAR(200);
-- No table rewrite, just metadata change (if new length > old)
 
-- POTENTIALLY SLOW: Narrowing VARCHAR
ALTER TABLE products ALTER COLUMN sku TYPE VARCHAR(50);
-- Checks every row to ensure data fits; blocks writes
 
-- SLOW: Changing type entirely (e.g., string to integer)
-- This rewrites entire table and blocks all access
ALTER TABLE events ALTER COLUMN event_id TYPE BIGINT USING event_id::BIGINT;
 
-- SAFE APPROACH for type change (expand-contract):
-- Step 1: Add new column with target type
ALTER TABLE events ADD COLUMN new_event_id BIGINT;
 
-- Step 2: Backfill (in batches for large tables)
UPDATE events SET new_event_id = event_id::BIGINT 
WHERE new_event_id IS NULL AND event_id <= 100000;
-- Repeat for batches...
 
-- Step 3: Update application to use new column
-- Step 4: Drop old column, rename new
ALTER TABLE events DROP COLUMN event_id;
ALTER TABLE events RENAME COLUMN new_event_id TO event_id;

Know Your Database's Behavior

Different databases handle DDL differently. PostgreSQL's ALTER TABLE behaviors changed significantly in version 11 (instant defaults). MySQL 8.0+ has instant DDL for many operations. Always test migrations against your specific database version and table size.

Online Schema Changes

For very large tables, even 'fast' operations can be problematic. Adding an index on a 100-million row table might lock writes for minutes. Online schema change tools solve this by performing changes in the background without blocking operations.

How Online Schema Change Tools Work:

Create a ghost/shadow copy of the table with the new schema
Copy data from the original table to the ghost table in batches
Capture and replay changes made to the original table during copy
Atomically swap the ghost table in place of the original
Drop the original table

Online Schema Change Tools
Tool	Database	Key Features
gh-ost (GitHub)	MySQL	Triggerless, built-in throttling, pause/resume
pt-online-schema-change	MySQL	Trigger-based, mature, Percona maintained
pg_repack	PostgreSQL	Repack tables without locks, removes bloat
Online DDL (native)	PostgreSQL 11+/MySQL 8+	Some operations are instant or lock-free
pgroll (Xata)	PostgreSQL	Multi-version DDL for expand-contract

online-schema-change.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# gh-ost example: Add column to large MySQL table
gh-ost \
  --host=prod-db-1 \
  --database=ecommerce \
  --table=orders \
  --alter="ADD COLUMN priority VARCHAR(20) DEFAULT 'normal'" \
  --execute \
  --chunk-size=1000 \
  --max-lag-millis=1500 \
  --throttle-additional-flag-file=/tmp/gh-ost-throttle
 
# Options explained:
# --chunk-size: Number of rows to copy per batch
# --max-lag-millis: Pause if replica lag exceeds this
# --throttle-additional-flag-file: Touch file to pause operation
 
# pg_repack example: Rebuild table and indexes without lock
pg_repack \
  --host=prod-db-1 \
  --dbname=ecommerce \
  --table=orders \
  --wait-timeout=30
 
# PostgreSQL native: Many operations are lock-free in modern versions
-- Adding column with default (PostgreSQL 11+)
ALTER TABLE orders ADD COLUMN priority VARCHAR(20) DEFAULT 'normal';
-- Instant operation, no table rewrite
 
-- Creating index concurrently (any PostgreSQL version)
CREATE INDEX CONCURRENTLY idx_orders_priority ON orders(priority);
-- Builds index without blocking writes (takes longer, but safe)

Prefer Native Solutions When Possible

Modern PostgreSQL (11+) and MySQL (8.0+) have excellent native online DDL support for common operations. Check if your specific operation is supported as an instant or online DDL before reaching for external tools. External tools add operational complexity and failure modes.

Backfilling Data at Scale

When you add a new column that needs to be populated from existing data, you face the backfill problem. A simple UPDATE ... SET on a billion-row table will:

Lock rows for extended periods
Generate massive WAL/binlog
Spike disk I/O and potentially cause replica lag
Risk statement timeouts or connection limits

Batch Processing:

The solution is always the same: break the update into small batches with pauses between them.

batch-backfill.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- DANGEROUS: Single UPDATE on large table
UPDATE orders SET normalized_email = LOWER(email);
-- On 100M rows: Runs for hours, locks everything, may timeout
 
-- SAFE: Batched backfill with PostgreSQL
DO $$
DECLARE
    batch_size INTEGER := 10000;
    affected_rows INTEGER;
BEGIN
    LOOP
        UPDATE orders
        SET normalized_email = LOWER(email)
        WHERE order_id IN (
            SELECT order_id FROM orders
            WHERE normalized_email IS NULL
            LIMIT batch_size
        );
        
        GET DIAGNOSTICS affected_rows = ROW_COUNT;
        
        -- Exit when no more rows to update
        EXIT WHEN affected_rows = 0;
        
        -- Commit this batch (important!)
        COMMIT;
        
        -- Pause to reduce load
        PERFORM pg_sleep(0.5);
    END LOOP;
END $$;

Application-Level Backfill:

For more control, run backfills from application code rather than SQL. This allows:

Monitoring and alerting integration
Dynamic throttling based on system load
Restartability and checkpoint recording
Complex transformations with application logic

app-backfill.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
// Application-level backfill with monitoring and throttling
interface BackfillConfig {
  batchSize: number;
  delayBetweenBatches: number;  // milliseconds
  checkpointInterval: number;   // Save progress every N batches
}
 
async function backfillNormalizedEmail(config: BackfillConfig) {
  const { batchSize, delayBetweenBatches, checkpointInterval } = config;
  
  // Resume from last checkpoint if exists
  let lastProcessedId = await getCheckpoint('email-normalization') || '';
  let totalProcessed = 0;
  let batchCount = 0;
  
  console.log(`Starting backfill from ID: ${lastProcessedId || 'beginning'}`);
  
  while (true) {
    const startTime = Date.now();
    
    // Fetch batch ordered by primary key
    const rows = await db.query(`
      SELECT order_id, email
      FROM orders
      WHERE order_id > $1 AND normalized_email IS NULL
      ORDER BY order_id
      LIMIT $2
    `, [lastProcessedId, batchSize]);
    
    if (rows.length === 0) {
      console.log('Backfill complete!');
      break;
    }
    
    // Process batch
    const updates = rows.map(row => ({
      order_id: row.order_id,
      normalized_email: row.email.toLowerCase().trim(),
    }));
    
    await db.batchUpdate('orders', updates, 'order_id');
    
    // Track progress
    lastProcessedId = rows[rows.length - 1].order_id;
    totalProcessed += rows.length;
    batchCount++;
    
    // Log progress
    const duration = Date.now() - startTime;
    console.log(`Batch ${batchCount}: Processed ${rows.length} rows in ${duration}ms`);
    metrics.gauge('backfill.total_processed', totalProcessed);
    
    // Save checkpoint periodically
    if (batchCount % checkpointInterval === 0) {
      await saveCheckpoint('email-normalization', lastProcessedId);
      console.log(`Checkpoint saved at ID: ${lastProcessedId}`);
    }
    
    // Throttle based on database load
    const replicaLag = await getReplicaLag();
    if (replicaLag > 5000) {
      console.log(`High replica lag (${replicaLag}ms), pausing...`);
      await sleep(10000);
    } else {
      await sleep(delayBetweenBatches);
    }
  }
  
  await clearCheckpoint('email-normalization');
}
 
// Usage
await backfillNormalizedEmail({
  batchSize: 5000,
  delayBetweenBatches: 200,
  checkpointInterval: 100,
});

Monitor Everything During Backfills

Backfills are extended operations that can impact production. Monitor database metrics (CPU, I/O, connections, replica lag), application latency, and error rates. Be prepared to pause or slow down if metrics degrade. Backfills that run 'invisibly' are the goal.

Backward and Forward Compatibility

During rolling deployments, old and new code versions run simultaneously. Your schema must be compatible with both. This requires understanding backward compatibility and forward compatibility.

Backward Compatibility: New schema works with old code

Old code can still read and write to the database
Required for the period before all instances are updated

Forward Compatibility: Old schema works with new code

New code can work with data before the migration ran
Required for rollback scenarios

Backward Compatible Changes

•Adding a nullable column
•Adding a column with a default
•Adding a new table
•Adding an index
•Widening a VARCHAR column
•Removing a NOT NULL constraint

Breaking Changes

•Dropping a column
•Renaming a column
•Changing column type
•Adding NOT NULL constraint
•Dropping a table
•Changing primary key

The Two-Version Rule:

Every schema version must be compatible with:

The application version that was running before the migration
The application version that will run after the migration

This means you cannot deploy schema changes and code changes atomically. The migration must go first (or last), with the other version being compatible.

compatibility-deployment.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// DEPLOYMENT SEQUENCE for adding a required column
 
// Initial state:
// - Schema v1: users(id, email, name)
// - App v1: Uses users.email, users.name
 
// Step 1: Deploy schema v2 (add nullable column)
// ALTER TABLE users ADD COLUMN phone VARCHAR(50);
// Schema v2 is backward compatible with App v1
// App v1 still works (ignores phone column)
 
// Step 2: Deploy App v2 (write to new column)
// App v2 INSERT: (id, email, name, phone)
// App v2 SELECT: Works with phone=NULL for old rows
// App v2 must handle phone=NULL until backfill complete
 
// Step 3: Backfill existing data
// UPDATE users SET phone = 'unknown' WHERE phone IS NULL;
 
// Step 4: Add NOT NULL constraint
// ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
// Now phone is required at database level
 
// Step 5: (Optional) Deploy App v3
// App v3 assumes phone is always present
 
// At every step:
// - App v1 works with Schema v1 and v2
// - App v2 works with Schema v2
// - App v3 works with Schema v2 (after constraint)
 
interface User_v1 {
  id: string;
  email: string;
  name: string;
}
 
interface User_v2 {
  id: string;
  email: string;
  name: string;
  phone: string | null;  // Nullable during transition
}
 
interface User_v3 {
  id: string;
  email: string;
  name: string;
  phone: string;  // Always present after backfill + constraint
}

Make Schema Migrations Reversible

For every migration, ask: 'Can I roll back the application without rolling back this migration?' If not, you've coupled them dangerously. The expand-contract pattern ensures migrations are independently reversible.

Migration Best Practices

Beyond the patterns and techniques, successful schema evolution requires disciplined practices.

Migration Best Practices

•Version Control Migrations: Store migrations in your code repository. Tools like Flyway, Liquibase, Prisma Migrate, or ActiveRecord Migrations provide deterministic, sequential migration application.
•Test Against Production-Scale Data: Migrations that take 100ms on 1000 rows might take 10 hours on 10 million rows. Test with production volume (backup/clone if needed).
•Run Migrations Outside Deploy Pipeline: Separate 'migrate' from 'deploy'. Migrations should run in dedicated windows with monitoring, not as a step in an automated deployment pipeline.
•Have a Rollback Plan: Before running any migration, document exactly how to undo it. Some migrations (adding columns) are easy to reverse; others (splitting tables) require prepared reverse migrations.
•Limit Migration Size: One logical change per migration. A migration that 'adds feature X, renames Y, and drops Z' is impossible to partially roll back.
•Communicate Broadly: Schema changes affect multiple teams (frontend, analytics, data science). Announce migrations in advance, especially for column removals or renames.
•Monitor After Migration: Watch database metrics and application errors for minutes to hours after migration. Problems may be delayed (e.g., when cache expires and new queries hit the changed schema).

migration-checklist.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Migration Checklist
 
## Before Migration
- [ ] Migration tested on production-sized dataset
- [ ] Backward compatibility verified (old app works with new schema)
- [ ] Rollback procedure documented and tested
- [ ] Affected teams notified
- [ ] Migration window scheduled during low-traffic period
- [ ] Monitoring dashboards prepared
 
## During Migration
- [ ] Database metrics normal (CPU, I/O, connections)
- [ ] Replica lag within acceptable limits
- [ ] Application error rates stable
- [ ] Migration progress logged
 
## After Migration
- [ ] Schema verified (columns exist, constraints applied)
- [ ] Application functionality tested
- [ ] Performance metrics stable for 15+ minutes
- [ ] Rollback materials retained for 24 hours
 
## Post-Migration Cleanup (days/weeks later)
- [ ] Old columns dropped (if expand-contract)
- [ ] Unused indexes removed
- [ ] Documentation updated
- [ ] Analytics/reporting teams notified of changes

Small, Frequent Migrations

Teams that fear migrations tend to batch them into large, risky releases. Teams that practice continuous migration deploy small changes frequently with minimal ceremony. The latter approach is safer and develops organizational muscle memory for smooth schema evolution.

NoSQL Schema Evolution

NoSQL databases handle schema evolution differently. The 'schemaless' nature provides flexibility but shifts complexity to the application layer.

MongoDB: Flexible but Discipline Required

MongoDB documents can have different structures in the same collection. This allows gradual migrations but requires careful application-level handling of multiple document versions.

mongodb-evolution.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// MongoDB: Handling multiple document versions
 
interface UserV1 {
  _id: ObjectId;
  name: string;  // Full name in one field
  email: string;
}
 
interface UserV2 {
  _id: ObjectId;
  firstName: string;  // Split names
  lastName: string;
  email: string;
  schemaVersion: 2;
}
 
type User = UserV1 | UserV2;
 
// Application code handles both versions
function getDisplayName(user: User): string {
  if ('schemaVersion' in user && user.schemaVersion === 2) {
    return `${user.firstName} ${user.lastName}`;
  }
  // Legacy v1 document
  return (user as UserV1).name;
}
 
// Migration can be lazy (update on read) or eager (background job)
async function migrateUserV1ToV2(user: UserV1): Promise<UserV2> {
  const [firstName, ...lastParts] = user.name.split(' ');
  const lastName = lastParts.join(' ') || 'Unknown';
  
  const updatedUser: UserV2 = {
    _id: user._id,
    firstName,
    lastName,
    email: user.email,
    schemaVersion: 2,
  };
  
  await collection.updateOne(
    { _id: user._id },
    { $set: updatedUser }
  );
  
  return updatedUser;
}
 
// Lazy migration: Update document when read
async function getUser(userId: ObjectId): Promise<UserV2> {
  const user = await collection.findOne({ _id: userId });
  if (!user) throw new NotFoundError();
  
  if (!('schemaVersion' in user)) {
    // Migrate on read
    return await migrateUserV1ToV2(user as UserV1);
  }
  
  return user as UserV2;
}

DynamoDB: Limited Evolution Options

DynamoDB's schema is effectively defined by its key structure. Changing partition or sort keys requires creating new tables and migrating data. Adding attributes is trivial; changing key structure is painful.

dynamodb-evolution.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// DynamoDB: Adding GSI for new access pattern
 
// Original table: Users with PK = USER#<userId>
// New requirement: Look up users by email
 
// Step 1: Add email attribute to items (if not present)
// New items already have email; backfill old items
 
// Step 2: Create Global Secondary Index
const createGSIParams = {
  TableName: 'Users',
  AttributeDefinitions: [
    { AttributeName: 'email', AttributeType: 'S' },
  ],
  GlobalSecondaryIndexUpdates: [
    {
      Create: {
        IndexName: 'email-index',
        KeySchema: [
          { AttributeName: 'email', KeyType: 'HASH' },
        ],
        Projection: { ProjectionType: 'ALL' },
        ProvisionedThroughput: {
          ReadCapacityUnits: 10,
          WriteCapacityUnits: 5,
        },
      },
    },
  ],
};
 
// GSI creation is online (table remains usable)
// But GSI won't include items without 'email' attribute
// Backfill must add email to old items for them to appear in GSI
 
// Step 3: Use new access pattern once GSI is active
async function getUserByEmail(email: string) {
  return await dynamodb.query({
    TableName: 'Users',
    IndexName: 'email-index',
    KeyConditionExpression: 'email = :email',
    ExpressionAttributeValues: { ':email': email },
  });
}

Schemaless Doesn't Mean Schema-Free

NoSQL databases aren't really schemaless—they just move the schema to your application code. You still need versioning, migration strategies, and compatibility planning. The database just won't enforce anything for you.

Summary: Schema Evolution Mastery

Schema evolution is an operational discipline as much as a technical one. The patterns are straightforward; the challenge is applying them consistently under pressure.

Key Takeaways

•Expand-contract is the master pattern: Add new → migrate → remove old. Never make breaking changes directly.
•Backward compatibility enables safe deployment: New schema must work with old code during rolling updates.
•Common operations have recipes: Adding columns, removing columns, renaming, type changes—each has a safe procedure.
•Online schema change tools exist: For large tables, use gh-ost, pt-osc, pg_repack, or native online DDL.
•Batch your backfills: Never UPDATE millions of rows in one statement. Batch, pause, monitor.
•Decouple migrations from deployments: Run migrations in dedicated windows, not as part of automated deploy pipelines.
•Plan for rollback: Every migration should have a documented reverse procedure before execution.
•NoSQL shifts complexity to application: Flexible schemas require application-level version handling and migration code.

What's Next:

With schema evolution mastered, we'll conclude with Modeling for Scale—the patterns and techniques for designing data models that can grow from thousands to billions of records while maintaining performance.

Page Complete

You now understand how to evolve database schemas safely in production. In the final page, we'll explore how to design data models that scale gracefully from startup to enterprise.

4 / 5

Loading learning content...

System Design (HLD)Data Modeling Fundamentals

Data Modeling Fundamentals

LevelIntermediate

Duration90 mins

TopicData Modeling Fundamentals

4 / 5

Schema Evolution

The Only Constant Is Change

What You Will Learn

Why Schema Evolution Is Hard

The Core Challenges:

Production Schema Change Challenges

•Live Traffic: Applications are actively reading and writing. Schema changes must not break running queries.
•Data Volume: Tables have millions or billions of rows. Operations that take milliseconds on test data can take hours in production.
•Locking: Many DDL operations lock tables, blocking reads and writes. Even brief locks cause cascading failures at scale.
•Rolling Deployments: Application servers are updated gradually. Old and new code must coexist during deployment windows.
•Rollback: If something goes wrong, you need to revert without data loss. Adding a column is easier than removing one.
•Coordination: Schema, application code, and deployment must be synchronized across multiple services.

The Timing Window Problem:

Consider adding a new required column to a table:

If you add the column as NOT NULL, all inserts will fail until application code is updated
If you update application code first, it will try to insert into a column that doesn't exist
If you deploy simultaneously, you need instant, atomic deployment of both—often impossible

This timing mismatch between schema changes and application deployment is the source of almost all migration failures.

The 3 AM Lesson

The Expand-Contract Pattern

The expand-contract pattern (also called parallel change) is the foundational technique for safe schema evolution. It splits every breaking change into three non-breaking phases:

Phase 1: Expand — Add new structures alongside existing ones. Both old and new versions work.

Phase 2: Migrate — Transition all usage from old to new. Backfill data if needed.

Phase 3: Contract — Remove old structures once nothing uses them.

This pattern ensures backward compatibility at every step. If something goes wrong, you can halt without rollback—the old structure still works.

expand-contract-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- GOAL: Rename column 'fullname' to separate 'first_name' and 'last_name'
 
-- ============================================
-- PHASE 1: EXPAND (Add new columns, keep old)
-- ============================================
ALTER TABLE users ADD COLUMN first_name VARCHAR(255);
ALTER TABLE users ADD COLUMN last_name VARCHAR(255);
 
-- Application code v1: Writes to both old and new columns
-- INSERT INTO users (fullname, first_name, last_name, ...) 
--   VALUES ('John Doe', 'John', 'Doe', ...);
 
-- Reads still use 'fullname' (backward compatible)
-- SELECT fullname FROM users WHERE ...
 
-- ============================================
-- PHASE 2: MIGRATE (Backfill and transition)
-- ============================================
-- Backfill existing data (do in batches for large tables)
UPDATE users 
SET first_name = split_part(fullname, ' ', 1),
    last_name = split_part(fullname, ' ', 2)
WHERE first_name IS NULL;
 
-- Once backfill complete, add constraints
ALTER TABLE users ALTER COLUMN first_name SET NOT NULL;
ALTER TABLE users ALTER COLUMN last_name SET NOT NULL;
 
-- Application code v2: Reads from new columns
-- SELECT first_name, last_name FROM users WHERE ...
-- (still writes to 'fullname' for rollback safety)
 
-- ============================================
-- PHASE 3: CONTRACT (Remove old structure)
-- ============================================
-- Only after all code versions use new columns
-- Application code v3: Stops writing to 'fullname'
 
-- Wait for all v2 instances to be replaced by v3
-- Then drop the old column
ALTER TABLE users DROP COLUMN fullname;

Why This Works:

Phase 1 is backward compatible: old code ignores new columns
Phase 2 is backward compatible: both column sets contain valid data
Phase 3 only occurs after all code is updated

At every step, you can pause, investigate, or roll back application code without data issues. The database change is decoupled from the deployment timing.

Expand-Contract Pattern Phases
Phase	Database State	Application State	Risk Level
Expand	Old + new structures exist	Writes to both, reads from old	Low
Migrate	Old + new, data synced	Reads from new, writes to both	Medium
Contract	Only new structure remains	Uses only new structure	Low (if validated)

Patience Is a Virtue

Common Migration Patterns

Different schema changes require different approaches. Here's how to apply expand-contract (or simpler patterns) to common scenarios:

Pattern 1: Adding a Column

The simplest case. Generally safe without expand-contract if the column is nullable or has a default.

add-column.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- SAFE: Add nullable column
ALTER TABLE users ADD COLUMN nickname VARCHAR(255);
-- Existing code ignores it; new code can use it
 
-- SAFE: Add column with default (PostgreSQL 11+ writes default lazily)
ALTER TABLE orders ADD COLUMN priority VARCHAR(20) DEFAULT 'normal';
-- PostgreSQL doesn't rewrite table; just records default metadata
 
-- DANGEROUS: Add NOT NULL without default
ALTER TABLE users ADD COLUMN required_field VARCHAR(255) NOT NULL;
-- This FAILS if table has existing rows
 
-- SAFE APPROACH for required column:
-- Step 1: Add nullable
ALTER TABLE users ADD COLUMN verified BOOLEAN;
-- Step 2: Backfill
UPDATE users SET verified = false WHERE verified IS NULL;
-- Step 3: Add constraint
ALTER TABLE users ALTER COLUMN verified SET NOT NULL;
ALTER TABLE users ALTER COLUMN verified SET DEFAULT false;

Pattern 2: Removing a Column

Never drop columns directly. Code might still reference them, causing runtime errors.

Remove all code references to the column
Deploy and verify no queries use it
Wait a safe period (days to weeks)
Drop the column

remove-column.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- DANGEROUS: Immediate drop
ALTER TABLE users DROP COLUMN legacy_field;
-- Any code still referencing this column will crash
 
-- SAFE APPROACH:
-- Week 1: Stop writing to the column (code change)
-- Week 2: Stop reading from the column (code change)
-- Week 3: Verify no active queries reference it
SELECT * FROM pg_stat_statements 
WHERE query ILIKE '%legacy_field%';
 
-- Week 4: Drop column
ALTER TABLE users DROP COLUMN legacy_field;
 
-- SAFER: Use soft deprecation first
-- Rename column to signal deprecation
ALTER TABLE users RENAME COLUMN legacy_field TO _deprecated_legacy_field;
-- Code referencing 'legacy_field' fails loudly
-- After fixing all references, drop the renamed column

Pattern 3: Renaming a Column

Renaming is actually drop + add. Direct rename causes errors for running queries.

rename-column.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- DANGEROUS: Direct rename
ALTER TABLE users RENAME COLUMN user_name TO username;
-- All running queries using 'user_name' will fail
 
-- SAFE APPROACH (expand-contract):
-- Phase 1: Add new column, write to both
ALTER TABLE users ADD COLUMN username VARCHAR(255);
UPDATE users SET username = user_name WHERE username IS NULL;
-- Code writes: INSERT INTO users (user_name, username, ...) VALUES (?, ?, ...)
-- Code reads: SELECT user_name FROM users... (unchanged)
 
-- Phase 2: Switch reads to new column, verify
-- Code reads: SELECT username FROM users...
-- Code writes: Still writes to both
 
-- Phase 3: Stop writing to old, then drop
-- Code writes: INSERT INTO users (username, ...) VALUES (?, ...)
-- After all old code instances are gone:
ALTER TABLE users DROP COLUMN user_name;

Pattern 4: Changing Column Type

Type changes can be dangerous. Some are fast (widening VARCHAR), some rewrite the entire table (changing to incompatible type).

change-type.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- SAFE: Widening VARCHAR (PostgreSQL)
ALTER TABLE products ALTER COLUMN sku TYPE VARCHAR(200);
-- No table rewrite, just metadata change (if new length > old)
 
-- POTENTIALLY SLOW: Narrowing VARCHAR
ALTER TABLE products ALTER COLUMN sku TYPE VARCHAR(50);
-- Checks every row to ensure data fits; blocks writes
 
-- SLOW: Changing type entirely (e.g., string to integer)
-- This rewrites entire table and blocks all access
ALTER TABLE events ALTER COLUMN event_id TYPE BIGINT USING event_id::BIGINT;
 
-- SAFE APPROACH for type change (expand-contract):
-- Step 1: Add new column with target type
ALTER TABLE events ADD COLUMN new_event_id BIGINT;
 
-- Step 2: Backfill (in batches for large tables)
UPDATE events SET new_event_id = event_id::BIGINT 
WHERE new_event_id IS NULL AND event_id <= 100000;
-- Repeat for batches...
 
-- Step 3: Update application to use new column
-- Step 4: Drop old column, rename new
ALTER TABLE events DROP COLUMN event_id;
ALTER TABLE events RENAME COLUMN new_event_id TO event_id;

Know Your Database's Behavior

Online Schema Changes

How Online Schema Change Tools Work:

Create a ghost/shadow copy of the table with the new schema
Copy data from the original table to the ghost table in batches
Capture and replay changes made to the original table during copy
Atomically swap the ghost table in place of the original
Drop the original table

Online Schema Change Tools
Tool	Database	Key Features
gh-ost (GitHub)	MySQL	Triggerless, built-in throttling, pause/resume
pt-online-schema-change	MySQL	Trigger-based, mature, Percona maintained
pg_repack	PostgreSQL	Repack tables without locks, removes bloat
Online DDL (native)	PostgreSQL 11+/MySQL 8+	Some operations are instant or lock-free
pgroll (Xata)	PostgreSQL	Multi-version DDL for expand-contract

online-schema-change.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# gh-ost example: Add column to large MySQL table
gh-ost \
  --host=prod-db-1 \
  --database=ecommerce \
  --table=orders \
  --alter="ADD COLUMN priority VARCHAR(20) DEFAULT 'normal'" \
  --execute \
  --chunk-size=1000 \
  --max-lag-millis=1500 \
  --throttle-additional-flag-file=/tmp/gh-ost-throttle
 
# Options explained:
# --chunk-size: Number of rows to copy per batch
# --max-lag-millis: Pause if replica lag exceeds this
# --throttle-additional-flag-file: Touch file to pause operation
 
# pg_repack example: Rebuild table and indexes without lock
pg_repack \
  --host=prod-db-1 \
  --dbname=ecommerce \
  --table=orders \
  --wait-timeout=30
 
# PostgreSQL native: Many operations are lock-free in modern versions
-- Adding column with default (PostgreSQL 11+)
ALTER TABLE orders ADD COLUMN priority VARCHAR(20) DEFAULT 'normal';
-- Instant operation, no table rewrite
 
-- Creating index concurrently (any PostgreSQL version)
CREATE INDEX CONCURRENTLY idx_orders_priority ON orders(priority);
-- Builds index without blocking writes (takes longer, but safe)

Prefer Native Solutions When Possible

Backfilling Data at Scale

When you add a new column that needs to be populated from existing data, you face the backfill problem. A simple UPDATE ... SET on a billion-row table will:

Lock rows for extended periods
Generate massive WAL/binlog
Spike disk I/O and potentially cause replica lag
Risk statement timeouts or connection limits

Batch Processing:

The solution is always the same: break the update into small batches with pauses between them.

batch-backfill.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- DANGEROUS: Single UPDATE on large table
UPDATE orders SET normalized_email = LOWER(email);
-- On 100M rows: Runs for hours, locks everything, may timeout
 
-- SAFE: Batched backfill with PostgreSQL
DO $$
DECLARE
    batch_size INTEGER := 10000;
    affected_rows INTEGER;
BEGIN
    LOOP
        UPDATE orders
        SET normalized_email = LOWER(email)
        WHERE order_id IN (
            SELECT order_id FROM orders
            WHERE normalized_email IS NULL
            LIMIT batch_size
        );
        
        GET DIAGNOSTICS affected_rows = ROW_COUNT;
        
        -- Exit when no more rows to update
        EXIT WHEN affected_rows = 0;
        
        -- Commit this batch (important!)
        COMMIT;
        
        -- Pause to reduce load
        PERFORM pg_sleep(0.5);
    END LOOP;
END $$;

Application-Level Backfill:

For more control, run backfills from application code rather than SQL. This allows:

Monitoring and alerting integration
Dynamic throttling based on system load
Restartability and checkpoint recording
Complex transformations with application logic

app-backfill.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
// Application-level backfill with monitoring and throttling
interface BackfillConfig {
  batchSize: number;
  delayBetweenBatches: number;  // milliseconds
  checkpointInterval: number;   // Save progress every N batches
}
 
async function backfillNormalizedEmail(config: BackfillConfig) {
  const { batchSize, delayBetweenBatches, checkpointInterval } = config;
  
  // Resume from last checkpoint if exists
  let lastProcessedId = await getCheckpoint('email-normalization') || '';
  let totalProcessed = 0;
  let batchCount = 0;
  
  console.log(`Starting backfill from ID: ${lastProcessedId || 'beginning'}`);
  
  while (true) {
    const startTime = Date.now();
    
    // Fetch batch ordered by primary key
    const rows = await db.query(`
      SELECT order_id, email
      FROM orders
      WHERE order_id > $1 AND normalized_email IS NULL
      ORDER BY order_id
      LIMIT $2
    `, [lastProcessedId, batchSize]);
    
    if (rows.length === 0) {
      console.log('Backfill complete!');
      break;
    }
    
    // Process batch
    const updates = rows.map(row => ({
      order_id: row.order_id,
      normalized_email: row.email.toLowerCase().trim(),
    }));
    
    await db.batchUpdate('orders', updates, 'order_id');
    
    // Track progress
    lastProcessedId = rows[rows.length - 1].order_id;
    totalProcessed += rows.length;
    batchCount++;
    
    // Log progress
    const duration = Date.now() - startTime;
    console.log(`Batch ${batchCount}: Processed ${rows.length} rows in ${duration}ms`);
    metrics.gauge('backfill.total_processed', totalProcessed);
    
    // Save checkpoint periodically
    if (batchCount % checkpointInterval === 0) {
      await saveCheckpoint('email-normalization', lastProcessedId);
      console.log(`Checkpoint saved at ID: ${lastProcessedId}`);
    }
    
    // Throttle based on database load
    const replicaLag = await getReplicaLag();
    if (replicaLag > 5000) {
      console.log(`High replica lag (${replicaLag}ms), pausing...`);
      await sleep(10000);
    } else {
      await sleep(delayBetweenBatches);
    }
  }
  
  await clearCheckpoint('email-normalization');
}
 
// Usage
await backfillNormalizedEmail({
  batchSize: 5000,
  delayBetweenBatches: 200,
  checkpointInterval: 100,
});

Monitor Everything During Backfills

Backward and Forward Compatibility

Backward Compatibility: New schema works with old code

Old code can still read and write to the database
Required for the period before all instances are updated

Forward Compatibility: Old schema works with new code

New code can work with data before the migration ran
Required for rollback scenarios

Backward Compatible Changes

•Adding a nullable column
•Adding a column with a default
•Adding a new table
•Adding an index
•Widening a VARCHAR column
•Removing a NOT NULL constraint

Breaking Changes

•Dropping a column
•Renaming a column
•Changing column type
•Adding NOT NULL constraint
•Dropping a table
•Changing primary key

The Two-Version Rule:

Every schema version must be compatible with:

The application version that was running before the migration
The application version that will run after the migration

This means you cannot deploy schema changes and code changes atomically. The migration must go first (or last), with the other version being compatible.

compatibility-deployment.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// DEPLOYMENT SEQUENCE for adding a required column
 
// Initial state:
// - Schema v1: users(id, email, name)
// - App v1: Uses users.email, users.name
 
// Step 1: Deploy schema v2 (add nullable column)
// ALTER TABLE users ADD COLUMN phone VARCHAR(50);
// Schema v2 is backward compatible with App v1
// App v1 still works (ignores phone column)
 
// Step 2: Deploy App v2 (write to new column)
// App v2 INSERT: (id, email, name, phone)
// App v2 SELECT: Works with phone=NULL for old rows
// App v2 must handle phone=NULL until backfill complete
 
// Step 3: Backfill existing data
// UPDATE users SET phone = 'unknown' WHERE phone IS NULL;
 
// Step 4: Add NOT NULL constraint
// ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
// Now phone is required at database level
 
// Step 5: (Optional) Deploy App v3
// App v3 assumes phone is always present
 
// At every step:
// - App v1 works with Schema v1 and v2
// - App v2 works with Schema v2
// - App v3 works with Schema v2 (after constraint)
 
interface User_v1 {
  id: string;
  email: string;
  name: string;
}
 
interface User_v2 {
  id: string;
  email: string;
  name: string;
  phone: string | null;  // Nullable during transition
}
 
interface User_v3 {
  id: string;
  email: string;
  name: string;
  phone: string;  // Always present after backfill + constraint
}

Make Schema Migrations Reversible

Migration Best Practices

Beyond the patterns and techniques, successful schema evolution requires disciplined practices.

Migration Best Practices

•Version Control Migrations: Store migrations in your code repository. Tools like Flyway, Liquibase, Prisma Migrate, or ActiveRecord Migrations provide deterministic, sequential migration application.
•Test Against Production-Scale Data: Migrations that take 100ms on 1000 rows might take 10 hours on 10 million rows. Test with production volume (backup/clone if needed).
•Run Migrations Outside Deploy Pipeline: Separate 'migrate' from 'deploy'. Migrations should run in dedicated windows with monitoring, not as a step in an automated deployment pipeline.
•Have a Rollback Plan: Before running any migration, document exactly how to undo it. Some migrations (adding columns) are easy to reverse; others (splitting tables) require prepared reverse migrations.
•Limit Migration Size: One logical change per migration. A migration that 'adds feature X, renames Y, and drops Z' is impossible to partially roll back.
•Communicate Broadly: Schema changes affect multiple teams (frontend, analytics, data science). Announce migrations in advance, especially for column removals or renames.
•Monitor After Migration: Watch database metrics and application errors for minutes to hours after migration. Problems may be delayed (e.g., when cache expires and new queries hit the changed schema).

migration-checklist.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Migration Checklist
 
## Before Migration
- [ ] Migration tested on production-sized dataset
- [ ] Backward compatibility verified (old app works with new schema)
- [ ] Rollback procedure documented and tested
- [ ] Affected teams notified
- [ ] Migration window scheduled during low-traffic period
- [ ] Monitoring dashboards prepared
 
## During Migration
- [ ] Database metrics normal (CPU, I/O, connections)
- [ ] Replica lag within acceptable limits
- [ ] Application error rates stable
- [ ] Migration progress logged
 
## After Migration
- [ ] Schema verified (columns exist, constraints applied)
- [ ] Application functionality tested
- [ ] Performance metrics stable for 15+ minutes
- [ ] Rollback materials retained for 24 hours
 
## Post-Migration Cleanup (days/weeks later)
- [ ] Old columns dropped (if expand-contract)
- [ ] Unused indexes removed
- [ ] Documentation updated
- [ ] Analytics/reporting teams notified of changes

Small, Frequent Migrations

NoSQL Schema Evolution

NoSQL databases handle schema evolution differently. The 'schemaless' nature provides flexibility but shifts complexity to the application layer.

MongoDB: Flexible but Discipline Required

MongoDB documents can have different structures in the same collection. This allows gradual migrations but requires careful application-level handling of multiple document versions.

mongodb-evolution.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// MongoDB: Handling multiple document versions
 
interface UserV1 {
  _id: ObjectId;
  name: string;  // Full name in one field
  email: string;
}
 
interface UserV2 {
  _id: ObjectId;
  firstName: string;  // Split names
  lastName: string;
  email: string;
  schemaVersion: 2;
}
 
type User = UserV1 | UserV2;
 
// Application code handles both versions
function getDisplayName(user: User): string {
  if ('schemaVersion' in user && user.schemaVersion === 2) {
    return `${user.firstName} ${user.lastName}`;
  }
  // Legacy v1 document
  return (user as UserV1).name;
}
 
// Migration can be lazy (update on read) or eager (background job)
async function migrateUserV1ToV2(user: UserV1): Promise<UserV2> {
  const [firstName, ...lastParts] = user.name.split(' ');
  const lastName = lastParts.join(' ') || 'Unknown';
  
  const updatedUser: UserV2 = {
    _id: user._id,
    firstName,
    lastName,
    email: user.email,
    schemaVersion: 2,
  };
  
  await collection.updateOne(
    { _id: user._id },
    { $set: updatedUser }
  );
  
  return updatedUser;
}
 
// Lazy migration: Update document when read
async function getUser(userId: ObjectId): Promise<UserV2> {
  const user = await collection.findOne({ _id: userId });
  if (!user) throw new NotFoundError();
  
  if (!('schemaVersion' in user)) {
    // Migrate on read
    return await migrateUserV1ToV2(user as UserV1);
  }
  
  return user as UserV2;
}

DynamoDB: Limited Evolution Options

dynamodb-evolution.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// DynamoDB: Adding GSI for new access pattern
 
// Original table: Users with PK = USER#<userId>
// New requirement: Look up users by email
 
// Step 1: Add email attribute to items (if not present)
// New items already have email; backfill old items
 
// Step 2: Create Global Secondary Index
const createGSIParams = {
  TableName: 'Users',
  AttributeDefinitions: [
    { AttributeName: 'email', AttributeType: 'S' },
  ],
  GlobalSecondaryIndexUpdates: [
    {
      Create: {
        IndexName: 'email-index',
        KeySchema: [
          { AttributeName: 'email', KeyType: 'HASH' },
        ],
        Projection: { ProjectionType: 'ALL' },
        ProvisionedThroughput: {
          ReadCapacityUnits: 10,
          WriteCapacityUnits: 5,
        },
      },
    },
  ],
};
 
// GSI creation is online (table remains usable)
// But GSI won't include items without 'email' attribute
// Backfill must add email to old items for them to appear in GSI
 
// Step 3: Use new access pattern once GSI is active
async function getUserByEmail(email: string) {
  return await dynamodb.query({
    TableName: 'Users',
    IndexName: 'email-index',
    KeyConditionExpression: 'email = :email',
    ExpressionAttributeValues: { ':email': email },
  });
}

Schemaless Doesn't Mean Schema-Free

Summary: Schema Evolution Mastery

Schema evolution is an operational discipline as much as a technical one. The patterns are straightforward; the challenge is applying them consistently under pressure.

Key Takeaways

•Expand-contract is the master pattern: Add new → migrate → remove old. Never make breaking changes directly.
•Backward compatibility enables safe deployment: New schema must work with old code during rolling updates.
•Common operations have recipes: Adding columns, removing columns, renaming, type changes—each has a safe procedure.
•Online schema change tools exist: For large tables, use gh-ost, pt-osc, pg_repack, or native online DDL.
•Batch your backfills: Never UPDATE millions of rows in one statement. Batch, pause, monitor.
•Decouple migrations from deployments: Run migrations in dedicated windows, not as part of automated deploy pipelines.
•Plan for rollback: Every migration should have a documented reverse procedure before execution.
•NoSQL shifts complexity to application: Flexible schemas require application-level version handling and migration code.

What's Next:

Page Complete

You now understand how to evolve database schemas safely in production. In the final page, we'll explore how to design data models that scale gracefully from startup to enterprise.

4 / 5