Loading learning content...
No database schema survives first contact with users unchanged. Requirements evolve. Features are added. Data models that seemed perfect reveal limitations. The question isn't whether your schema will change—it's how you'll manage that change safely.
Schema evolution is the discipline of modifying database structures in production systems without breaking applications, losing data, or causing extended downtime. In systems serving millions of users, a poorly executed migration can mean hours of outage, corrupted data, or both.
This is one of the most operationally critical skills in database engineering. The techniques and patterns in this page will determine whether your schema changes are seamless events or midnight fire drills.
By the end of this page, you will understand migration strategies (expand-contract pattern, online schema changes), backward/forward compatibility, zero-downtime deployment techniques, data backfill strategies, and how to handle schema changes across different database systems.
Changing a schema in development is trivial. Drop the database, recreate it with the new structure, reload test data. But production systems have constraints that make the same operation catastrophically complex:
The Core Challenges:
The Timing Window Problem:
Consider adding a new required column to a table:
This timing mismatch between schema changes and application deployment is the source of almost all migration failures.
Every engineer who has been paged at 3 AM for a migration-related outage learns the same lesson: schema evolution is not a technical problem—it's an operational problem. The SQL is usually trivial. The coordination, timing, and rollback planning are where things go wrong.
The expand-contract pattern (also called parallel change) is the foundational technique for safe schema evolution. It splits every breaking change into three non-breaking phases:
Phase 1: Expand — Add new structures alongside existing ones. Both old and new versions work.
Phase 2: Migrate — Transition all usage from old to new. Backfill data if needed.
Phase 3: Contract — Remove old structures once nothing uses them.
This pattern ensures backward compatibility at every step. If something goes wrong, you can halt without rollback—the old structure still works.
1234567891011121314151617181920212223242526272829303132333435363738394041
-- GOAL: Rename column 'fullname' to separate 'first_name' and 'last_name' -- ============================================-- PHASE 1: EXPAND (Add new columns, keep old)-- ============================================ALTER TABLE users ADD COLUMN first_name VARCHAR(255);ALTER TABLE users ADD COLUMN last_name VARCHAR(255); -- Application code v1: Writes to both old and new columns-- INSERT INTO users (fullname, first_name, last_name, ...) -- VALUES ('John Doe', 'John', 'Doe', ...); -- Reads still use 'fullname' (backward compatible)-- SELECT fullname FROM users WHERE ... -- ============================================-- PHASE 2: MIGRATE (Backfill and transition)-- ============================================-- Backfill existing data (do in batches for large tables)UPDATE users SET first_name = split_part(fullname, ' ', 1), last_name = split_part(fullname, ' ', 2)WHERE first_name IS NULL; -- Once backfill complete, add constraintsALTER TABLE users ALTER COLUMN first_name SET NOT NULL;ALTER TABLE users ALTER COLUMN last_name SET NOT NULL; -- Application code v2: Reads from new columns-- SELECT first_name, last_name FROM users WHERE ...-- (still writes to 'fullname' for rollback safety) -- ============================================-- PHASE 3: CONTRACT (Remove old structure)-- ============================================-- Only after all code versions use new columns-- Application code v3: Stops writing to 'fullname' -- Wait for all v2 instances to be replaced by v3-- Then drop the old columnALTER TABLE users DROP COLUMN fullname;Why This Works:
At every step, you can pause, investigate, or roll back application code without data issues. The database change is decoupled from the deployment timing.
| Phase | Database State | Application State | Risk Level |
|---|---|---|---|
| Old + new structures exist | Writes to both, reads from old | Low |
| Old + new, data synced | Reads from new, writes to both | Medium |
| Only new structure remains | Uses only new structure | Low (if validated) |
The expand-contract pattern is slower than a direct rename. It requires multiple deployments and migrations. But that slowness is the safety margin. A week of careful incremental changes beats a night of crisis management.
Different schema changes require different approaches. Here's how to apply expand-contract (or simpler patterns) to common scenarios:
Pattern 1: Adding a Column
The simplest case. Generally safe without expand-contract if the column is nullable or has a default.
1234567891011121314151617181920
-- SAFE: Add nullable columnALTER TABLE users ADD COLUMN nickname VARCHAR(255);-- Existing code ignores it; new code can use it -- SAFE: Add column with default (PostgreSQL 11+ writes default lazily)ALTER TABLE orders ADD COLUMN priority VARCHAR(20) DEFAULT 'normal';-- PostgreSQL doesn't rewrite table; just records default metadata -- DANGEROUS: Add NOT NULL without defaultALTER TABLE users ADD COLUMN required_field VARCHAR(255) NOT NULL;-- This FAILS if table has existing rows -- SAFE APPROACH for required column:-- Step 1: Add nullableALTER TABLE users ADD COLUMN verified BOOLEAN;-- Step 2: BackfillUPDATE users SET verified = false WHERE verified IS NULL;-- Step 3: Add constraintALTER TABLE users ALTER COLUMN verified SET NOT NULL;ALTER TABLE users ALTER COLUMN verified SET DEFAULT false;Pattern 2: Removing a Column
Never drop columns directly. Code might still reference them, causing runtime errors.
12345678910111213141516171819
-- DANGEROUS: Immediate dropALTER TABLE users DROP COLUMN legacy_field;-- Any code still referencing this column will crash -- SAFE APPROACH:-- Week 1: Stop writing to the column (code change)-- Week 2: Stop reading from the column (code change)-- Week 3: Verify no active queries reference itSELECT * FROM pg_stat_statements WHERE query ILIKE '%legacy_field%'; -- Week 4: Drop columnALTER TABLE users DROP COLUMN legacy_field; -- SAFER: Use soft deprecation first-- Rename column to signal deprecationALTER TABLE users RENAME COLUMN legacy_field TO _deprecated_legacy_field;-- Code referencing 'legacy_field' fails loudly-- After fixing all references, drop the renamed columnPattern 3: Renaming a Column
Renaming is actually drop + add. Direct rename causes errors for running queries.
12345678910111213141516171819
-- DANGEROUS: Direct renameALTER TABLE users RENAME COLUMN user_name TO username;-- All running queries using 'user_name' will fail -- SAFE APPROACH (expand-contract):-- Phase 1: Add new column, write to bothALTER TABLE users ADD COLUMN username VARCHAR(255);UPDATE users SET username = user_name WHERE username IS NULL;-- Code writes: INSERT INTO users (user_name, username, ...) VALUES (?, ?, ...)-- Code reads: SELECT user_name FROM users... (unchanged) -- Phase 2: Switch reads to new column, verify-- Code reads: SELECT username FROM users...-- Code writes: Still writes to both -- Phase 3: Stop writing to old, then drop-- Code writes: INSERT INTO users (username, ...) VALUES (?, ...)-- After all old code instances are gone:ALTER TABLE users DROP COLUMN user_name;Pattern 4: Changing Column Type
Type changes can be dangerous. Some are fast (widening VARCHAR), some rewrite the entire table (changing to incompatible type).
12345678910111213141516171819202122232425
-- SAFE: Widening VARCHAR (PostgreSQL)ALTER TABLE products ALTER COLUMN sku TYPE VARCHAR(200);-- No table rewrite, just metadata change (if new length > old) -- POTENTIALLY SLOW: Narrowing VARCHARALTER TABLE products ALTER COLUMN sku TYPE VARCHAR(50);-- Checks every row to ensure data fits; blocks writes -- SLOW: Changing type entirely (e.g., string to integer)-- This rewrites entire table and blocks all accessALTER TABLE events ALTER COLUMN event_id TYPE BIGINT USING event_id::BIGINT; -- SAFE APPROACH for type change (expand-contract):-- Step 1: Add new column with target typeALTER TABLE events ADD COLUMN new_event_id BIGINT; -- Step 2: Backfill (in batches for large tables)UPDATE events SET new_event_id = event_id::BIGINT WHERE new_event_id IS NULL AND event_id <= 100000;-- Repeat for batches... -- Step 3: Update application to use new column-- Step 4: Drop old column, rename newALTER TABLE events DROP COLUMN event_id;ALTER TABLE events RENAME COLUMN new_event_id TO event_id;Different databases handle DDL differently. PostgreSQL's ALTER TABLE behaviors changed significantly in version 11 (instant defaults). MySQL 8.0+ has instant DDL for many operations. Always test migrations against your specific database version and table size.
For very large tables, even 'fast' operations can be problematic. Adding an index on a 100-million row table might lock writes for minutes. Online schema change tools solve this by performing changes in the background without blocking operations.
How Online Schema Change Tools Work:
| Tool | Database | Key Features |
|---|---|---|
| gh-ost (GitHub) | MySQL | Triggerless, built-in throttling, pause/resume |
| pt-online-schema-change | MySQL | Trigger-based, mature, Percona maintained |
| pg_repack | PostgreSQL | Repack tables without locks, removes bloat |
| Online DDL (native) | PostgreSQL 11+/MySQL 8+ | Some operations are instant or lock-free |
| pgroll (Xata) | PostgreSQL | Multi-version DDL for expand-contract |
12345678910111213141516171819202122232425262728293031
# gh-ost example: Add column to large MySQL tablegh-ost \ --host=prod-db-1 \ --database=ecommerce \ --table=orders \ --alter="ADD COLUMN priority VARCHAR(20) DEFAULT 'normal'" \ --execute \ --chunk-size=1000 \ --max-lag-millis=1500 \ --throttle-additional-flag-file=/tmp/gh-ost-throttle # Options explained:# --chunk-size: Number of rows to copy per batch# --max-lag-millis: Pause if replica lag exceeds this# --throttle-additional-flag-file: Touch file to pause operation # pg_repack example: Rebuild table and indexes without lockpg_repack \ --host=prod-db-1 \ --dbname=ecommerce \ --table=orders \ --wait-timeout=30 # PostgreSQL native: Many operations are lock-free in modern versions-- Adding column with default (PostgreSQL 11+)ALTER TABLE orders ADD COLUMN priority VARCHAR(20) DEFAULT 'normal';-- Instant operation, no table rewrite -- Creating index concurrently (any PostgreSQL version)CREATE INDEX CONCURRENTLY idx_orders_priority ON orders(priority);-- Builds index without blocking writes (takes longer, but safe)Modern PostgreSQL (11+) and MySQL (8.0+) have excellent native online DDL support for common operations. Check if your specific operation is supported as an instant or online DDL before reaching for external tools. External tools add operational complexity and failure modes.
When you add a new column that needs to be populated from existing data, you face the backfill problem. A simple UPDATE ... SET on a billion-row table will:
Batch Processing:
The solution is always the same: break the update into small batches with pauses between them.
12345678910111213141516171819202122232425262728293031
-- DANGEROUS: Single UPDATE on large tableUPDATE orders SET normalized_email = LOWER(email);-- On 100M rows: Runs for hours, locks everything, may timeout -- SAFE: Batched backfill with PostgreSQLDO $$DECLARE batch_size INTEGER := 10000; affected_rows INTEGER;BEGIN LOOP UPDATE orders SET normalized_email = LOWER(email) WHERE order_id IN ( SELECT order_id FROM orders WHERE normalized_email IS NULL LIMIT batch_size ); GET DIAGNOSTICS affected_rows = ROW_COUNT; -- Exit when no more rows to update EXIT WHEN affected_rows = 0; -- Commit this batch (important!) COMMIT; -- Pause to reduce load PERFORM pg_sleep(0.5); END LOOP;END $$;Application-Level Backfill:
For more control, run backfills from application code rather than SQL. This allows:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
// Application-level backfill with monitoring and throttlinginterface BackfillConfig { batchSize: number; delayBetweenBatches: number; // milliseconds checkpointInterval: number; // Save progress every N batches} async function backfillNormalizedEmail(config: BackfillConfig) { const { batchSize, delayBetweenBatches, checkpointInterval } = config; // Resume from last checkpoint if exists let lastProcessedId = await getCheckpoint('email-normalization') || ''; let totalProcessed = 0; let batchCount = 0; console.log(`Starting backfill from ID: ${lastProcessedId || 'beginning'}`); while (true) { const startTime = Date.now(); // Fetch batch ordered by primary key const rows = await db.query(` SELECT order_id, email FROM orders WHERE order_id > $1 AND normalized_email IS NULL ORDER BY order_id LIMIT $2 `, [lastProcessedId, batchSize]); if (rows.length === 0) { console.log('Backfill complete!'); break; } // Process batch const updates = rows.map(row => ({ order_id: row.order_id, normalized_email: row.email.toLowerCase().trim(), })); await db.batchUpdate('orders', updates, 'order_id'); // Track progress lastProcessedId = rows[rows.length - 1].order_id; totalProcessed += rows.length; batchCount++; // Log progress const duration = Date.now() - startTime; console.log(`Batch ${batchCount}: Processed ${rows.length} rows in ${duration}ms`); metrics.gauge('backfill.total_processed', totalProcessed); // Save checkpoint periodically if (batchCount % checkpointInterval === 0) { await saveCheckpoint('email-normalization', lastProcessedId); console.log(`Checkpoint saved at ID: ${lastProcessedId}`); } // Throttle based on database load const replicaLag = await getReplicaLag(); if (replicaLag > 5000) { console.log(`High replica lag (${replicaLag}ms), pausing...`); await sleep(10000); } else { await sleep(delayBetweenBatches); } } await clearCheckpoint('email-normalization');} // Usageawait backfillNormalizedEmail({ batchSize: 5000, delayBetweenBatches: 200, checkpointInterval: 100,});Backfills are extended operations that can impact production. Monitor database metrics (CPU, I/O, connections, replica lag), application latency, and error rates. Be prepared to pause or slow down if metrics degrade. Backfills that run 'invisibly' are the goal.
During rolling deployments, old and new code versions run simultaneously. Your schema must be compatible with both. This requires understanding backward compatibility and forward compatibility.
Backward Compatibility: New schema works with old code
Forward Compatibility: Old schema works with new code
The Two-Version Rule:
Every schema version must be compatible with:
This means you cannot deploy schema changes and code changes atomically. The migration must go first (or last), with the other version being compatible.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// DEPLOYMENT SEQUENCE for adding a required column // Initial state:// - Schema v1: users(id, email, name)// - App v1: Uses users.email, users.name // Step 1: Deploy schema v2 (add nullable column)// ALTER TABLE users ADD COLUMN phone VARCHAR(50);// Schema v2 is backward compatible with App v1// App v1 still works (ignores phone column) // Step 2: Deploy App v2 (write to new column)// App v2 INSERT: (id, email, name, phone)// App v2 SELECT: Works with phone=NULL for old rows// App v2 must handle phone=NULL until backfill complete // Step 3: Backfill existing data// UPDATE users SET phone = 'unknown' WHERE phone IS NULL; // Step 4: Add NOT NULL constraint// ALTER TABLE users ALTER COLUMN phone SET NOT NULL;// Now phone is required at database level // Step 5: (Optional) Deploy App v3// App v3 assumes phone is always present // At every step:// - App v1 works with Schema v1 and v2// - App v2 works with Schema v2// - App v3 works with Schema v2 (after constraint) interface User_v1 { id: string; email: string; name: string;} interface User_v2 { id: string; email: string; name: string; phone: string | null; // Nullable during transition} interface User_v3 { id: string; email: string; name: string; phone: string; // Always present after backfill + constraint}For every migration, ask: 'Can I roll back the application without rolling back this migration?' If not, you've coupled them dangerously. The expand-contract pattern ensures migrations are independently reversible.
Beyond the patterns and techniques, successful schema evolution requires disciplined practices.
123456789101112131415161718192021222324252627
# Migration Checklist ## Before Migration- [ ] Migration tested on production-sized dataset- [ ] Backward compatibility verified (old app works with new schema)- [ ] Rollback procedure documented and tested- [ ] Affected teams notified- [ ] Migration window scheduled during low-traffic period- [ ] Monitoring dashboards prepared ## During Migration- [ ] Database metrics normal (CPU, I/O, connections)- [ ] Replica lag within acceptable limits- [ ] Application error rates stable- [ ] Migration progress logged ## After Migration- [ ] Schema verified (columns exist, constraints applied)- [ ] Application functionality tested- [ ] Performance metrics stable for 15+ minutes- [ ] Rollback materials retained for 24 hours ## Post-Migration Cleanup (days/weeks later)- [ ] Old columns dropped (if expand-contract)- [ ] Unused indexes removed- [ ] Documentation updated- [ ] Analytics/reporting teams notified of changesTeams that fear migrations tend to batch them into large, risky releases. Teams that practice continuous migration deploy small changes frequently with minimal ceremony. The latter approach is safer and develops organizational muscle memory for smooth schema evolution.
NoSQL databases handle schema evolution differently. The 'schemaless' nature provides flexibility but shifts complexity to the application layer.
MongoDB: Flexible but Discipline Required
MongoDB documents can have different structures in the same collection. This allows gradual migrations but requires careful application-level handling of multiple document versions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
// MongoDB: Handling multiple document versions interface UserV1 { _id: ObjectId; name: string; // Full name in one field email: string;} interface UserV2 { _id: ObjectId; firstName: string; // Split names lastName: string; email: string; schemaVersion: 2;} type User = UserV1 | UserV2; // Application code handles both versionsfunction getDisplayName(user: User): string { if ('schemaVersion' in user && user.schemaVersion === 2) { return `${user.firstName} ${user.lastName}`; } // Legacy v1 document return (user as UserV1).name;} // Migration can be lazy (update on read) or eager (background job)async function migrateUserV1ToV2(user: UserV1): Promise<UserV2> { const [firstName, ...lastParts] = user.name.split(' '); const lastName = lastParts.join(' ') || 'Unknown'; const updatedUser: UserV2 = { _id: user._id, firstName, lastName, email: user.email, schemaVersion: 2, }; await collection.updateOne( { _id: user._id }, { $set: updatedUser } ); return updatedUser;} // Lazy migration: Update document when readasync function getUser(userId: ObjectId): Promise<UserV2> { const user = await collection.findOne({ _id: userId }); if (!user) throw new NotFoundError(); if (!('schemaVersion' in user)) { // Migrate on read return await migrateUserV1ToV2(user as UserV1); } return user as UserV2;}DynamoDB: Limited Evolution Options
DynamoDB's schema is effectively defined by its key structure. Changing partition or sort keys requires creating new tables and migrating data. Adding attributes is trivial; changing key structure is painful.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// DynamoDB: Adding GSI for new access pattern // Original table: Users with PK = USER#<userId>// New requirement: Look up users by email // Step 1: Add email attribute to items (if not present)// New items already have email; backfill old items // Step 2: Create Global Secondary Indexconst createGSIParams = { TableName: 'Users', AttributeDefinitions: [ { AttributeName: 'email', AttributeType: 'S' }, ], GlobalSecondaryIndexUpdates: [ { Create: { IndexName: 'email-index', KeySchema: [ { AttributeName: 'email', KeyType: 'HASH' }, ], Projection: { ProjectionType: 'ALL' }, ProvisionedThroughput: { ReadCapacityUnits: 10, WriteCapacityUnits: 5, }, }, }, ],}; // GSI creation is online (table remains usable)// But GSI won't include items without 'email' attribute// Backfill must add email to old items for them to appear in GSI // Step 3: Use new access pattern once GSI is activeasync function getUserByEmail(email: string) { return await dynamodb.query({ TableName: 'Users', IndexName: 'email-index', KeyConditionExpression: 'email = :email', ExpressionAttributeValues: { ':email': email }, });}NoSQL databases aren't really schemaless—they just move the schema to your application code. You still need versioning, migration strategies, and compatibility planning. The database just won't enforce anything for you.
Schema evolution is an operational discipline as much as a technical one. The patterns are straightforward; the challenge is applying them consistently under pressure.
What's Next:
With schema evolution mastered, we'll conclude with Modeling for Scale—the patterns and techniques for designing data models that can grow from thousands to billions of records while maintaining performance.
You now understand how to evolve database schemas safely in production. In the final page, we'll explore how to design data models that scale gracefully from startup to enterprise.