Loading learning content...
Despite all our redundancy measures—RAID, mirroring, remote backup—storage failures still occur. Sometimes all copies of a database file are lost or corrupted. Sometimes a disaster destroys the primary site and we must recover from backups. Sometimes human error deletes critical data and we need to roll back to an earlier state.
Media recovery is the process of reconstructing a consistent database from backups and archived transaction logs after storage media has failed. Unlike the crash recovery we studied earlier (which works from intact logs), media recovery must work with data that may be hours, days, or weeks old, bringing it forward to a consistent state.
This is the ultimate test of a database's durability guarantees—can we actually recover the data when we need it most?
By the end of this page, you will understand how media recovery differs from crash recovery, the process of restoring from backup and applying archived logs, point-in-time recovery (PITR) techniques, and practical media recovery procedures for major database systems.
Database recovery addresses two fundamentally different failure scenarios with different recovery techniques:
System (Crash) Recovery:
Media Recovery:
| Aspect | Crash Recovery | Media Recovery |
|---|---|---|
| Trigger | Automatic on restart | Manual initiation required |
| Input: Data Files | Current (possibly inconsistent) | Old backup |
| Input: Logs | Online redo log | Archived logs + online log |
| Forward recovery span | Since last checkpoint | Since backup |
| REDO scope | Recent operations only | Potentially millions of operations |
| UNDO requirement | Yes (uncommitted txns) | Only at end of recovery |
Crash recovery is routine—it happens every time a database restarts after an unclean shutdown. Media recovery is exceptional—it's invoked only when data is lost. Crash recovery is automatic and tested constantly. Media recovery is manual and tested rarely. This is why media recovery procedures must be documented and practiced.
Media Recovery Prerequisites:
Media recovery is only possible with proper preparation:
Media recovery follows a systematic process that mirrors how the database originally built up its state:
Phase 1: Restore Backup
First, we restore a backup to get the database to a known historical state. This might be:
Phase 2: Apply Archived Logs (Forward Recovery)
Next, we apply all transaction log segments that were archived since the backup was taken. Each log segment contains the changes made during that time period. Applying them in sequence brings the database forward in time.
Phase 3: Apply Current Log (Final Recovery)
If the current/online transaction log is available (not lost in the media failure), we apply it to recover the most recent transactions.
Phase 4: Open Database
Finally, the database is opened, undo recovery runs for any incomplete transactions, and the system resumes normal operation.
12345678910111213141516171819
Timeline of Media Recovery: Backup Archived Logs Current Failure │ │ Log │ ▼ ▼ ▼ ▼ ┌───────────────────────────────────────────────────────────┐ │ Day 0 │ Day 1 │ Day 2 │ Day 3 │ Day 4 │ Day 5 │ Now │ │ │ │ │ │ │ │ (crash) │ │ Full │ Log │ Log │ Log │ Log │ Log │ Online │ │Backup │ Arch1 │ Arch2 │ Arch3 │ Arch4 │ Arch5 │ Log │ └───────────────────────────────────────────────────────────┘ Recovery Steps: 1. Restore Full Backup (Day 0 state) 2. Apply Arch1 through Arch5 (forward to Day 5) 3. Apply Online Log if available (forward to crash point) 4. Open database (undo incomplete transactions) Result: Database restored to moment before failureRecovery can only proceed as far as you have continuous logs. If archived log #3 is missing, you can only recover to log #2—losing all transactions from log #3 onwards, even if logs #4 and #5 are available. Archive log storage must be treated with the same care as the database itself.
Complete vs. Incomplete Recovery:
Complete Recovery: Apply all available logs to recover up to the moment of failure. No data loss (assuming logs are complete).
Incomplete Recovery (Point-in-Time): Stop recovery at a specific point before the latest available log. Used to recover from logical errors (like accidental table drops) rather than media failures.
Point-in-Time Recovery (PITR) allows restoring a database to any specific moment in the past, not just to a backup point. This capability is essential for recovering from:
DROP TABLE or DELETE without WHERE clauseHow PITR Works:
PITR leverages the same mechanism as complete media recovery, but stops applying logs at a specified target:
Restore Backup Apply Logs Up To Target Point Stop
│ │ │
▼ ▼ ▼
┌───────────────────────────────────────────────────────┐
│ Backup │ Log1 │ Log2 │ Log3 │ Log4 │ Log5 │ Log6 │ │
│ (Tues) │ │ │ │ ▲ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
└────────────────────────────────│──────────────────────┘
│
Target: Thursday 14:30
(Just before accidental DELETE)
Result: Database restored to Thursday 14:30 state
Transactions after 14:30 are NOT applied (by design)
12345678910111213141516171819202122232425262728293031323334353637383940
#!/bin/bash# PostgreSQL: Point-in-Time Recovery # Scenario: Accidental DELETE occurred at 2024-01-15 14:35:00# We want to recover to 14:30:00 (before the DELETE) # Step 1: Stop PostgreSQL (if running)systemctl stop postgresql # Step 2: Move current data directory asidemv /var/lib/postgresql/15/main /var/lib/postgresql/15/main.failed # Step 3: Restore base backuppg_basebackup --restore-base-backup /backup/basebackup_20240114.tartar -xf /backup/basebackup_20240114.tar -C /var/lib/postgresql/15/main # Step 4: Configure recovery targetcat > /var/lib/postgresql/15/main/postgresql.auto.conf << EOFrestore_command = 'cp /archive/%f %p'recovery_target_time = '2024-01-15 14:30:00'recovery_target_action = 'pause' # or 'promote'EOF # Step 5: Create recovery signal filetouch /var/lib/postgresql/15/main/recovery.signal # Step 6: Start PostgreSQLsystemctl start postgresql # Step 7: Verify recovery target reachedpsql -c "SELECT pg_last_xact_replay_timestamp();" # Step 8: When satisfied, promote to master# If recovery_target_action = 'pause':psql -c "SELECT pg_wal_replay_resume();"# Then:psql -c "SELECT pg_promote();" # Note: All transactions after 14:30:00 are LOST - this is intentional# You're trading newer data to recover from the DELETERecovery Target Options:
| Target Type | PostgreSQL | Oracle | Use Case |
|---|---|---|---|
| Timestamp | recovery_target_time | UNTIL TIME | Recover to specific moment |
| Transaction ID | recovery_target_xid | UNTIL CHANGE | Recover to specific txn |
| Named Point | recovery_target_name | UNTIL SEQUENCE | Recover to restore point |
| Log Position | recovery_target_lsn | UNTIL SCN | Recover to specific log location |
| Immediate | recovery_target = 'immediate' | N/A | Stop at consistent point |
Before risky operations (schema changes, large updates), create a named restore point. This gives you a precise recovery target without needing to guess timestamps. In PostgreSQL: SELECT pg_create_restore_point('before_migration'); In Oracle: CREATE RESTORE POINT before_migration;
Oracle Database provides sophisticated media recovery capabilities through RMAN (Recovery Manager) and its integrated backup/restore architecture.
Oracle Recovery Architecture:
Oracle distinguishes between:
123456789101112131415161718192021222324252627282930313233343536373839404142
-- Oracle: Complete Database Media Recovery with RMAN -- Scenario: All data files lost, need complete recovery -- Step 1: Connect to RMAN-- $ rman target / -- Step 2: If control file is lost, restore it firstRMAN> STARTUP NOMOUNT;RMAN> RESTORE CONTROLFILE FROM AUTOBACKUP;RMAN> ALTER DATABASE MOUNT; -- Step 3: Restore database filesRMAN> RESTORE DATABASE; -- Step 4: Recover (apply archived and online logs)RMAN> RECOVER DATABASE; -- Step 5: Open database with resetlogs (after incomplete recovery)-- or just OPEN (after complete recovery)RMAN> ALTER DATABASE OPEN RESETLOGS; -- ========================================-- Point-in-Time Recovery Example-- ======================================== -- Recover to specific time (before accidental DELETE)RMAN> STARTUP MOUNT;RMAN> SET UNTIL TIME "TO_DATE('2024-01-15 14:30:00','YYYY-MM-DD HH24:MI:SS')";RMAN> RESTORE DATABASE;RMAN> RECOVER DATABASE;RMAN> ALTER DATABASE OPEN RESETLOGS; -- Recover to SCN (System Change Number)RMAN> SET UNTIL SCN 123456789;RMAN> RESTORE DATABASE;RMAN> RECOVER DATABASE; -- Recover to restore pointRMAN> SET UNTIL RESTORE POINT before_upgrade;RMAN> RESTORE DATABASE;RMAN> RECOVER DATABASE;Oracle Block Media Recovery:
Oracle can recover individual corrupted blocks without restoring the entire datafile—a powerful feature for minimizing downtime:
12345678910111213141516171819
-- Oracle: Block-Level Media Recovery -- Scenario: DBVERIFY reports a few corrupted blocks-- No need to restore entire datafile! -- Step 1: Identify corrupted blocksSELECT * FROM v$database_block_corruption;-- orSELECT * FROM v$backup_corruption; -- Step 2: Recover just those blocks (RMAN)RMAN> BLOCKRECOVER DATAFILE 7 BLOCK 123; -- Or recover all known corrupted blocksRMAN> BLOCKRECOVER CORRUPTION LIST; -- This restores only the specific blocks from backup-- and applies redo to bring them current-- Much faster than full datafile recovery!Oracle offers Flashback Database as an alternative to traditional PITR for some scenarios. Flashback uses 'before images' stored in the flashback logs to quickly reverse changes without full restore. It's faster but requires pre-configuration and additional storage. Flashback is ideal for logical errors; traditional recovery is still needed for media failures.
PostgreSQL's media recovery is based on its Write-Ahead Log (WAL) architecture. The process involves restoring a base backup and replaying WAL segments.
PostgreSQL Recovery Architecture:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
-- PostgreSQL: Complete Media Recovery Procedure -- PREPARATION (before any failure occurs): -- Enable WAL archiving in postgresql.conf:-- archive_mode = on-- archive_command = 'cp %p /archive/%f'-- wal_level = replica -- Create base backups regularly:-- pg_basebackup -D /backup/$(date +%Y%m%d) -Ft -Xs -P -- ========================================-- RECOVERY PROCEDURE-- ======================================== -- Step 1: Stop PostgreSQL if running-- $ systemctl stop postgresql -- Step 2: Secure failed data directory-- $ mv $PGDATA $PGDATA.failed -- Step 3: Restore base backup-- $ tar -xf /backup/20240114.tar -C $PGDATA -- Step 4: Configure recovery-- Create $PGDATA/postgresql.auto.conf (or edit postgresql.conf):-- -- restore_command = 'cp /archive/%f %p'---- For complete recovery, that's all you need-- For PITR, also add:-- recovery_target_time = '2024-01-15 14:30:00' -- Step 5: Create recovery signal-- $ touch $PGDATA/recovery.signal -- Step 6: Start PostgreSQL-- $ systemctl start postgresql -- Step 7: Check recovery progressSELECT pg_is_in_recovery();SELECT pg_last_wal_receive_lsn();SELECT pg_last_wal_replay_lsn();SELECT pg_last_xact_replay_timestamp(); -- Step 8: After PITR, promote when readySELECT pg_wal_replay_resume(); -- If pausedSELECT pg_promote(); -- Become primaryUsing pgBackRest for Enterprise Recovery:
pgBackRest is a popular backup tool for PostgreSQL that provides advanced features:
1234567891011121314151617181920212223242526272829303132333435363738
#!/bin/bash# pgBackRest: Enterprise PostgreSQL Media Recovery # Configuration: /etc/pgbackrest/pgbackrest.conf# [global]# repo1-path=/backup/pgbackrest# repo1-retention-full=2# # [main]# pg1-path=/var/lib/postgresql/15/main # Stop PostgreSQLsystemctl stop postgresql # List available backupspgbackrest --stanza=main info # Restore to latest (complete recovery)pgbackrest --stanza=main --delta restore # Restore to specific time (PITR)pgbackrest --stanza=main \ --type=time \ --target="2024-01-15 14:30:00" \ --target-action=promote \ restore # Restore with different target options# --type=xid --target="1234" # To specific transaction ID# --type=name --target="before_upgrade" # To restore point# --type=lsn --target="0/12345678" # To specific LSN# --type=immediate # To first consistent point # Start PostgreSQLsystemctl start postgresql # Verify recoverypsql -c "SELECT pg_last_xact_replay_timestamp();"pgBackRest's --delta option compares the current data directory to the backup and only restores files that differ. This is much faster than full restore when most files are intact (e.g., single tablespace corruption). Always use --delta when some of the data directory is still valid.
MySQL's media recovery options depend on the storage engine and backup method. For InnoDB (the default engine), recovery typically uses physical backups (Percona XtraBackup or MySQL Enterprise Backup) or logical backups (mysqldump) combined with binary log application.
MySQL Recovery Architecture:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
#!/bin/bash# MySQL/InnoDB: Media Recovery with Percona XtraBackup # PREPARATION: Create backup with XtraBackup# xtrabackup --backup --target-dir=/backup/$(date +%Y%m%d)# xtrabackup --prepare --target-dir=/backup/20240114 # ========================================# COMPLETE RECOVERY PROCEDURE# ======================================== # Step 1: Stop MySQLsystemctl stop mysql # Step 2: Secure current data directorymv /var/lib/mysql /var/lib/mysql.failed # Step 3: Restore backupxtrabackup --copy-back --target-dir=/backup/20240114 # Step 4: Fix permissionschown -R mysql:mysql /var/lib/mysql # Step 5: Start MySQLsystemctl start mysql # Step 6: Apply binary logs for forward recovery# Find the binlog position from xtrabackup_binlog_infocat /backup/20240114/xtrabackup_binlog_info# Output: mysql-bin.000042 1234567 # Apply all binlogs from that position forwardmysqlbinlog --start-position=1234567 \ /var/log/mysql/mysql-bin.000042 \ /var/log/mysql/mysql-bin.000043 \ /var/log/mysql/mysql-bin.000044 \ | mysql -u root -p # ========================================# POINT-IN-TIME RECOVERY# ======================================== # Apply binlogs up to specific timemysqlbinlog --start-position=1234567 \ --stop-datetime="2024-01-15 14:30:00" \ /var/log/mysql/mysql-bin.000042 \ /var/log/mysql/mysql-bin.000043 \ | mysql -u root -pLogical Recovery (from mysqldump):
1234567891011121314151617181920212223
#!/bin/bash# MySQL: Logical Recovery from mysqldump # Restore the logical backupmysql -u root -p < /backup/full_dump_20240114.sql # Find the binlog position recorded in dump file header# Look for: -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000042', MASTER_LOG_POS=1234567 # Apply binary logs from that pointmysqlbinlog --start-position=1234567 \ /var/log/mysql/mysql-bin.000042 \ /var/log/mysql/mysql-bin.000043 \ | mysql -u root -p # For PITR, stop at specific timemysqlbinlog --start-position=1234567 \ --stop-datetime="2024-01-15 14:30:00" \ /var/log/mysql/mysql-bin.000042 \ | mysql -u root -p # Note: Logical restore is MUCH slower than physical# Use mysqldump mainly for small databases or cross-version migrationMySQL's binlog files are essential for PITR. Configure adequate retention with binlog_expire_logs_seconds (MySQL 8.0+) or expire_logs_days (older versions). If binlog files are deleted before being applied, you cannot recover transactions after the last available binlog. Back up binlog files to remote storage separately from MySQL's automatic purging.
Media recovery is a high-stress, high-stakes operation. Following best practices significantly improves success rates and reduces recovery time.
123456789101112131415161718192021222324252627282930313233
# Database Media Recovery Checklist ## Pre-Recovery Assessment- [ ] Identify scope of failure (single file, whole database, entire site)- [ ] Verify backup availability and integrity- [ ] Verify archived log availability (no gaps)- [ ] Calculate expected recovery time- [ ] Notify stakeholders of outage and ETA ## Secure Failed System- [ ] Stop database if running- [ ] Copy/preserve failed data directory- [ ] Collect error logs for analysis ## Restore Process- [ ] Restore backup to recovery location- [ ] Verify checksum/integrity of restored files- [ ] Configure recovery parameters- [ ] Start recovery process- [ ] Monitor recovery progress ## Validation- [ ] Verify database opens successfully- [ ] Check data integrity (critical tables, recent transactions)- [ ] Verify application connectivity- [ ] Run application health checks ## Post-Recovery- [ ] Take fresh backup immediately- [ ] Resume normal backup schedule- [ ] Document lessons learned- [ ] Update runbook with any improvements- [ ] Schedule post-mortem meetingNever, ever overwrite your backup during recovery. Always restore to a new location. If the recovery fails or you make a mistake, you still have the backup to try again. The backup is the last line of defense—protect it absolutely.
Media recovery is the ultimate validation of a database's durability guarantees—the ability to recover data when primary storage has failed. Let's consolidate the key concepts:
Module Complete:
This completes our exploration of Stable Storage. We've journeyed from the theoretical concept of perfectly reliable storage, through the practical technologies that approximate it (RAID, mirroring, remote backup), to the recovery procedures that restore data when failures inevitably occur.
The key insight is that durability is not a property of any single component—it's an emergent property of system design. By combining redundancy, verification, and tested recovery procedures, we achieve practical durability that approaches the theoretical ideal of stable storage.
You have mastered the concepts and techniques of stable storage—from theoretical foundations through RAID, mirroring, remote backup, and media recovery. This knowledge is essential for designing and operating database systems that deliver on the durability promise of ACID.