Database Management SystemsStable Storage

Stable Storage: Foundations of Durability

LevelIntermediate

Duration60 mins

TopicStable Storage

5 / 5

Media Recovery

When Storage Fails

Despite all our redundancy measures—RAID, mirroring, remote backup—storage failures still occur. Sometimes all copies of a database file are lost or corrupted. Sometimes a disaster destroys the primary site and we must recover from backups. Sometimes human error deletes critical data and we need to roll back to an earlier state.

Media recovery is the process of reconstructing a consistent database from backups and archived transaction logs after storage media has failed. Unlike the crash recovery we studied earlier (which works from intact logs), media recovery must work with data that may be hours, days, or weeks old, bringing it forward to a consistent state.

This is the ultimate test of a database's durability guarantees—can we actually recover the data when we need it most?

What You Will Learn

By the end of this page, you will understand how media recovery differs from crash recovery, the process of restoring from backup and applying archived logs, point-in-time recovery (PITR) techniques, and practical media recovery procedures for major database systems.

Media Failure vs. System Failure

Database recovery addresses two fundamentally different failure scenarios with different recovery techniques:

System (Crash) Recovery:

Cause: Power failure, OS crash, database process crash
Data state: Data files exist but may be inconsistent; logs intact
Recovery: Automatic using online redo logs (ARIES redo/undo)
Data loss: Zero (for committed transactions)
Recovery time: Seconds to minutes

Media Recovery:

Cause: Disk failure, storage corruption, accidental deletion, disaster
Data state: Data files lost, corrupted, or intentionally restored from backup
Recovery: Manual process using backups plus archived logs
Data loss: Depends on backup age and log availability
Recovery time: Minutes to hours (or longer for large databases)

Recovery Type Comparison
Aspect	Crash Recovery	Media Recovery
Trigger	Automatic on restart	Manual initiation required
Input: Data Files	Current (possibly inconsistent)	Old backup
Input: Logs	Online redo log	Archived logs + online log
Forward recovery span	Since last checkpoint	Since backup
REDO scope	Recent operations only	Potentially millions of operations
UNDO requirement	Yes (uncommitted txns)	Only at end of recovery

Why the Distinction Matters

Crash recovery is routine—it happens every time a database restarts after an unclean shutdown. Media recovery is exceptional—it's invoked only when data is lost. Crash recovery is automatic and tested constantly. Media recovery is manual and tested rarely. This is why media recovery procedures must be documented and practiced.

Media Recovery Prerequisites:

Media recovery is only possible with proper preparation:

Valid Backup — A restorable copy of the database from some point in the past
Archived Logs — All transaction log segments from the backup time to now
Continuous Archive — No gaps in the log archive; missing logs mean lost transactions
Tested Procedures — Documented and verified recovery steps

The Media Recovery Process

Media recovery follows a systematic process that mirrors how the database originally built up its state:

Phase 1: Restore Backup

First, we restore a backup to get the database to a known historical state. This might be:

A full backup (complete database copy)
A full backup plus incremental backups (to reduce restore time)
A base backup plus differential backups

Phase 2: Apply Archived Logs (Forward Recovery)

Next, we apply all transaction log segments that were archived since the backup was taken. Each log segment contains the changes made during that time period. Applying them in sequence brings the database forward in time.

Phase 3: Apply Current Log (Final Recovery)

If the current/online transaction log is available (not lost in the media failure), we apply it to recover the most recent transactions.

Phase 4: Open Database

Finally, the database is opened, undo recovery runs for any incomplete transactions, and the system resumes normal operation.

media_recovery_timeline.text
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Timeline of Media Recovery:
 
Backup          Archived Logs               Current       Failure
  │                   │                       Log            │
  ▼                   ▼                        ▼             ▼
  ┌───────────────────────────────────────────────────────────┐
  │ Day 0 │ Day 1 │ Day 2 │ Day 3 │ Day 4 │ Day 5 │ Now      │
  │       │       │       │       │       │       │ (crash)  │
  │ Full  │ Log   │ Log   │ Log   │ Log   │ Log   │ Online   │
  │Backup │ Arch1 │ Arch2 │ Arch3 │ Arch4 │ Arch5 │  Log     │
  └───────────────────────────────────────────────────────────┘
  
  Recovery Steps:
  1. Restore Full Backup (Day 0 state)
  2. Apply Arch1 through Arch5 (forward to Day 5)
  3. Apply Online Log if available (forward to crash point)
  4. Open database (undo incomplete transactions)
  
  Result: Database restored to moment before failure

Log Continuity is Critical

Recovery can only proceed as far as you have continuous logs. If archived log #3 is missing, you can only recover to log #2—losing all transactions from log #3 onwards, even if logs #4 and #5 are available. Archive log storage must be treated with the same care as the database itself.

Complete vs. Incomplete Recovery:

Complete Recovery: Apply all available logs to recover up to the moment of failure. No data loss (assuming logs are complete).
Incomplete Recovery (Point-in-Time): Stop recovery at a specific point before the latest available log. Used to recover from logical errors (like accidental table drops) rather than media failures.

Point-in-Time Recovery (PITR)

Point-in-Time Recovery (PITR) allows restoring a database to any specific moment in the past, not just to a backup point. This capability is essential for recovering from:

Accidental DROP TABLE or DELETE without WHERE clause
Application bugs that corrupted data
Ransomware encryption (restore to before infection)
Regulatory requirements to reproduce historical data

How PITR Works:

PITR leverages the same mechanism as complete media recovery, but stops applying logs at a specified target:

Restore Backup    Apply Logs Up To Target Point    Stop
      │                      │                       │
      ▼                      ▼                       ▼
┌───────────────────────────────────────────────────────┐
│ Backup │ Log1 │ Log2 │ Log3 │ Log4 │ Log5 │ Log6 │   │
│ (Tues) │      │      │      │  ▲   │      │      │   │
│        │      │      │      │  │   │      │      │   │
│        │      │      │      │  │   │      │      │   │
└────────────────────────────────│──────────────────────┘
                                 │
                    Target: Thursday 14:30
                    (Just before accidental DELETE)

Result: Database restored to Thursday 14:30 state
Transactions after 14:30 are NOT applied (by design)

postgresql_pitr.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash
# PostgreSQL: Point-in-Time Recovery
 
# Scenario: Accidental DELETE occurred at 2024-01-15 14:35:00
# We want to recover to 14:30:00 (before the DELETE)
 
# Step 1: Stop PostgreSQL (if running)
systemctl stop postgresql
 
# Step 2: Move current data directory aside
mv /var/lib/postgresql/15/main /var/lib/postgresql/15/main.failed
 
# Step 3: Restore base backup
pg_basebackup --restore-base-backup /backup/basebackup_20240114.tar
tar -xf /backup/basebackup_20240114.tar -C /var/lib/postgresql/15/main
 
# Step 4: Configure recovery target
cat > /var/lib/postgresql/15/main/postgresql.auto.conf << EOF
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
recovery_target_action = 'pause'  # or 'promote'
EOF
 
# Step 5: Create recovery signal file
touch /var/lib/postgresql/15/main/recovery.signal
 
# Step 6: Start PostgreSQL
systemctl start postgresql
 
# Step 7: Verify recovery target reached
psql -c "SELECT pg_last_xact_replay_timestamp();"
 
# Step 8: When satisfied, promote to master
# If recovery_target_action = 'pause':
psql -c "SELECT pg_wal_replay_resume();"
# Then:
psql -c "SELECT pg_promote();"
 
# Note: All transactions after 14:30:00 are LOST - this is intentional
# You're trading newer data to recover from the DELETE

Recovery Target Options:

PITR Target Specifications
Target Type	PostgreSQL	Oracle	Use Case
Timestamp	recovery_target_time	UNTIL TIME	Recover to specific moment
Transaction ID	recovery_target_xid	UNTIL CHANGE	Recover to specific txn
Named Point	recovery_target_name	UNTIL SEQUENCE	Recover to restore point
Log Position	recovery_target_lsn	UNTIL SCN	Recover to specific log location
Immediate	recovery_target = 'immediate'	N/A	Stop at consistent point

Creating Restore Points

Before risky operations (schema changes, large updates), create a named restore point. This gives you a precise recovery target without needing to guess timestamps. In PostgreSQL: SELECT pg_create_restore_point('before_migration'); In Oracle: CREATE RESTORE POINT before_migration;

Oracle Media Recovery

Oracle Database provides sophisticated media recovery capabilities through RMAN (Recovery Manager) and its integrated backup/restore architecture.

Oracle Recovery Architecture:

Oracle distinguishes between:

Redo Logs: Online circular logs used for crash recovery
Archived Redo Logs: Completed log files archived to permanent storage
Control File: Contains database structure and backup information
RMAN Catalog: Optional repository of backup metadata

oracle_media_recovery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Oracle: Complete Database Media Recovery with RMAN
 
-- Scenario: All data files lost, need complete recovery
 
-- Step 1: Connect to RMAN
-- $ rman target /
 
-- Step 2: If control file is lost, restore it first
RMAN> STARTUP NOMOUNT;
RMAN> RESTORE CONTROLFILE FROM AUTOBACKUP;
RMAN> ALTER DATABASE MOUNT;
 
-- Step 3: Restore database files
RMAN> RESTORE DATABASE;
 
-- Step 4: Recover (apply archived and online logs)
RMAN> RECOVER DATABASE;
 
-- Step 5: Open database with resetlogs (after incomplete recovery)
-- or just OPEN (after complete recovery)
RMAN> ALTER DATABASE OPEN RESETLOGS;
 
-- ========================================
-- Point-in-Time Recovery Example
-- ========================================
 
-- Recover to specific time (before accidental DELETE)
RMAN> STARTUP MOUNT;
RMAN> SET UNTIL TIME "TO_DATE('2024-01-15 14:30:00','YYYY-MM-DD HH24:MI:SS')";
RMAN> RESTORE DATABASE;
RMAN> RECOVER DATABASE;
RMAN> ALTER DATABASE OPEN RESETLOGS;
 
-- Recover to SCN (System Change Number)
RMAN> SET UNTIL SCN 123456789;
RMAN> RESTORE DATABASE;
RMAN> RECOVER DATABASE;
 
-- Recover to restore point
RMAN> SET UNTIL RESTORE POINT before_upgrade;
RMAN> RESTORE DATABASE;
RMAN> RECOVER DATABASE;

Oracle Block Media Recovery:

Oracle can recover individual corrupted blocks without restoring the entire datafile—a powerful feature for minimizing downtime:

oracle_block_recovery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Oracle: Block-Level Media Recovery
 
-- Scenario: DBVERIFY reports a few corrupted blocks
-- No need to restore entire datafile!
 
-- Step 1: Identify corrupted blocks
SELECT * FROM v$database_block_corruption;
-- or
SELECT * FROM v$backup_corruption;
 
-- Step 2: Recover just those blocks (RMAN)
RMAN> BLOCKRECOVER DATAFILE 7 BLOCK 123;
 
-- Or recover all known corrupted blocks
RMAN> BLOCKRECOVER CORRUPTION LIST;
 
-- This restores only the specific blocks from backup
-- and applies redo to bring them current
-- Much faster than full datafile recovery!

Oracle Flashback Technology

Oracle offers Flashback Database as an alternative to traditional PITR for some scenarios. Flashback uses 'before images' stored in the flashback logs to quickly reverse changes without full restore. It's faster but requires pre-configuration and additional storage. Flashback is ideal for logical errors; traditional recovery is still needed for media failures.

PostgreSQL Media Recovery

PostgreSQL's media recovery is based on its Write-Ahead Log (WAL) architecture. The process involves restoring a base backup and replaying WAL segments.

PostgreSQL Recovery Architecture:

Base Backup: Physical copy of the data directory (pg_basebackup)
WAL Segments: 16MB files containing all changes
WAL Archiving: Mechanism to copy completed WAL files to archive storage
recovery.signal: File that triggers recovery mode on startup

postgresql_media_recovery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- PostgreSQL: Complete Media Recovery Procedure
 
-- PREPARATION (before any failure occurs):
 
-- Enable WAL archiving in postgresql.conf:
-- archive_mode = on
-- archive_command = 'cp %p /archive/%f'
-- wal_level = replica
 
-- Create base backups regularly:
-- pg_basebackup -D /backup/$(date +%Y%m%d) -Ft -Xs -P
 
-- ========================================
-- RECOVERY PROCEDURE
-- ========================================
 
-- Step 1: Stop PostgreSQL if running
-- $ systemctl stop postgresql
 
-- Step 2: Secure failed data directory
-- $ mv $PGDATA $PGDATA.failed
 
-- Step 3: Restore base backup
-- $ tar -xf /backup/20240114.tar -C $PGDATA
 
-- Step 4: Configure recovery
-- Create $PGDATA/postgresql.auto.conf (or edit postgresql.conf):
-- 
-- restore_command = 'cp /archive/%f %p'
--
-- For complete recovery, that's all you need
-- For PITR, also add:
-- recovery_target_time = '2024-01-15 14:30:00'
 
-- Step 5: Create recovery signal
-- $ touch $PGDATA/recovery.signal
 
-- Step 6: Start PostgreSQL
-- $ systemctl start postgresql
 
-- Step 7: Check recovery progress
SELECT pg_is_in_recovery();
SELECT pg_last_wal_receive_lsn();
SELECT pg_last_wal_replay_lsn();
SELECT pg_last_xact_replay_timestamp();
 
-- Step 8: After PITR, promote when ready
SELECT pg_wal_replay_resume();  -- If paused
SELECT pg_promote();            -- Become primary

Using pgBackRest for Enterprise Recovery:

pgBackRest is a popular backup tool for PostgreSQL that provides advanced features:

pgbackrest_recovery.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/bin/bash
# pgBackRest: Enterprise PostgreSQL Media Recovery
 
# Configuration: /etc/pgbackrest/pgbackrest.conf
# [global]
# repo1-path=/backup/pgbackrest
# repo1-retention-full=2
# 
# [main]
# pg1-path=/var/lib/postgresql/15/main
 
# Stop PostgreSQL
systemctl stop postgresql
 
# List available backups
pgbackrest --stanza=main info
 
# Restore to latest (complete recovery)
pgbackrest --stanza=main --delta restore
 
# Restore to specific time (PITR)
pgbackrest --stanza=main \
    --type=time \
    --target="2024-01-15 14:30:00" \
    --target-action=promote \
    restore
 
# Restore with different target options
# --type=xid --target="1234"           # To specific transaction ID
# --type=name --target="before_upgrade" # To restore point
# --type=lsn --target="0/12345678"      # To specific LSN
# --type=immediate                      # To first consistent point
 
# Start PostgreSQL
systemctl start postgresql
 
# Verify recovery
psql -c "SELECT pg_last_xact_replay_timestamp();"

Delta Restore

pgBackRest's --delta option compares the current data directory to the backup and only restores files that differ. This is much faster than full restore when most files are intact (e.g., single tablespace corruption). Always use --delta when some of the data directory is still valid.

MySQL/InnoDB Media Recovery

MySQL's media recovery options depend on the storage engine and backup method. For InnoDB (the default engine), recovery typically uses physical backups (Percona XtraBackup or MySQL Enterprise Backup) or logical backups (mysqldump) combined with binary log application.

MySQL Recovery Architecture:

InnoDB Redo Log: For crash recovery within InnoDB
Binary Log (binlog): Complete history of all changes for replication and PITR
Physical Backup: Ibdata files, tablespace files, InnoDB logs
Logical Backup: SQL statements (mysqldump, mysqlpump)

mysql_media_recovery.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#!/bin/bash
# MySQL/InnoDB: Media Recovery with Percona XtraBackup
 
# PREPARATION: Create backup with XtraBackup
# xtrabackup --backup --target-dir=/backup/$(date +%Y%m%d)
# xtrabackup --prepare --target-dir=/backup/20240114
 
# ========================================
# COMPLETE RECOVERY PROCEDURE
# ========================================
 
# Step 1: Stop MySQL
systemctl stop mysql
 
# Step 2: Secure current data directory
mv /var/lib/mysql /var/lib/mysql.failed
 
# Step 3: Restore backup
xtrabackup --copy-back --target-dir=/backup/20240114
 
# Step 4: Fix permissions
chown -R mysql:mysql /var/lib/mysql
 
# Step 5: Start MySQL
systemctl start mysql
 
# Step 6: Apply binary logs for forward recovery
# Find the binlog position from xtrabackup_binlog_info
cat /backup/20240114/xtrabackup_binlog_info
# Output: mysql-bin.000042  1234567
 
# Apply all binlogs from that position forward
mysqlbinlog --start-position=1234567 \
    /var/log/mysql/mysql-bin.000042 \
    /var/log/mysql/mysql-bin.000043 \
    /var/log/mysql/mysql-bin.000044 \
    | mysql -u root -p
 
# ========================================
# POINT-IN-TIME RECOVERY
# ========================================
 
# Apply binlogs up to specific time
mysqlbinlog --start-position=1234567 \
    --stop-datetime="2024-01-15 14:30:00" \
    /var/log/mysql/mysql-bin.000042 \
    /var/log/mysql/mysql-bin.000043 \
    | mysql -u root -p

Logical Recovery (from mysqldump):

mysql_logical_recovery.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash
# MySQL: Logical Recovery from mysqldump
 
# Restore the logical backup
mysql -u root -p < /backup/full_dump_20240114.sql
 
# Find the binlog position recorded in dump file header
# Look for: -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000042', MASTER_LOG_POS=1234567
 
# Apply binary logs from that point
mysqlbinlog --start-position=1234567 \
    /var/log/mysql/mysql-bin.000042 \
    /var/log/mysql/mysql-bin.000043 \
    | mysql -u root -p
 
# For PITR, stop at specific time
mysqlbinlog --start-position=1234567 \
    --stop-datetime="2024-01-15 14:30:00" \
    /var/log/mysql/mysql-bin.000042 \
    | mysql -u root -p
 
# Note: Logical restore is MUCH slower than physical
# Use mysqldump mainly for small databases or cross-version migration

Binary Log Retention

MySQL's binlog files are essential for PITR. Configure adequate retention with binlog_expire_logs_seconds (MySQL 8.0+) or expire_logs_days (older versions). If binlog files are deleted before being applied, you cannot recover transactions after the last available binlog. Back up binlog files to remote storage separately from MySQL's automatic purging.

Recovery Best Practices

Media recovery is a high-stress, high-stakes operation. Following best practices significantly improves success rates and reduces recovery time.

Pre-Failure Preparation

•Test backups regularly — Restore to isolated environment weekly; perform full recovery drill quarterly.
•Monitor log archiving — Alert immediately if archiving fails or falls behind.
•Document recovery procedures — Step-by-step runbooks that can be followed under stress.
•Maintain recovery environment — Test servers with same version, configuration, and sufficient space.
•Store recovery tools separately — Recovery scripts and documentation accessible even if primary systems are down.
•Practice, practice, practice — Run recovery drills so the procedure becomes routine.

During Recovery

•Don't panic — Follow documented procedures; rushing causes mistakes.
•Preserve evidence — Copy failed data directory before destroying it; you might need it for forensics.
•Verify at each step — Check each phase succeeded before proceeding to the next.
•Log everything — Document what you're doing in real-time; this helps troubleshooting and post-mortems.
•Have a second pair of eyes — Another person reviewing each step catches errors.
•Communicate status — Keep stakeholders informed of progress and ETA.

recovery_checklist.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Database Media Recovery Checklist
 
## Pre-Recovery Assessment
- [ ] Identify scope of failure (single file, whole database, entire site)
- [ ] Verify backup availability and integrity
- [ ] Verify archived log availability (no gaps)
- [ ] Calculate expected recovery time
- [ ] Notify stakeholders of outage and ETA
 
## Secure Failed System
- [ ] Stop database if running
- [ ] Copy/preserve failed data directory
- [ ] Collect error logs for analysis
 
## Restore Process
- [ ] Restore backup to recovery location
- [ ] Verify checksum/integrity of restored files
- [ ] Configure recovery parameters
- [ ] Start recovery process
- [ ] Monitor recovery progress
 
## Validation
- [ ] Verify database opens successfully
- [ ] Check data integrity (critical tables, recent transactions)
- [ ] Verify application connectivity
- [ ] Run application health checks
 
## Post-Recovery
- [ ] Take fresh backup immediately
- [ ] Resume normal backup schedule
- [ ] Document lessons learned
- [ ] Update runbook with any improvements
- [ ] Schedule post-mortem meeting

The Golden Rule of Recovery

Never, ever overwrite your backup during recovery. Always restore to a new location. If the recovery fails or you make a mistake, you still have the backup to try again. The backup is the last line of defense—protect it absolutely.

Summary: Recovering from the Unrecoverable

Media recovery is the ultimate validation of a database's durability guarantees—the ability to recover data when primary storage has failed. Let's consolidate the key concepts:

Key Takeaways

•Media recovery differs from crash recovery — It works from old backups rather than current data files.
•The process is: restore backup, apply archived logs, apply current log — Each phase brings the database further forward in time.
•Log continuity is essential — A gap in archived logs limits recovery to before the gap.
•Point-in-time recovery enables rollback to any moment — Essential for recovering from logical errors, not just media failures.
•Each database system has specific tools and procedures — RMAN for Oracle, pg_basebackup/pgBackRest for PostgreSQL, XtraBackup for MySQL.
•Preparation determines success — Tested backups, documented procedures, and practice are essential.
•Never overwrite your backup during recovery — The backup must remain available for retry if recovery fails.

Module Complete:

This completes our exploration of Stable Storage. We've journeyed from the theoretical concept of perfectly reliable storage, through the practical technologies that approximate it (RAID, mirroring, remote backup), to the recovery procedures that restore data when failures inevitably occur.

The key insight is that durability is not a property of any single component—it's an emergent property of system design. By combining redundancy, verification, and tested recovery procedures, we achieve practical durability that approaches the theoretical ideal of stable storage.

Module Complete: Stable Storage

You have mastered the concepts and techniques of stable storage—from theoretical foundations through RAID, mirroring, remote backup, and media recovery. This knowledge is essential for designing and operating database systems that deliver on the durability promise of ACID.

5 / 5

Loading learning content...

Database Management SystemsStable Storage

Stable Storage: Foundations of Durability

LevelIntermediate

Duration60 mins

TopicStable Storage

5 / 5

Media Recovery

When Storage Fails

This is the ultimate test of a database's durability guarantees—can we actually recover the data when we need it most?

What You Will Learn

Media Failure vs. System Failure

Database recovery addresses two fundamentally different failure scenarios with different recovery techniques:

System (Crash) Recovery:

Cause: Power failure, OS crash, database process crash
Data state: Data files exist but may be inconsistent; logs intact
Recovery: Automatic using online redo logs (ARIES redo/undo)
Data loss: Zero (for committed transactions)
Recovery time: Seconds to minutes

Media Recovery:

Cause: Disk failure, storage corruption, accidental deletion, disaster
Data state: Data files lost, corrupted, or intentionally restored from backup
Recovery: Manual process using backups plus archived logs
Data loss: Depends on backup age and log availability
Recovery time: Minutes to hours (or longer for large databases)

Recovery Type Comparison
Aspect	Crash Recovery	Media Recovery
Trigger	Automatic on restart	Manual initiation required
Input: Data Files	Current (possibly inconsistent)	Old backup
Input: Logs	Online redo log	Archived logs + online log
Forward recovery span	Since last checkpoint	Since backup
REDO scope	Recent operations only	Potentially millions of operations
UNDO requirement	Yes (uncommitted txns)	Only at end of recovery

Why the Distinction Matters

Media Recovery Prerequisites:

Media recovery is only possible with proper preparation:

Valid Backup — A restorable copy of the database from some point in the past
Archived Logs — All transaction log segments from the backup time to now
Continuous Archive — No gaps in the log archive; missing logs mean lost transactions
Tested Procedures — Documented and verified recovery steps

The Media Recovery Process

Media recovery follows a systematic process that mirrors how the database originally built up its state:

Phase 1: Restore Backup

First, we restore a backup to get the database to a known historical state. This might be:

A full backup (complete database copy)
A full backup plus incremental backups (to reduce restore time)
A base backup plus differential backups

Phase 2: Apply Archived Logs (Forward Recovery)

Phase 3: Apply Current Log (Final Recovery)

If the current/online transaction log is available (not lost in the media failure), we apply it to recover the most recent transactions.

Phase 4: Open Database

Finally, the database is opened, undo recovery runs for any incomplete transactions, and the system resumes normal operation.

media_recovery_timeline.text
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Timeline of Media Recovery:
 
Backup          Archived Logs               Current       Failure
  │                   │                       Log            │
  ▼                   ▼                        ▼             ▼
  ┌───────────────────────────────────────────────────────────┐
  │ Day 0 │ Day 1 │ Day 2 │ Day 3 │ Day 4 │ Day 5 │ Now      │
  │       │       │       │       │       │       │ (crash)  │
  │ Full  │ Log   │ Log   │ Log   │ Log   │ Log   │ Online   │
  │Backup │ Arch1 │ Arch2 │ Arch3 │ Arch4 │ Arch5 │  Log     │
  └───────────────────────────────────────────────────────────┘
  
  Recovery Steps:
  1. Restore Full Backup (Day 0 state)
  2. Apply Arch1 through Arch5 (forward to Day 5)
  3. Apply Online Log if available (forward to crash point)
  4. Open database (undo incomplete transactions)
  
  Result: Database restored to moment before failure

Log Continuity is Critical

Complete vs. Incomplete Recovery:

Complete Recovery: Apply all available logs to recover up to the moment of failure. No data loss (assuming logs are complete).
Incomplete Recovery (Point-in-Time): Stop recovery at a specific point before the latest available log. Used to recover from logical errors (like accidental table drops) rather than media failures.

Point-in-Time Recovery (PITR)

Point-in-Time Recovery (PITR) allows restoring a database to any specific moment in the past, not just to a backup point. This capability is essential for recovering from:

Accidental DROP TABLE or DELETE without WHERE clause
Application bugs that corrupted data
Ransomware encryption (restore to before infection)
Regulatory requirements to reproduce historical data

How PITR Works:

PITR leverages the same mechanism as complete media recovery, but stops applying logs at a specified target:

Restore Backup    Apply Logs Up To Target Point    Stop
      │                      │                       │
      ▼                      ▼                       ▼
┌───────────────────────────────────────────────────────┐
│ Backup │ Log1 │ Log2 │ Log3 │ Log4 │ Log5 │ Log6 │   │
│ (Tues) │      │      │      │  ▲   │      │      │   │
│        │      │      │      │  │   │      │      │   │
│        │      │      │      │  │   │      │      │   │
└────────────────────────────────│──────────────────────┘
                                 │
                    Target: Thursday 14:30
                    (Just before accidental DELETE)

Result: Database restored to Thursday 14:30 state
Transactions after 14:30 are NOT applied (by design)

postgresql_pitr.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash
# PostgreSQL: Point-in-Time Recovery
 
# Scenario: Accidental DELETE occurred at 2024-01-15 14:35:00
# We want to recover to 14:30:00 (before the DELETE)
 
# Step 1: Stop PostgreSQL (if running)
systemctl stop postgresql
 
# Step 2: Move current data directory aside
mv /var/lib/postgresql/15/main /var/lib/postgresql/15/main.failed
 
# Step 3: Restore base backup
pg_basebackup --restore-base-backup /backup/basebackup_20240114.tar
tar -xf /backup/basebackup_20240114.tar -C /var/lib/postgresql/15/main
 
# Step 4: Configure recovery target
cat > /var/lib/postgresql/15/main/postgresql.auto.conf << EOF
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
recovery_target_action = 'pause'  # or 'promote'
EOF
 
# Step 5: Create recovery signal file
touch /var/lib/postgresql/15/main/recovery.signal
 
# Step 6: Start PostgreSQL
systemctl start postgresql
 
# Step 7: Verify recovery target reached
psql -c "SELECT pg_last_xact_replay_timestamp();"
 
# Step 8: When satisfied, promote to master
# If recovery_target_action = 'pause':
psql -c "SELECT pg_wal_replay_resume();"
# Then:
psql -c "SELECT pg_promote();"
 
# Note: All transactions after 14:30:00 are LOST - this is intentional
# You're trading newer data to recover from the DELETE

Recovery Target Options:

PITR Target Specifications
Target Type	PostgreSQL	Oracle	Use Case
Timestamp	recovery_target_time	UNTIL TIME	Recover to specific moment
Transaction ID	recovery_target_xid	UNTIL CHANGE	Recover to specific txn
Named Point	recovery_target_name	UNTIL SEQUENCE	Recover to restore point
Log Position	recovery_target_lsn	UNTIL SCN	Recover to specific log location
Immediate	recovery_target = 'immediate'	N/A	Stop at consistent point

Creating Restore Points

Oracle Media Recovery

Oracle Database provides sophisticated media recovery capabilities through RMAN (Recovery Manager) and its integrated backup/restore architecture.

Oracle Recovery Architecture:

Oracle distinguishes between:

Redo Logs: Online circular logs used for crash recovery
Archived Redo Logs: Completed log files archived to permanent storage
Control File: Contains database structure and backup information
RMAN Catalog: Optional repository of backup metadata

oracle_media_recovery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Oracle: Complete Database Media Recovery with RMAN
 
-- Scenario: All data files lost, need complete recovery
 
-- Step 1: Connect to RMAN
-- $ rman target /
 
-- Step 2: If control file is lost, restore it first
RMAN> STARTUP NOMOUNT;
RMAN> RESTORE CONTROLFILE FROM AUTOBACKUP;
RMAN> ALTER DATABASE MOUNT;
 
-- Step 3: Restore database files
RMAN> RESTORE DATABASE;
 
-- Step 4: Recover (apply archived and online logs)
RMAN> RECOVER DATABASE;
 
-- Step 5: Open database with resetlogs (after incomplete recovery)
-- or just OPEN (after complete recovery)
RMAN> ALTER DATABASE OPEN RESETLOGS;
 
-- ========================================
-- Point-in-Time Recovery Example
-- ========================================
 
-- Recover to specific time (before accidental DELETE)
RMAN> STARTUP MOUNT;
RMAN> SET UNTIL TIME "TO_DATE('2024-01-15 14:30:00','YYYY-MM-DD HH24:MI:SS')";
RMAN> RESTORE DATABASE;
RMAN> RECOVER DATABASE;
RMAN> ALTER DATABASE OPEN RESETLOGS;
 
-- Recover to SCN (System Change Number)
RMAN> SET UNTIL SCN 123456789;
RMAN> RESTORE DATABASE;
RMAN> RECOVER DATABASE;
 
-- Recover to restore point
RMAN> SET UNTIL RESTORE POINT before_upgrade;
RMAN> RESTORE DATABASE;
RMAN> RECOVER DATABASE;

Oracle Block Media Recovery:

Oracle can recover individual corrupted blocks without restoring the entire datafile—a powerful feature for minimizing downtime:

oracle_block_recovery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Oracle: Block-Level Media Recovery
 
-- Scenario: DBVERIFY reports a few corrupted blocks
-- No need to restore entire datafile!
 
-- Step 1: Identify corrupted blocks
SELECT * FROM v$database_block_corruption;
-- or
SELECT * FROM v$backup_corruption;
 
-- Step 2: Recover just those blocks (RMAN)
RMAN> BLOCKRECOVER DATAFILE 7 BLOCK 123;
 
-- Or recover all known corrupted blocks
RMAN> BLOCKRECOVER CORRUPTION LIST;
 
-- This restores only the specific blocks from backup
-- and applies redo to bring them current
-- Much faster than full datafile recovery!

Oracle Flashback Technology

PostgreSQL Media Recovery

PostgreSQL's media recovery is based on its Write-Ahead Log (WAL) architecture. The process involves restoring a base backup and replaying WAL segments.

PostgreSQL Recovery Architecture:

Base Backup: Physical copy of the data directory (pg_basebackup)
WAL Segments: 16MB files containing all changes
WAL Archiving: Mechanism to copy completed WAL files to archive storage
recovery.signal: File that triggers recovery mode on startup

postgresql_media_recovery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- PostgreSQL: Complete Media Recovery Procedure
 
-- PREPARATION (before any failure occurs):
 
-- Enable WAL archiving in postgresql.conf:
-- archive_mode = on
-- archive_command = 'cp %p /archive/%f'
-- wal_level = replica
 
-- Create base backups regularly:
-- pg_basebackup -D /backup/$(date +%Y%m%d) -Ft -Xs -P
 
-- ========================================
-- RECOVERY PROCEDURE
-- ========================================
 
-- Step 1: Stop PostgreSQL if running
-- $ systemctl stop postgresql
 
-- Step 2: Secure failed data directory
-- $ mv $PGDATA $PGDATA.failed
 
-- Step 3: Restore base backup
-- $ tar -xf /backup/20240114.tar -C $PGDATA
 
-- Step 4: Configure recovery
-- Create $PGDATA/postgresql.auto.conf (or edit postgresql.conf):
-- 
-- restore_command = 'cp /archive/%f %p'
--
-- For complete recovery, that's all you need
-- For PITR, also add:
-- recovery_target_time = '2024-01-15 14:30:00'
 
-- Step 5: Create recovery signal
-- $ touch $PGDATA/recovery.signal
 
-- Step 6: Start PostgreSQL
-- $ systemctl start postgresql
 
-- Step 7: Check recovery progress
SELECT pg_is_in_recovery();
SELECT pg_last_wal_receive_lsn();
SELECT pg_last_wal_replay_lsn();
SELECT pg_last_xact_replay_timestamp();
 
-- Step 8: After PITR, promote when ready
SELECT pg_wal_replay_resume();  -- If paused
SELECT pg_promote();            -- Become primary

Using pgBackRest for Enterprise Recovery:

pgBackRest is a popular backup tool for PostgreSQL that provides advanced features:

pgbackrest_recovery.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/bin/bash
# pgBackRest: Enterprise PostgreSQL Media Recovery
 
# Configuration: /etc/pgbackrest/pgbackrest.conf
# [global]
# repo1-path=/backup/pgbackrest
# repo1-retention-full=2
# 
# [main]
# pg1-path=/var/lib/postgresql/15/main
 
# Stop PostgreSQL
systemctl stop postgresql
 
# List available backups
pgbackrest --stanza=main info
 
# Restore to latest (complete recovery)
pgbackrest --stanza=main --delta restore
 
# Restore to specific time (PITR)
pgbackrest --stanza=main \
    --type=time \
    --target="2024-01-15 14:30:00" \
    --target-action=promote \
    restore
 
# Restore with different target options
# --type=xid --target="1234"           # To specific transaction ID
# --type=name --target="before_upgrade" # To restore point
# --type=lsn --target="0/12345678"      # To specific LSN
# --type=immediate                      # To first consistent point
 
# Start PostgreSQL
systemctl start postgresql
 
# Verify recovery
psql -c "SELECT pg_last_xact_replay_timestamp();"

Delta Restore

MySQL/InnoDB Media Recovery

MySQL Recovery Architecture:

InnoDB Redo Log: For crash recovery within InnoDB
Binary Log (binlog): Complete history of all changes for replication and PITR
Physical Backup: Ibdata files, tablespace files, InnoDB logs
Logical Backup: SQL statements (mysqldump, mysqlpump)

mysql_media_recovery.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#!/bin/bash
# MySQL/InnoDB: Media Recovery with Percona XtraBackup
 
# PREPARATION: Create backup with XtraBackup
# xtrabackup --backup --target-dir=/backup/$(date +%Y%m%d)
# xtrabackup --prepare --target-dir=/backup/20240114
 
# ========================================
# COMPLETE RECOVERY PROCEDURE
# ========================================
 
# Step 1: Stop MySQL
systemctl stop mysql
 
# Step 2: Secure current data directory
mv /var/lib/mysql /var/lib/mysql.failed
 
# Step 3: Restore backup
xtrabackup --copy-back --target-dir=/backup/20240114
 
# Step 4: Fix permissions
chown -R mysql:mysql /var/lib/mysql
 
# Step 5: Start MySQL
systemctl start mysql
 
# Step 6: Apply binary logs for forward recovery
# Find the binlog position from xtrabackup_binlog_info
cat /backup/20240114/xtrabackup_binlog_info
# Output: mysql-bin.000042  1234567
 
# Apply all binlogs from that position forward
mysqlbinlog --start-position=1234567 \
    /var/log/mysql/mysql-bin.000042 \
    /var/log/mysql/mysql-bin.000043 \
    /var/log/mysql/mysql-bin.000044 \
    | mysql -u root -p
 
# ========================================
# POINT-IN-TIME RECOVERY
# ========================================
 
# Apply binlogs up to specific time
mysqlbinlog --start-position=1234567 \
    --stop-datetime="2024-01-15 14:30:00" \
    /var/log/mysql/mysql-bin.000042 \
    /var/log/mysql/mysql-bin.000043 \
    | mysql -u root -p

Logical Recovery (from mysqldump):

mysql_logical_recovery.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash
# MySQL: Logical Recovery from mysqldump
 
# Restore the logical backup
mysql -u root -p < /backup/full_dump_20240114.sql
 
# Find the binlog position recorded in dump file header
# Look for: -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000042', MASTER_LOG_POS=1234567
 
# Apply binary logs from that point
mysqlbinlog --start-position=1234567 \
    /var/log/mysql/mysql-bin.000042 \
    /var/log/mysql/mysql-bin.000043 \
    | mysql -u root -p
 
# For PITR, stop at specific time
mysqlbinlog --start-position=1234567 \
    --stop-datetime="2024-01-15 14:30:00" \
    /var/log/mysql/mysql-bin.000042 \
    | mysql -u root -p
 
# Note: Logical restore is MUCH slower than physical
# Use mysqldump mainly for small databases or cross-version migration

Binary Log Retention

Recovery Best Practices

Media recovery is a high-stress, high-stakes operation. Following best practices significantly improves success rates and reduces recovery time.

Pre-Failure Preparation

•Test backups regularly — Restore to isolated environment weekly; perform full recovery drill quarterly.
•Monitor log archiving — Alert immediately if archiving fails or falls behind.
•Document recovery procedures — Step-by-step runbooks that can be followed under stress.
•Maintain recovery environment — Test servers with same version, configuration, and sufficient space.
•Store recovery tools separately — Recovery scripts and documentation accessible even if primary systems are down.
•Practice, practice, practice — Run recovery drills so the procedure becomes routine.

During Recovery

•Don't panic — Follow documented procedures; rushing causes mistakes.
•Preserve evidence — Copy failed data directory before destroying it; you might need it for forensics.
•Verify at each step — Check each phase succeeded before proceeding to the next.
•Log everything — Document what you're doing in real-time; this helps troubleshooting and post-mortems.
•Have a second pair of eyes — Another person reviewing each step catches errors.
•Communicate status — Keep stakeholders informed of progress and ETA.

recovery_checklist.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Database Media Recovery Checklist
 
## Pre-Recovery Assessment
- [ ] Identify scope of failure (single file, whole database, entire site)
- [ ] Verify backup availability and integrity
- [ ] Verify archived log availability (no gaps)
- [ ] Calculate expected recovery time
- [ ] Notify stakeholders of outage and ETA
 
## Secure Failed System
- [ ] Stop database if running
- [ ] Copy/preserve failed data directory
- [ ] Collect error logs for analysis
 
## Restore Process
- [ ] Restore backup to recovery location
- [ ] Verify checksum/integrity of restored files
- [ ] Configure recovery parameters
- [ ] Start recovery process
- [ ] Monitor recovery progress
 
## Validation
- [ ] Verify database opens successfully
- [ ] Check data integrity (critical tables, recent transactions)
- [ ] Verify application connectivity
- [ ] Run application health checks
 
## Post-Recovery
- [ ] Take fresh backup immediately
- [ ] Resume normal backup schedule
- [ ] Document lessons learned
- [ ] Update runbook with any improvements
- [ ] Schedule post-mortem meeting

The Golden Rule of Recovery

Summary: Recovering from the Unrecoverable

Media recovery is the ultimate validation of a database's durability guarantees—the ability to recover data when primary storage has failed. Let's consolidate the key concepts:

Key Takeaways

•Media recovery differs from crash recovery — It works from old backups rather than current data files.
•The process is: restore backup, apply archived logs, apply current log — Each phase brings the database further forward in time.
•Log continuity is essential — A gap in archived logs limits recovery to before the gap.
•Point-in-time recovery enables rollback to any moment — Essential for recovering from logical errors, not just media failures.
•Each database system has specific tools and procedures — RMAN for Oracle, pg_basebackup/pgBackRest for PostgreSQL, XtraBackup for MySQL.
•Preparation determines success — Tested backups, documented procedures, and practice are essential.
•Never overwrite your backup during recovery — The backup must remain available for retry if recovery fails.

Module Complete:

Module Complete: Stable Storage

5 / 5