Database Management SystemsBackup Verification

Backup Verification

LevelIntermediate

Duration60 mins

TopicBackup Verification

3 / 5

Integrity Checks

Detecting Corruption Before Crisis

Backup files can become corrupted silently—during creation, transfer, or storage. Integrity checks are systematic validations that detect corruption before you attempt recovery, ensuring that when disaster strikes, you're not compounding the crisis with unusable backups.

The integrity challenge: Corruption can occur at multiple levels:

Physical corruption: Bit rot, bad sectors, incomplete writes
Logical corruption: Valid files containing invalid or inconsistent data
Structural corruption: Backup file format violations, missing headers
Encryption corruption: Key mismatches, incomplete encryption

Each corruption type requires different detection methods. Comprehensive integrity checking addresses all levels.

What You Will Learn

By the end of this page, you will understand how to implement multi-layer integrity checking that catches corruption at every level—from physical bit-level validation to logical data consistency verification.

Checksum Verification

Checksums are the foundation of integrity verification—mathematical fingerprints that detect any modification to backup files. By comparing calculated checksums against stored values, you can identify corruption with high confidence.

Checksum algorithm selection:

Checksum Algorithm Comparison
Algorithm	Output Size	Speed	Collision Resistance	Use Case
CRC32	32 bits	Very Fast	Low	Quick sanity checks, not security
MD5	128 bits	Fast	Broken	Legacy systems only, not recommended
SHA-1	160 bits	Fast	Weakened	Transitional, avoid for new systems
SHA-256	256 bits	Moderate	Strong	Recommended for integrity verification
SHA-512	512 bits	Moderate	Very Strong	High-security environments
BLAKE3	256+ bits	Very Fast	Strong	Modern high-performance option

checksum_verification.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#!/usr/bin/env python3
"""
Backup Integrity Verification using Checksums
Implements multi-algorithm checksum validation with streaming support
"""
 
import hashlib
from pathlib import Path
from dataclasses import dataclass
from typing import Dict, Optional
import json
 
@dataclass
class ChecksumResult:
    algorithm: str
    expected: str
    calculated: str
    match: bool
    file_size: int
 
def calculate_checksum(
    file_path: Path,
    algorithm: str = 'sha256',
    chunk_size: int = 8192 * 1024  # 8MB chunks for large files
) -> str:
    """
    Calculate checksum using streaming to handle large backup files.
    Memory-efficient: never loads entire file into RAM.
    """
    hash_func = hashlib.new(algorithm)
    
    with open(file_path, 'rb') as f:
        while chunk := f.read(chunk_size):
            hash_func.update(chunk)
    
    return hash_func.hexdigest()
 
def verify_backup_checksum(
    backup_path: Path,
    manifest_path: Path
) -> ChecksumResult:
    """
    Verify backup file against stored checksum in manifest.
    """
    # Load manifest with expected checksums
    with open(manifest_path) as f:
        manifest = json.load(f)
    
    expected = manifest['checksums']['sha256']
    calculated = calculate_checksum(backup_path, 'sha256')
    
    return ChecksumResult(
        algorithm='sha256',
        expected=expected,
        calculated=calculated,
        match=(expected == calculated),
        file_size=backup_path.stat().st_size
    )
 
def create_backup_manifest(backup_path: Path) -> Dict:
    """
    Create integrity manifest for a backup file.
    Includes multiple checksums for defense in depth.
    """
    return {
        'backup_file': backup_path.name,
        'file_size': backup_path.stat().st_size,
        'checksums': {
            'sha256': calculate_checksum(backup_path, 'sha256'),
            'sha512': calculate_checksum(backup_path, 'sha512'),
        },
        'created_at': datetime.now().isoformat()
    }

Structural Validation

Beyond bit-level integrity, backups must have valid internal structure. Structural validation verifies that backup files conform to expected format specifications—headers present, segments properly delimited, metadata consistent.

Database-specific structural validation:

Structural Validation Methods by Database

•PostgreSQL — pg_restore --list validates backup structure without restoring; pg_verifybackup (v13+) validates base backups against manifest
•MySQL — mysqlbinlog --verify-binlog-checksum for binary logs; myisamchk for MyISAM table files
•SQL Server — RESTORE VERIFYONLY validates backup sets; DBCC CHECKDB for logical consistency
•Oracle — RMAN VALIDATE checks backup file structure; DBVERIFY validates datafile blocks
•MongoDB — mongorestore --dryRun validates without restoring; oplog validation for replica consistency

structural_validation.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash
# Structural validation for PostgreSQL backups
 
BACKUP_FILE="$1"
LOG_FILE="/var/log/backup-validation.log"
 
log() { echo "[$(date -Iseconds)] $*" | tee -a "$LOG_FILE"; }
 
log "Validating structure: $BACKUP_FILE"
 
# Method 1: List backup contents (validates format)
if pg_restore --list "$BACKUP_FILE" > /dev/null 2>&1; then
    OBJECT_COUNT=$(pg_restore --list "$BACKUP_FILE" | wc -l)
    log "PASS: Valid format, contains $OBJECT_COUNT objects"
else
    log "FAIL: Backup format invalid or corrupted"
    exit 1
fi
 
# Method 2: Verify against manifest (PostgreSQL 13+)
MANIFEST="${BACKUP_FILE %.backup}.manifest"
if [[ -f "$MANIFEST" ]]; then
    if pg_verifybackup -m "$MANIFEST" "$BACKUP_FILE"; then
        log "PASS: Manifest verification successful"
    else
        log "FAIL: Manifest verification failed"
        exit 1
    fi
fi
 
log "Structural validation complete: PASS"

Block-Level Verification

Enterprise databases store page/block-level checksums that enable fine-grained corruption detection. Block verification validates each data page independently, identifying corruption even when file-level checksums pass.

How block-level checksums work:

Database pages (typically 8KB or 16KB) include embedded checksums calculated when the page is written. During verification, each page's content is re-checksummed and compared against the embedded value. A mismatch indicates corruption.

Advantages over file-level checksums:

•Pinpoint accuracy — Identifies exactly which pages are corrupted
•Partial corruption detection — File checksums can mask localized corruption
•Continuous validation — Checked during normal database operations
•Incremental verification — Can validate only changed pages

Enable Page Checksums

Block-level checksums must be enabled during database creation in most systems (e.g., PostgreSQL data_checksums). They cannot be added later without rebuilding. Always enable this feature for production databases—the minimal performance overhead is far outweighed by corruption detection capability.

Logical Consistency Verification

Even structurally valid backups can contain logically inconsistent data—referential integrity violations, constraint failures, or application-level anomalies. Consistency verification validates data correctness beyond physical integrity.

Consistency check categories:

Logical Consistency Verification Types
Check Type	What It Validates	Failure Indicates	Verification Method
Primary Key Uniqueness	No duplicate primary keys	Backup during concurrent modification	SELECT with GROUP BY HAVING COUNT > 1
Foreign Key Integrity	All references valid	Incomplete backup or truncation	LEFT JOIN WHERE parent IS NULL
Check Constraints	Domain constraints satisfied	Data corruption or invalid backup	COUNT(*) WHERE NOT constraint
Index Consistency	Index entries match table data	Index corruption in source	REINDEX and compare counts
Sequence Alignment	Sequences ahead of max values	Sequence not captured in backup	Compare sequence value vs MAX(column)

Application-level consistency:

Beyond database constraints, applications often have implicit consistency rules not enforced by the database:

Order totals equal sum of line items
Account balances match transaction history
Status fields align with related records

Include application-specific consistency queries in your verification suite to catch domain-level corruption.

Automated Integrity Pipeline

Integrity checking should be automated and continuous—running after every backup, during storage, and periodically throughout retention. An automated pipeline ensures no backup goes unverified.

Integrity Pipeline Stages

•Immediate verification (post-backup) — Checksum + structural validation within minutes of backup completion
•Transfer verification — Re-verify after any file movement (local to remote, between tiers)
•Storage verification — Periodic re-validation during retention to detect bit rot
•Pre-restore verification — Final check immediately before restoration attempt

Immutable Storage Protection

Combine integrity checks with immutable storage (WORM, object lock). Immutability prevents malicious modification, while checksums detect corruption. Together, they provide defense-in-depth against both intentional attacks and accidental damage.

Handling Integrity Failures

When integrity checks fail, immediate response is critical. A failed check means your backup safety net has a hole that must be addressed before the next disaster.

Integrity failure response protocol:

•Alert immediately — Integrity failures are high-priority events requiring prompt attention
•Preserve the evidence — Don't delete or modify the corrupted backup; it may be partially recoverable
•Identify the scope — Determine if corruption affects single backup or indicates systemic issue
•Verify other backups — Immediately validate recent backups from the same source
•Root cause analysis — Investigate storage, network, or software issues that caused corruption
•Initiate emergency backup — If no valid recent backup exists, take new backup immediately
•Update procedures — Modify backup and verification processes to prevent recurrence

Never Ignore Integrity Failures

An integrity failure you ignore today becomes a data loss event tomorrow. Treat every failed check as a production incident requiring tracking, resolution, and post-mortem analysis.

Summary: Integrity Checks

Key Takeaways

•Multi-layer verification is essential — Checksums, structural validation, and consistency checks each catch different corruption types
•Enable block-level checksums — Database page checksums provide fine-grained corruption detection
•Automate the integrity pipeline — Manual checks get skipped; automation ensures continuous validation
•Verify at every stage — Post-backup, post-transfer, during storage, and pre-restore
•Respond immediately to failures — Integrity failures are incidents requiring prompt investigation and remediation

Page Complete

You now understand integrity checking principles—the systematic detection of corruption before it impacts recovery. Next, we'll examine documentation practices that ensure backup and recovery procedures are well-defined and accessible.

3 / 5

Loading learning content...

Database Management SystemsBackup Verification

Backup Verification

LevelIntermediate

Duration60 mins

TopicBackup Verification

3 / 5

Integrity Checks

Detecting Corruption Before Crisis

The integrity challenge: Corruption can occur at multiple levels:

Physical corruption: Bit rot, bad sectors, incomplete writes
Logical corruption: Valid files containing invalid or inconsistent data
Structural corruption: Backup file format violations, missing headers
Encryption corruption: Key mismatches, incomplete encryption

Each corruption type requires different detection methods. Comprehensive integrity checking addresses all levels.

What You Will Learn

Checksum Verification

Checksum algorithm selection:

Checksum Algorithm Comparison
Algorithm	Output Size	Speed	Collision Resistance	Use Case
CRC32	32 bits	Very Fast	Low	Quick sanity checks, not security
MD5	128 bits	Fast	Broken	Legacy systems only, not recommended
SHA-1	160 bits	Fast	Weakened	Transitional, avoid for new systems
SHA-256	256 bits	Moderate	Strong	Recommended for integrity verification
SHA-512	512 bits	Moderate	Very Strong	High-security environments
BLAKE3	256+ bits	Very Fast	Strong	Modern high-performance option

checksum_verification.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#!/usr/bin/env python3
"""
Backup Integrity Verification using Checksums
Implements multi-algorithm checksum validation with streaming support
"""
 
import hashlib
from pathlib import Path
from dataclasses import dataclass
from typing import Dict, Optional
import json
 
@dataclass
class ChecksumResult:
    algorithm: str
    expected: str
    calculated: str
    match: bool
    file_size: int
 
def calculate_checksum(
    file_path: Path,
    algorithm: str = 'sha256',
    chunk_size: int = 8192 * 1024  # 8MB chunks for large files
) -> str:
    """
    Calculate checksum using streaming to handle large backup files.
    Memory-efficient: never loads entire file into RAM.
    """
    hash_func = hashlib.new(algorithm)
    
    with open(file_path, 'rb') as f:
        while chunk := f.read(chunk_size):
            hash_func.update(chunk)
    
    return hash_func.hexdigest()
 
def verify_backup_checksum(
    backup_path: Path,
    manifest_path: Path
) -> ChecksumResult:
    """
    Verify backup file against stored checksum in manifest.
    """
    # Load manifest with expected checksums
    with open(manifest_path) as f:
        manifest = json.load(f)
    
    expected = manifest['checksums']['sha256']
    calculated = calculate_checksum(backup_path, 'sha256')
    
    return ChecksumResult(
        algorithm='sha256',
        expected=expected,
        calculated=calculated,
        match=(expected == calculated),
        file_size=backup_path.stat().st_size
    )
 
def create_backup_manifest(backup_path: Path) -> Dict:
    """
    Create integrity manifest for a backup file.
    Includes multiple checksums for defense in depth.
    """
    return {
        'backup_file': backup_path.name,
        'file_size': backup_path.stat().st_size,
        'checksums': {
            'sha256': calculate_checksum(backup_path, 'sha256'),
            'sha512': calculate_checksum(backup_path, 'sha512'),
        },
        'created_at': datetime.now().isoformat()
    }

Structural Validation

Database-specific structural validation:

Structural Validation Methods by Database

•PostgreSQL — pg_restore --list validates backup structure without restoring; pg_verifybackup (v13+) validates base backups against manifest
•MySQL — mysqlbinlog --verify-binlog-checksum for binary logs; myisamchk for MyISAM table files
•SQL Server — RESTORE VERIFYONLY validates backup sets; DBCC CHECKDB for logical consistency
•Oracle — RMAN VALIDATE checks backup file structure; DBVERIFY validates datafile blocks
•MongoDB — mongorestore --dryRun validates without restoring; oplog validation for replica consistency

structural_validation.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash
# Structural validation for PostgreSQL backups
 
BACKUP_FILE="$1"
LOG_FILE="/var/log/backup-validation.log"
 
log() { echo "[$(date -Iseconds)] $*" | tee -a "$LOG_FILE"; }
 
log "Validating structure: $BACKUP_FILE"
 
# Method 1: List backup contents (validates format)
if pg_restore --list "$BACKUP_FILE" > /dev/null 2>&1; then
    OBJECT_COUNT=$(pg_restore --list "$BACKUP_FILE" | wc -l)
    log "PASS: Valid format, contains $OBJECT_COUNT objects"
else
    log "FAIL: Backup format invalid or corrupted"
    exit 1
fi
 
# Method 2: Verify against manifest (PostgreSQL 13+)
MANIFEST="${BACKUP_FILE %.backup}.manifest"
if [[ -f "$MANIFEST" ]]; then
    if pg_verifybackup -m "$MANIFEST" "$BACKUP_FILE"; then
        log "PASS: Manifest verification successful"
    else
        log "FAIL: Manifest verification failed"
        exit 1
    fi
fi
 
log "Structural validation complete: PASS"

Block-Level Verification

How block-level checksums work:

Advantages over file-level checksums:

•Pinpoint accuracy — Identifies exactly which pages are corrupted
•Partial corruption detection — File checksums can mask localized corruption
•Continuous validation — Checked during normal database operations
•Incremental verification — Can validate only changed pages

Enable Page Checksums

Logical Consistency Verification

Consistency check categories:

Logical Consistency Verification Types
Check Type	What It Validates	Failure Indicates	Verification Method
Primary Key Uniqueness	No duplicate primary keys	Backup during concurrent modification	SELECT with GROUP BY HAVING COUNT > 1
Foreign Key Integrity	All references valid	Incomplete backup or truncation	LEFT JOIN WHERE parent IS NULL
Check Constraints	Domain constraints satisfied	Data corruption or invalid backup	COUNT(*) WHERE NOT constraint
Index Consistency	Index entries match table data	Index corruption in source	REINDEX and compare counts
Sequence Alignment	Sequences ahead of max values	Sequence not captured in backup	Compare sequence value vs MAX(column)

Application-level consistency:

Beyond database constraints, applications often have implicit consistency rules not enforced by the database:

Order totals equal sum of line items
Account balances match transaction history
Status fields align with related records

Include application-specific consistency queries in your verification suite to catch domain-level corruption.

Automated Integrity Pipeline

Integrity checking should be automated and continuous—running after every backup, during storage, and periodically throughout retention. An automated pipeline ensures no backup goes unverified.

Integrity Pipeline Stages

•Immediate verification (post-backup) — Checksum + structural validation within minutes of backup completion
•Transfer verification — Re-verify after any file movement (local to remote, between tiers)
•Storage verification — Periodic re-validation during retention to detect bit rot
•Pre-restore verification — Final check immediately before restoration attempt

Immutable Storage Protection

Handling Integrity Failures

When integrity checks fail, immediate response is critical. A failed check means your backup safety net has a hole that must be addressed before the next disaster.

Integrity failure response protocol:

•Alert immediately — Integrity failures are high-priority events requiring prompt attention
•Preserve the evidence — Don't delete or modify the corrupted backup; it may be partially recoverable
•Identify the scope — Determine if corruption affects single backup or indicates systemic issue
•Verify other backups — Immediately validate recent backups from the same source
•Root cause analysis — Investigate storage, network, or software issues that caused corruption
•Initiate emergency backup — If no valid recent backup exists, take new backup immediately
•Update procedures — Modify backup and verification processes to prevent recurrence

Never Ignore Integrity Failures

An integrity failure you ignore today becomes a data loss event tomorrow. Treat every failed check as a production incident requiring tracking, resolution, and post-mortem analysis.

Summary: Integrity Checks

Key Takeaways

•Multi-layer verification is essential — Checksums, structural validation, and consistency checks each catch different corruption types
•Enable block-level checksums — Database page checksums provide fine-grained corruption detection
•Automate the integrity pipeline — Manual checks get skipped; automation ensures continuous validation
•Verify at every stage — Post-backup, post-transfer, during storage, and pre-restore
•Respond immediately to failures — Integrity failures are incidents requiring prompt investigation and remediation

Page Complete

3 / 5