Database Management SystemsBackup Implementation

Backup Implementation Strategies

LevelIntermediate

Duration60 mins

TopicBackup Implementation

5 / 5

Backup Storage

The Critical Foundation of Backup Infrastructure

Backup storage is where data protection becomes tangible—the physical and logical systems that store your backup data and determine whether recovery is possible when disaster strikes. A sophisticated backup strategy with world-class tools means nothing if the storage infrastructure fails, becomes inaccessible, or cannot scale with data growth.

Backup storage decisions have far-reaching implications: they affect recovery time objectives (RTO), determine operational costs, influence data retention capabilities, and establish your resilience against various disaster scenarios. This page provides comprehensive coverage of backup storage technologies, architectures, and best practices for designing robust backup infrastructure.

What You Will Learn

By the end of this page, you will understand the spectrum of backup storage technologies, master storage architecture patterns for different requirements, design retention and tiering strategies, leverage cloud storage effectively, and build storage infrastructure that supports your recovery objectives.

Storage Media and Technologies

Backup storage spans a range of technologies, each with distinct performance characteristics, cost profiles, and use cases.

Hard Disk Drives (HDD)

Traditional spinning disk remains the workhorse of backup storage:

Characteristics:

Cost: ~$15-25 per TB (enterprise drives)
Performance: 100-250 MB/s sequential, 50-200 IOPS random
Capacity: Up to 24 TB per drive (and growing)
Durability: 1-2 million hours MTBF

Best for: Primary backup landing zone, mid-tier storage, high-capacity requirements

Solid State Drives (SSD)

Flash storage for performance-sensitive backup needs:

Characteristics:

Cost: ~$80-150 per TB (enterprise NVMe)
Performance: 3,000-7,000 MB/s, hundreds of thousands of IOPS
Capacity: Up to 30+ TB per drive
Durability: Limited write endurance (TBW rating)

Best for: Backup landing zones requiring fast ingestion, instant recovery staging, deduplication indexes

Backup Storage Media Comparison
Media Type	Cost/TB	Write Speed	Read Speed	Durability	Best Use Case
HDD (Enterprise)	$15-25	200 MB/s	250 MB/s	Excellent	Primary backup storage
SSD (NVMe)	$80-150	5,000 MB/s	7,000 MB/s	Very Good	Fast recovery staging
Tape (LTO-9)	$5-8	400 MB/s	400 MB/s	Excellent	Long-term archive
Object Storage	$2-23/mo	Variable	Variable	Excellent	Cloud backup, archive
Optical (BDXL)	$10-15	36 MB/s	72 MB/s	Excellent	Compliance archive

Tape Storage

Magnetic tape remains relevant for large-scale archival:

Modern Tape (LTO-9):

Native capacity: 18 TB (45 TB compressed)
Transfer rate: 400 MB/s native
Media lifespan: 30+ years
Cost: ~~$100 per cartridge (~~$5-8 per TB native)

Advantages:

Lowest cost per TB for long-term storage
Air-gapped by nature (offline protection against ransomware)
Extremely long media life
Low power consumption (idle tapes consume zero power)

Considerations:

Sequential access only (slow random restore)
Requires tape library infrastructure
Restore times measured in hours for large datasets
Expertise increasingly rare

Object Storage

Cloud and on-premises object storage is increasingly popular:

Public Cloud (S3, Azure Blob, GCS):

Effectively unlimited capacity
No infrastructure management
Geographic distribution built-in
API-based access

On-Premises (MinIO, Ceph, Dell ECS):

S3-compatible APIs
Control over data location
Integration with existing infrastructure

Converting Mermaid diagram...

Storage Architecture Patterns

Backup storage architecture determines performance, resilience, and operational characteristics. Several patterns address different requirements.

Direct-Attached Storage (DAS)

Storage directly connected to the backup server:

Configuration:

RAID arrays attached via SAS, NVMe, or SATA
Server-managed storage pool

Advantages:

Highest performance (no network overhead)
Lowest complexity
Cost-effective for single-server backup

Limitations:

Tied to single server (single point of failure)
Limited scalability
No shared access

Network-Attached Storage (NAS)

File-based storage accessed over network:

Configuration:

NAS appliance (NetApp, Dell EMC, QNAP)
Protocols: NFS, SMB/CIFS

Advantages:

Shared access from multiple backup servers
Centralized management
Built-in data protection features

Limitations:

Network becomes bottleneck
Protocol overhead reduces throughput
Vendor lock-in potential

storage_configuration_examples.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#!/bin/bash
# Backup Storage Configuration Examples
 
# =====================================
# DAS: RAID Configuration with mdadm
# =====================================
 
# Create RAID 6 array for backup storage
# 6 data disks + 2 parity = tolerates 2 disk failures
sudo mdadm --create /dev/md0 \
    --level=6 \
    --raid-devices=8 \
    /dev/sd[b-i]
 
# Create filesystem
sudo mkfs.xfs -L backup_storage /dev/md0
 
# Mount for backup use
sudo mkdir -p /backup
sudo mount /dev/md0 /backup
 
# Add to fstab for persistence
echo "LABEL=backup_storage /backup xfs defaults,noatime 0 2" | sudo tee -a /etc/fstab
 
# =====================================
# NAS: NFS Mount for Backup
# =====================================
 
# Install NFS client
sudo apt-get install nfs-common
 
# Mount NFS share for backup
sudo mkdir -p /backup/nfs
sudo mount -t nfs -o hard,intr,rsize=1048576,wsize=1048576 \
    nas.example.com:/vol/backups /backup/nfs
 
# Persistent mount in fstab
echo "nas.example.com:/vol/backups /backup/nfs nfs hard,intr,rsize=1048576,wsize=1048576 0 0" | sudo tee -a /etc/fstab
 
# =====================================
# SAN: iSCSI Target Mount
# =====================================
 
# Discover iSCSI targets
sudo iscsiadm -m discovery -t sendtargets -p san.example.com:3260
 
# Login to target
sudo iscsiadm -m node -T iqn.2024-01.com.example:backup001 -p san.example.com:3260 -l
 
# Create filesystem on iSCSI LUN
sudo mkfs.xfs /dev/sdc  # Newly discovered iSCSI disk
 
# Mount iSCSI volume
sudo mkdir -p /backup/san
sudo mount /dev/sdc /backup/san
 
# =====================================
# Object Storage: S3-Compatible Mount
# =====================================
 
# Install s3fs for S3 mounting (not recommended for primary backup)
sudo apt-get install s3fs
 
# Create credentials file
echo "ACCESS_KEY:SECRET_KEY" > ~/.passwd-s3fs
chmod 600 ~/.passwd-s3fs
 
# Mount S3 bucket
s3fs mybucket /backup/s3 -o passwd_file=~/.passwd-s3fs,url=https://s3.amazonaws.com
 
# Better approach: Use backup tool's native S3 support
# pgBackRest with S3
cat >> /etc/pgbackrest/pgbackrest.conf <<EOF
[global]
repo1-type=s3
repo1-s3-bucket=backup-bucket
repo1-s3-endpoint=s3.amazonaws.com
repo1-s3-region=us-east-1
repo1-path=/pgbackrest
repo1-s3-key=ACCESS_KEY
repo1-s3-key-secret=SECRET_KEY
EOF

Storage Area Network (SAN)

Block-level storage over dedicated network:

Configuration:

Fibre Channel (16/32 Gbps) or iSCSI (10/25/100 GbE)
Centralized storage arrays
LUN presentation to servers

Advantages:

High performance (especially FC)
Enterprise-grade reliability
Advanced features (snapshots, replication)

Considerations:

Higher cost and complexity
Requires specialized expertise
FC requires dedicated infrastructure

Deduplication Appliances

Purpose-built backup storage with inline deduplication:

Products:

Dell EMC Data Domain
HPE StoreOnce
Quantum DXi

Advantages:

Dramatic storage savings (10:1 to 50:1 typical)
Built-in replication for DR
Optimized for backup workloads

Considerations:

Higher upfront cost
Performance depends on deduplication ratio
Vendor lock-in

Hybrid Architecture

Most production environments benefit from hybrid architectures: fast DAS or SAN for backup landing zones, NAS or deduplication appliances for retention, and cloud or tape for archive. Data automatically moves through tiers based on age and access patterns, optimizing both performance and cost.

Cloud Backup Storage

Cloud storage has transformed backup architecture, offering scalability, durability, and geographic distribution that would be prohibitively expensive on-premises.

Cloud Storage Classes

Cloud providers offer multiple storage tiers optimized for different access patterns:

Cloud Storage Tier Comparison
Tier (AWS)	Cost/GB/Month	Retrieval Time	Retrieval Cost	Best For
S3 Standard	$0.023	Immediate	None	Frequently accessed backups
S3 Standard-IA	$0.0125	Immediate	$0.01/GB	Infrequent restore, 30+ day retention
S3 Glacier Instant	$0.004	Milliseconds	$0.03/GB	Archive with instant access needs
S3 Glacier Flexible	$0.0036	1-12 hours	$0.03/GB	Archive, flexible retrieval OK
S3 Glacier Deep Archive	$0.00099	12-48 hours	$0.02/GB	Long-term archive, rare access

Cloud Storage Integration Strategies

1. Direct Cloud Backup

Backup directly to cloud storage:

Advantages:

No local storage infrastructure
Built-in geographic durability
Pay-as-you-go pricing

Considerations:

Internet bandwidth dependency
Latency affects backup window
Egress costs for restore

cloud_storage_integration.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
#!/bin/bash
# Cloud Storage Integration Examples
 
# =====================================
# AWS S3 Lifecycle Policy
# =====================================
 
# Create lifecycle policy for backup tiering
cat > lifecycle-policy.json <<EOF
{
    "Rules": [
        {
            "ID": "BackupLifecycle",
            "Status": "Enabled",
            "Filter": {
                "Prefix": "backups/"
            },
            "Transitions": [
                {
                    "Days": 30,
                    "StorageClass": "STANDARD_IA"
                },
                {
                    "Days": 90,
                    "StorageClass": "GLACIER_IR"
                },
                {
                    "Days": 365,
                    "StorageClass": "DEEP_ARCHIVE"
                }
            ],
            "Expiration": {
                "Days": 2555
            },
            "NoncurrentVersionExpiration": {
                "NoncurrentDays": 30
            }
        }
    ]
}
EOF
 
aws s3api put-bucket-lifecycle-configuration \
    --bucket my-backup-bucket \
    --lifecycle-configuration file://lifecycle-policy.json
 
# =====================================
# Backup to S3 with Intelligent-Tiering
# =====================================
 
# Upload backup with intelligent tiering
aws s3 cp /backup/daily_backup.tar.gz \
    s3://my-backup-bucket/daily/ \
    --storage-class INTELLIGENT_TIERING
 
# Sync entire backup directory
aws s3 sync /backup/postgresql s3://my-backup-bucket/postgresql/ \
    --storage-class INTELLIGENT_TIERING \
    --only-show-errors
 
# =====================================
# Cross-Region Replication for DR
# =====================================
 
# Enable versioning (required for replication)
aws s3api put-bucket-versioning \
    --bucket my-backup-bucket \
    --versioning-configuration Status=Enabled
 
# Create replication configuration
cat > replication-config.json <<EOF
{
    "Role": "arn:aws:iam::123456789:role/S3ReplicationRole",
    "Rules": [
        {
            "ID": "DR-Replication",
            "Status": "Enabled",
            "Priority": 1,
            "DeleteMarkerReplication": { "Status": "Disabled" },
            "Filter": { "Prefix": "" },
            "Destination": {
                "Bucket": "arn:aws:s3:::my-backup-bucket-dr",
                "StorageClass": "STANDARD_IA"
            }
        }
    ]
}
EOF
 
aws s3api put-bucket-replication \
    --bucket my-backup-bucket \
    --replication-configuration file://replication-config.json
 
# =====================================
# Azure Blob Storage Integration
# =====================================
 
# Create storage account with geo-redundancy
az storage account create \
    --name mybackupstorage \
    --resource-group backups \
    --location eastus \
    --sku Standard_GRS \
    --kind StorageV2
 
# Create container with blob-level tiering
az storage container create \
    --name backups \
    --account-name mybackupstorage
 
# Upload with access tier
az storage blob upload \
    --account-name mybackupstorage \
    --container-name backups \
    --name daily/backup.tar.gz \
    --file /backup/daily_backup.tar.gz \
    --tier Cool
 
# Set lifecycle management policy
az storage account management-policy create \
    --account-name mybackupstorage \
    --policy @lifecycle-policy.json

2. Hybrid Cloud Backup

Local backup with cloud replication:

Pattern:

Fast local backup to disk/deduplication appliance
Automated replication to cloud
Retention policies move data through cloud tiers

Advantages:

Fast local recovery (most common scenario)
Cloud provides geographic protection
Bandwidth-efficient through deduplication before transfer

3. Cloud-Native Database Backup

For cloud databases (RDS, Cloud SQL, etc.):

Built-in features:

Automated snapshots to cloud storage
Cross-region copy for DR
Point-in-time recovery via transaction logs

Cloud Egress Costs

Cloud storage is inexpensive but cloud egress (downloading data) is not. Restoring 10 TB from S3 Standard costs ~$900 in egress fees. Factor egress costs into disaster recovery planning. Strategies to mitigate: (1) choose cloud regions with lower egress costs, (2) use Direct Connect/ExpressRoute for large restores, (3) test restores periodically to understand actual costs, (4) consider keeping recent backups locally for common restore scenarios.

Data Protection and Immutability

Backup storage must be protected against both hardware failures and malicious attacks (especially ransomware). Multiple protection mechanisms work together.

RAID Protection

RAID provides first-line protection against disk failures:

RAID 5/6: Parity-based protection (1-2 disk failures)
RAID 10: Mirror + stripe (high performance, mirror protection)
RAID-Z (ZFS): Enhanced parity with data integrity verification

Replication

Copying backup data to secondary locations:

Synchronous: Every write confirmed at both locations (zero RPO, performance impact)
Asynchronous: Periodic replication (minutes/hours RPO, minimal performance impact)
Geographic: Different physical locations for disaster protection

Immutability and WORM (Write Once Read Many)

Critical protection against ransomware and accidental deletion:

Why immutability matters:

Ransomware attacks increasingly target backups first
Compromised admin credentials can delete backups
Regulatory requirements may mandate data retention

Implementation options:

immutability_configuration.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#!/bin/bash
# Backup Immutability Configuration Examples
 
# =====================================
# AWS S3 Object Lock
# =====================================
 
# Create bucket with object lock enabled
aws s3api create-bucket \
    --bucket my-immutable-backup \
    --object-lock-enabled-for-bucket \
    --create-bucket-configuration LocationConstraint=us-east-1
 
# Set default retention policy (Governance mode)
aws s3api put-object-lock-configuration \
    --bucket my-immutable-backup \
    --object-lock-configuration '{
        "ObjectLockEnabled": "Enabled",
        "Rule": {
            "DefaultRetention": {
                "Mode": "GOVERNANCE",
                "Days": 30
            }
        }
    }'
 
# Upload with compliance retention (cannot be deleted even by root)
aws s3api put-object \
    --bucket my-immutable-backup \
    --key backups/critical-backup.tar.gz \
    --body /backup/critical-backup.tar.gz \
    --object-lock-mode COMPLIANCE \
    --object-lock-retain-until-date "2024-12-31T00:00:00Z"
 
# =====================================
# Azure Immutable Blob Storage
# =====================================
 
# Create container with legal hold
az storage container create \
    --name immutable-backups \
    --account-name mybackupstorage
 
# Set time-based retention policy (locked after confirmation)
az storage container immutability-policy create \
    --resource-group backups \
    --account-name mybackupstorage \
    --container-name immutable-backups \
    --period 90
 
# Lock the policy (irreversible!)
az storage container immutability-policy lock \
    --resource-group backups \
    --account-name mybackupstorage \
    --container-name immutable-backups
 
# =====================================
# Linux: Immutable File Attribute
# =====================================
 
# Make backup file immutable (cannot be modified/deleted even by root)
sudo chattr +i /backup/critical_backup.tar.gz
 
# Check immutable status
lsattr /backup/critical_backup.tar.gz
# Output: ----i--------e-- /backup/critical_backup.tar.gz
 
# Remove immutable (required for deletion/rotation)
sudo chattr -i /backup/critical_backup.tar.gz
 
# Automated immutable backup script
backup_file="/backup/$(date +%Y%m%d)_backup.tar.gz"
tar -czf "$backup_file" /data
sudo chattr +i "$backup_file"
 
# Schedule immutability removal for retention
echo "sudo chattr -i $backup_file && rm -f $backup_file" | \
    at "now + 30 days"
 
# =====================================
# ZFS: Snapshot Holds
# =====================================
 
# Create hold on snapshot (prevents destruction)
zfs hold compliance_hold zpool/backups@2024-01-15
 
# List holds
zfs holds zpool/backups@2024-01-15
 
# Release hold (when retention period expires)
zfs release compliance_hold zpool/backups@2024-01-15

Ransomware-Resistant Backup Strategies

•Air-Gapped Backups — Physical separation from network (tape, offline disks)
•Immutable Storage — WORM/Object Lock prevents deletion
•Separate Credentials — Backup admin accounts distinct from normal IT
•Multi-Factor Authentication — MFA on backup management interfaces
•Anomaly Detection — Alert on unusual backup activity patterns
•Regular Verification — Confirm backup integrity hasn't been compromised
•3-2-1 Rule — 3 copies, 2 media types, 1 offsite

Retention and Lifecycle Management

Backup retention determines how long backups are kept and how storage is managed over time. Well-designed retention policies balance recovery requirements, compliance obligations, and storage costs.

Retention Strategies

Grandfather-Father-Son (GFS)

Classic rotation scheme maintaining multiple recovery points:

Daily (Son): Last 7 days
Weekly (Father): Last 4 weeks
Monthly (Grandfather): Last 12 months
Yearly: 7 years (compliance)

Total backups retained: 7 + 4 + 12 + 7 = 30 backups Maximum recovery granularity: Depends on age of data

GFS Retention Example
Backup Type	Frequency	Retention	Count	Purpose
Daily Full	Every day	7 days	7	Recent point-in-time recovery
Weekly Full	Every Sunday	4 weeks	4	Last month recovery
Monthly Full	1st of month	12 months	12	Historical recovery
Yearly Full	Jan 1st	7 years	7	Compliance, audit

Tiered Retention with Storage Optimization

Move backups through storage tiers as they age:

retention_management.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
#!/bin/bash
# Backup Retention and Lifecycle Management
 
# =====================================
# GFS Retention Implementation
# =====================================
 
BACKUP_DIR="/backup/postgresql"
TODAY=$(date +%u)  # Day of week (1-7, Monday=1)
DAY_OF_MONTH=$(date +%d)
MONTH=$(date +%m)
 
# Daily backup (keep 7 days)
pg_dump mydb > "$BACKUP_DIR/daily/$(date +%Y%m%d).sql.gz"
find "$BACKUP_DIR/daily" -name "*.sql.gz" -mtime +7 -delete
 
# Weekly backup on Sunday (keep 4 weeks)
if [ "$TODAY" -eq 7 ]; then
    cp "$BACKUP_DIR/daily/$(date +%Y%m%d).sql.gz" "$BACKUP_DIR/weekly/"
    find "$BACKUP_DIR/weekly" -name "*.sql.gz" -mtime +28 -delete
fi
 
# Monthly backup on 1st (keep 12 months)
if [ "$DAY_OF_MONTH" -eq "01" ]; then
    cp "$BACKUP_DIR/daily/$(date +%Y%m%d).sql.gz" "$BACKUP_DIR/monthly/"
    find "$BACKUP_DIR/monthly" -name "*.sql.gz" -mtime +365 -delete
fi
 
# Yearly backup on Jan 1st (keep 7 years)
if [ "$DAY_OF_MONTH" -eq "01" ] && [ "$MONTH" -eq "01" ]; then
    cp "$BACKUP_DIR/daily/$(date +%Y%m%d).sql.gz" "$BACKUP_DIR/yearly/"
    find "$BACKUP_DIR/yearly" -name "*.sql.gz" -mtime +2555 -delete
fi
 
# =====================================
# Tiered Storage Migration
# =====================================
 
FAST_TIER="/backup/ssd"      # 0-7 days
CAPACITY_TIER="/backup/hdd"   # 7-30 days
ARCHIVE_TIER="/backup/s3"     # 30+ days
 
# Move from fast to capacity tier (7+ days old)
find "$FAST_TIER" -name "*.backup" -mtime +7 -exec mv {} "$CAPACITY_TIER/" \;
 
# Archive to cloud (30+ days old)
find "$CAPACITY_TIER" -name "*.backup" -mtime +30 | while read file; do
    # Compress before archive
    gzip "$file"
    
    # Upload to S3 with lifecycle
    aws s3 cp "${file}.gz" s3://backup-bucket/archive/ \
        --storage-class STANDARD_IA
    
    # Remove local copy after confirmed upload
    rm "${file}.gz"
done
 
# =====================================
# pgBackRest Retention Configuration
# =====================================
 
cat >> /etc/pgbackrest/pgbackrest.conf <<EOF
[global]
# Full backup retention (keep 4 full backups)
repo1-retention-full=4
repo1-retention-full-type=count
 
# Differential retention (keep 2 per full)
repo1-retention-diff=2
 
# Archive retention (match full backup retention)
repo1-retention-archive=4
repo1-retention-archive-type=full
EOF
 
# Expire old backups based on retention
pgbackrest --stanza=main expire
 
# =====================================
# Database-Level Retention Tracking
# =====================================
 
# Create backup catalog table
psql <<EOF
CREATE TABLE IF NOT EXISTS backup_catalog (
    backup_id SERIAL PRIMARY KEY,
    backup_date TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    backup_type VARCHAR(20),  -- full, diff, incr, log
    backup_path TEXT,
    backup_size BIGINT,
    retention_class VARCHAR(20),  -- daily, weekly, monthly, yearly
    expire_date DATE,
    status VARCHAR(20) DEFAULT 'active'
);
 
-- Insert today's backup
INSERT INTO backup_catalog (backup_type, backup_path, backup_size, retention_class, expire_date)
VALUES ('full', '/backup/daily/20240115.sql.gz', 1073741824, 'daily', CURRENT_DATE + 7);
 
-- Find expired backups
SELECT backup_path FROM backup_catalog 
WHERE expire_date < CURRENT_DATE AND status = 'active';
 
-- Mark as expired
UPDATE backup_catalog SET status = 'expired' WHERE expire_date < CURRENT_DATE;
EOF

Compliance Requirements

Retention policies are often driven by compliance requirements: HIPAA (6 years), SOX (7 years), GDPR (varies by data type), PCI-DSS (1 year typically). Document your retention policies, ensure they meet regulatory requirements, and maintain audit trails of backup creation and deletion.

Capacity Planning and Monitoring

Backup storage must scale with data growth while maintaining performance. Proactive capacity planning prevents backup failures.

Capacity Estimation

Calculate required storage for your retention policy:

Sample Capacity Calculation (1 TB Database)
Backup Type	Size	Count	Total	Notes
Daily Full (compressed)	300 GB	7	2.1 TB	30% compression typical
Weekly Full	300 GB	4	1.2 TB	Separate from daily
Monthly Full	300 GB	12	3.6 TB	Full year history
Yearly Full	300 GB	7	2.1 TB	Compliance retention
Transaction Logs (daily)	50 GB	30	1.5 TB	For PITR
Total Required			10.5 TB	Plus 20% headroom

Growth Projection

Account for data growth:

Historical growth rate: Analyze past 12-24 months
Business projections: Planned expansions, new applications
Buffer: Add 20-30% for unexpected growth
Review cycle: Reassess quarterly

monitoring_and_alerting.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#!/bin/bash
# Backup Storage Monitoring and Alerting
 
# =====================================
# Storage Capacity Monitoring
# =====================================
 
BACKUP_PATH="/backup"
THRESHOLD_WARNING=80
THRESHOLD_CRITICAL=90
 
# Check usage percentage
usage=$(df "$BACKUP_PATH" | awk 'NR==2 {print $5}' | tr -d '%')
 
if [ "$usage" -ge "$THRESHOLD_CRITICAL" ]; then
    echo "CRITICAL: Backup storage at ${usage}%" |         mail -s "CRITICAL: Backup Storage Space" dba@example.com
    exit 2
elif [ "$usage" -ge "$THRESHOLD_WARNING" ]; then
    echo "WARNING: Backup storage at ${usage}%" |         mail -s "WARNING: Backup Storage Space" dba@example.com
    exit 1
fi
 
# =====================================
# Backup Size Trending
# =====================================
 
# Log daily backup sizes
cat >> /var/log/backup_sizes.log <<EOF
$(date +%Y-%m-%d),$(du -sb /backup/daily/$(date +%Y%m%d)* 2>/dev/null | awk '{sum+=$1} END {print sum}')
EOF
 
# Detect abnormal backup size (>50% deviation from average)
recent_avg=$(tail -7 /var/log/backup_sizes.log | awk -F, '{sum+=$2; count++} END {print sum/count}')
today_size=$(tail -1 /var/log/backup_sizes.log | cut -d, -f2)
deviation=$(echo "scale=2; ($today_size - $recent_avg) / $recent_avg * 100" | bc)
 
if [ "$(echo "$deviation > 50 || $deviation < -50" | bc)" -eq 1 ]; then
    echo "Unusual backup size detected: ${deviation}% deviation" |         mail -s "Backup Size Anomaly" dba@example.com
fi
 
# =====================================
# Backup Success Monitoring
# =====================================
 
# Check for today's backup
expected_backup="/backup/daily/$(date +%Y%m%d)_backup.sql.gz"
if [ ! -f "$expected_backup" ]; then
    echo "CRITICAL: Missing backup for $(date +%Y-%m-%d)" |         mail -s "MISSING BACKUP" dba@example.com
    exit 2
fi
 
# Verify backup is not empty/corrupt
min_size=$((100 * 1024 * 1024))  # 100 MB minimum
actual_size=$(stat -c%s "$expected_backup")
if [ "$actual_size" -lt "$min_size" ]; then
    echo "CRITICAL: Backup too small (${actual_size} bytes)" |         mail -s "BACKUP SIZE CRITICAL" dba@example.com
    exit 2
fi
 
# =====================================
# Prometheus/Grafana Metrics
# =====================================
 
# Expose metrics for Prometheus scraping
cat > /var/lib/node_exporter/textfile_collector/backup.prom <<EOF
# HELP backup_storage_bytes Total backup storage used
# TYPE backup_storage_bytes gauge
backup_storage_bytes $(du -sb /backup | awk '{print $1}')
 
# HELP backup_storage_available_bytes Available backup storage
# TYPE backup_storage_available_bytes gauge
backup_storage_available_bytes $(df -B1 /backup | awk 'NR==2 {print $4}')
 
# HELP backup_last_success_timestamp Last successful backup timestamp
# TYPE backup_last_success_timestamp gauge
backup_last_success_timestamp $(stat -c%Y "$expected_backup" 2>/dev/null || echo 0)
 
# HELP backup_size_bytes Size of most recent backup
# TYPE backup_size_bytes gauge  
backup_size_bytes $actual_size
EOF

Monitoring Checklist

•Storage capacity — Alert at 80% warning, 90% critical
•Backup completion — Alert if expected backup missing
•Backup size — Detect anomalies (too small, too large)
•Backup duration — Trend increasing duration
•Storage I/O — Monitor throughput and latency
•Replication lag — For replicated backup storage
•Cloud costs — Track storage and egress costs

Summary and Best Practices

Backup storage is the foundation of data protection. Let's consolidate the key principles:

Backup Storage Best Practices

•Follow the 3-2-1 Rule — 3 copies, 2 different media types, 1 offsite location
•Implement storage tiering — Fast storage for recent backups, economical storage for archives
•Enable immutability — Protect against ransomware with WORM/Object Lock
•Plan for growth — Project capacity needs 12-24 months ahead
•Monitor proactively — Alert on capacity, completion, and anomalies
•Test recovery — Regularly restore from all storage tiers
•Document storage architecture — For disaster recovery and audits
•Consider egress costs — Factor cloud retrieval costs into RTO planning
•Encrypt everywhere — At rest and in transit, especially for cloud
•Separate backup credentials — Different from production admin access

Storage Architecture Decision Matrix
Requirement	Recommended Architecture
Fast local recovery	SSD/NVMe landing zone + HDD retention
Ransomware protection	Immutable cloud + air-gapped tape
Compliance archive	WORM storage (cloud or tape)
Geographic DR	Cross-region cloud replication
Cost optimization	Tiered storage with lifecycle policies
Large-scale (PB)	Deduplication + tape archive

Module Complete

Congratulations! You've completed the Backup Implementation module. You now understand how to implement online and offline backups, ensure backup consistency, select appropriate tools, and design robust backup storage infrastructure. These skills form the foundation for protecting enterprise data against loss and enabling business continuity in the face of disasters.

5 / 5

Loading learning content...

Database Management SystemsBackup Implementation

Backup Implementation Strategies

LevelIntermediate

Duration60 mins

TopicBackup Implementation

5 / 5

Backup Storage

The Critical Foundation of Backup Infrastructure

What You Will Learn

Storage Media and Technologies

Backup storage spans a range of technologies, each with distinct performance characteristics, cost profiles, and use cases.

Hard Disk Drives (HDD)

Traditional spinning disk remains the workhorse of backup storage:

Characteristics:

Cost: ~$15-25 per TB (enterprise drives)
Performance: 100-250 MB/s sequential, 50-200 IOPS random
Capacity: Up to 24 TB per drive (and growing)
Durability: 1-2 million hours MTBF

Best for: Primary backup landing zone, mid-tier storage, high-capacity requirements

Solid State Drives (SSD)

Flash storage for performance-sensitive backup needs:

Characteristics:

Cost: ~$80-150 per TB (enterprise NVMe)
Performance: 3,000-7,000 MB/s, hundreds of thousands of IOPS
Capacity: Up to 30+ TB per drive
Durability: Limited write endurance (TBW rating)

Best for: Backup landing zones requiring fast ingestion, instant recovery staging, deduplication indexes

Backup Storage Media Comparison
Media Type	Cost/TB	Write Speed	Read Speed	Durability	Best Use Case
HDD (Enterprise)	$15-25	200 MB/s	250 MB/s	Excellent	Primary backup storage
SSD (NVMe)	$80-150	5,000 MB/s	7,000 MB/s	Very Good	Fast recovery staging
Tape (LTO-9)	$5-8	400 MB/s	400 MB/s	Excellent	Long-term archive
Object Storage	$2-23/mo	Variable	Variable	Excellent	Cloud backup, archive
Optical (BDXL)	$10-15	36 MB/s	72 MB/s	Excellent	Compliance archive

Tape Storage

Magnetic tape remains relevant for large-scale archival:

Modern Tape (LTO-9):

Native capacity: 18 TB (45 TB compressed)
Transfer rate: 400 MB/s native
Media lifespan: 30+ years
Cost: ~~$100 per cartridge (~~$5-8 per TB native)

Advantages:

Lowest cost per TB for long-term storage
Air-gapped by nature (offline protection against ransomware)
Extremely long media life
Low power consumption (idle tapes consume zero power)

Considerations:

Sequential access only (slow random restore)
Requires tape library infrastructure
Restore times measured in hours for large datasets
Expertise increasingly rare

Object Storage

Cloud and on-premises object storage is increasingly popular:

Public Cloud (S3, Azure Blob, GCS):

Effectively unlimited capacity
No infrastructure management
Geographic distribution built-in
API-based access

On-Premises (MinIO, Ceph, Dell ECS):

S3-compatible APIs
Control over data location
Integration with existing infrastructure

Converting Mermaid diagram...

Storage Architecture Patterns

Backup storage architecture determines performance, resilience, and operational characteristics. Several patterns address different requirements.

Direct-Attached Storage (DAS)

Storage directly connected to the backup server:

Configuration:

RAID arrays attached via SAS, NVMe, or SATA
Server-managed storage pool

Advantages:

Highest performance (no network overhead)
Lowest complexity
Cost-effective for single-server backup

Limitations:

Tied to single server (single point of failure)
Limited scalability
No shared access

Network-Attached Storage (NAS)

File-based storage accessed over network:

Configuration:

NAS appliance (NetApp, Dell EMC, QNAP)
Protocols: NFS, SMB/CIFS

Advantages:

Shared access from multiple backup servers
Centralized management
Built-in data protection features

Limitations:

Network becomes bottleneck
Protocol overhead reduces throughput
Vendor lock-in potential

storage_configuration_examples.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#!/bin/bash
# Backup Storage Configuration Examples
 
# =====================================
# DAS: RAID Configuration with mdadm
# =====================================
 
# Create RAID 6 array for backup storage
# 6 data disks + 2 parity = tolerates 2 disk failures
sudo mdadm --create /dev/md0 \
    --level=6 \
    --raid-devices=8 \
    /dev/sd[b-i]
 
# Create filesystem
sudo mkfs.xfs -L backup_storage /dev/md0
 
# Mount for backup use
sudo mkdir -p /backup
sudo mount /dev/md0 /backup
 
# Add to fstab for persistence
echo "LABEL=backup_storage /backup xfs defaults,noatime 0 2" | sudo tee -a /etc/fstab
 
# =====================================
# NAS: NFS Mount for Backup
# =====================================
 
# Install NFS client
sudo apt-get install nfs-common
 
# Mount NFS share for backup
sudo mkdir -p /backup/nfs
sudo mount -t nfs -o hard,intr,rsize=1048576,wsize=1048576 \
    nas.example.com:/vol/backups /backup/nfs
 
# Persistent mount in fstab
echo "nas.example.com:/vol/backups /backup/nfs nfs hard,intr,rsize=1048576,wsize=1048576 0 0" | sudo tee -a /etc/fstab
 
# =====================================
# SAN: iSCSI Target Mount
# =====================================
 
# Discover iSCSI targets
sudo iscsiadm -m discovery -t sendtargets -p san.example.com:3260
 
# Login to target
sudo iscsiadm -m node -T iqn.2024-01.com.example:backup001 -p san.example.com:3260 -l
 
# Create filesystem on iSCSI LUN
sudo mkfs.xfs /dev/sdc  # Newly discovered iSCSI disk
 
# Mount iSCSI volume
sudo mkdir -p /backup/san
sudo mount /dev/sdc /backup/san
 
# =====================================
# Object Storage: S3-Compatible Mount
# =====================================
 
# Install s3fs for S3 mounting (not recommended for primary backup)
sudo apt-get install s3fs
 
# Create credentials file
echo "ACCESS_KEY:SECRET_KEY" > ~/.passwd-s3fs
chmod 600 ~/.passwd-s3fs
 
# Mount S3 bucket
s3fs mybucket /backup/s3 -o passwd_file=~/.passwd-s3fs,url=https://s3.amazonaws.com
 
# Better approach: Use backup tool's native S3 support
# pgBackRest with S3
cat >> /etc/pgbackrest/pgbackrest.conf <<EOF
[global]
repo1-type=s3
repo1-s3-bucket=backup-bucket
repo1-s3-endpoint=s3.amazonaws.com
repo1-s3-region=us-east-1
repo1-path=/pgbackrest
repo1-s3-key=ACCESS_KEY
repo1-s3-key-secret=SECRET_KEY
EOF

Storage Area Network (SAN)

Block-level storage over dedicated network:

Configuration:

Fibre Channel (16/32 Gbps) or iSCSI (10/25/100 GbE)
Centralized storage arrays
LUN presentation to servers

Advantages:

High performance (especially FC)
Enterprise-grade reliability
Advanced features (snapshots, replication)

Considerations:

Higher cost and complexity
Requires specialized expertise
FC requires dedicated infrastructure

Deduplication Appliances

Purpose-built backup storage with inline deduplication:

Products:

Dell EMC Data Domain
HPE StoreOnce
Quantum DXi

Advantages:

Dramatic storage savings (10:1 to 50:1 typical)
Built-in replication for DR
Optimized for backup workloads

Considerations:

Higher upfront cost
Performance depends on deduplication ratio
Vendor lock-in

Hybrid Architecture

Cloud Backup Storage

Cloud storage has transformed backup architecture, offering scalability, durability, and geographic distribution that would be prohibitively expensive on-premises.

Cloud Storage Classes

Cloud providers offer multiple storage tiers optimized for different access patterns:

Cloud Storage Tier Comparison
Tier (AWS)	Cost/GB/Month	Retrieval Time	Retrieval Cost	Best For
S3 Standard	$0.023	Immediate	None	Frequently accessed backups
S3 Standard-IA	$0.0125	Immediate	$0.01/GB	Infrequent restore, 30+ day retention
S3 Glacier Instant	$0.004	Milliseconds	$0.03/GB	Archive with instant access needs
S3 Glacier Flexible	$0.0036	1-12 hours	$0.03/GB	Archive, flexible retrieval OK
S3 Glacier Deep Archive	$0.00099	12-48 hours	$0.02/GB	Long-term archive, rare access

Cloud Storage Integration Strategies

1. Direct Cloud Backup

Backup directly to cloud storage:

Advantages:

No local storage infrastructure
Built-in geographic durability
Pay-as-you-go pricing

Considerations:

Internet bandwidth dependency
Latency affects backup window
Egress costs for restore

cloud_storage_integration.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
#!/bin/bash
# Cloud Storage Integration Examples
 
# =====================================
# AWS S3 Lifecycle Policy
# =====================================
 
# Create lifecycle policy for backup tiering
cat > lifecycle-policy.json <<EOF
{
    "Rules": [
        {
            "ID": "BackupLifecycle",
            "Status": "Enabled",
            "Filter": {
                "Prefix": "backups/"
            },
            "Transitions": [
                {
                    "Days": 30,
                    "StorageClass": "STANDARD_IA"
                },
                {
                    "Days": 90,
                    "StorageClass": "GLACIER_IR"
                },
                {
                    "Days": 365,
                    "StorageClass": "DEEP_ARCHIVE"
                }
            ],
            "Expiration": {
                "Days": 2555
            },
            "NoncurrentVersionExpiration": {
                "NoncurrentDays": 30
            }
        }
    ]
}
EOF
 
aws s3api put-bucket-lifecycle-configuration \
    --bucket my-backup-bucket \
    --lifecycle-configuration file://lifecycle-policy.json
 
# =====================================
# Backup to S3 with Intelligent-Tiering
# =====================================
 
# Upload backup with intelligent tiering
aws s3 cp /backup/daily_backup.tar.gz \
    s3://my-backup-bucket/daily/ \
    --storage-class INTELLIGENT_TIERING
 
# Sync entire backup directory
aws s3 sync /backup/postgresql s3://my-backup-bucket/postgresql/ \
    --storage-class INTELLIGENT_TIERING \
    --only-show-errors
 
# =====================================
# Cross-Region Replication for DR
# =====================================
 
# Enable versioning (required for replication)
aws s3api put-bucket-versioning \
    --bucket my-backup-bucket \
    --versioning-configuration Status=Enabled
 
# Create replication configuration
cat > replication-config.json <<EOF
{
    "Role": "arn:aws:iam::123456789:role/S3ReplicationRole",
    "Rules": [
        {
            "ID": "DR-Replication",
            "Status": "Enabled",
            "Priority": 1,
            "DeleteMarkerReplication": { "Status": "Disabled" },
            "Filter": { "Prefix": "" },
            "Destination": {
                "Bucket": "arn:aws:s3:::my-backup-bucket-dr",
                "StorageClass": "STANDARD_IA"
            }
        }
    ]
}
EOF
 
aws s3api put-bucket-replication \
    --bucket my-backup-bucket \
    --replication-configuration file://replication-config.json
 
# =====================================
# Azure Blob Storage Integration
# =====================================
 
# Create storage account with geo-redundancy
az storage account create \
    --name mybackupstorage \
    --resource-group backups \
    --location eastus \
    --sku Standard_GRS \
    --kind StorageV2
 
# Create container with blob-level tiering
az storage container create \
    --name backups \
    --account-name mybackupstorage
 
# Upload with access tier
az storage blob upload \
    --account-name mybackupstorage \
    --container-name backups \
    --name daily/backup.tar.gz \
    --file /backup/daily_backup.tar.gz \
    --tier Cool
 
# Set lifecycle management policy
az storage account management-policy create \
    --account-name mybackupstorage \
    --policy @lifecycle-policy.json

2. Hybrid Cloud Backup

Local backup with cloud replication:

Pattern:

Fast local backup to disk/deduplication appliance
Automated replication to cloud
Retention policies move data through cloud tiers

Advantages:

Fast local recovery (most common scenario)
Cloud provides geographic protection
Bandwidth-efficient through deduplication before transfer

3. Cloud-Native Database Backup

For cloud databases (RDS, Cloud SQL, etc.):

Built-in features:

Automated snapshots to cloud storage
Cross-region copy for DR
Point-in-time recovery via transaction logs

Cloud Egress Costs

Data Protection and Immutability

Backup storage must be protected against both hardware failures and malicious attacks (especially ransomware). Multiple protection mechanisms work together.

RAID Protection

RAID provides first-line protection against disk failures:

RAID 5/6: Parity-based protection (1-2 disk failures)
RAID 10: Mirror + stripe (high performance, mirror protection)
RAID-Z (ZFS): Enhanced parity with data integrity verification

Replication

Copying backup data to secondary locations:

Synchronous: Every write confirmed at both locations (zero RPO, performance impact)
Asynchronous: Periodic replication (minutes/hours RPO, minimal performance impact)
Geographic: Different physical locations for disaster protection

Immutability and WORM (Write Once Read Many)

Critical protection against ransomware and accidental deletion:

Why immutability matters:

Ransomware attacks increasingly target backups first
Compromised admin credentials can delete backups
Regulatory requirements may mandate data retention

Implementation options:

immutability_configuration.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#!/bin/bash
# Backup Immutability Configuration Examples
 
# =====================================
# AWS S3 Object Lock
# =====================================
 
# Create bucket with object lock enabled
aws s3api create-bucket \
    --bucket my-immutable-backup \
    --object-lock-enabled-for-bucket \
    --create-bucket-configuration LocationConstraint=us-east-1
 
# Set default retention policy (Governance mode)
aws s3api put-object-lock-configuration \
    --bucket my-immutable-backup \
    --object-lock-configuration '{
        "ObjectLockEnabled": "Enabled",
        "Rule": {
            "DefaultRetention": {
                "Mode": "GOVERNANCE",
                "Days": 30
            }
        }
    }'
 
# Upload with compliance retention (cannot be deleted even by root)
aws s3api put-object \
    --bucket my-immutable-backup \
    --key backups/critical-backup.tar.gz \
    --body /backup/critical-backup.tar.gz \
    --object-lock-mode COMPLIANCE \
    --object-lock-retain-until-date "2024-12-31T00:00:00Z"
 
# =====================================
# Azure Immutable Blob Storage
# =====================================
 
# Create container with legal hold
az storage container create \
    --name immutable-backups \
    --account-name mybackupstorage
 
# Set time-based retention policy (locked after confirmation)
az storage container immutability-policy create \
    --resource-group backups \
    --account-name mybackupstorage \
    --container-name immutable-backups \
    --period 90
 
# Lock the policy (irreversible!)
az storage container immutability-policy lock \
    --resource-group backups \
    --account-name mybackupstorage \
    --container-name immutable-backups
 
# =====================================
# Linux: Immutable File Attribute
# =====================================
 
# Make backup file immutable (cannot be modified/deleted even by root)
sudo chattr +i /backup/critical_backup.tar.gz
 
# Check immutable status
lsattr /backup/critical_backup.tar.gz
# Output: ----i--------e-- /backup/critical_backup.tar.gz
 
# Remove immutable (required for deletion/rotation)
sudo chattr -i /backup/critical_backup.tar.gz
 
# Automated immutable backup script
backup_file="/backup/$(date +%Y%m%d)_backup.tar.gz"
tar -czf "$backup_file" /data
sudo chattr +i "$backup_file"
 
# Schedule immutability removal for retention
echo "sudo chattr -i $backup_file && rm -f $backup_file" | \
    at "now + 30 days"
 
# =====================================
# ZFS: Snapshot Holds
# =====================================
 
# Create hold on snapshot (prevents destruction)
zfs hold compliance_hold zpool/backups@2024-01-15
 
# List holds
zfs holds zpool/backups@2024-01-15
 
# Release hold (when retention period expires)
zfs release compliance_hold zpool/backups@2024-01-15

Ransomware-Resistant Backup Strategies

•Air-Gapped Backups — Physical separation from network (tape, offline disks)
•Immutable Storage — WORM/Object Lock prevents deletion
•Separate Credentials — Backup admin accounts distinct from normal IT
•Multi-Factor Authentication — MFA on backup management interfaces
•Anomaly Detection — Alert on unusual backup activity patterns
•Regular Verification — Confirm backup integrity hasn't been compromised
•3-2-1 Rule — 3 copies, 2 media types, 1 offsite

Retention and Lifecycle Management

Backup retention determines how long backups are kept and how storage is managed over time. Well-designed retention policies balance recovery requirements, compliance obligations, and storage costs.

Retention Strategies

Grandfather-Father-Son (GFS)

Classic rotation scheme maintaining multiple recovery points:

Daily (Son): Last 7 days
Weekly (Father): Last 4 weeks
Monthly (Grandfather): Last 12 months
Yearly: 7 years (compliance)

Total backups retained: 7 + 4 + 12 + 7 = 30 backups Maximum recovery granularity: Depends on age of data

GFS Retention Example
Backup Type	Frequency	Retention	Count	Purpose
Daily Full	Every day	7 days	7	Recent point-in-time recovery
Weekly Full	Every Sunday	4 weeks	4	Last month recovery
Monthly Full	1st of month	12 months	12	Historical recovery
Yearly Full	Jan 1st	7 years	7	Compliance, audit

Tiered Retention with Storage Optimization

Move backups through storage tiers as they age:

retention_management.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
#!/bin/bash
# Backup Retention and Lifecycle Management
 
# =====================================
# GFS Retention Implementation
# =====================================
 
BACKUP_DIR="/backup/postgresql"
TODAY=$(date +%u)  # Day of week (1-7, Monday=1)
DAY_OF_MONTH=$(date +%d)
MONTH=$(date +%m)
 
# Daily backup (keep 7 days)
pg_dump mydb > "$BACKUP_DIR/daily/$(date +%Y%m%d).sql.gz"
find "$BACKUP_DIR/daily" -name "*.sql.gz" -mtime +7 -delete
 
# Weekly backup on Sunday (keep 4 weeks)
if [ "$TODAY" -eq 7 ]; then
    cp "$BACKUP_DIR/daily/$(date +%Y%m%d).sql.gz" "$BACKUP_DIR/weekly/"
    find "$BACKUP_DIR/weekly" -name "*.sql.gz" -mtime +28 -delete
fi
 
# Monthly backup on 1st (keep 12 months)
if [ "$DAY_OF_MONTH" -eq "01" ]; then
    cp "$BACKUP_DIR/daily/$(date +%Y%m%d).sql.gz" "$BACKUP_DIR/monthly/"
    find "$BACKUP_DIR/monthly" -name "*.sql.gz" -mtime +365 -delete
fi
 
# Yearly backup on Jan 1st (keep 7 years)
if [ "$DAY_OF_MONTH" -eq "01" ] && [ "$MONTH" -eq "01" ]; then
    cp "$BACKUP_DIR/daily/$(date +%Y%m%d).sql.gz" "$BACKUP_DIR/yearly/"
    find "$BACKUP_DIR/yearly" -name "*.sql.gz" -mtime +2555 -delete
fi
 
# =====================================
# Tiered Storage Migration
# =====================================
 
FAST_TIER="/backup/ssd"      # 0-7 days
CAPACITY_TIER="/backup/hdd"   # 7-30 days
ARCHIVE_TIER="/backup/s3"     # 30+ days
 
# Move from fast to capacity tier (7+ days old)
find "$FAST_TIER" -name "*.backup" -mtime +7 -exec mv {} "$CAPACITY_TIER/" \;
 
# Archive to cloud (30+ days old)
find "$CAPACITY_TIER" -name "*.backup" -mtime +30 | while read file; do
    # Compress before archive
    gzip "$file"
    
    # Upload to S3 with lifecycle
    aws s3 cp "${file}.gz" s3://backup-bucket/archive/ \
        --storage-class STANDARD_IA
    
    # Remove local copy after confirmed upload
    rm "${file}.gz"
done
 
# =====================================
# pgBackRest Retention Configuration
# =====================================
 
cat >> /etc/pgbackrest/pgbackrest.conf <<EOF
[global]
# Full backup retention (keep 4 full backups)
repo1-retention-full=4
repo1-retention-full-type=count
 
# Differential retention (keep 2 per full)
repo1-retention-diff=2
 
# Archive retention (match full backup retention)
repo1-retention-archive=4
repo1-retention-archive-type=full
EOF
 
# Expire old backups based on retention
pgbackrest --stanza=main expire
 
# =====================================
# Database-Level Retention Tracking
# =====================================
 
# Create backup catalog table
psql <<EOF
CREATE TABLE IF NOT EXISTS backup_catalog (
    backup_id SERIAL PRIMARY KEY,
    backup_date TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    backup_type VARCHAR(20),  -- full, diff, incr, log
    backup_path TEXT,
    backup_size BIGINT,
    retention_class VARCHAR(20),  -- daily, weekly, monthly, yearly
    expire_date DATE,
    status VARCHAR(20) DEFAULT 'active'
);
 
-- Insert today's backup
INSERT INTO backup_catalog (backup_type, backup_path, backup_size, retention_class, expire_date)
VALUES ('full', '/backup/daily/20240115.sql.gz', 1073741824, 'daily', CURRENT_DATE + 7);
 
-- Find expired backups
SELECT backup_path FROM backup_catalog 
WHERE expire_date < CURRENT_DATE AND status = 'active';
 
-- Mark as expired
UPDATE backup_catalog SET status = 'expired' WHERE expire_date < CURRENT_DATE;
EOF

Compliance Requirements

Capacity Planning and Monitoring

Backup storage must scale with data growth while maintaining performance. Proactive capacity planning prevents backup failures.

Capacity Estimation

Calculate required storage for your retention policy:

Sample Capacity Calculation (1 TB Database)
Backup Type	Size	Count	Total	Notes
Daily Full (compressed)	300 GB	7	2.1 TB	30% compression typical
Weekly Full	300 GB	4	1.2 TB	Separate from daily
Monthly Full	300 GB	12	3.6 TB	Full year history
Yearly Full	300 GB	7	2.1 TB	Compliance retention
Transaction Logs (daily)	50 GB	30	1.5 TB	For PITR
Total Required			10.5 TB	Plus 20% headroom

Growth Projection

Account for data growth:

Historical growth rate: Analyze past 12-24 months
Business projections: Planned expansions, new applications
Buffer: Add 20-30% for unexpected growth
Review cycle: Reassess quarterly

monitoring_and_alerting.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#!/bin/bash
# Backup Storage Monitoring and Alerting
 
# =====================================
# Storage Capacity Monitoring
# =====================================
 
BACKUP_PATH="/backup"
THRESHOLD_WARNING=80
THRESHOLD_CRITICAL=90
 
# Check usage percentage
usage=$(df "$BACKUP_PATH" | awk 'NR==2 {print $5}' | tr -d '%')
 
if [ "$usage" -ge "$THRESHOLD_CRITICAL" ]; then
    echo "CRITICAL: Backup storage at ${usage}%" |         mail -s "CRITICAL: Backup Storage Space" dba@example.com
    exit 2
elif [ "$usage" -ge "$THRESHOLD_WARNING" ]; then
    echo "WARNING: Backup storage at ${usage}%" |         mail -s "WARNING: Backup Storage Space" dba@example.com
    exit 1
fi
 
# =====================================
# Backup Size Trending
# =====================================
 
# Log daily backup sizes
cat >> /var/log/backup_sizes.log <<EOF
$(date +%Y-%m-%d),$(du -sb /backup/daily/$(date +%Y%m%d)* 2>/dev/null | awk '{sum+=$1} END {print sum}')
EOF
 
# Detect abnormal backup size (>50% deviation from average)
recent_avg=$(tail -7 /var/log/backup_sizes.log | awk -F, '{sum+=$2; count++} END {print sum/count}')
today_size=$(tail -1 /var/log/backup_sizes.log | cut -d, -f2)
deviation=$(echo "scale=2; ($today_size - $recent_avg) / $recent_avg * 100" | bc)
 
if [ "$(echo "$deviation > 50 || $deviation < -50" | bc)" -eq 1 ]; then
    echo "Unusual backup size detected: ${deviation}% deviation" |         mail -s "Backup Size Anomaly" dba@example.com
fi
 
# =====================================
# Backup Success Monitoring
# =====================================
 
# Check for today's backup
expected_backup="/backup/daily/$(date +%Y%m%d)_backup.sql.gz"
if [ ! -f "$expected_backup" ]; then
    echo "CRITICAL: Missing backup for $(date +%Y-%m-%d)" |         mail -s "MISSING BACKUP" dba@example.com
    exit 2
fi
 
# Verify backup is not empty/corrupt
min_size=$((100 * 1024 * 1024))  # 100 MB minimum
actual_size=$(stat -c%s "$expected_backup")
if [ "$actual_size" -lt "$min_size" ]; then
    echo "CRITICAL: Backup too small (${actual_size} bytes)" |         mail -s "BACKUP SIZE CRITICAL" dba@example.com
    exit 2
fi
 
# =====================================
# Prometheus/Grafana Metrics
# =====================================
 
# Expose metrics for Prometheus scraping
cat > /var/lib/node_exporter/textfile_collector/backup.prom <<EOF
# HELP backup_storage_bytes Total backup storage used
# TYPE backup_storage_bytes gauge
backup_storage_bytes $(du -sb /backup | awk '{print $1}')
 
# HELP backup_storage_available_bytes Available backup storage
# TYPE backup_storage_available_bytes gauge
backup_storage_available_bytes $(df -B1 /backup | awk 'NR==2 {print $4}')
 
# HELP backup_last_success_timestamp Last successful backup timestamp
# TYPE backup_last_success_timestamp gauge
backup_last_success_timestamp $(stat -c%Y "$expected_backup" 2>/dev/null || echo 0)
 
# HELP backup_size_bytes Size of most recent backup
# TYPE backup_size_bytes gauge  
backup_size_bytes $actual_size
EOF

Monitoring Checklist

•Storage capacity — Alert at 80% warning, 90% critical
•Backup completion — Alert if expected backup missing
•Backup size — Detect anomalies (too small, too large)
•Backup duration — Trend increasing duration
•Storage I/O — Monitor throughput and latency
•Replication lag — For replicated backup storage
•Cloud costs — Track storage and egress costs

Summary and Best Practices

Backup storage is the foundation of data protection. Let's consolidate the key principles:

Backup Storage Best Practices

•Follow the 3-2-1 Rule — 3 copies, 2 different media types, 1 offsite location
•Implement storage tiering — Fast storage for recent backups, economical storage for archives
•Enable immutability — Protect against ransomware with WORM/Object Lock
•Plan for growth — Project capacity needs 12-24 months ahead
•Monitor proactively — Alert on capacity, completion, and anomalies
•Test recovery — Regularly restore from all storage tiers
•Document storage architecture — For disaster recovery and audits
•Consider egress costs — Factor cloud retrieval costs into RTO planning
•Encrypt everywhere — At rest and in transit, especially for cloud
•Separate backup credentials — Different from production admin access

Storage Architecture Decision Matrix
Requirement	Recommended Architecture
Fast local recovery	SSD/NVMe landing zone + HDD retention
Ransomware protection	Immutable cloud + air-gapped tape
Compliance archive	WORM storage (cloud or tape)
Geographic DR	Cross-region cloud replication
Cost optimization	Tiered storage with lifecycle policies
Large-scale (PB)	Deduplication + tape archive

Module Complete

5 / 5