Btrfs - Learning Module

Loading content...

0/240

Data Scrubbing

The Silent Enemy: Bit Rot

Your data is under constant attack from an invisible enemy. Bit rot—the gradual corruption of data through cosmic rays, electrical interference, firmware bugs, controller errors, and media degradation—silently flips bits in your files. A single bit flip in a photograph might cause a barely noticeable color change. A bit flip in a database could corrupt critical records. A bit flip in an executable could cause crashes or security vulnerabilities.

Traditional file systems are blind to this corruption. They trust that data read from disk is the same data that was written. By the time you discover the corruption—if you ever do—it may have propagated to your backups, leaving you with no clean copy.

Btrfs data scrubbing is the antidote. By storing cryptographic checksums for every block of data and metadata, Btrfs can actively verify data integrity and, when configured with redundancy, automatically repair corruption.

What You Will Learn

By the end of this page, you will understand Btrfs's data integrity architecture: how checksums work, the scrub operation that verifies entire file systems, self-healing with redundant storage, monitoring and scheduling scrubs, and best practices for data integrity maintenance.

Understanding Data Corruption

Before exploring Btrfs's solutions, we need to understand the scope of the problem.

Sources of Silent Data Corruption:

Corruption Sources

•Cosmic rays — High-energy particles can flip bits in RAM or storage cells; more common at high altitudes and in data centers
•Electrical interference — Power fluctuations, nearby electromagnetic fields, or poorly shielded cables can induce bit errors
•Firmware bugs — Disk and controller firmware can have bugs that corrupt data during writes, reads, or internal operations
•Vibration and heat — Physical stress can cause read/write head positioning errors or cell degradation
•Media degradation — Magnetic domains weaken over time; flash cells lose charge; optical media degrades with light exposure
•Manufacturing defects — Marginal sectors or cells may pass initial tests but fail later
•Misdirected writes — A write intended for sector X lands on sector Y, often undetected by the storage device
•Lost writes — The device acknowledges a write that never actually persisted

The Scale of the Problem:

Studies of large storage systems have revealed alarming statistics:

CERN found ~1 undetected corruption event per 10 TB transferred
NetApp reported silent corruption in 8.5% of RAID groups over 17 months
Carnegie Mellon USENIX study found 3.45% of enterprise RAID systems had silent corruption

Why Traditional File Systems Can't Help:

File systems like ext4, XFS, and NTFS store data but don't independently verify it:

Data is written to disk
Later, data is read back
File system returns what the disk provides
If the disk returns corrupted data, the file system delivers corrupted data
No comparison against expected values occurs

These file systems trust the hardware, but as we've seen, hardware is not always trustworthy.

RAID Doesn't Solve This

Traditional RAID protects against drive failure, not data corruption. If a disk silently returns bad data, RAID can't detect it. Worse, RAID rebuild operations might propagate the corruption or even choose the wrong data during reconstruction. This is called 'silent data corruption'—RAID's blind spot.

Btrfs Checksums

Btrfs solves the corruption problem by computing and storing cryptographic checksums for every block of data and every tree node.

How Checksums Work:

On Write: When data is written, Btrfs computes a checksum of the block content
Checksum Storage: The checksum is stored in the checksum tree (separate from data)
On Read: When data is read, Btrfs recomputes the checksum
Verification: If computed checksum matches stored checksum → data is good
Corruption Detected: If checksums differ → corruption is flagged

Checksum Algorithms:

Btrfs Checksum Algorithms
Algorithm	Size	Speed	Security	Notes
CRC32C	4 bytes	Very fast	Low	Default, hardware accelerated on modern CPUs
xxHash	8 bytes	Fast	Low	Faster than CRC32C in software, good for large datasets
SHA256	32 bytes	Slow	High	Cryptographically secure, protects against malicious attacks
BLAKE2b	32 bytes	Medium	High	Fast cryptographic hash, good balance

Choosing a Checksum Algorithm:

# Create file system with specific checksum algorithm
$ mkfs.btrfs --csum sha256 /dev/sda1
$ mkfs.btrfs --csum xxhash /dev/sda1
$ mkfs.btrfs --csum blake2 /dev/sda1

# Check current algorithm
$ btrfs inspect-internal dump-super /dev/sda1 | grep csum_type
csum_type               sha256

Algorithm Selection Guidelines:

CRC32C (default): Best for most uses; hardware accelerated; protects against accidental corruption
xxHash: Consider for software RAID or when CPU lacks CRC32C instructions
SHA256/BLAKE2b: Use when cryptographic protection is needed (e.g., malicious attacker could craft files with matching CRC32C)

Checksum Overhead:

Checksums add storage and CPU overhead:

Storage: 4 bytes per 4KB block (CRC32C) = ~0.1% overhead; 32 bytes per block (SHA256) = ~0.8% overhead
CPU: CRC32C with hardware: negligible; SHA256: measurable but usually acceptable

Checksum Architecture

Visualization

Data Write Flow:
═══════════════════════════════════════════════════════════════
 
User Data: [Block A: "Hello World...4KB of content..."]
                │
                ▼
Compute Checksum: crc32c("Hello World...") = 0xABCD1234
                │
                ├──────────────────────────┐
                ▼                          ▼
Data Tree:                         Checksum Tree:
[Extent pointer to Block A]        [Offset: 0, Checksum: 0xABCD1234]
 
 
Data Read Flow:
═══════════════════════════════════════════════════════════════
 
Read Request for offset 0
        │
        ├──────────────────────────┐
        ▼                          ▼
Fetch from disk:            Fetch expected checksum:
Block A content             0xABCD1234
        │                          │
        ▼                          ▼
Compute: crc32c(content) ──────── Compare ──────→ Match? 
        │                                           │
        ├────────────────── Yes ────────────────────┤
        │                                           │
        ▼                         No ───────────────┘
Return data to user               │
                                  ▼
                           Return -EIO error
                           (or attempt repair if redundancy exists)

Metadata Checksums Are Always On

Btrfs tree nodes (metadata) always have checksums stored in the node header, regardless of data checksum settings. This protects file system structure integrity. The 'nodatasum' mount option only affects file data checksums.

The Scrub Operation

Checksums verify data when it's read, but some data may not be read for months or years. Scrub is a proactive operation that reads and verifies the entire file system, ensuring all data is checked regularly.

Running a Scrub:

# Start a scrub (runs in background)
$ sudo btrfs scrub start /mnt
scrub started on /mnt, fsid a1b2c3d4-...

# Check scrub status
$ sudo btrfs scrub status /mnt
Scrub started:    Fri Jan 15 10:00:00 2024
Status:           running
Duration:         0:05:32
Total to scrub:   500.00GiB
Rate:             1.5GiB/s
Error summary:    no errors found

# Cancel a running scrub
$ sudo btrfs scrub cancel /mnt

# Resume a cancelled/interrupted scrub
$ sudo btrfs scrub resume /mnt

Scrub Output Explained:

Scrub Status Output

Example Output

$ sudo btrfs scrub status /mnt
Scrub started:    Fri Jan 15 10:00:00 2024
Status:           finished
Duration:         1:23:45
Total to scrub:   2.00TiB
Rate:             410.5MiB/s
Error summary:    
  read_errors:    0      # Disk couldn't read the block at all
  csum_errors:    3      # Checksum mismatch (corruption detected!)  
  verify_errors:  0      # Repair verification failed
  super_errors:   0      # Superblock corruption
  malloc_errors:  0      # System ran out of memory
  uncorrectable:  1      # Corruption found but no redundancy to fix
  corrected:      2      # Corruption found AND automatically repaired
 
Key Interpretation:
- csum_errors = 3: Three corrupted blocks were detected
- corrected = 2: Two were automatically fixed from redundant copies
- uncorrectable = 1: One corruption couldn't be fixed (no good copy)

What Scrub Does:

Reads every allocated data and metadata block on the file system
Computes the checksum of each block
Compares against stored checksums
If mismatch and redundancy exists (RAID1, DUP, RAID10):
- Reads the same block from other copies
- If a good copy exists, overwrites the bad copy
- Reports as "corrected"
If mismatch and no redundancy:
- Reports as "uncorrectable"
- Data is corrupted; only backup can restore it

Scrub I/O Impact:

Scrub is I/O intensive—it reads everything:

# Limit scrub I/O bandwidth
$ sudo ionice -c 3 btrfs scrub start /mnt
# -c 3 = idle class, only runs when disk is idle

# Alternatively, use kernel I/O limits
$ echo 100 | sudo tee /sys/fs/btrfs/$(btrfs filesystem show / | grep uuid | awk '{print $NF}')/scrub/bandwidth
# Limits to 100 MB/s (varies by kernel version)

Scrub Timing

Schedule scrubs during low-activity periods. On a 2TB drive at 100 MB/s, a full scrub takes ~5.5 hours. SSDs can complete much faster, but even they benefit from off-peak scheduling. Monthly scrubs are a reasonable starting point for most systems.

Self-Healing with Redundancy

Checksums detect corruption, but redundancy enables automatic repair. Btrfs can self-heal when configured with data redundancy profiles.

Redundancy Profiles:

Btrfs Redundancy Profiles and Self-Healing
Profile	Copies	Can Self-Heal	Use Case
single	1	❌ No	Maximum capacity, no redundancy
dup	2 on same device	✅ Yes (same disk)	Single disk with some protection
raid1	2 on different devices	✅ Yes	Mirror across two+ devices
raid1c3	3 on different devices	✅ Yes	Triple mirror, maximum redundancy
raid1c4	4 on different devices	✅ Yes	Quadruple mirror
raid10	2 (mirrored stripes)	✅ Yes	Performance + redundancy
raid5	Parity (1 disk loss)	⚠️ Limited (write hole)	Not recommended yet
raid6	Double parity	⚠️ Limited (write hole)	Not recommended yet

Self-Healing Flow:

Self-Healing Process

Visualization

Scenario: RAID1 with 2 devices, Block X is corrupted on Device A
 
           Device A                    Device B
       ┌──────────────┐           ┌──────────────┐
       │  Block X     │           │  Block X     │
       │  (CORRUPTED) │           │  (GOOD)      │
       │  csum: FAIL  │           │  csum: OK    │
       └──────────────┘           └──────────────┘
 
Step 1: Read Request for Block X
        ├─→ Read from Device A
        │   Compute checksum ─→ MISMATCH! Corruption detected
        │
Step 2: Try Alternate Copy  
        ├─→ Read from Device B
        │   Compute checksum ─→ MATCH! Good copy found
        │
Step 3: Repair
        ├─→ Write good copy from B to A (overwrite corruption)
        ├─→ Verify repair succeeded
        │
Step 4: Return Data
        └─→ Return good data to application (no error visible!)
 
Result:
- Application gets correct data
- Corruption is repaired automatically
- Logged in dmesg/journal for monitoring
- Scrub reports as "corrected_errors"

Setting Up Redundancy:

# Create file system with specific data/metadata profiles
$ mkfs.btrfs -d raid1 -m raid1 /dev/sda /dev/sdb

# Common configurations:
# Single disk, DUP for metadata (default for single device)
$ mkfs.btrfs -d single -m dup /dev/sda

# Two disks, RAID1 for both data and metadata
$ mkfs.btrfs -d raid1 -m raid1 /dev/sda /dev/sdb

# Four disks, RAID10 for data, RAID1C3 for metadata
$ mkfs.btrfs -d raid10 -m raid1c3 /dev/sd{a,b,c,d}

# Check current allocation profile
$ btrfs filesystem df /mnt
Data, RAID1: total=100.00GiB, used=80.00GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=2.00GiB, used=500.00MiB
GlobalReserve, single: total=256.00MiB, used=0.00B

DUP on Single Disk:

Even on a single disk, Btrfs can provide some protection:

# DUP stores two copies on the same device (different locations)
$ mkfs.btrfs -d dup -m dup /dev/sda

# Protects against:
# - Localized media failures
# - Some types of bad sector errors

# Does NOT protect against:
# - Full drive failure
# - Controller-level corruption

DUP on SSDs

Some SSDs have internal management that may place 'different' blocks in the same physical location, defeating DUP's purpose. For SSDs, prefer RAID1 across multiple devices when possible, or understand that DUP provides limited protection.

Handling Uncorrectable Errors

When scrub finds corruption without available redundancy, you face an uncorrectable error. This is a serious situation requiring careful handling.

Identifying Affected Files:

Scrub reports block locations, but you need to find which files are affected:

# After scrub with errors, check kernel log
$ dmesg | grep BTRFS
[12345.67] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[12345.68] BTRFS error (device sda1): unable to fixup (regular) error at logical 12345678 on dev /dev/sda1

# Find which file corresponds to that logical address
$ btrfs inspect-internal logical-resolve 12345678 /mnt
/mnt/path/to/corrupted/file.data

# Now you know which file to restore from backup

Recovery Options:

Recovery Strategies

•Restore from backup — The correct solution. If you have backups (btrfs send/receive snapshots to another disk), restore the affected file.
•Accept partial data loss — If the corrupted file isn't critical, delete it and move on. Better than keeping corrupted data.
•Try to read what remains — Use 'dd conv=noerror,sync' to extract readable portions if partial recovery is useful.
•Recreate from source — If it's a compiled binary, database cache, etc., it may be regeneratable.
•Forensic recovery — Professional data recovery services for irreplaceable, unrecoverable data (expensive, not guaranteed).

Preventing Future Uncorrectable Errors:

# 1. Add redundancy to existing file system
$ btrfs device add /dev/sdb /mnt
$ btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt
# Now data has two copies for future self-healing

# 2. Implement regular backups
$ btrfs subvolume snapshot -r /mnt/@ /mnt/@-daily
$ btrfs send /mnt/@-daily | ssh backup-server btrfs receive /backup/

# 3. Schedule regular scrubs
$ sudo systemctl enable btrfs-scrub@mnt.timer

When Corruption Persists:

If scrub keeps finding corruption:

Check SMART data: smartctl -a /dev/sda — failing drive?
Check cables/controllers: Replace cables, try different ports
Check RAM: Memory errors can corrupt data during writes (memtest86+)
Replace failing hardware before more data is lost

Don't Ignore Scrub Errors

Uncorrectable errors mean your data is damaged and more damage may be occurring. Treat any scrub error as urgent: identify affected files, restore from backups, and investigate the root cause. Continuing to use a system with unaddressed corruption risks further data loss.

Scheduling and Automation

Regular scrubbing is essential for proactive data integrity. Most distributions provide built-in systemd timers; custom cron jobs are also an option.

Using Systemd Timers:

# Many distributions include btrfs-scrub timers
$ systemctl list-timers | grep scrub

# Enable weekly scrub for mount point
$ sudo systemctl enable --now btrfs-scrub@mnt.timer
$ sudo systemctl enable --now btrfs-scrub@home.timer

# Check timer status
$ systemctl status btrfs-scrub@mnt.timer
● btrfs-scrub@mnt.timer - Monthly Btrfs Scrub for /mnt
   Loaded: loaded (/lib/systemd/system/btrfs-scrub@.timer; enabled)
   Active: active (waiting) since Mon 2024-01-15 10:00:00 UTC

Custom Scrub Script:

#!/bin/bash
# /usr/local/bin/btrfs-scrub-all.sh

LOGFILE="/var/log/btrfs-scrub.log"

echo "=== Btrfs Scrub Run: $(date) ===" >> "$LOGFILE"

# Get all Btrfs mount points
for mp in $(findmnt -t btrfs -no TARGET); do
    echo "Starting scrub on $mp" >> "$LOGFILE"
    
    # Run scrub, wait for completion
    btrfs scrub start -B "$mp" >> "$LOGFILE" 2>&1
    
    # Get and log status
    btrfs scrub status "$mp" >> "$LOGFILE"
    
    # Check for errors
    if btrfs scrub status "$mp" | grep -q "uncorrectable_errors: [^0]"; then
        echo "ALERT: Uncorrectable errors found on $mp!" | 
            mail -s "Btrfs Scrub Alert" admin@example.com
    fi
    
    echo "---" >> "$LOGFILE"
done

Cron Job Configuration:

# /etc/cron.d/btrfs-scrub
# Run scrub monthly at 3 AM on the 1st
0 3 1 * * root /usr/local/bin/btrfs-scrub-all.sh

# Or weekly on Sunday at 2 AM
0 2 * * 0 root /usr/local/bin/btrfs-scrub-all.sh

Recommended Scrub Frequency:

Scrub Frequency Recommendations
Environment	Frequency	Rationale
Desktop/Laptop	Monthly	Personal data, moderate change rate
Workstation	Weekly-Monthly	Development data, more frequent changes
Server (general)	Weekly	Production data, proactive detection
NAS/Archive	Weekly	Archival focus, catch rot early
High-value data	Weekly + after events	Mission critical, scrub after any hardware events
After hardware changes	Immediately	New disks, controller changes, power events

Monitor Scrub Duration

Track how long scrubs take. A steadily increasing scrub duration might indicate a failing drive needing to retry reads. Sudden jumps in duration warrant investigation.

Comprehensive Data Integrity Strategy

Scrubbing is one component of a comprehensive data integrity strategy. The complete picture includes multiple layers of protection.

The Defense-in-Depth Approach:

Data Integrity Layers

Strategy

Layer 1: HARDWARE
════════════════════════════════════════════════════════
✓ Use quality drives with good reliability records
✓ ECC RAM to prevent memory-induced corruption
✓ Quality power supply and surge protection
✓ Monitor SMART data for early warning signs
✓ Replace aging drives proactively
 
Layer 2: FILE SYSTEM CONFIGURATION
════════════════════════════════════════════════════════
✓ Btrfs with checksums enabled (default)
✓ Redundant data profile (raid1, dup)
✓ Redundant metadata (raid1c3 for maximum protection)
✓ Appropriate checksum algorithm for threat model
 
Layer 3: ACTIVE MONITORING
════════════════════════════════════════════════════════
✓ Scheduled scrubs (weekly/monthly)
✓ SMART monitoring and alerts
✓ Scrub error notifications
✓ File system usage and health checks
 
Layer 4: BACKUP STRATEGY
════════════════════════════════════════════════════════
✓ Regular snapshots (hourly/daily)
✓ Off-site backup via send/receive
✓ Multiple backup generations
✓ Periodic backup verification (test restores!)
 
Layer 5: DISASTER RECOVERY
════════════════════════════════════════════════════════
✓ Documented recovery procedures
✓ Tested recovery process
✓ Off-site/cloud backup for catastrophic failure
✓ RAID doesn't replace backups!

Integration with System Monitoring:

# Monitor Btrfs health with Prometheus/Grafana
# Example metrics to collect:

# Scrub statistics (extract from btrfs scrub status)
btrfs_scrub_corrected_errors{mountpoint="/"}
btrfs_scrub_uncorrectable_errors{mountpoint="/"}
btrfs_scrub_last_run_timestamp{mountpoint="/"}

# Device statistics
btrfs_device_read_errors{device="/dev/sda1"}
btrfs_device_write_errors{device="/dev/sda1"}
btrfs_device_corruption_errors{device="/dev/sda1"}

# Alert examples:
alert BtrfsUncorrectableErrors
  expr: btrfs_scrub_uncorrectable_errors > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Btrfs uncorrectable errors detected"
    
alert BtrfsScrubOverdue
  expr: time() - btrfs_scrub_last_run_timestamp > 2592000  # 30 days
  labels:
    severity: warning

Data Integrity Checklist

•☐ Data redundancy configured (raid1, raid1c3, or dup)
•☐ Metadata redundancy configured (raid1 or better)
•☐ Scheduled scrubs enabled and running
•☐ Scrub notifications configured (email, monitoring)
•☐ Regular snapshots automated
•☐ Off-site backups via send/receive
•☐ Backup restoration tested
•☐ SMART monitoring active
•☐ ECC RAM installed (if mission-critical)
•☐ Recovery procedure documented

The 3-2-1 Backup Rule Still Applies

3 copies of data, on 2 different media types, with 1 offsite. Btrfs features complement this—snapshots count as copies for point-in-time recovery, send/receive enables efficient off-site replication, and scrubbing ensures those copies remain valid.

Summary and Key Takeaways

Data scrubbing and checksums form the foundation of Btrfs's data integrity guarantees. Let's consolidate the critical concepts:

Key Takeaways

•Silent corruption is real — Bit rot, firmware bugs, and hardware errors corrupt data without traditional file system detection
•Checksums enable detection — CRC32C (default) or cryptographic hashes verify every block on read
•Scrub proactively checks everything — Regular scrubs catch corruption before you need the data
•Redundancy enables self-healing — RAID1, DUP, or RAID10 provide copies for automatic repair
•Uncorrectable errors require backups — Without redundancy or backup, corrupted data is permanently lost
•Schedule regular scrubs — Monthly minimum; weekly for important data
•Monitor for errors — Configure alerts for scrub errors; investigate promptly
•Defense in depth — Combine checksums, redundancy, scrubbing, snapshots, and off-site backups

Module Complete:

This concludes our comprehensive exploration of Btrfs. You now understand:

B-tree architectural foundations — The unified data structure enabling Btrfs's capabilities
Copy-on-Write semantics — Never overwrite, enabling atomic operations and crash consistency
Subvolumes — Lightweight, flexible containers for organizing storage
Snapshots — Instant point-in-time copies for versioning, backup, and recovery
Data scrubbing — Active verification and self-healing for data integrity

Btrfs represents the state of the art in Linux file system technology, providing enterprise-grade features with the flexibility and integration that Linux users expect.

Module Complete

You have completed the comprehensive Btrfs module. You now possess deep knowledge of this cutting-edge file system—from its B-tree foundations through COW mechanisms, subvolumes, snapshots, and data integrity features. This knowledge enables you to design, deploy, and maintain robust Btrfs storage systems.

Data Scrubbing

The Silent Enemy: Bit Rot

What You Will Learn

Understanding Data Corruption

Before exploring Btrfs's solutions, we need to understand the scope of the problem.

Sources of Silent Data Corruption:

Corruption Sources

•Cosmic rays — High-energy particles can flip bits in RAM or storage cells; more common at high altitudes and in data centers
•Electrical interference — Power fluctuations, nearby electromagnetic fields, or poorly shielded cables can induce bit errors
•Firmware bugs — Disk and controller firmware can have bugs that corrupt data during writes, reads, or internal operations
•Vibration and heat — Physical stress can cause read/write head positioning errors or cell degradation
•Media degradation — Magnetic domains weaken over time; flash cells lose charge; optical media degrades with light exposure
•Manufacturing defects — Marginal sectors or cells may pass initial tests but fail later
•Misdirected writes — A write intended for sector X lands on sector Y, often undetected by the storage device
•Lost writes — The device acknowledges a write that never actually persisted

The Scale of the Problem:

Studies of large storage systems have revealed alarming statistics:

CERN found ~1 undetected corruption event per 10 TB transferred
NetApp reported silent corruption in 8.5% of RAID groups over 17 months
Carnegie Mellon USENIX study found 3.45% of enterprise RAID systems had silent corruption

Why Traditional File Systems Can't Help:

File systems like ext4, XFS, and NTFS store data but don't independently verify it:

Data is written to disk
Later, data is read back
File system returns what the disk provides
If the disk returns corrupted data, the file system delivers corrupted data
No comparison against expected values occurs

These file systems trust the hardware, but as we've seen, hardware is not always trustworthy.

RAID Doesn't Solve This

Btrfs Checksums

Btrfs solves the corruption problem by computing and storing cryptographic checksums for every block of data and every tree node.

How Checksums Work:

On Write: When data is written, Btrfs computes a checksum of the block content
Checksum Storage: The checksum is stored in the checksum tree (separate from data)
On Read: When data is read, Btrfs recomputes the checksum
Verification: If computed checksum matches stored checksum → data is good
Corruption Detected: If checksums differ → corruption is flagged

Checksum Algorithms:

Btrfs Checksum Algorithms
Algorithm	Size	Speed	Security	Notes
CRC32C	4 bytes	Very fast	Low	Default, hardware accelerated on modern CPUs
xxHash	8 bytes	Fast	Low	Faster than CRC32C in software, good for large datasets
SHA256	32 bytes	Slow	High	Cryptographically secure, protects against malicious attacks
BLAKE2b	32 bytes	Medium	High	Fast cryptographic hash, good balance

Choosing a Checksum Algorithm:

# Create file system with specific checksum algorithm
$ mkfs.btrfs --csum sha256 /dev/sda1
$ mkfs.btrfs --csum xxhash /dev/sda1
$ mkfs.btrfs --csum blake2 /dev/sda1

# Check current algorithm
$ btrfs inspect-internal dump-super /dev/sda1 | grep csum_type
csum_type               sha256

Algorithm Selection Guidelines:

CRC32C (default): Best for most uses; hardware accelerated; protects against accidental corruption
xxHash: Consider for software RAID or when CPU lacks CRC32C instructions
SHA256/BLAKE2b: Use when cryptographic protection is needed (e.g., malicious attacker could craft files with matching CRC32C)

Checksum Overhead:

Checksums add storage and CPU overhead:

Storage: 4 bytes per 4KB block (CRC32C) = ~0.1% overhead; 32 bytes per block (SHA256) = ~0.8% overhead
CPU: CRC32C with hardware: negligible; SHA256: measurable but usually acceptable

Checksum Architecture

Visualization

Data Write Flow:
═══════════════════════════════════════════════════════════════
 
User Data: [Block A: "Hello World...4KB of content..."]
                │
                ▼
Compute Checksum: crc32c("Hello World...") = 0xABCD1234
                │
                ├──────────────────────────┐
                ▼                          ▼
Data Tree:                         Checksum Tree:
[Extent pointer to Block A]        [Offset: 0, Checksum: 0xABCD1234]
 
 
Data Read Flow:
═══════════════════════════════════════════════════════════════
 
Read Request for offset 0
        │
        ├──────────────────────────┐
        ▼                          ▼
Fetch from disk:            Fetch expected checksum:
Block A content             0xABCD1234
        │                          │
        ▼                          ▼
Compute: crc32c(content) ──────── Compare ──────→ Match? 
        │                                           │
        ├────────────────── Yes ────────────────────┤
        │                                           │
        ▼                         No ───────────────┘
Return data to user               │
                                  ▼
                           Return -EIO error
                           (or attempt repair if redundancy exists)

Metadata Checksums Are Always On

The Scrub Operation

Running a Scrub:

# Start a scrub (runs in background)
$ sudo btrfs scrub start /mnt
scrub started on /mnt, fsid a1b2c3d4-...

# Check scrub status
$ sudo btrfs scrub status /mnt
Scrub started:    Fri Jan 15 10:00:00 2024
Status:           running
Duration:         0:05:32
Total to scrub:   500.00GiB
Rate:             1.5GiB/s
Error summary:    no errors found

# Cancel a running scrub
$ sudo btrfs scrub cancel /mnt

# Resume a cancelled/interrupted scrub
$ sudo btrfs scrub resume /mnt

Scrub Output Explained:

Scrub Status Output

Example Output

$ sudo btrfs scrub status /mnt
Scrub started:    Fri Jan 15 10:00:00 2024
Status:           finished
Duration:         1:23:45
Total to scrub:   2.00TiB
Rate:             410.5MiB/s
Error summary:    
  read_errors:    0      # Disk couldn't read the block at all
  csum_errors:    3      # Checksum mismatch (corruption detected!)  
  verify_errors:  0      # Repair verification failed
  super_errors:   0      # Superblock corruption
  malloc_errors:  0      # System ran out of memory
  uncorrectable:  1      # Corruption found but no redundancy to fix
  corrected:      2      # Corruption found AND automatically repaired
 
Key Interpretation:
- csum_errors = 3: Three corrupted blocks were detected
- corrected = 2: Two were automatically fixed from redundant copies
- uncorrectable = 1: One corruption couldn't be fixed (no good copy)

What Scrub Does:

Reads every allocated data and metadata block on the file system
Computes the checksum of each block
Compares against stored checksums
If mismatch and redundancy exists (RAID1, DUP, RAID10):
- Reads the same block from other copies
- If a good copy exists, overwrites the bad copy
- Reports as "corrected"
If mismatch and no redundancy:
- Reports as "uncorrectable"
- Data is corrupted; only backup can restore it

Scrub I/O Impact:

Scrub is I/O intensive—it reads everything:

# Limit scrub I/O bandwidth
$ sudo ionice -c 3 btrfs scrub start /mnt
# -c 3 = idle class, only runs when disk is idle

# Alternatively, use kernel I/O limits
$ echo 100 | sudo tee /sys/fs/btrfs/$(btrfs filesystem show / | grep uuid | awk '{print $NF}')/scrub/bandwidth
# Limits to 100 MB/s (varies by kernel version)

Scrub Timing

Self-Healing with Redundancy

Checksums detect corruption, but redundancy enables automatic repair. Btrfs can self-heal when configured with data redundancy profiles.

Redundancy Profiles:

Btrfs Redundancy Profiles and Self-Healing
Profile	Copies	Can Self-Heal	Use Case
single	1	❌ No	Maximum capacity, no redundancy
dup	2 on same device	✅ Yes (same disk)	Single disk with some protection
raid1	2 on different devices	✅ Yes	Mirror across two+ devices
raid1c3	3 on different devices	✅ Yes	Triple mirror, maximum redundancy
raid1c4	4 on different devices	✅ Yes	Quadruple mirror
raid10	2 (mirrored stripes)	✅ Yes	Performance + redundancy
raid5	Parity (1 disk loss)	⚠️ Limited (write hole)	Not recommended yet
raid6	Double parity	⚠️ Limited (write hole)	Not recommended yet

Self-Healing Flow:

Self-Healing Process

Visualization

Scenario: RAID1 with 2 devices, Block X is corrupted on Device A
 
           Device A                    Device B
       ┌──────────────┐           ┌──────────────┐
       │  Block X     │           │  Block X     │
       │  (CORRUPTED) │           │  (GOOD)      │
       │  csum: FAIL  │           │  csum: OK    │
       └──────────────┘           └──────────────┘
 
Step 1: Read Request for Block X
        ├─→ Read from Device A
        │   Compute checksum ─→ MISMATCH! Corruption detected
        │
Step 2: Try Alternate Copy  
        ├─→ Read from Device B
        │   Compute checksum ─→ MATCH! Good copy found
        │
Step 3: Repair
        ├─→ Write good copy from B to A (overwrite corruption)
        ├─→ Verify repair succeeded
        │
Step 4: Return Data
        └─→ Return good data to application (no error visible!)
 
Result:
- Application gets correct data
- Corruption is repaired automatically
- Logged in dmesg/journal for monitoring
- Scrub reports as "corrected_errors"

Setting Up Redundancy:

# Create file system with specific data/metadata profiles
$ mkfs.btrfs -d raid1 -m raid1 /dev/sda /dev/sdb

# Common configurations:
# Single disk, DUP for metadata (default for single device)
$ mkfs.btrfs -d single -m dup /dev/sda

# Two disks, RAID1 for both data and metadata
$ mkfs.btrfs -d raid1 -m raid1 /dev/sda /dev/sdb

# Four disks, RAID10 for data, RAID1C3 for metadata
$ mkfs.btrfs -d raid10 -m raid1c3 /dev/sd{a,b,c,d}

# Check current allocation profile
$ btrfs filesystem df /mnt
Data, RAID1: total=100.00GiB, used=80.00GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=2.00GiB, used=500.00MiB
GlobalReserve, single: total=256.00MiB, used=0.00B

DUP on Single Disk:

Even on a single disk, Btrfs can provide some protection:

# DUP stores two copies on the same device (different locations)
$ mkfs.btrfs -d dup -m dup /dev/sda

# Protects against:
# - Localized media failures
# - Some types of bad sector errors

# Does NOT protect against:
# - Full drive failure
# - Controller-level corruption

DUP on SSDs

Handling Uncorrectable Errors

When scrub finds corruption without available redundancy, you face an uncorrectable error. This is a serious situation requiring careful handling.

Identifying Affected Files:

Scrub reports block locations, but you need to find which files are affected:

# After scrub with errors, check kernel log
$ dmesg | grep BTRFS
[12345.67] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[12345.68] BTRFS error (device sda1): unable to fixup (regular) error at logical 12345678 on dev /dev/sda1

# Find which file corresponds to that logical address
$ btrfs inspect-internal logical-resolve 12345678 /mnt
/mnt/path/to/corrupted/file.data

# Now you know which file to restore from backup

Recovery Options:

Recovery Strategies

•Restore from backup — The correct solution. If you have backups (btrfs send/receive snapshots to another disk), restore the affected file.
•Accept partial data loss — If the corrupted file isn't critical, delete it and move on. Better than keeping corrupted data.
•Try to read what remains — Use 'dd conv=noerror,sync' to extract readable portions if partial recovery is useful.
•Recreate from source — If it's a compiled binary, database cache, etc., it may be regeneratable.
•Forensic recovery — Professional data recovery services for irreplaceable, unrecoverable data (expensive, not guaranteed).

Preventing Future Uncorrectable Errors:

# 1. Add redundancy to existing file system
$ btrfs device add /dev/sdb /mnt
$ btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt
# Now data has two copies for future self-healing

# 2. Implement regular backups
$ btrfs subvolume snapshot -r /mnt/@ /mnt/@-daily
$ btrfs send /mnt/@-daily | ssh backup-server btrfs receive /backup/

# 3. Schedule regular scrubs
$ sudo systemctl enable btrfs-scrub@mnt.timer

When Corruption Persists:

If scrub keeps finding corruption:

Check SMART data: smartctl -a /dev/sda — failing drive?
Check cables/controllers: Replace cables, try different ports
Check RAM: Memory errors can corrupt data during writes (memtest86+)
Replace failing hardware before more data is lost

Don't Ignore Scrub Errors

Scheduling and Automation

Regular scrubbing is essential for proactive data integrity. Most distributions provide built-in systemd timers; custom cron jobs are also an option.

Using Systemd Timers:

# Many distributions include btrfs-scrub timers
$ systemctl list-timers | grep scrub

# Enable weekly scrub for mount point
$ sudo systemctl enable --now btrfs-scrub@mnt.timer
$ sudo systemctl enable --now btrfs-scrub@home.timer

# Check timer status
$ systemctl status btrfs-scrub@mnt.timer
● btrfs-scrub@mnt.timer - Monthly Btrfs Scrub for /mnt
   Loaded: loaded (/lib/systemd/system/btrfs-scrub@.timer; enabled)
   Active: active (waiting) since Mon 2024-01-15 10:00:00 UTC

Custom Scrub Script:

#!/bin/bash
# /usr/local/bin/btrfs-scrub-all.sh

LOGFILE="/var/log/btrfs-scrub.log"

echo "=== Btrfs Scrub Run: $(date) ===" >> "$LOGFILE"

# Get all Btrfs mount points
for mp in $(findmnt -t btrfs -no TARGET); do
    echo "Starting scrub on $mp" >> "$LOGFILE"
    
    # Run scrub, wait for completion
    btrfs scrub start -B "$mp" >> "$LOGFILE" 2>&1
    
    # Get and log status
    btrfs scrub status "$mp" >> "$LOGFILE"
    
    # Check for errors
    if btrfs scrub status "$mp" | grep -q "uncorrectable_errors: [^0]"; then
        echo "ALERT: Uncorrectable errors found on $mp!" | 
            mail -s "Btrfs Scrub Alert" admin@example.com
    fi
    
    echo "---" >> "$LOGFILE"
done

Cron Job Configuration:

# /etc/cron.d/btrfs-scrub
# Run scrub monthly at 3 AM on the 1st
0 3 1 * * root /usr/local/bin/btrfs-scrub-all.sh

# Or weekly on Sunday at 2 AM
0 2 * * 0 root /usr/local/bin/btrfs-scrub-all.sh

Recommended Scrub Frequency:

Scrub Frequency Recommendations
Environment	Frequency	Rationale
Desktop/Laptop	Monthly	Personal data, moderate change rate
Workstation	Weekly-Monthly	Development data, more frequent changes
Server (general)	Weekly	Production data, proactive detection
NAS/Archive	Weekly	Archival focus, catch rot early
High-value data	Weekly + after events	Mission critical, scrub after any hardware events
After hardware changes	Immediately	New disks, controller changes, power events

Monitor Scrub Duration

Track how long scrubs take. A steadily increasing scrub duration might indicate a failing drive needing to retry reads. Sudden jumps in duration warrant investigation.

Comprehensive Data Integrity Strategy

Scrubbing is one component of a comprehensive data integrity strategy. The complete picture includes multiple layers of protection.

The Defense-in-Depth Approach:

Data Integrity Layers

Strategy

Layer 1: HARDWARE
════════════════════════════════════════════════════════
✓ Use quality drives with good reliability records
✓ ECC RAM to prevent memory-induced corruption
✓ Quality power supply and surge protection
✓ Monitor SMART data for early warning signs
✓ Replace aging drives proactively
 
Layer 2: FILE SYSTEM CONFIGURATION
════════════════════════════════════════════════════════
✓ Btrfs with checksums enabled (default)
✓ Redundant data profile (raid1, dup)
✓ Redundant metadata (raid1c3 for maximum protection)
✓ Appropriate checksum algorithm for threat model
 
Layer 3: ACTIVE MONITORING
════════════════════════════════════════════════════════
✓ Scheduled scrubs (weekly/monthly)
✓ SMART monitoring and alerts
✓ Scrub error notifications
✓ File system usage and health checks
 
Layer 4: BACKUP STRATEGY
════════════════════════════════════════════════════════
✓ Regular snapshots (hourly/daily)
✓ Off-site backup via send/receive
✓ Multiple backup generations
✓ Periodic backup verification (test restores!)
 
Layer 5: DISASTER RECOVERY
════════════════════════════════════════════════════════
✓ Documented recovery procedures
✓ Tested recovery process
✓ Off-site/cloud backup for catastrophic failure
✓ RAID doesn't replace backups!

Integration with System Monitoring:

# Monitor Btrfs health with Prometheus/Grafana
# Example metrics to collect:

# Scrub statistics (extract from btrfs scrub status)
btrfs_scrub_corrected_errors{mountpoint="/"}
btrfs_scrub_uncorrectable_errors{mountpoint="/"}
btrfs_scrub_last_run_timestamp{mountpoint="/"}

# Device statistics
btrfs_device_read_errors{device="/dev/sda1"}
btrfs_device_write_errors{device="/dev/sda1"}
btrfs_device_corruption_errors{device="/dev/sda1"}

# Alert examples:
alert BtrfsUncorrectableErrors
  expr: btrfs_scrub_uncorrectable_errors > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Btrfs uncorrectable errors detected"
    
alert BtrfsScrubOverdue
  expr: time() - btrfs_scrub_last_run_timestamp > 2592000  # 30 days
  labels:
    severity: warning

Data Integrity Checklist

•☐ Data redundancy configured (raid1, raid1c3, or dup)
•☐ Metadata redundancy configured (raid1 or better)
•☐ Scheduled scrubs enabled and running
•☐ Scrub notifications configured (email, monitoring)
•☐ Regular snapshots automated
•☐ Off-site backups via send/receive
•☐ Backup restoration tested
•☐ SMART monitoring active
•☐ ECC RAM installed (if mission-critical)
•☐ Recovery procedure documented

The 3-2-1 Backup Rule Still Applies

Summary and Key Takeaways

Data scrubbing and checksums form the foundation of Btrfs's data integrity guarantees. Let's consolidate the critical concepts:

Key Takeaways

•Silent corruption is real — Bit rot, firmware bugs, and hardware errors corrupt data without traditional file system detection
•Checksums enable detection — CRC32C (default) or cryptographic hashes verify every block on read
•Scrub proactively checks everything — Regular scrubs catch corruption before you need the data
•Redundancy enables self-healing — RAID1, DUP, or RAID10 provide copies for automatic repair
•Uncorrectable errors require backups — Without redundancy or backup, corrupted data is permanently lost
•Schedule regular scrubs — Monthly minimum; weekly for important data
•Monitor for errors — Configure alerts for scrub errors; investigate promptly
•Defense in depth — Combine checksums, redundancy, scrubbing, snapshots, and off-site backups

Module Complete:

This concludes our comprehensive exploration of Btrfs. You now understand:

B-tree architectural foundations — The unified data structure enabling Btrfs's capabilities
Copy-on-Write semantics — Never overwrite, enabling atomic operations and crash consistency
Subvolumes — Lightweight, flexible containers for organizing storage
Snapshots — Instant point-in-time copies for versioning, backup, and recovery
Data scrubbing — Active verification and self-healing for data integrity

Btrfs represents the state of the art in Linux file system technology, providing enterprise-grade features with the flexibility and integration that Linux users expect.

Module Complete