Loading content...
Your data is under constant attack from an invisible enemy. Bit rot—the gradual corruption of data through cosmic rays, electrical interference, firmware bugs, controller errors, and media degradation—silently flips bits in your files. A single bit flip in a photograph might cause a barely noticeable color change. A bit flip in a database could corrupt critical records. A bit flip in an executable could cause crashes or security vulnerabilities.
Traditional file systems are blind to this corruption. They trust that data read from disk is the same data that was written. By the time you discover the corruption—if you ever do—it may have propagated to your backups, leaving you with no clean copy.
Btrfs data scrubbing is the antidote. By storing cryptographic checksums for every block of data and metadata, Btrfs can actively verify data integrity and, when configured with redundancy, automatically repair corruption.
By the end of this page, you will understand Btrfs's data integrity architecture: how checksums work, the scrub operation that verifies entire file systems, self-healing with redundant storage, monitoring and scheduling scrubs, and best practices for data integrity maintenance.
Before exploring Btrfs's solutions, we need to understand the scope of the problem.
Sources of Silent Data Corruption:
The Scale of the Problem:
Studies of large storage systems have revealed alarming statistics:
Why Traditional File Systems Can't Help:
File systems like ext4, XFS, and NTFS store data but don't independently verify it:
These file systems trust the hardware, but as we've seen, hardware is not always trustworthy.
Traditional RAID protects against drive failure, not data corruption. If a disk silently returns bad data, RAID can't detect it. Worse, RAID rebuild operations might propagate the corruption or even choose the wrong data during reconstruction. This is called 'silent data corruption'—RAID's blind spot.
Btrfs solves the corruption problem by computing and storing cryptographic checksums for every block of data and every tree node.
How Checksums Work:
Checksum Algorithms:
| Algorithm | Size | Speed | Security | Notes |
|---|---|---|---|---|
| CRC32C | 4 bytes | Very fast | Low | Default, hardware accelerated on modern CPUs |
| xxHash | 8 bytes | Fast | Low | Faster than CRC32C in software, good for large datasets |
| SHA256 | 32 bytes | Slow | High | Cryptographically secure, protects against malicious attacks |
| BLAKE2b | 32 bytes | Medium | High | Fast cryptographic hash, good balance |
Choosing a Checksum Algorithm:
# Create file system with specific checksum algorithm
$ mkfs.btrfs --csum sha256 /dev/sda1
$ mkfs.btrfs --csum xxhash /dev/sda1
$ mkfs.btrfs --csum blake2 /dev/sda1
# Check current algorithm
$ btrfs inspect-internal dump-super /dev/sda1 | grep csum_type
csum_type sha256
Algorithm Selection Guidelines:
Checksum Overhead:
Checksums add storage and CPU overhead:
12345678910111213141516171819202122232425262728293031323334
Data Write Flow:═══════════════════════════════════════════════════════════════ User Data: [Block A: "Hello World...4KB of content..."] │ ▼Compute Checksum: crc32c("Hello World...") = 0xABCD1234 │ ├──────────────────────────┐ ▼ ▼Data Tree: Checksum Tree:[Extent pointer to Block A] [Offset: 0, Checksum: 0xABCD1234] Data Read Flow:═══════════════════════════════════════════════════════════════ Read Request for offset 0 │ ├──────────────────────────┐ ▼ ▼Fetch from disk: Fetch expected checksum:Block A content 0xABCD1234 │ │ ▼ ▼Compute: crc32c(content) ──────── Compare ──────→ Match? │ │ ├────────────────── Yes ────────────────────┤ │ │ ▼ No ───────────────┘Return data to user │ ▼ Return -EIO error (or attempt repair if redundancy exists)Btrfs tree nodes (metadata) always have checksums stored in the node header, regardless of data checksum settings. This protects file system structure integrity. The 'nodatasum' mount option only affects file data checksums.
Checksums verify data when it's read, but some data may not be read for months or years. Scrub is a proactive operation that reads and verifies the entire file system, ensuring all data is checked regularly.
Running a Scrub:
# Start a scrub (runs in background)
$ sudo btrfs scrub start /mnt
scrub started on /mnt, fsid a1b2c3d4-...
# Check scrub status
$ sudo btrfs scrub status /mnt
Scrub started: Fri Jan 15 10:00:00 2024
Status: running
Duration: 0:05:32
Total to scrub: 500.00GiB
Rate: 1.5GiB/s
Error summary: no errors found
# Cancel a running scrub
$ sudo btrfs scrub cancel /mnt
# Resume a cancelled/interrupted scrub
$ sudo btrfs scrub resume /mnt
Scrub Output Explained:
12345678910111213141516171819
$ sudo btrfs scrub status /mntScrub started: Fri Jan 15 10:00:00 2024Status: finishedDuration: 1:23:45Total to scrub: 2.00TiBRate: 410.5MiB/sError summary: read_errors: 0 # Disk couldn't read the block at all csum_errors: 3 # Checksum mismatch (corruption detected!) verify_errors: 0 # Repair verification failed super_errors: 0 # Superblock corruption malloc_errors: 0 # System ran out of memory uncorrectable: 1 # Corruption found but no redundancy to fix corrected: 2 # Corruption found AND automatically repaired Key Interpretation:- csum_errors = 3: Three corrupted blocks were detected- corrected = 2: Two were automatically fixed from redundant copies- uncorrectable = 1: One corruption couldn't be fixed (no good copy)What Scrub Does:
Scrub I/O Impact:
Scrub is I/O intensive—it reads everything:
# Limit scrub I/O bandwidth
$ sudo ionice -c 3 btrfs scrub start /mnt
# -c 3 = idle class, only runs when disk is idle
# Alternatively, use kernel I/O limits
$ echo 100 | sudo tee /sys/fs/btrfs/$(btrfs filesystem show / | grep uuid | awk '{print $NF}')/scrub/bandwidth
# Limits to 100 MB/s (varies by kernel version)
Schedule scrubs during low-activity periods. On a 2TB drive at 100 MB/s, a full scrub takes ~5.5 hours. SSDs can complete much faster, but even they benefit from off-peak scheduling. Monthly scrubs are a reasonable starting point for most systems.
Checksums detect corruption, but redundancy enables automatic repair. Btrfs can self-heal when configured with data redundancy profiles.
Redundancy Profiles:
| Profile | Copies | Can Self-Heal | Use Case |
|---|---|---|---|
| single | 1 | ❌ No | Maximum capacity, no redundancy |
| dup | 2 on same device | ✅ Yes (same disk) | Single disk with some protection |
| raid1 | 2 on different devices | ✅ Yes | Mirror across two+ devices |
| raid1c3 | 3 on different devices | ✅ Yes | Triple mirror, maximum redundancy |
| raid1c4 | 4 on different devices | ✅ Yes | Quadruple mirror |
| raid10 | 2 (mirrored stripes) | ✅ Yes | Performance + redundancy |
| raid5 | Parity (1 disk loss) | ⚠️ Limited (write hole) | Not recommended yet |
| raid6 | Double parity | ⚠️ Limited (write hole) | Not recommended yet |
Self-Healing Flow:
1234567891011121314151617181920212223242526272829
Scenario: RAID1 with 2 devices, Block X is corrupted on Device A Device A Device B ┌──────────────┐ ┌──────────────┐ │ Block X │ │ Block X │ │ (CORRUPTED) │ │ (GOOD) │ │ csum: FAIL │ │ csum: OK │ └──────────────┘ └──────────────┘ Step 1: Read Request for Block X ├─→ Read from Device A │ Compute checksum ─→ MISMATCH! Corruption detected │Step 2: Try Alternate Copy ├─→ Read from Device B │ Compute checksum ─→ MATCH! Good copy found │Step 3: Repair ├─→ Write good copy from B to A (overwrite corruption) ├─→ Verify repair succeeded │Step 4: Return Data └─→ Return good data to application (no error visible!) Result:- Application gets correct data- Corruption is repaired automatically- Logged in dmesg/journal for monitoring- Scrub reports as "corrected_errors"Setting Up Redundancy:
# Create file system with specific data/metadata profiles
$ mkfs.btrfs -d raid1 -m raid1 /dev/sda /dev/sdb
# Common configurations:
# Single disk, DUP for metadata (default for single device)
$ mkfs.btrfs -d single -m dup /dev/sda
# Two disks, RAID1 for both data and metadata
$ mkfs.btrfs -d raid1 -m raid1 /dev/sda /dev/sdb
# Four disks, RAID10 for data, RAID1C3 for metadata
$ mkfs.btrfs -d raid10 -m raid1c3 /dev/sd{a,b,c,d}
# Check current allocation profile
$ btrfs filesystem df /mnt
Data, RAID1: total=100.00GiB, used=80.00GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=2.00GiB, used=500.00MiB
GlobalReserve, single: total=256.00MiB, used=0.00B
DUP on Single Disk:
Even on a single disk, Btrfs can provide some protection:
# DUP stores two copies on the same device (different locations)
$ mkfs.btrfs -d dup -m dup /dev/sda
# Protects against:
# - Localized media failures
# - Some types of bad sector errors
# Does NOT protect against:
# - Full drive failure
# - Controller-level corruption
Some SSDs have internal management that may place 'different' blocks in the same physical location, defeating DUP's purpose. For SSDs, prefer RAID1 across multiple devices when possible, or understand that DUP provides limited protection.
When scrub finds corruption without available redundancy, you face an uncorrectable error. This is a serious situation requiring careful handling.
Identifying Affected Files:
Scrub reports block locations, but you need to find which files are affected:
# After scrub with errors, check kernel log
$ dmesg | grep BTRFS
[12345.67] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[12345.68] BTRFS error (device sda1): unable to fixup (regular) error at logical 12345678 on dev /dev/sda1
# Find which file corresponds to that logical address
$ btrfs inspect-internal logical-resolve 12345678 /mnt
/mnt/path/to/corrupted/file.data
# Now you know which file to restore from backup
Recovery Options:
Preventing Future Uncorrectable Errors:
# 1. Add redundancy to existing file system
$ btrfs device add /dev/sdb /mnt
$ btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt
# Now data has two copies for future self-healing
# 2. Implement regular backups
$ btrfs subvolume snapshot -r /mnt/@ /mnt/@-daily
$ btrfs send /mnt/@-daily | ssh backup-server btrfs receive /backup/
# 3. Schedule regular scrubs
$ sudo systemctl enable btrfs-scrub@mnt.timer
When Corruption Persists:
If scrub keeps finding corruption:
smartctl -a /dev/sda — failing drive?memtest86+)Uncorrectable errors mean your data is damaged and more damage may be occurring. Treat any scrub error as urgent: identify affected files, restore from backups, and investigate the root cause. Continuing to use a system with unaddressed corruption risks further data loss.
Regular scrubbing is essential for proactive data integrity. Most distributions provide built-in systemd timers; custom cron jobs are also an option.
Using Systemd Timers:
# Many distributions include btrfs-scrub timers
$ systemctl list-timers | grep scrub
# Enable weekly scrub for mount point
$ sudo systemctl enable --now btrfs-scrub@mnt.timer
$ sudo systemctl enable --now btrfs-scrub@home.timer
# Check timer status
$ systemctl status btrfs-scrub@mnt.timer
● btrfs-scrub@mnt.timer - Monthly Btrfs Scrub for /mnt
Loaded: loaded (/lib/systemd/system/btrfs-scrub@.timer; enabled)
Active: active (waiting) since Mon 2024-01-15 10:00:00 UTC
Custom Scrub Script:
#!/bin/bash
# /usr/local/bin/btrfs-scrub-all.sh
LOGFILE="/var/log/btrfs-scrub.log"
echo "=== Btrfs Scrub Run: $(date) ===" >> "$LOGFILE"
# Get all Btrfs mount points
for mp in $(findmnt -t btrfs -no TARGET); do
echo "Starting scrub on $mp" >> "$LOGFILE"
# Run scrub, wait for completion
btrfs scrub start -B "$mp" >> "$LOGFILE" 2>&1
# Get and log status
btrfs scrub status "$mp" >> "$LOGFILE"
# Check for errors
if btrfs scrub status "$mp" | grep -q "uncorrectable_errors: [^0]"; then
echo "ALERT: Uncorrectable errors found on $mp!" |
mail -s "Btrfs Scrub Alert" admin@example.com
fi
echo "---" >> "$LOGFILE"
done
Cron Job Configuration:
# /etc/cron.d/btrfs-scrub
# Run scrub monthly at 3 AM on the 1st
0 3 1 * * root /usr/local/bin/btrfs-scrub-all.sh
# Or weekly on Sunday at 2 AM
0 2 * * 0 root /usr/local/bin/btrfs-scrub-all.sh
Recommended Scrub Frequency:
| Environment | Frequency | Rationale |
|---|---|---|
| Desktop/Laptop | Monthly | Personal data, moderate change rate |
| Workstation | Weekly-Monthly | Development data, more frequent changes |
| Server (general) | Weekly | Production data, proactive detection |
| NAS/Archive | Weekly | Archival focus, catch rot early |
| High-value data | Weekly + after events | Mission critical, scrub after any hardware events |
| After hardware changes | Immediately | New disks, controller changes, power events |
Track how long scrubs take. A steadily increasing scrub duration might indicate a failing drive needing to retry reads. Sudden jumps in duration warrant investigation.
Scrubbing is one component of a comprehensive data integrity strategy. The complete picture includes multiple layers of protection.
The Defense-in-Depth Approach:
1234567891011121314151617181920212223242526272829303132333435
Layer 1: HARDWARE════════════════════════════════════════════════════════✓ Use quality drives with good reliability records✓ ECC RAM to prevent memory-induced corruption✓ Quality power supply and surge protection✓ Monitor SMART data for early warning signs✓ Replace aging drives proactively Layer 2: FILE SYSTEM CONFIGURATION════════════════════════════════════════════════════════✓ Btrfs with checksums enabled (default)✓ Redundant data profile (raid1, dup)✓ Redundant metadata (raid1c3 for maximum protection)✓ Appropriate checksum algorithm for threat model Layer 3: ACTIVE MONITORING════════════════════════════════════════════════════════✓ Scheduled scrubs (weekly/monthly)✓ SMART monitoring and alerts✓ Scrub error notifications✓ File system usage and health checks Layer 4: BACKUP STRATEGY════════════════════════════════════════════════════════✓ Regular snapshots (hourly/daily)✓ Off-site backup via send/receive✓ Multiple backup generations✓ Periodic backup verification (test restores!) Layer 5: DISASTER RECOVERY════════════════════════════════════════════════════════✓ Documented recovery procedures✓ Tested recovery process✓ Off-site/cloud backup for catastrophic failure✓ RAID doesn't replace backups!Integration with System Monitoring:
# Monitor Btrfs health with Prometheus/Grafana
# Example metrics to collect:
# Scrub statistics (extract from btrfs scrub status)
btrfs_scrub_corrected_errors{mountpoint="/"}
btrfs_scrub_uncorrectable_errors{mountpoint="/"}
btrfs_scrub_last_run_timestamp{mountpoint="/"}
# Device statistics
btrfs_device_read_errors{device="/dev/sda1"}
btrfs_device_write_errors{device="/dev/sda1"}
btrfs_device_corruption_errors{device="/dev/sda1"}
# Alert examples:
alert BtrfsUncorrectableErrors
expr: btrfs_scrub_uncorrectable_errors > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Btrfs uncorrectable errors detected"
alert BtrfsScrubOverdue
expr: time() - btrfs_scrub_last_run_timestamp > 2592000 # 30 days
labels:
severity: warning
3 copies of data, on 2 different media types, with 1 offsite. Btrfs features complement this—snapshots count as copies for point-in-time recovery, send/receive enables efficient off-site replication, and scrubbing ensures those copies remain valid.
Data scrubbing and checksums form the foundation of Btrfs's data integrity guarantees. Let's consolidate the critical concepts:
Module Complete:
This concludes our comprehensive exploration of Btrfs. You now understand:
Btrfs represents the state of the art in Linux file system technology, providing enterprise-grade features with the flexibility and integration that Linux users expect.
You have completed the comprehensive Btrfs module. You now possess deep knowledge of this cutting-edge file system—from its B-tree foundations through COW mechanisms, subvolumes, snapshots, and data integrity features. This knowledge enables you to design, deploy, and maintain robust Btrfs storage systems.