Loading learning content...
In the world of Copy-on-Write file systems, two names dominate: ZFS and btrfs. Both deliver the COW promise—snapshots, checksums, and data integrity—but they emerged from different origins, serve different communities, and make different architectural choices.
ZFS, born at Sun Microsystems in 2005, was designed from the ground up as a complete storage solution: file system, volume manager, and RAID implementation combined. It prioritizes data integrity above all else and has earned a reputation for bulletproof reliability in enterprise deployments.
btrfs, initiated by Oracle in 2007, aimed to bring ZFS-like features to the Linux kernel with native integration. It emphasizes flexibility—subvolumes, snapshots, and online operations—while maintaining Linux's modular philosophy of separate tools working together.
Choosing between them isn't about which is "better"—it's about which fits your requirements, constraints, and operational model.
By the end of this page, you will understand the architectural differences between btrfs and ZFS, their unique features and capabilities, the licensing and integration implications, and practical guidance for choosing between them based on your specific needs.
ZFS was designed with an ambitious goal: eliminate all known data corruption vectors. Its creators at Sun Microsystems approached storage as a single, integrated problem rather than a stack of independent layers.
Core architectural principles:
Pooled storage: Disks are combined into pools; file systems (datasets) draw from shared pool space. No fixed partition sizes.
Transactional object model: Everything is an object with a checksum. Transactions are atomic across the entire pool.
End-to-end integrity: Data is checksummed at write, verified at read, with checksums stored in parent blocks.
Integrated RAID: RAID-Z levels are checksum-aware; can repair silent corruption, not just failed disks.
Immutable data: Copy-on-write means data is never overwritten; older versions are preserved until freed.
Key ZFS features:
| Feature | Description |
|---|---|
| Storage pools | Combine multiple devices; datasets share capacity automatically |
| RAID-Z/Z2/Z3 | Single/double/triple parity with per-block checksum verification |
| Snapshots & clones | Instant, space-efficient point-in-time copies |
| Send/receive | Efficient serialization for backup and replication |
| Compression | LZ4, ZSTD, gzip, lzjb - transparent, per-dataset |
| Deduplication | Optional block-level deduplication (RAM-intensive) |
| Encryption | Native encryption (OpenZFS 2.0+) with key management |
| ARC/L2ARC | Adaptive Replacement Cache with optional SSD extension |
| Special vdev | Separate device class for metadata/small blocks |
| Quotas/reservations | Per-dataset space management |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
# === ZFS Pool Management === # Create a mirrored poolzpool create tank mirror /dev/sda /dev/sdb # Create a RAID-Z2 pool (double parity)zpool create tank raidz2 /dev/sdc /dev/sdd /dev/sde /dev/sdf # Add a SLOG for sync write accelerationzpool add tank log /dev/nvme0n1 # Add L2ARC for read cachezpool add tank cache /dev/nvme1n1 # View pool statuszpool status tankzpool list tank # === Dataset Management === # Create nested datasets (automatic mounting)zfs create tank/homezfs create tank/home/alicezfs create tank/databases # Set propertieszfs set compression=lz4 tank # Enable compression pool-widezfs set quota=100G tank/home/alice # Limit space usagezfs set mountpoint=/mnt/data tank/data # List datasets with space usagezfs list -r tank # === Snapshots and Clones === # Create snapshotzfs snapshot tank/databases@before-migration # Create recursive snapshotzfs snapshot -r tank/home@daily-$(date +%Y%m%d) # Clone from snapshot (writeable copy)zfs clone tank/databases@before-migration tank/databases-test # Rollback to snapshotzfs rollback tank/databases@before-migration # === Send/Receive Replication === # Initial full sendzfs send tank/databases@snap1 | ssh backup zfs receive backuppool/databases # Incremental send (much faster)zfs send -i tank/databases@snap1 tank/databases@snap2 | \ ssh backup zfs receive backuppool/databases # Encrypted, compressed sendzfs send -wc tank/encrypted@snap | gzip | \ ssh backup "gunzip | zfs receive backuppool/encrypted"ZFS uses the CDDL license, incompatible with the Linux kernel's GPL. This means ZFS cannot be included in the mainline kernel. Users install via OpenZFS (zfs-dkms or pre-built modules). While legally complex, it's widely used in production without practical issues.
btrfs (pronounced "butter-FS" or "B-tree-FS") was designed as a native Linux file system to bring ZFS-like features with GPL licensing and tight kernel integration.
Core architectural principles:
B-tree everything: All metadata is stored in B-tree structures, enabling efficient lookups and modifications.
Subvolumes: Lightweight, independent namespace trees within a single file system. More flexible than ZFS datasets.
Extents: Large contiguous file allocations (vs. fixed block pointers) for efficiency with large files.
Inline data: Small files stored directly in metadata, eliminating separate data blocks.
Native Linux integration: Standard VFS interface, works with all Linux tools, kernel module included.
Key btrfs features:
| Feature | Description |
|---|---|
| Subvolumes | Independent directory trees, each snapshot-able |
| Snapshots | Instant, space-efficient, including read-write snapshots |
| Send/receive | File-level streaming for backup/replication |
| Compression | zlib, lzo, zstd - transparent, can be per-file |
| Deduplication | Offline dedup via duperemove, no RAM overhead |
| Built-in RAID | RAID 0/1/10/5/6 (5/6 have write hole issues) |
| Checksums | CRC32C default, xxhash, sha256, blake2b available |
| Online operations | Resize, balance, convert, defrag while mounted |
| Reflinks | Instant copy-on-write file copies (cp --reflink) |
| Quotas | Per-subvolume quota groups (qgroups) |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
# === btrfs File System Creation === # Create btrfs on a single devicemkfs.btrfs /dev/sda # Create btrfs with RAID1 metadata, RAID0 datamkfs.btrfs -m raid1 -d raid0 /dev/sda /dev/sdb # Create with specific label and checksumsmkfs.btrfs -L mydata --checksum sha256 /dev/sda # Mount the file systemmount /dev/sda /mnt/mydata # === Subvolume Management === # Create subvolumesbtrfs subvolume create /mnt/mydata/@rootbtrfs subvolume create /mnt/mydata/@homebtrfs subvolume create /mnt/mydata/@snapshots # List subvolumesbtrfs subvolume list /mnt/mydata # Mount a specific subvolumemount -o subvol=@home /dev/sda /home # Set default subvolume (for boot without subvol= option)btrfs subvolume set-default 256 /mnt/mydata # === Snapshots === # Create read-only snapshotbtrfs subvolume snapshot -r /mnt/mydata/@root \ /mnt/mydata/@snapshots/root-$(date +%Y%m%d) # Create read-write snapshot (for testing)btrfs subvolume snapshot /mnt/mydata/@home \ /mnt/mydata/@home-test # Delete snapshotbtrfs subvolume delete /mnt/mydata/@snapshots/root-old # === Send/Receive === # Send snapshot to filebtrfs send /mnt/mydata/@snapshots/root-20250116 > /backup/root.btrfs # Incremental send (requires parent snapshot at destination)btrfs send -p /mnt/mydata/@snapshots/root-20250115 \ /mnt/mydata/@snapshots/root-20250116 | \ ssh backup btrfs receive /backup/snapshots/ # === Maintenance === # Check file system (must be unmounted or read-only)btrfs check /dev/sda # Scrub for corruption detectionbtrfs scrub start /mnt/mydatabtrfs scrub status /mnt/mydata # Balance to redistribute data (after adding drives)btrfs balance start /mnt/mydata # Online resizebtrfs filesystem resize +10G /mnt/mydatabtrfs filesystem resize max /mnt/mydata # Use all available # === Reflinks (instant copy) === # Copy file instantly with copy-on-writecp --reflink=always large_file.iso large_file_copy.iso # Check if files share blocksfilefrag -v large_file.iso large_file_copy.isobtrfs RAID5 and RAID6 have a known write hole issue and are not recommended for production use. Use RAID1 or RAID10 for redundancy, or use btrfs on top of mdadm or LVM for RAID5/6 semantics. The development community continues work on this, but treat RAID5/6 as experimental.
Understanding the architectural differences helps explain why each file system behaves as it does:
Block allocation strategies:
| Aspect | ZFS | btrfs |
|---|---|---|
| Allocation unit | Variable block size (512B - 16MB) | Fixed 4KB blocks, variable extents |
| Default record size | 128KB (tunable per dataset) | 4KB blocks in contiguous extents |
| Small file handling | Stored in one block (up to recordsize) | Inline in metadata (< ~2KB) |
| Large file handling | One block pointer per recordsize chunk | Extents describe contiguous ranges |
| Fragmentation tendency | Moderate (sequential allocation hints) | Lower (extent-based allocation) |
Checksum and integrity:
Both file systems use parent-pointer checksums (Merkle tree style), but differ in implementation:
| Aspect | ZFS | btrfs |
|---|---|---|
| Default algorithm | SHA-256 | CRC32C |
| Algorithm options | Fletcher, SHA-256/512, Skein, Edon-R, BLAKE3 | CRC32C, xxhash, SHA-256, BLAKE2b |
| Metadata copies | 2-3 copies of critical metadata | Configurable DUP for metadata |
| Checksum location | In block pointer (parent block) | In extent item (parent block) |
| Self-healing | Yes, with mirror/RAID-Z | Yes, with RAID1/10 |
Memory and resource usage:
ZFS and btrfs have different resource profiles:
| Resource | ZFS | btrfs |
|---|---|---|
| Minimum RAM | 2-4GB (more for dedup) | 512MB-1GB |
| Recommended RAM | 1GB per TB (more for production) | 1GB typical |
| Deduplication RAM | 5GB per TB of deduped data | Offline tool, no RAM overhead |
| Metadata caching | ARC, configurable size | Page cache (standard Linux) |
| CPU overhead | Moderate (checksums, compression) | Similar |
ZFS's "1GB RAM per TB of storage" rule is for optimal ARC performance, not a hard requirement. ZFS will run with less RAM but with reduced caching effectiveness. For deduplication, the rule becomes much more aggressive—plan for 5GB per TB of deduplicated data.
Let's compare specific features in detail:
Snapshots:
Both excel at snapshots, but with different characteristics:
| Aspect | ZFS | btrfs |
|---|---|---|
| Creation time | O(1), instantaneous | O(1), instantaneous |
| Space accounting | Per-dataset, clear breakdown | Quota groups (complex) |
| Writeable snapshots | Via clones | Native read-write snapshots |
| Snapshot visibility | Hidden .zfs directory | Normal directory (subvolume) |
| Cross-dataset | Independent per dataset | Subvolumes are independent |
| Max snapshots | Thousands practical, no hard limit | Thousands practical |
| Feature | ZFS | btrfs |
|---|---|---|
| Compression algorithms | lz4, zstd, gzip, lzjb, zle | lzo, zlib, zstd |
| Deduplication | Inline (RAM-intensive) | Offline (duperemove tool) |
| Encryption | Native (OpenZFS 2.0+) | Via dm-crypt/LUKS layer |
| RAID levels | Mirror, RAID-Z/Z2/Z3, dRAID | 0, 1, 10, 5*, 6* (*unstable) |
| Device replacement | Resilver to new device | Replace command, balance |
| Growing pool | Add vdevs, expand vdev | Add devices, balance |
| Shrinking pool | Not supported | Supported with balance |
| Quotas | Per-dataset, simple | Quota groups, complex |
| ACL support | NFSv4 ACLs, POSIX ACLs | POSIX ACLs |
| Special characters | UTF-8, case sensitivity options | UTF-8 only |
| Boot support | Well-supported with GRUB | Well-supported with GRUB |
Send/receive comparison:
Both support efficient replication, but the mechanisms differ:
ZFS send streams are block-level:
-w) preserves encryption without access to keys-c) preserves compression decisions-s flag)Advantages: Exact block-level replication, encrypted send without decryption.
Limitations: Destination must be ZFS; block structure determines size.
btrfs's reflink feature (cp --reflink) enables instant file copies with COW semantics at the file level. This is powerful for container layers, package caching, and backup tools. ZFS achieves similar results via clones but at the dataset level, not individual files.
ZFS is the right choice when your priorities align with its strengths:
1. Enterprise storage servers:
ZFS shines in file servers, NAS appliances, and storage arrays:
2. Databases with large storage:
For database servers requiring robust storage:
| Use Case | Recordsize | Compression | Special Settings |
|---|---|---|---|
| General file server | 128K (default) | lz4 | atime=off, xattr=sa |
| MySQL/MariaDB | 16K | off or lz4 | primarycache=metadata |
| PostgreSQL | 8K or 16K | lz4 | logbias=throughput |
| MongoDB | 8K or 16K | lz4 | recordsize matching page size |
| Virtualization | 64K or 128K | lz4 or off | sync=disabled (for non-critical) |
| Backup storage | 1M | zstd | copies=2 for extra safety |
123456789101112131415161718192021222324252627282930313233343536
# Example: Configure ZFS for PostgreSQL production database # Create pool with redundancyzpool create -o ashift=12 dbpool mirror /dev/sda /dev/sdb # Create dataset optimized for PostgreSQLzfs create dbpool/postgres # PostgreSQL uses 8K blockszfs set recordsize=16K dbpool/postgres # Light compression - lz4 is fastzfs set compression=lz4 dbpool/postgres # Disable access time updateszfs set atime=off dbpool/postgres # Prioritize metadata in ARC for random I/Ozfs set primarycache=metadata dbpool/postgres # Optimize for throughput (PostgreSQL handles its own logging)zfs set logbias=throughput dbpool/postgres # Set mount pointzfs set mountpoint=/var/lib/postgresql dbpool/postgres # Create dataset for WAL logs (different characteristics)zfs create dbpool/postgres/walzfs set recordsize=128K dbpool/postgres/walzfs set logbias=latency dbpool/postgres/wal # WAL needs low latency # Add SLOG for sync write performance if using sync=standard# zpool add dbpool log mirror /dev/nvme0n1 /dev/nvme1n1 # Verify configurationzfs get recordsize,compression,atime,primarycache dbpool/postgresZFS prioritizes data safety over features. New features are added slowly after extensive testing. If you need the most reliable, proven COW file system and can accept the CDDL licensing constraints, ZFS is the safe choice.
btrfs is the right choice when your priorities favor native Linux integration and flexibility:
1. Desktop and laptop systems:
btrfs has become the default for several major distributions:
2. Container and development environments:
Reflinks and subvolumes excel for:
3. System snapshot and rollback:
btrfs's integration with tools like Snapper and Timeshift provides:
# Snapper configuration for root subvolume
snapper -c root create-config /
# Automatic snapshots before/after package operations
# (via PAC hooks or zypper plugins)
# Boot into previous snapshot if update breaks system
# GRUB menu shows snapshot entries
# Rollback to previous snapshot
snapper rollback 42
This workflow is particularly powerful for rolling-release distributions where updates occasionally cause issues.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
# Example: Configure btrfs desktop with Timeshift integration # Create btrfs with optimal settingsmkfs.btrfs -L system /dev/sda2 # Mount temporarily to create subvolume layoutmount /dev/sda2 /mnt # Create subvolume layout suitable for snapshotsbtrfs subvolume create /mnt/@btrfs subvolume create /mnt/@homebtrfs subvolume create /mnt/@snapshotsbtrfs subvolume create /mnt/@logbtrfs subvolume create /mnt/@cache # Unmount and remount with subvolumesumount /mnt # /etc/fstab entries:# UUID=xxx / btrfs subvol=@,compress=zstd,noatime 0 0# UUID=xxx /home btrfs subvol=@home,compress=zstd,noatime 0 0# UUID=xxx /.snapshots btrfs subvol=@snapshots,noatime 0 0# UUID=xxx /var/log btrfs subvol=@log,compress=zstd,noatime 0 0# UUID=xxx /var/cache btrfs subvol=@cache,compress=zstd,noatime 0 0 # Install and configure Timeshift (after system installation)# apt install timeshift # Debian/Ubuntu# dnf install timeshift # Fedora # Configure Timeshift to snapshot @ and @home# Set schedule: daily, keep 7 # Manual snapshot before risky operationtimeshift --create --comments "Before kernel update" # List snapshotstimeshift --list # Restore if needed (from live USB for root)timeshift --restore # === Reflink usage for development === # Clone a large project instantlycp -r --reflink=always myproject myproject-experiment # Both share data until divergencedu -sh myproject myproject-experiment # Same apparent sizebtrfs filesystem du myproject myproject-experiment # Shows shared dataIf you need RAID5/6 reliability with btrfs features, layer btrfs on top of mdadm. mdadm provides stable RAID5/6, and btrfs adds COW benefits. You lose btrfs self-healing (mdadm doesn't know about checksums), but gain stable parity RAID.
Both file systems have sharp edges that can surprise administrators:
ZFS pitfalls:
| Feature | ZFS Maturity | btrfs Maturity |
|---|---|---|
| Basic operations | ⭐⭐⭐⭐⭐ Production-ready | ⭐⭐⭐⭐⭐ Production-ready |
| RAID redundancy | ⭐⭐⭐⭐⭐ RAID-Z is rock-solid | ⭐⭐⭐⭐ RAID1/10 solid; 5/6 unstable |
| Snapshots | ⭐⭐⭐⭐⭐ Mature | ⭐⭐⭐⭐⭐ Mature |
| Send/receive | ⭐⭐⭐⭐⭐ Mature with resume | ⭐⭐⭐⭐ Good, no resume |
| Compression | ⭐⭐⭐⭐⭐ Multiple algorithms | ⭐⭐⭐⭐⭐ Good algorithm support |
| Encryption | ⭐⭐⭐⭐ Native (newer) | ⭐⭐⭐ Via LUKS layer |
| Self-healing | ⭐⭐⭐⭐⭐ Proven | ⭐⭐⭐⭐ Works with RAID1/10 |
| Tooling | ⭐⭐⭐⭐⭐ Comprehensive | ⭐⭐⭐⭐ Good, some gaps |
Both file systems have complex recovery scenarios. Before relying on either in production, practice: recovering from drive failures, rolling back snapshots, and restoring from send/receive backups. Knowing the recovery process before you need it is essential.
We've explored both major COW file systems in depth. Here's a consolidated decision framework:
| If you need... | Choose | Because |
|---|---|---|
| Rock-solid RAID5/6 equivalent | ZFS | RAID-Z is proven and reliable |
| Native Linux kernel integration | btrfs | GPL licensed, in mainline kernel |
| Enterprise storage appliances | ZFS | More mature enterprise tooling |
| Desktop snapshots (Timeshift) | btrfs | Better tool integration |
| Container storage (reflinks) | btrfs | Reflinks are powerful for containers |
| FreeBSD/illumos | ZFS | Native, first-class support |
| Lower memory systems | btrfs | More RAM-efficient |
| Block-level encrypted backup | ZFS | Native encrypted send support |
Neither choice is wrong
Organizations successfully deploy both file systems in production. The "right" choice depends on your specific requirements, existing infrastructure, and operational expertise. Many teams use both—ZFS for storage servers and btrfs for desktop systems.
Looking ahead:
With a solid understanding of COW file systems—concept, snapshots, integrity, and implementations—our final page examines the performance tradeoffs. COW isn't free; understanding the costs helps you optimize for your workload.
You now understand both btrfs and ZFS at an architectural level, their strengths and weaknesses, and when to choose each. You can explain the key differences to stakeholders and make informed decisions for your storage needs. Next, we'll explore COW performance tradeoffs and optimization strategies.