Loading content...
Traditional swap moves memory pages to disk—slow but unlimited. RAM disks keep everything in memory—fast but limited by physical RAM size. zram offers a compelling middle ground: a compressed block device that lives entirely in RAM, effectively multiplying available memory through compression.
Unlike zswap, which intercepts the swap path to an existing swap device, zram is the swap device. When you configure swap on a zram device, all swap I/O goes directly to compressed pages in memory—no disk involved (unless you enable optional writeback).
This makes zram particularly valuable for:
By the end of this page, you will understand zram architecture, device creation and configuration, the compression data path, memory accounting, optional writeback capabilities, and advanced deployment strategies for various use cases.
Before diving into zram internals, it's essential to understand how it differs from zswap—both provide memory compression, but in fundamentally different ways:
zswap: A Cache in Front of Swap
zram: A Compressed Block Device
/dev/zram0, etc.)| Aspect | zswap | zram |
|---|---|---|
| Architecture | Cache/frontend to swap | Block device |
| Requires backing swap | Yes | No |
| Multiple instances | One global pool | Multiple devices |
| Pool allocator | zbud, z3fold, zsmalloc | zsmalloc only |
| Statistics | Debugfs | Sysfs per-device |
| Writeback to disk | Automatic | Optional, explicit |
| Configuration | Kernel parameters | Sysfs per-device |
| Use cases | Systems with disk swap | Diskless, containers, embedded |
Yes, but it's unusual. You could use zswap in front of a disk swap device while also having zram swap devices. However, this adds complexity. Most systems choose one approach based on their requirements: zswap for systems with disk swap needing compression, zram for systems wanting compressed swap entirely in RAM.
zram implements a block device driver that stores data in compressed form in RAM. Understanding its architecture requires examining several key components:
1. Block Device Layer:
zram registers as a block device driver with the kernel. Each zram device (/dev/zram0, /dev/zram1, etc.) behaves like a regular block device—you can format it, mount filesystems on it, or use it as swap.
2. Request Processing: When the block layer sends I/O requests to zram:
3. Memory Pool (zsmalloc): zram uses zsmalloc—a special-purpose memory allocator designed for storing compressed pages. Unlike zbud/z3fold used by zswap, zsmalloc handles arbitrary sizes efficiently with sophisticated class-based allocation.
4. Compression Engine: zram leverages the kernel's crypto API for compression, supporting multiple algorithms (lzo, lz4, zstd, etc.).
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
/* Core zram data structures (simplified from kernel) */ /* Per-device state */struct zram { struct zram_table_entry *table; /* Mapping table */ struct zs_pool *mem_pool; /* zsmalloc pool */ struct zcomp *comp; /* Compression driver */ gfp_t table_flags; /* Allocation flags */ struct rw_semaphore init_lock; /* Init/reset protection */ struct disk *disk; /* gendisk representation */ struct block_device *bdev; /* Block device */ /* Device size */ u64 disksize; /* Logical size (uncompressed) */ /* Memory limit */ u64 mem_limit; /* Maximum memory to use */ u64 mem_used; /* Current memory used */ /* Optional backing device */ struct file *backing_dev; /* Writeback target */ spinlock_t wb_lock; /* Writeback lock */ /* Per-CPU compression contexts */ struct zcomp_strm __percpu *streams; /* Statistics */ atomic64_t stats[NR_ZRAM_STAT_ITEMS];}; /* Per-page entry in the mapping table */struct zram_table_entry { union { unsigned long handle; /* zsmalloc handle */ unsigned long element; /* Same-element value */ }; unsigned long flags; /* ZRAM_SAME, ZRAM_WB, etc. */#ifdef CONFIG_ZRAM_MEMORY_TRACKING ktime_t ac_time; /* Access time */#endif}; /* Entry flags */#define ZRAM_SAME (1 << 0) /* All bytes are same value */#define ZRAM_WB (1 << 1) /* Written back to backing dev */#define ZRAM_HUGE (1 << 2) /* Incompressible, stored raw */#define ZRAM_IDLE (1 << 3) /* Not accessed recently */ /* Statistics */enum zram_stat_item { ZRAM_STAT_COMPR_DATA_SIZE, /* Compressed data size */ ZRAM_STAT_NUM_READS, /* Read operations */ ZRAM_STAT_NUM_WRITES, /* Write operations */ ZRAM_STAT_FAILED_READS, /* Failed read operations */ ZRAM_STAT_FAILED_WRITES, /* Failed write operations */ ZRAM_STAT_INVALID_IO, /* Invalid I/O requests */ ZRAM_STAT_NOTIFY_FREE, /* Discard notifications */ ZRAM_STAT_SAME_PAGES, /* Same-filled pages */ ZRAM_STAT_HUGE_PAGES, /* Huge (incompressible) pages */ ZRAM_STAT_PAGES_STORED, /* Total pages stored */ ZRAM_STAT_WB_PAGES, /* Pages written back */ NR_ZRAM_STAT_ITEMS,};zsmalloc (zs memory allocator) groups objects of similar compressed sizes into 'size classes,' allocating them in contiguous physical pages. This minimizes fragmentation while allowing variable-size compressed data. It's more complex than zbud/z3fold but achieves better memory utilization for zram's workload.
zram devices are created and configured through the sysfs interface. The kernel module loads with a configurable number of devices, each independently configurable.
Loading the Module:
# Load with default 1 device
modprobe zram
# Load with multiple devices
modprobe zram num_devices=4
# Or dynamically add devices later
echo 2 > /sys/class/zram-control/hot_add # Creates /dev/zram2
Basic Configuration Sequence:
12345678910111213141516171819202122232425262728293031323334353637383940
#!/bin/bash# Complete zram setup for swap usage DEVICE="/dev/zram0"SYSFS="/sys/block/zram0"SIZE="4G" # Logical (uncompressed) sizeALGO="lz4" # Compression algorithmPRIORITY=100 # Swap priority # Step 1: Reset device if previously configuredecho 1 > $SYSFS/reset 2>/dev/null || true # Step 2: Set compression algorithm (before setting size)echo $ALGO > $SYSFS/comp_algorithmecho "Compression algorithm: $(cat $SYSFS/comp_algorithm)" # Step 3: Set disk size (this allocates internal structures)echo $SIZE > $SYSFS/disksizeecho "Disk size: $(cat $SYSFS/disksize)" # Step 4: Create swap filesystemmkswap $DEVICE # Step 5: Enable swap with priorityswapon -p $PRIORITY $DEVICE # Step 6: Verifyecho ""echo "=== zram Status ==="swapon --show | grep zramecho ""cat $SYSFS/mm_stat | awk '{ print "Original size: " $1 " bytes" print "Compressed: " $2 " bytes" print "Memory used: " $3 " bytes" print "Mem limit: " $4 " bytes" print "Max used: " $5 " bytes" print "Same pages: " $6 print "Pages stored: " $8}'| File | Mode | Description |
|---|---|---|
disksize | RW | Logical size of device (uncompressed) |
comp_algorithm | RW | Compression algorithm to use |
mem_limit | RW | Maximum memory device can use |
mem_used | RO | Current memory usage |
max_comp_streams | RW | Max concurrent compression streams (deprecated) |
backing_dev | RW | Path to backing device for writeback |
writeback | WO | Trigger writeback (idle/huge/incompressible) |
reset | WO | Reset device (write 1) |
mm_stat | RO | Memory statistics |
io_stat | RO | I/O statistics |
bd_stat | RO | Backing device statistics |
debug_stat | RO | Debug statistics |
Set comp_algorithm BEFORE setting disksize. Once disksize is configured, changing the algorithm requires resetting the device first. Similarly, backing_dev should be configured before writeback is used.
When data is written to a zram device, it follows a specific path through compression and storage. Understanding this path is essential for performance tuning.
Write Path (Compression):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
/* Simplified zram write path */ static int zram_write_page(struct zram *zram, struct page *page, u32 index){ struct zram_table_entry *entry = &zram->table[index]; struct zcomp_strm *zstrm; unsigned long handle; size_t comp_len; void *src, *dst; int ret; /* Check for same-filled page first */ src = kmap_atomic(page); unsigned long element = 0; bool same = page_same_filled(src, &element); kunmap_atomic(src); if (same) { /* Store only the element value, not the data */ zram_set_flag(entry, ZRAM_SAME); entry->element = element; atomic64_inc(&zram->stats[ZRAM_STAT_SAME_PAGES]); return 0; } /* Get compression context for this CPU */ zstrm = zcomp_stream_get(zram->comp); /* Compress the page */ src = kmap_atomic(page); comp_len = PAGE_SIZE; ret = zcomp_compress(zstrm, src, &comp_len); kunmap_atomic(src); if (ret) { zcomp_stream_put(zram->comp); return ret; } /* Check if compression was worthwhile */ if (comp_len >= PAGE_SIZE) { /* Store uncompressed as "huge" page */ zram_set_flag(entry, ZRAM_HUGE); comp_len = PAGE_SIZE; /* src_data is the original uncompressed data */ } /* Allocate space in zsmalloc */ handle = zs_malloc(zram->mem_pool, comp_len, GFP_NOIO); if (!handle) { /* Try writeback to free space if configured */ if (zram->backing_dev) { zram_writeback_slot(zram); handle = zs_malloc(zram->mem_pool, comp_len, GFP_NOIO); } if (!handle) { zcomp_stream_put(zram->comp); return -ENOMEM; } } /* Copy data to zsmalloc buffer */ dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO); if (entry->flags & ZRAM_HUGE) { src = kmap_atomic(page); memcpy(dst, src, PAGE_SIZE); kunmap_atomic(src); } else { memcpy(dst, zstrm->buffer, comp_len); } zs_unmap_object(zram->mem_pool, handle); zcomp_stream_put(zram->comp); /* Update table entry */ entry->handle = handle; atomic64_add(comp_len, &zram->stats[ZRAM_STAT_COMPR_DATA_SIZE]); atomic64_inc(&zram->stats[ZRAM_STAT_PAGES_STORED]); return 0;}Read Path (Decompression):
The read path is the inverse:
ZRAM_SAME: Reconstruct page from single element valueZRAM_WB: Read from backing device (if writeback occurred)ZRAM_HUGE: Copy directly (no decompression needed)Performance Considerations:
| Operation | Typical Latency | Depends On |
|---|---|---|
| Same-filled read | ~50 ns | Single value expansion |
| Compressed read | 300-1000 ns | Algorithm, compressed size |
| Huge page read | ~100 ns | Memory copy only |
| Writeback read | 10-100 μs | Backing device speed |
When data doesn't compress (encrypted content, already-compressed media), zram stores it uncompressed with the HUGE flag. This avoids wasting CPU on futile compression attempts. Monitor the 'huge_pages' statistic—high values suggest workload may not benefit from zram.
Understanding zram's memory usage is critical for capacity planning. Several interrelated values control and describe memory consumption:
Key Memory Metrics:
| Metric | Description | Accessed Via |
|---|---|---|
disksize | Logical size (uncompressed capacity) | cat /sys/block/zram0/disksize |
mem_used_total | Actual memory consumed | mm_stat field 3 |
mem_limit | Maximum memory allowed | cat /sys/block/zram0/mem_limit |
orig_data_size | Uncompressed data stored | mm_stat field 1 |
compr_data_size | Compressed data size | mm_stat field 2 |
Calculating Compression Ratio:
Ratio = orig_data_size / compr_data_size
Calculating Effective Utilization:
Utilization = mem_used_total / disksize
With 3:1 compression, a 4GB disksize zram device might use only ~1.5GB actual RAM.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
#!/bin/bash# Comprehensive zram memory analysis DEVICE="zram0"SYSFS="/sys/block/$DEVICE" # Parse mm_stat: orig_data_size compr_data_size mem_used_total mem_limit# max_mem_used same_pages compacted_pages huge_pagesread -r orig compr used limit max same compact huge <<< $(cat $SYSFS/mm_stat) # Calculate metricsdisksize=$(cat $SYSFS/disksize) echo "=== zram Memory Analysis: $DEVICE ==="echo ""echo "Configuration:"echo " Disk size (logical): $(numfmt --to=iec $disksize)"echo " Memory limit: $(numfmt --to=iec $limit)"echo ""echo "Current State:"echo " Original data size: $(numfmt --to=iec $orig)"echo " Compressed data size: $(numfmt --to=iec $compr)"echo " Total memory used: $(numfmt --to=iec $used)"echo " Max memory used: $(numfmt --to=iec $max)"echo "" # Calculate compression ratioif [ "$compr" -gt 0 ]; then ratio=$(echo "scale=2; $orig / $compr" | bc) echo "Compression ratio: ${ratio}:1"fi # Calculate fill percentageif [ "$disksize" -gt 0 ]; then fill=$(echo "scale=1; $orig * 100 / $disksize" | bc) echo "Logical fill: ${fill}%"fi # Calculate memory efficiencyif [ "$disksize" -gt 0 ]; then efficiency=$(echo "scale=1; $used * 100 / $disksize" | bc) echo "Memory efficiency: ${efficiency}% of disksize"fi echo ""echo "Page Statistics:"echo " Same-filled pages: $same"echo " Compacted pages: $compact"echo " Huge (incompressible): $huge" # Parse io_stat: failed_reads failed_writes invalid_io notify_freeread -r fail_r fail_w invalid notify <<< $(cat $SYSFS/io_stat)echo ""echo "I/O Statistics:"echo " Failed reads: $fail_r"echo " Failed writes: $fail_w"echo " Invalid I/O: $invalid"echo " Discard notifications: $notify"Memory Limit Enforcement:
The mem_limit parameter caps memory consumption. When a write would exceed this limit:
Setting an appropriate mem_limit prevents zram from consuming too much RAM under heavy load:
# Set memory limit to 2GB
echo $((2*1024*1024*1024)) > /sys/block/zram0/mem_limit
# Or use human-readable format (if supported)
echo 2G > /sys/block/zram0/mem_limit
disksize is the logical capacity (what swap sees). mem_limit is the physical memory cap. With 3:1 compression, 4G disksize needs only ~1.5G mem_limit. But compression ratios vary! Set mem_limit conservatively—if compression is worse than expected, you'll hit the limit.
While zram's primary mode is fully in-RAM operation, it supports optional writeback to a backing device. This enables hybrid operation: compress pages in RAM when possible, write to disk when necessary.
Use Cases for Writeback:
Configuring a Backing Device:
# Create backing file (e.g., 8GB)
truncate -s 8G /var/tmp/zram-backing
# Create loopback device
LOOP=$(losetup -f --show /var/tmp/zram-backing)
# Assign to zram (must be done before disksize or after reset)
echo $LOOP > /sys/block/zram0/backing_dev
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
/* Triggering writeback via sysfs */ /* * Writing to /sys/block/zram0/writeback triggers writeback * of pages matching specified criteria: * * "idle" - Pages not accessed recently * "huge" - Incompressible pages (stored uncompressed) * "incompressible" - Synonym for huge * "all" - All pages (rarely used) * PAGE_INDEX - Specific page by index (internal testing) */ /* Example: Writeback idle pages */// echo idle > /sys/block/zram0/writeback /* Writeback internals (simplified) */static int zram_writeback_slot(struct zram *zram, u32 index, struct bio *parent_bio){ struct zram_table_entry *entry = &zram->table[index]; struct page *page; struct bio *bio; int ret; /* Skip if already written back or empty */ if (zram_test_flag(entry, ZRAM_WB) || !entry->handle) return 0; /* Allocate page for decompression/copying */ page = alloc_page(GFP_NOIO); if (!page) return -ENOMEM; /* Decompress/copy data to temporary page */ ret = zram_slot_read(zram, index, page); if (ret) { __free_page(page); return ret; } /* Write to backing device */ bio = bio_alloc(GFP_NOIO, 1); bio_set_dev(bio, zram->bdev); bio->bi_iter.bi_sector = index * (PAGE_SIZE >> SECTOR_SHIFT); bio_add_page(bio, page, PAGE_SIZE, 0); bio->bi_opf = REQ_OP_WRITE | REQ_SYNC; submit_bio_wait(bio); bio_put(bio); if (bio->bi_status) { __free_page(page); return -EIO; } /* Free the zsmalloc entry */ zs_free(zram->mem_pool, entry->handle); entry->handle = 0; /* Mark as written back; store backing device index */ zram_set_flag(entry, ZRAM_WB); entry->element = index; /* Backing device offset */ atomic64_inc(&zram->stats[ZRAM_STAT_WB_PAGES]); __free_page(page); return 0;}Writeback Strategies:
| Strategy | Command | Pages Affected | Use Case |
|---|---|---|---|
| Idle | echo idle > writeback | Not accessed recently | Free RAM from cold pages |
| Huge | echo huge > writeback | Incompressible | Avoid RAM waste on un-compressable data |
| All | echo all > writeback | Everything | Emergency RAM recovery |
Automatic Writeback via Cron:
# Crontab entry: writeback idle pages every hour
0 * * * * echo idle > /sys/block/zram0/writeback
Use /sys/block/zram0/idle to mark all current pages as idle. Later access clears the idle flag. Writing 'idle' to 'writeback' only affects pages that remained idle since marking. This enables precise 'not used in last X hours' policies.
zram supports multiple independent devices, enabling sophisticated configurations:
Use Cases for Multiple zram Devices:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
#!/bin/bash# Multi-zram NUMA-aware configuration# Creates one zram device per NUMA node with local memory allocation set -e # Determine number of NUMA nodesNUM_NODES=$(ls -d /sys/devices/system/node/node* 2>/dev/null | wc -l)if [ "$NUM_NODES" -eq 0 ]; then echo "No NUMA nodes detected, using single device" NUM_NODES=1fi echo "Configuring $NUM_NODES zram devices for NUMA..." # Ensure sufficient zram devices existEXISTING=$(ls /dev/zram* 2>/dev/null | wc -l)while [ "$EXISTING" -lt "$NUM_NODES" ]; do cat /sys/class/zram-control/hot_add > /dev/null EXISTING=$((EXISTING + 1))done # Configure each deviceSIZE="2G" # Per-device sizeALGO="lz4" for node in $(seq 0 $((NUM_NODES - 1))); do DEVICE="/dev/zram$node" SYSFS="/sys/block/zram$node" echo "Configuring $DEVICE for NUMA node $node..." # Reset if needed echo 1 > $SYSFS/reset 2>/dev/null || true # Set algorithm echo $ALGO > $SYSFS/comp_algorithm # Set size echo $SIZE > $SYSFS/disksize # Create swap mkswap $DEVICE # Enable with NUMA-aware priority # Higher node number = lower priority (preference for local access) PRIORITY=$((100 - node)) swapon -p $PRIORITY $DEVICE echo " $DEVICE: size=$SIZE, algo=$ALGO, priority=$PRIORITY"done echo ""echo "=== Current Swap Configuration ==="swapon --show # Note: True NUMA memory binding requires Memory Policy control# which is handled by the kernel memory allocator, not zram directly.# For production NUMA optimization, consider:# 1. CPU affinity for zram worker threads# 2. memcg-based cgroup memory policy# 3. numactl for process bindingSwap Priority Mechanics:
Linux's swap subsystem uses priorities to determine which swap device to use:
# Example priorities
swapon -p 100 /dev/zram0 # Used first (fast, compressed RAM)
swapon -p 50 /dev/zram1 # Used second
swapon -p 10 /dev/sda2 # Last resort (slow disk)
This creates a tiered swap hierarchy:
zram devices can be added dynamically: echo X > /sys/class/zram-control/hot_add creates a new device. Devices can be removed with echo X > /sys/class/zram-control/hot_remove (after swapoff). This enables runtime adaptation to changing workloads.
Most modern Linux systems use systemd for service management. zram can be integrated through systemd-zram-generator or custom units:
Option 1: systemd-zram-generator
A dedicated generator that creates zram swap devices at boot:
# Install (varies by distro)
sudo dnf install zram-generator # Fedora
sudo apt install systemd-zram-generator # Debian/Ubuntu
Configuration file: /etc/systemd/zram-generator.conf
12345678910111213141516171819202122232425
# /etc/systemd/zram-generator.conf# systemd-zram-generator configuration [zram0]# Size: can be absolute (500M) or percentage of RAMzram-size = ram / 2 # Compression algorithmcompression-algorithm = lz4 # Filesystem type (swap or filesystem)fs-type = swap # Swap priorityswap-priority = 100 # Optional: memory limit# max-zram-size = 4096 [zram1]# Second device with different settingszram-size = 1024compression-algorithm = zstdfs-type = swapswap-priority = 90Option 2: Custom systemd Service
For more control, create a custom service:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
# /etc/systemd/system/zram-swap.service[Unit]Description=Configure zram swapAfter=local-fs.targetBefore=swap.target [Service]Type=oneshotRemainAfterExit=yesExecStart=/usr/local/bin/zram-init.sh startExecStop=/usr/local/bin/zram-init.sh stop [Install]WantedBy=swap.target # ----------------------------------------# /usr/local/bin/zram-init.sh#!/bin/bash# zram initialization script DEVICE=/dev/zram0SYSFS=/sys/block/zram0 case "$1" in start) modprobe zram num_devices=1 # Wait for device while [ ! -e $DEVICE ]; do sleep 0.1; done # Calculate size (50% of RAM) MEM_KB=$(awk '/MemTotal/ {print $2}' /proc/meminfo) SIZE_KB=$((MEM_KB / 2)) # Configure echo lz4 > $SYSFS/comp_algorithm echo ${SIZE_KB}K > $SYSFS/disksize # Enable swap mkswap $DEVICE swapon -p 100 $DEVICE logger "zram-swap: Enabled ${SIZE_KB}KB swap on $DEVICE" ;; stop) swapoff $DEVICE 2>/dev/null || true echo 1 > $SYSFS/reset 2>/dev/null || true rmmod zram 2>/dev/null || true logger "zram-swap: Disabled" ;;esacMany distributions now enable zram by default: Fedora (since F33), Ubuntu (since 22.04 for desktop), Android (standard). Check your distribution's documentation—custom configuration may conflict with defaults.
We've explored zram in depth—from its fundamental architecture through advanced multi-device configurations. Let's consolidate the key concepts:
What's Next:
The next page explores compression algorithms—the engines that power both zswap and zram. We'll examine LZ4, LZO, Zstd, and others, understanding their compression ratios, speed characteristics, and how to choose the right algorithm for specific workloads.
You now have comprehensive knowledge of zram—architecture, configuration, data paths, memory management, writeback, and multi-device deployments. This enables you to effectively deploy zram for RAM-based compressed swap across diverse use cases from embedded systems to high-performance servers.