Operating SystemsMemory Compression

Memory Compression

LevelAdvanced

Duration90 mins

TopicMemory Compression

2 / 5

zswap

Intercepting the Swap Path

When the Linux kernel faces memory pressure, it traditionally has one escape route: swap—writing anonymous pages to disk and freeing their memory for reuse. This works, but at tremendous cost. Even modern NVMe SSDs introduce latencies measured in microseconds, while HDDs impose latencies in milliseconds.

zswap revolutionizes this by inserting a compressed cache layer between the reclaim path and the swap device. Instead of immediately writing pages to disk, zswap:

Intercepts pages on their way to swap
Compresses them using a fast algorithm (LZ4, LZO, Zstd, etc.)
Stores the compressed data in a RAM-based pool
Defers actual disk I/O until absolutely necessary

The result: under memory pressure, systems with zswap maintain dramatically better responsiveness than those relying purely on swap.

What You Will Learn

By the end of this page, you will master zswap internals—the frontend interception mechanism, backend pool management, writeback to swap, configuration tuning, and production deployment strategies. You'll understand when zswap helps, when it hurts, and how to optimize it for specific workloads.

zswap Architecture Overview

zswap is implemented as a frontswap backend—a Linux kernel mechanism that allows interception of swap operations. When a page is about to be written to swap, zswap gets first opportunity to handle it.

Core Components:

Component	Purpose	Implementation
Frontend	Intercepts swap-out requests	Frontswap ops registration
Compressor	Compresses/decompresses pages	Crypto API (lz4, lzo, zstd, etc.)
zpool	Stores compressed pages	zbud, z3fold, or zsmalloc
Same-filled check	Optimizes zero/same-filled pages	Deduplication to single value
Writeback	Evicts cold pages to actual swap	kthread-based background worker
Entry Tree	Maps (swap type, offset) to compressed entries	Red-black tree

The Data Flow:

Converting Mermaid diagram...

When zswap intercepts a page:

Check same-filled: Is the page entirely filled with the same byte value?
- If yes: Store only the pattern byte (massive savings)
- If no: Continue to compression
Compress page: Apply the configured algorithm
Check compression ratio: Did compression achieve sufficient savings?
- If ratio below threshold: Reject, let page go to swap directly
- If ratio acceptable: Continue
Allocate zpool space: Get space in the compressed pool
- If allocation fails: Reject, page goes to swap
- If successful: Continue
Store and index: Copy compressed data, create index entry, mark success
Free original page: The uncompressed page frame is now available for reuse

The 'Free Lunch' of Same-Filled Pages

A significant percentage of pages (often 10-30%) are 'same-filled'—entirely filled with zeros or another repeated byte. zswap detects these without compression, storing only the fill value. A 4KB zero page becomes a few bytes of metadata. This optimization alone can dramatically increase effective memory.

Frontend Interception Mechanism

zswap hooks into the kernel's swap path via the frontswap API—a clean abstraction that allows backend implementations to intercept swap operations without modifying the core VM code.

Frontswap Operations:

struct frontswap_ops {
    void (*init)(unsigned type);           /* Swap area initialized */
    int (*store)(unsigned type,            /* Store a page */
                 pgoff_t offset,
                 struct page *page);
    int (*load)(unsigned type,             /* Load (decompress) a page */
                pgoff_t offset,
                struct page *page);
    void (*invalidate_page)(unsigned type, /* Page no longer needed */
                            pgoff_t offset);
    void (*invalidate_area)(unsigned type); /* Swap area deactivated */
};

Store Operation (Compression Path):

When the kernel calls swap_writepage() to write a page to swap, frontswap intercepts and calls zswap's store operation:

zswap_store.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
/* Simplified zswap store path (Linux kernel) */
 
static int zswap_frontswap_store(unsigned type, pgoff_t offset,
                                  struct page *page)
{
    struct zswap_tree *tree = zswap_trees[type];
    struct zswap_entry *entry, *dupentry;
    struct crypto_acomp_ctx *acomp_ctx;
    struct scatterlist input, output;
    int ret, dlen = PAGE_SIZE;
    unsigned long handle;
    char *buf;
    u8 *src;
 
    /* Check if zswap is enabled and pool is available */
    if (!zswap_enabled || !tree)
        return -ENODEV;
 
    /* Allocate entry metadata */
    entry = zswap_entry_cache_alloc(GFP_KERNEL);
    if (!entry)
        return -ENOMEM;
 
    /* Check for same-filled pages first (optimization) */
    src = kmap_atomic(page);
    if (zswap_same_filled_pages_enabled &&
        zswap_is_page_same_filled(src, &entry->value)) {
        kunmap_atomic(src);
        entry->length = 0;  /* Marker for same-filled */
        goto insert;
    }
 
    /* Get compression context for this CPU */
    acomp_ctx = raw_cpu_ptr(zswap_comp->acomp_ctx);
    mutex_lock(&acomp_ctx->mutex);
 
    /* Setup source scatter-gather */
    sg_init_table(&input, 1);
    sg_set_page(&input, page, PAGE_SIZE, 0);
 
    /* Compress into temporary buffer */
    buf = acomp_ctx->dstmem;
    sg_init_one(&output, buf, PAGE_SIZE);
 
    acomp_request_set_params(acomp_ctx->req, &input, &output,
                             PAGE_SIZE, dlen);
    ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req),
                          &acomp_ctx->wait);
    dlen = acomp_ctx->req->dlen;
 
    kunmap_atomic(src);
 
    /* Check if compression was worthwhile */
    if (ret || dlen >= zswap_max_pool_size_ratio * PAGE_SIZE) {
        mutex_unlock(&acomp_ctx->mutex);
        ret = -EINVAL;  /* Reject: compression didn't help enough */
        goto freepage;
    }
 
    /* Allocate space in zpool */
    ret = zpool_malloc(zswap_pool->zpool, dlen +
                       sizeof(struct zswap_header), &handle);
    if (ret) {
        mutex_unlock(&acomp_ctx->mutex);
        ret = -ENOMEM;
        goto freepage;
    }
 
    /* Copy compressed data to zpool */
    char *dst = zpool_map_handle(zswap_pool->zpool, handle, ZPOOL_MM_WO);
    memcpy(dst, buf, dlen);
    zpool_unmap_handle(zswap_pool->zpool, handle);
 
    mutex_unlock(&acomp_ctx->mutex);
 
    /* Setup entry */
    entry->handle = handle;
    entry->length = dlen;
 
insert:
    entry->offset = offset;
    entry->refcount = 1;
    entry->pool = zswap_pool;
 
    /* Insert into tree, checking for duplicates */
    spin_lock(&tree->lock);
    dupentry = zswap_rb_search(&tree->rbroot, offset);
    if (dupentry) {
        zswap_entry_put(tree, dupentry);
        zswap_rb_erase(&tree->rbroot, dupentry);
    }
    zswap_rb_insert(&tree->rbroot, entry);
    spin_unlock(&tree->lock);
 
    /* Update statistics */
    atomic_inc(&zswap_stored_pages);
    zswap_pool_total_size = zpool_get_total_size(zswap_pool->zpool);
 
    return 0;  /* Success: page is now in zswap */
 
freepage:
    zswap_entry_cache_free(entry);
    return ret;  /* Failure: page should go to regular swap */
}

Return Values Matter

When zswap_frontswap_store() returns 0, the page is successfully stored in the compressed cache and won't be written to disk. When it returns non-zero (rejection or failure), the kernel falls through to normal swap I/O. This graceful fallback ensures reliability.

Load Operation (Decompression Path)

When a process accesses a page that was compressed into zswap, the kernel must decompress and return the original data. This happens through the frontswap load operation.

The Load Path:

Page fault occurs: Process accesses a swapped-out page
Swap subsystem invoked: Kernel calls swap_readpage()
Frontswap intercept: zswap's load operation is called first
Lookup entry: Find the compressed entry in the red-black tree
Decompress: Expand the compressed data into the target page
Invalidate entry: Remove from zswap (page is now in regular memory)
Return success: Page fault handled, process continues

zswap_load.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
/* Simplified zswap load path */
 
static int zswap_frontswap_load(unsigned type, pgoff_t offset,
                                 struct page *page)
{
    struct zswap_tree *tree = zswap_trees[type];
    struct zswap_entry *entry;
    struct crypto_acomp_ctx *acomp_ctx;
    u8 *src, *dst;
    unsigned int dlen;
    int ret;
 
    /* Lookup entry in tree */
    spin_lock(&tree->lock);
    entry = zswap_rb_search(&tree->rbroot, offset);
    if (!entry) {
        spin_unlock(&tree->lock);
        return -ENOENT;  /* Not in zswap, try regular swap */
    }
    zswap_entry_get(entry);  /* Take reference */
    spin_unlock(&tree->lock);
 
    /* Handle same-filled pages */
    if (entry->length == 0) {
        dst = kmap_atomic(page);
        zswap_fill_page(dst, entry->value);
        kunmap_atomic(dst);
        goto stats;
    }
 
    /* Map compressed data from zpool */
    src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
 
    /* Get decompression context */
    acomp_ctx = raw_cpu_ptr(zswap_comp->acomp_ctx);
    mutex_lock(&acomp_ctx->mutex);
 
    /* Setup decompression */
    struct scatterlist input, output;
    sg_init_one(&input, src, entry->length);
    sg_init_table(&output, 1);
    sg_set_page(&output, page, PAGE_SIZE, 0);
 
    acomp_request_set_params(acomp_ctx->req, &input, &output,
                             entry->length, PAGE_SIZE);
 
    /* Decompress */
    ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req),
                          &acomp_ctx->wait);
    dlen = acomp_ctx->req->dlen;
 
    mutex_unlock(&acomp_ctx->mutex);
    zpool_unmap_handle(entry->pool->zpool, entry->handle);
 
    /* Verify decompression succeeded */
    if (ret || dlen != PAGE_SIZE) {
        zswap_entry_put(tree, entry);
        return -EIO;
    }
 
stats:
    atomic_dec(&zswap_stored_pages);
    zswap_entry_put(tree, entry);
 
    return 0;  /* Success: page decompressed into target */
}
 
/* Invalidate removes the entry entirely */
static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
{
    struct zswap_tree *tree = zswap_trees[type];
    struct zswap_entry *entry;
 
    spin_lock(&tree->lock);
    entry = zswap_rb_search(&tree->rbroot, offset);
    if (entry) {
        zswap_rb_erase(&tree->rbroot, entry);
        zswap_entry_put(tree, entry);
    }
    spin_unlock(&tree->lock);
}

Critical Performance Considerations:

Operation	Typical Latency	Key Factors
Tree lookup	~100 ns	Tree depth, cache locality
zpool map	~50 ns	Pool type, memory access
Decompression	300-1000 ns	Algorithm, data size
Page mapping	~100 ns	TLB state, NUMA effects
Total load	500-1500 ns	Sum of above

Compare to:

Swap read (NVMe SSD): ~10,000 ns
Swap read (SATA SSD): ~100,000 ns
Swap read (HDD): ~10,000,000 ns

zswap provides 10-10,000x improvement over disk-based swap for cached pages.

Entry Lifetime

zswap entries are invalidated on load—once decompressed, the page returns to regular memory and the compressed copy is freed. This differs from some caching schemes that retain copies. The rationale: if the page is faulted in, it's likely to be used soon; keeping both copies wastes memory.

zpool Backend Details

The zpool is the storage backend where compressed page data lives. zswap uses the kernel's zpool abstraction, which can be backed by different allocation strategies:

zbud (Budget):

Stores exactly 2 compressed pages per physical page
Very low fragmentation (two slots per page, simple management)
Space efficiency: ~50% typical (wastes space if compression ratio > 2:1)
Use case: Predictable, low-overhead systems

z3fold:

Stores up to 3 compressed pages per physical page
Better space efficiency than zbud (~33% overhead per page)
Slightly higher CPU overhead for management
Use case: Balance between efficiency and overhead

zsmalloc:

Stores arbitrary sizes using class-based allocation
Best space efficiency (minimal fragmentation)
Higher complexity and CPU overhead
Use case: Maximum memory savings, CPU-rich systems

zpool Backend Comparison
Feature	zbud	z3fold	zsmalloc
Pages per physical page	2	3	Variable
Fragmentation	Low	Medium	Minimal
CPU overhead	Very low	Low	Medium
Memory efficiency	~75%	~85%	~95%
Complexity	Simple	Moderate	Complex
Compaction support	No	Limited	Yes
Best for	Low overhead	Balanced	Max savings

zbud Internal Structure:

┌─────────────────────────────────┐
│         Physical Page           │
├───────────────┬─────────────────┤
│   Entry 1     │    Entry 2      │
│   (< 2KB)     │    (< 2KB)      │
│               │                 │
│  Compressed   │   Compressed    │
│    Data       │     Data        │
├───────────────┴─────────────────┤
│         Metadata Header         │
└─────────────────────────────────┘

zbud divides each 4KB page into two "buddies" that can each hold a compressed page up to ~2KB. If a compressed page exceeds 2KB, it gets the whole physical page, wasting the other half.

z3fold Internal Structure:

┌─────────────────────────────────┐
│         Physical Page           │
├──────────┬──────────┬───────────┤
│  Entry 1 │ Entry 2  │  Entry 3  │
│  (≤1.3KB)│ (≤1.3KB) │  (≤1.3KB) │
│          │          │           │
│  Data 1  │  Data 2  │  Data 3   │
├──────────┴──────────┴───────────┤
│         Metadata + Padding      │
└─────────────────────────────────┘

z3fold allows up to 3 entries, improving density when pages compress to < 1.3KB.

Choosing the Right zpool

The default (z3fold) is a good starting point. Switch to zbud if CPU overhead is critical, or zsmalloc if memory savings are paramount and CPU is plentiful. Monitor pool statistics to validate your choice for specific workloads.

Pool Sizing and Writeback

zswap's pool has a configurable maximum size, expressed as a percentage of total RAM. When the pool fills, zswap can either:

Reject new entries — New swap-outs bypass zswap and go directly to disk
Writeback old entries — Evict cold compressed entries to actual swap, freeing pool space

Writeback Mechanism:

zswap implements a background writeback thread (zswap_shrink_worker) that proactively evicts pages from the compressed pool to disk when:

Pool utilization exceeds a threshold
System is otherwise idle
Specific entries are identified as cold

zswap_writeback.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
/* Simplified zswap writeback mechanism */
 
/* Writeback worker - runs when pool is getting full */
static void zswap_shrink_worker(struct work_struct *work)
{
    struct zswap_pool *pool = container_of(work, 
                                           struct zswap_pool,
                                           shrink_work);
    unsigned long pool_size, target_size;
    int ret;
 
    /* Calculate target: we want to free 10% of pool */
    pool_size = zpool_get_total_size(pool->zpool);
    target_size = pool_size - (pool_size / 10);
 
    while (zpool_get_total_size(pool->zpool) > target_size) {
        /* Select oldest entry via LRU */
        struct zswap_entry *entry = get_lru_entry(pool);
        if (!entry)
            break;
 
        /* Write entry to actual swap device */
        ret = zswap_writeback_entry(pool, entry);
        if (ret) {
            /* Writeback failed, return entry to pool */
            put_entry_back(pool, entry);
            break;
        }
 
        /* Free pool space */
        zpool_free(pool->zpool, entry->handle);
        zswap_entry_cache_free(entry);
 
        atomic_dec(&zswap_stored_pages);
        atomic_inc(&zswap_written_back_pages);
 
        /* Yield to prevent monopolizing CPU */
        cond_resched();
    }
}
 
/* Write a compressed entry to actual swap */
static int zswap_writeback_entry(struct zswap_pool *pool,
                                  struct zswap_entry *entry)
{
    struct page *page;
    swp_entry_t swpentry;
    struct bio *bio;
    u8 *src, *dst;
 
    /* Allocate temporary page */
    page = alloc_page(GFP_NOIO);
    if (!page)
        return -ENOMEM;
 
    /* For same-filled pages, just fill */
    if (entry->length == 0) {
        dst = kmap_atomic(page);
        zswap_fill_page(dst, entry->value);
        kunmap_atomic(dst);
    } else {
        /* Decompress into temporary page */
        src = zpool_map_handle(pool->zpool, entry->handle, ZPOOL_MM_RO);
        dst = kmap_atomic(page);
        
        ret = decompress(entry->algo, src, entry->length,
                        dst, PAGE_SIZE);
        
        kunmap_atomic(dst);
        zpool_unmap_handle(pool->zpool, entry->handle);
 
        if (ret != PAGE_SIZE) {
            __free_page(page);
            return -EIO;
        }
    }
 
    /* Construct swap entry */
    swpentry = entry_to_swp_entry(entry);
 
    /* Write to swap device */
    bio = bio_alloc(GFP_NOIO, 1);
    bio_set_dev(bio, get_swap_bdev(swpentry));
    bio->bi_iter.bi_sector = map_swap_page(swpentry);
    bio_add_page(bio, page, PAGE_SIZE, 0);
    bio->bi_opf = REQ_OP_WRITE;
 
    submit_bio_wait(bio);
    bio_put(bio);
 
    __free_page(page);
    return 0;
}

Pool Sizing Considerations:

Pool Size	Behavior	Trade-off
Small (5-10%)	Frequent writeback, low memory use	Higher I/O, less benefit
Medium (20-30%)	Balanced operation	Good starting point
Large (50%+)	Rare writeback, maximum caching	May starve applications

Recommendation: Start with 20% (max_pool_percent=20) and adjust based on:

High writeback rate → Increase pool size
Application memory pressure → Decrease pool size
High compression ratio → Larger pool is more efficient

The Writeback Decision

Writeback involves decompression (CPU cost) followed by disk I/O. If the page is accessed again after writeback, it must be read from disk— the worst outcome. Good LRU ordering is critical: evict pages unlikely to be accessed soon.

Configuration and Tuning

zswap is highly configurable through sysfs parameters. Understanding these parameters is essential for optimal deployment:

Enable/Disable (enabled):

echo Y > /sys/module/zswap/parameters/enabled   # Enable
echo N > /sys/module/zswap/parameters/enabled   # Disable

Compression Algorithm (compressor):

# Available algorithms (depends on kernel config)
cat /proc/crypto | grep -E 'name.*lz[o4]|zstd'

# Set compressor
echo lz4 > /sys/module/zswap/parameters/compressor

Pool Allocator (zpool):

echo z3fold > /sys/module/zswap/parameters/zpool
echo zbud > /sys/module/zswap/parameters/zpool
echo zsmalloc > /sys/module/zswap/parameters/zpool

Maximum Pool Size (max_pool_percent):

echo 25 > /sys/module/zswap/parameters/max_pool_percent

Same-Filled Pages (same_filled_pages_enabled):

echo Y > /sys/module/zswap/parameters/same_filled_pages_enabled

zswap Parameter Reference
Parameter	Default	Range	Description
`enabled`	N	Y/N	Master enable switch
`compressor`	lzo-rle	lz4, lzo, zstd, etc.	Compression algorithm
`zpool`	z3fold	zbud, z3fold, zsmalloc	Pool allocator
`max_pool_percent`	20	1-100	Max pool as % of RAM
`accept_threshold_percent`	90	1-100	Min compression ratio
`same_filled_pages_enabled`	Y	Y/N	Zero-page optimization
`non_same_filled_pages_enabled`	Y	Y/N	Enable regular compression

zswap_setup.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash
# zswap configuration script for production servers
 
# Enable zswap
echo Y > /sys/module/zswap/parameters/enabled
 
# Use LZ4 for best speed (or zstd for best ratio)
echo lz4 > /sys/module/zswap/parameters/compressor
 
# Use z3fold for good balance
echo z3fold > /sys/module/zswap/parameters/zpool
 
# Set pool to 25% of RAM
echo 25 > /sys/module/zswap/parameters/max_pool_percent
 
# Enable same-filled page optimization
echo Y > /sys/module/zswap/parameters/same_filled_pages_enabled
 
# Verify configuration
echo "=== zswap Configuration ==="
for param in enabled compressor zpool max_pool_percent; do
    val=$(cat /sys/module/zswap/parameters/$param)
    echo "$param: $val"
done
 
# For persistent configuration, add to kernel command line:
# zswap.enabled=1 zswap.compressor=lz4 zswap.zpool=z3fold zswap.max_pool_percent=25

Algorithm Selection Guide

LZ4: Fastest speed, moderate compression (~2.5:1). Best for latency-sensitive workloads. LZO: Good balance of speed and compression. Legacy default. Zstd: Best compression (~3-4:1) but slower. Good when CPU is plentiful. Choose based on your CPU/memory tradeoff preferences.

Monitoring and Debugging

Effective zswap operation requires monitoring key metrics to ensure the system is behaving as expected.

Primary Statistics Location:

/sys/kernel/debug/zswap/*  # Detailed statistics (debugfs)

Key Metrics to Monitor:

Critical zswap Metrics

•stored_pages — Total pages currently in zswap pool
•same_filled_pages — Pages stored as same-filled (huge savings)
•pool_total_size — Current memory used by pool
•duplicate_entry — Entries replaced (indicates repeated swapouts)
•pool_limit_hit — Times pool reached max size (may need larger pool)
•reject_compress_poor — Rejections due to poor compression
•reject_alloc_fail — Rejections due to allocation failure
•written_back_pages — Pages evicted from pool to swap

zswap_monitor.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#!/bin/bash
# zswap monitoring script
 
DEBUGFS="/sys/kernel/debug/zswap"
 
# Check if debugfs is mounted and zswap is active
if [ ! -d "$DEBUGFS" ]; then
    echo "zswap debugfs not available"
    exit 1
fi
 
echo "=== zswap Statistics ==="
echo ""
 
# Core metrics
stored=$(cat $DEBUGFS/stored_pages 2>/dev/null || echo 0)
same_filled=$(cat $DEBUGFS/same_filled_pages 2>/dev/null || echo 0)
pool_size=$(cat $DEBUGFS/pool_total_size 2>/dev/null || echo 0)
written_back=$(cat $DEBUGFS/written_back_pages 2>/dev/null || echo 0)
 
# Calculate effective size
page_size=4096
stored_bytes=$((stored * page_size))
ratio="N/A"
if [ "$pool_size" -gt 0 ]; then
    ratio=$(echo "scale=2; $stored_bytes / $pool_size" | bc)
fi
 
echo "Stored pages:       $stored ($(numfmt --to=iec $stored_bytes))"
echo "Same-filled pages:  $same_filled"
echo "Pool size:          $(numfmt --to=iec $pool_size)"
echo "Effective ratio:    ${ratio}:1"
echo "Written back:       $written_back"
echo ""
 
# Rejection stats
echo "=== Rejection Statistics ==="
echo "Compress poor:      $(cat $DEBUGFS/reject_compress_poor 2>/dev/null || echo 0)"
echo "Alloc fail:         $(cat $DEBUGFS/reject_alloc_fail 2>/dev/null || echo 0)"
echo "Pool limit hit:     $(cat $DEBUGFS/pool_limit_hit 2>/dev/null || echo 0)"
echo ""
 
# Configuration
echo "=== Configuration ==="
for param in enabled compressor zpool max_pool_percent; do
    val=$(cat /sys/module/zswap/parameters/$param 2>/dev/null || echo "N/A")
    printf "%-20s %s
" "$param:" "$val"
done

Diagnosing Problems

High reject_compress_poor? Workload has incompressible data (encrypted, media). Consider disabling zswap for these systems. High pool_limit_hit? Pool is too small; increase max_pool_percent. High written_back_pages? Pool is cycling; consider larger pool or faster compressor. Low stored_pages? Check if zswap is enabled and swap is configured.

Production Best Practices

Deploying zswap in production requires careful consideration of workload characteristics and system constraints. Here are battle-tested recommendations:

zswap Recommended For

•Systems with swap configured
•Memory-bound workloads
•Desktop/laptop responsiveness
•VMs with limited memory allocation
•Containers with memory limits
•Systems with slow swap devices (HDD)

zswap Not Recommended For

•Encrypted workloads (dm-crypt, encrypted home)
•Media processing (already compressed)
•CPU-bound workloads with no memory pressure
•Systems with ample RAM (no swap activity)
•Real-time systems (unpredictable latency)
•Systems without swap configured

Production Deployment Checklist

•Verify swap is configured — zswap requires a swap device/file to work
•Test compression ratio — Run workload and check pool_total_size vs stored_pages
•Choose appropriate algorithm — LZ4 for latency, Zstd for ratio
•Size the pool appropriately — Start at 20%, adjust based on metrics
•Monitor during burn-in — Watch for high rejection rates or pool thrashing
•Add to kernel cmdline — Make configuration persistent across reboots
•Test failover behavior — Verify system behaves correctly when pool is full
•Document configuration — Record settings and rationale for future operators

Expected Impact

Properly configured zswap typically reduces swap I/O by 50-90%, improving system responsiveness dramatically under memory pressure. Desktop users see fewer 'freezes' during heavy multitasking. Servers maintain lower latency during memory spikes. The main cost is modest CPU usage during compression.

Summary: zswap

We've explored zswap in depth—from its architecture and interception mechanism through configuration and production deployment. Let's consolidate the key concepts:

Key Takeaways

•zswap intercepts swap-outs — Pages destined for swap are compressed and stored in RAM instead.
•Frontswap integration — Clean kernel API allows zswap to intercept without modifying core VM code.
•Same-filled optimization — Zero and pattern-filled pages are stored with minimal overhead.
•zpool backends — zbud, z3fold, and zsmalloc offer different space/CPU tradeoffs.
•Writeback mechanism — When pool is full, cold pages are evicted to actual swap.
•Highly configurable — Algorithm, pool size, and behavior are tunable via sysfs.
•Monitor for success — Track compression ratio, hit rate, and rejection statistics.
•Know when to disable — Encrypted/compressed workloads don't benefit from zswap.

What's Next:

The next page explores zram—a complementary technology that creates a compressed block device in RAM. While zswap intercepts the swap path to existing swap devices, zram creates an entirely new compressed swap device. Understanding both enables optimal memory compression strategies.

Page Complete

You now have deep knowledge of zswap internals—interception mechanisms, compression paths, pool management, and production deployment. This prepares you to effectively deploy, monitor, and troubleshoot zswap in production Linux systems.

2 / 5

Loading learning content...

Operating SystemsMemory Compression

Memory Compression

LevelAdvanced

Duration90 mins

TopicMemory Compression

2 / 5

zswap

Intercepting the Swap Path

zswap revolutionizes this by inserting a compressed cache layer between the reclaim path and the swap device. Instead of immediately writing pages to disk, zswap:

Intercepts pages on their way to swap
Compresses them using a fast algorithm (LZ4, LZO, Zstd, etc.)
Stores the compressed data in a RAM-based pool
Defers actual disk I/O until absolutely necessary

The result: under memory pressure, systems with zswap maintain dramatically better responsiveness than those relying purely on swap.

What You Will Learn

zswap Architecture Overview

Core Components:

Component	Purpose	Implementation
Frontend	Intercepts swap-out requests	Frontswap ops registration
Compressor	Compresses/decompresses pages	Crypto API (lz4, lzo, zstd, etc.)
zpool	Stores compressed pages	zbud, z3fold, or zsmalloc
Same-filled check	Optimizes zero/same-filled pages	Deduplication to single value
Writeback	Evicts cold pages to actual swap	kthread-based background worker
Entry Tree	Maps (swap type, offset) to compressed entries	Red-black tree

The Data Flow:

Converting Mermaid diagram...

When zswap intercepts a page:

Check same-filled: Is the page entirely filled with the same byte value?
- If yes: Store only the pattern byte (massive savings)
- If no: Continue to compression
Compress page: Apply the configured algorithm
Check compression ratio: Did compression achieve sufficient savings?
- If ratio below threshold: Reject, let page go to swap directly
- If ratio acceptable: Continue
Allocate zpool space: Get space in the compressed pool
- If allocation fails: Reject, page goes to swap
- If successful: Continue
Store and index: Copy compressed data, create index entry, mark success
Free original page: The uncompressed page frame is now available for reuse

The 'Free Lunch' of Same-Filled Pages

Frontend Interception Mechanism

zswap hooks into the kernel's swap path via the frontswap API—a clean abstraction that allows backend implementations to intercept swap operations without modifying the core VM code.

Frontswap Operations:

struct frontswap_ops {
    void (*init)(unsigned type);           /* Swap area initialized */
    int (*store)(unsigned type,            /* Store a page */
                 pgoff_t offset,
                 struct page *page);
    int (*load)(unsigned type,             /* Load (decompress) a page */
                pgoff_t offset,
                struct page *page);
    void (*invalidate_page)(unsigned type, /* Page no longer needed */
                            pgoff_t offset);
    void (*invalidate_area)(unsigned type); /* Swap area deactivated */
};

Store Operation (Compression Path):

When the kernel calls swap_writepage() to write a page to swap, frontswap intercepts and calls zswap's store operation:

zswap_store.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
/* Simplified zswap store path (Linux kernel) */
 
static int zswap_frontswap_store(unsigned type, pgoff_t offset,
                                  struct page *page)
{
    struct zswap_tree *tree = zswap_trees[type];
    struct zswap_entry *entry, *dupentry;
    struct crypto_acomp_ctx *acomp_ctx;
    struct scatterlist input, output;
    int ret, dlen = PAGE_SIZE;
    unsigned long handle;
    char *buf;
    u8 *src;
 
    /* Check if zswap is enabled and pool is available */
    if (!zswap_enabled || !tree)
        return -ENODEV;
 
    /* Allocate entry metadata */
    entry = zswap_entry_cache_alloc(GFP_KERNEL);
    if (!entry)
        return -ENOMEM;
 
    /* Check for same-filled pages first (optimization) */
    src = kmap_atomic(page);
    if (zswap_same_filled_pages_enabled &&
        zswap_is_page_same_filled(src, &entry->value)) {
        kunmap_atomic(src);
        entry->length = 0;  /* Marker for same-filled */
        goto insert;
    }
 
    /* Get compression context for this CPU */
    acomp_ctx = raw_cpu_ptr(zswap_comp->acomp_ctx);
    mutex_lock(&acomp_ctx->mutex);
 
    /* Setup source scatter-gather */
    sg_init_table(&input, 1);
    sg_set_page(&input, page, PAGE_SIZE, 0);
 
    /* Compress into temporary buffer */
    buf = acomp_ctx->dstmem;
    sg_init_one(&output, buf, PAGE_SIZE);
 
    acomp_request_set_params(acomp_ctx->req, &input, &output,
                             PAGE_SIZE, dlen);
    ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req),
                          &acomp_ctx->wait);
    dlen = acomp_ctx->req->dlen;
 
    kunmap_atomic(src);
 
    /* Check if compression was worthwhile */
    if (ret || dlen >= zswap_max_pool_size_ratio * PAGE_SIZE) {
        mutex_unlock(&acomp_ctx->mutex);
        ret = -EINVAL;  /* Reject: compression didn't help enough */
        goto freepage;
    }
 
    /* Allocate space in zpool */
    ret = zpool_malloc(zswap_pool->zpool, dlen +
                       sizeof(struct zswap_header), &handle);
    if (ret) {
        mutex_unlock(&acomp_ctx->mutex);
        ret = -ENOMEM;
        goto freepage;
    }
 
    /* Copy compressed data to zpool */
    char *dst = zpool_map_handle(zswap_pool->zpool, handle, ZPOOL_MM_WO);
    memcpy(dst, buf, dlen);
    zpool_unmap_handle(zswap_pool->zpool, handle);
 
    mutex_unlock(&acomp_ctx->mutex);
 
    /* Setup entry */
    entry->handle = handle;
    entry->length = dlen;
 
insert:
    entry->offset = offset;
    entry->refcount = 1;
    entry->pool = zswap_pool;
 
    /* Insert into tree, checking for duplicates */
    spin_lock(&tree->lock);
    dupentry = zswap_rb_search(&tree->rbroot, offset);
    if (dupentry) {
        zswap_entry_put(tree, dupentry);
        zswap_rb_erase(&tree->rbroot, dupentry);
    }
    zswap_rb_insert(&tree->rbroot, entry);
    spin_unlock(&tree->lock);
 
    /* Update statistics */
    atomic_inc(&zswap_stored_pages);
    zswap_pool_total_size = zpool_get_total_size(zswap_pool->zpool);
 
    return 0;  /* Success: page is now in zswap */
 
freepage:
    zswap_entry_cache_free(entry);
    return ret;  /* Failure: page should go to regular swap */
}

Return Values Matter

Load Operation (Decompression Path)

When a process accesses a page that was compressed into zswap, the kernel must decompress and return the original data. This happens through the frontswap load operation.

The Load Path:

Page fault occurs: Process accesses a swapped-out page
Swap subsystem invoked: Kernel calls swap_readpage()
Frontswap intercept: zswap's load operation is called first
Lookup entry: Find the compressed entry in the red-black tree
Decompress: Expand the compressed data into the target page
Invalidate entry: Remove from zswap (page is now in regular memory)
Return success: Page fault handled, process continues

zswap_load.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
/* Simplified zswap load path */
 
static int zswap_frontswap_load(unsigned type, pgoff_t offset,
                                 struct page *page)
{
    struct zswap_tree *tree = zswap_trees[type];
    struct zswap_entry *entry;
    struct crypto_acomp_ctx *acomp_ctx;
    u8 *src, *dst;
    unsigned int dlen;
    int ret;
 
    /* Lookup entry in tree */
    spin_lock(&tree->lock);
    entry = zswap_rb_search(&tree->rbroot, offset);
    if (!entry) {
        spin_unlock(&tree->lock);
        return -ENOENT;  /* Not in zswap, try regular swap */
    }
    zswap_entry_get(entry);  /* Take reference */
    spin_unlock(&tree->lock);
 
    /* Handle same-filled pages */
    if (entry->length == 0) {
        dst = kmap_atomic(page);
        zswap_fill_page(dst, entry->value);
        kunmap_atomic(dst);
        goto stats;
    }
 
    /* Map compressed data from zpool */
    src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
 
    /* Get decompression context */
    acomp_ctx = raw_cpu_ptr(zswap_comp->acomp_ctx);
    mutex_lock(&acomp_ctx->mutex);
 
    /* Setup decompression */
    struct scatterlist input, output;
    sg_init_one(&input, src, entry->length);
    sg_init_table(&output, 1);
    sg_set_page(&output, page, PAGE_SIZE, 0);
 
    acomp_request_set_params(acomp_ctx->req, &input, &output,
                             entry->length, PAGE_SIZE);
 
    /* Decompress */
    ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req),
                          &acomp_ctx->wait);
    dlen = acomp_ctx->req->dlen;
 
    mutex_unlock(&acomp_ctx->mutex);
    zpool_unmap_handle(entry->pool->zpool, entry->handle);
 
    /* Verify decompression succeeded */
    if (ret || dlen != PAGE_SIZE) {
        zswap_entry_put(tree, entry);
        return -EIO;
    }
 
stats:
    atomic_dec(&zswap_stored_pages);
    zswap_entry_put(tree, entry);
 
    return 0;  /* Success: page decompressed into target */
}
 
/* Invalidate removes the entry entirely */
static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset)
{
    struct zswap_tree *tree = zswap_trees[type];
    struct zswap_entry *entry;
 
    spin_lock(&tree->lock);
    entry = zswap_rb_search(&tree->rbroot, offset);
    if (entry) {
        zswap_rb_erase(&tree->rbroot, entry);
        zswap_entry_put(tree, entry);
    }
    spin_unlock(&tree->lock);
}

Critical Performance Considerations:

Operation	Typical Latency	Key Factors
Tree lookup	~100 ns	Tree depth, cache locality
zpool map	~50 ns	Pool type, memory access
Decompression	300-1000 ns	Algorithm, data size
Page mapping	~100 ns	TLB state, NUMA effects
Total load	500-1500 ns	Sum of above

Compare to:

Swap read (NVMe SSD): ~10,000 ns
Swap read (SATA SSD): ~100,000 ns
Swap read (HDD): ~10,000,000 ns

zswap provides 10-10,000x improvement over disk-based swap for cached pages.

Entry Lifetime

zpool Backend Details

The zpool is the storage backend where compressed page data lives. zswap uses the kernel's zpool abstraction, which can be backed by different allocation strategies:

zbud (Budget):

Stores exactly 2 compressed pages per physical page
Very low fragmentation (two slots per page, simple management)
Space efficiency: ~50% typical (wastes space if compression ratio > 2:1)
Use case: Predictable, low-overhead systems

z3fold:

Stores up to 3 compressed pages per physical page
Better space efficiency than zbud (~33% overhead per page)
Slightly higher CPU overhead for management
Use case: Balance between efficiency and overhead

zsmalloc:

Stores arbitrary sizes using class-based allocation
Best space efficiency (minimal fragmentation)
Higher complexity and CPU overhead
Use case: Maximum memory savings, CPU-rich systems

zpool Backend Comparison
Feature	zbud	z3fold	zsmalloc
Pages per physical page	2	3	Variable
Fragmentation	Low	Medium	Minimal
CPU overhead	Very low	Low	Medium
Memory efficiency	~75%	~85%	~95%
Complexity	Simple	Moderate	Complex
Compaction support	No	Limited	Yes
Best for	Low overhead	Balanced	Max savings

zbud Internal Structure:

┌─────────────────────────────────┐
│         Physical Page           │
├───────────────┬─────────────────┤
│   Entry 1     │    Entry 2      │
│   (< 2KB)     │    (< 2KB)      │
│               │                 │
│  Compressed   │   Compressed    │
│    Data       │     Data        │
├───────────────┴─────────────────┤
│         Metadata Header         │
└─────────────────────────────────┘

zbud divides each 4KB page into two "buddies" that can each hold a compressed page up to ~2KB. If a compressed page exceeds 2KB, it gets the whole physical page, wasting the other half.

z3fold Internal Structure:

┌─────────────────────────────────┐
│         Physical Page           │
├──────────┬──────────┬───────────┤
│  Entry 1 │ Entry 2  │  Entry 3  │
│  (≤1.3KB)│ (≤1.3KB) │  (≤1.3KB) │
│          │          │           │
│  Data 1  │  Data 2  │  Data 3   │
├──────────┴──────────┴───────────┤
│         Metadata + Padding      │
└─────────────────────────────────┘

z3fold allows up to 3 entries, improving density when pages compress to < 1.3KB.

Choosing the Right zpool

Pool Sizing and Writeback

zswap's pool has a configurable maximum size, expressed as a percentage of total RAM. When the pool fills, zswap can either:

Reject new entries — New swap-outs bypass zswap and go directly to disk
Writeback old entries — Evict cold compressed entries to actual swap, freeing pool space

Writeback Mechanism:

zswap implements a background writeback thread (zswap_shrink_worker) that proactively evicts pages from the compressed pool to disk when:

Pool utilization exceeds a threshold
System is otherwise idle
Specific entries are identified as cold

zswap_writeback.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
/* Simplified zswap writeback mechanism */
 
/* Writeback worker - runs when pool is getting full */
static void zswap_shrink_worker(struct work_struct *work)
{
    struct zswap_pool *pool = container_of(work, 
                                           struct zswap_pool,
                                           shrink_work);
    unsigned long pool_size, target_size;
    int ret;
 
    /* Calculate target: we want to free 10% of pool */
    pool_size = zpool_get_total_size(pool->zpool);
    target_size = pool_size - (pool_size / 10);
 
    while (zpool_get_total_size(pool->zpool) > target_size) {
        /* Select oldest entry via LRU */
        struct zswap_entry *entry = get_lru_entry(pool);
        if (!entry)
            break;
 
        /* Write entry to actual swap device */
        ret = zswap_writeback_entry(pool, entry);
        if (ret) {
            /* Writeback failed, return entry to pool */
            put_entry_back(pool, entry);
            break;
        }
 
        /* Free pool space */
        zpool_free(pool->zpool, entry->handle);
        zswap_entry_cache_free(entry);
 
        atomic_dec(&zswap_stored_pages);
        atomic_inc(&zswap_written_back_pages);
 
        /* Yield to prevent monopolizing CPU */
        cond_resched();
    }
}
 
/* Write a compressed entry to actual swap */
static int zswap_writeback_entry(struct zswap_pool *pool,
                                  struct zswap_entry *entry)
{
    struct page *page;
    swp_entry_t swpentry;
    struct bio *bio;
    u8 *src, *dst;
 
    /* Allocate temporary page */
    page = alloc_page(GFP_NOIO);
    if (!page)
        return -ENOMEM;
 
    /* For same-filled pages, just fill */
    if (entry->length == 0) {
        dst = kmap_atomic(page);
        zswap_fill_page(dst, entry->value);
        kunmap_atomic(dst);
    } else {
        /* Decompress into temporary page */
        src = zpool_map_handle(pool->zpool, entry->handle, ZPOOL_MM_RO);
        dst = kmap_atomic(page);
        
        ret = decompress(entry->algo, src, entry->length,
                        dst, PAGE_SIZE);
        
        kunmap_atomic(dst);
        zpool_unmap_handle(pool->zpool, entry->handle);
 
        if (ret != PAGE_SIZE) {
            __free_page(page);
            return -EIO;
        }
    }
 
    /* Construct swap entry */
    swpentry = entry_to_swp_entry(entry);
 
    /* Write to swap device */
    bio = bio_alloc(GFP_NOIO, 1);
    bio_set_dev(bio, get_swap_bdev(swpentry));
    bio->bi_iter.bi_sector = map_swap_page(swpentry);
    bio_add_page(bio, page, PAGE_SIZE, 0);
    bio->bi_opf = REQ_OP_WRITE;
 
    submit_bio_wait(bio);
    bio_put(bio);
 
    __free_page(page);
    return 0;
}

Pool Sizing Considerations:

Pool Size	Behavior	Trade-off
Small (5-10%)	Frequent writeback, low memory use	Higher I/O, less benefit
Medium (20-30%)	Balanced operation	Good starting point
Large (50%+)	Rare writeback, maximum caching	May starve applications

Recommendation: Start with 20% (max_pool_percent=20) and adjust based on:

High writeback rate → Increase pool size
Application memory pressure → Decrease pool size
High compression ratio → Larger pool is more efficient

The Writeback Decision

Configuration and Tuning

zswap is highly configurable through sysfs parameters. Understanding these parameters is essential for optimal deployment:

Enable/Disable (enabled):

echo Y > /sys/module/zswap/parameters/enabled   # Enable
echo N > /sys/module/zswap/parameters/enabled   # Disable

Compression Algorithm (compressor):

# Available algorithms (depends on kernel config)
cat /proc/crypto | grep -E 'name.*lz[o4]|zstd'

# Set compressor
echo lz4 > /sys/module/zswap/parameters/compressor

Pool Allocator (zpool):

echo z3fold > /sys/module/zswap/parameters/zpool
echo zbud > /sys/module/zswap/parameters/zpool
echo zsmalloc > /sys/module/zswap/parameters/zpool

Maximum Pool Size (max_pool_percent):

echo 25 > /sys/module/zswap/parameters/max_pool_percent

Same-Filled Pages (same_filled_pages_enabled):

echo Y > /sys/module/zswap/parameters/same_filled_pages_enabled

zswap Parameter Reference
Parameter	Default	Range	Description
`enabled`	N	Y/N	Master enable switch
`compressor`	lzo-rle	lz4, lzo, zstd, etc.	Compression algorithm
`zpool`	z3fold	zbud, z3fold, zsmalloc	Pool allocator
`max_pool_percent`	20	1-100	Max pool as % of RAM
`accept_threshold_percent`	90	1-100	Min compression ratio
`same_filled_pages_enabled`	Y	Y/N	Zero-page optimization
`non_same_filled_pages_enabled`	Y	Y/N	Enable regular compression

zswap_setup.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash
# zswap configuration script for production servers
 
# Enable zswap
echo Y > /sys/module/zswap/parameters/enabled
 
# Use LZ4 for best speed (or zstd for best ratio)
echo lz4 > /sys/module/zswap/parameters/compressor
 
# Use z3fold for good balance
echo z3fold > /sys/module/zswap/parameters/zpool
 
# Set pool to 25% of RAM
echo 25 > /sys/module/zswap/parameters/max_pool_percent
 
# Enable same-filled page optimization
echo Y > /sys/module/zswap/parameters/same_filled_pages_enabled
 
# Verify configuration
echo "=== zswap Configuration ==="
for param in enabled compressor zpool max_pool_percent; do
    val=$(cat /sys/module/zswap/parameters/$param)
    echo "$param: $val"
done
 
# For persistent configuration, add to kernel command line:
# zswap.enabled=1 zswap.compressor=lz4 zswap.zpool=z3fold zswap.max_pool_percent=25

Algorithm Selection Guide

Monitoring and Debugging

Effective zswap operation requires monitoring key metrics to ensure the system is behaving as expected.

Primary Statistics Location:

/sys/kernel/debug/zswap/*  # Detailed statistics (debugfs)

Key Metrics to Monitor:

Critical zswap Metrics

•stored_pages — Total pages currently in zswap pool
•same_filled_pages — Pages stored as same-filled (huge savings)
•pool_total_size — Current memory used by pool
•duplicate_entry — Entries replaced (indicates repeated swapouts)
•pool_limit_hit — Times pool reached max size (may need larger pool)
•reject_compress_poor — Rejections due to poor compression
•reject_alloc_fail — Rejections due to allocation failure
•written_back_pages — Pages evicted from pool to swap

zswap_monitor.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#!/bin/bash
# zswap monitoring script
 
DEBUGFS="/sys/kernel/debug/zswap"
 
# Check if debugfs is mounted and zswap is active
if [ ! -d "$DEBUGFS" ]; then
    echo "zswap debugfs not available"
    exit 1
fi
 
echo "=== zswap Statistics ==="
echo ""
 
# Core metrics
stored=$(cat $DEBUGFS/stored_pages 2>/dev/null || echo 0)
same_filled=$(cat $DEBUGFS/same_filled_pages 2>/dev/null || echo 0)
pool_size=$(cat $DEBUGFS/pool_total_size 2>/dev/null || echo 0)
written_back=$(cat $DEBUGFS/written_back_pages 2>/dev/null || echo 0)
 
# Calculate effective size
page_size=4096
stored_bytes=$((stored * page_size))
ratio="N/A"
if [ "$pool_size" -gt 0 ]; then
    ratio=$(echo "scale=2; $stored_bytes / $pool_size" | bc)
fi
 
echo "Stored pages:       $stored ($(numfmt --to=iec $stored_bytes))"
echo "Same-filled pages:  $same_filled"
echo "Pool size:          $(numfmt --to=iec $pool_size)"
echo "Effective ratio:    ${ratio}:1"
echo "Written back:       $written_back"
echo ""
 
# Rejection stats
echo "=== Rejection Statistics ==="
echo "Compress poor:      $(cat $DEBUGFS/reject_compress_poor 2>/dev/null || echo 0)"
echo "Alloc fail:         $(cat $DEBUGFS/reject_alloc_fail 2>/dev/null || echo 0)"
echo "Pool limit hit:     $(cat $DEBUGFS/pool_limit_hit 2>/dev/null || echo 0)"
echo ""
 
# Configuration
echo "=== Configuration ==="
for param in enabled compressor zpool max_pool_percent; do
    val=$(cat /sys/module/zswap/parameters/$param 2>/dev/null || echo "N/A")
    printf "%-20s %s
" "$param:" "$val"
done

Diagnosing Problems

Production Best Practices

Deploying zswap in production requires careful consideration of workload characteristics and system constraints. Here are battle-tested recommendations:

zswap Recommended For

•Systems with swap configured
•Memory-bound workloads
•Desktop/laptop responsiveness
•VMs with limited memory allocation
•Containers with memory limits
•Systems with slow swap devices (HDD)

zswap Not Recommended For

•Encrypted workloads (dm-crypt, encrypted home)
•Media processing (already compressed)
•CPU-bound workloads with no memory pressure
•Systems with ample RAM (no swap activity)
•Real-time systems (unpredictable latency)
•Systems without swap configured

Production Deployment Checklist

•Verify swap is configured — zswap requires a swap device/file to work
•Test compression ratio — Run workload and check pool_total_size vs stored_pages
•Choose appropriate algorithm — LZ4 for latency, Zstd for ratio
•Size the pool appropriately — Start at 20%, adjust based on metrics
•Monitor during burn-in — Watch for high rejection rates or pool thrashing
•Add to kernel cmdline — Make configuration persistent across reboots
•Test failover behavior — Verify system behaves correctly when pool is full
•Document configuration — Record settings and rationale for future operators

Expected Impact

Summary: zswap

We've explored zswap in depth—from its architecture and interception mechanism through configuration and production deployment. Let's consolidate the key concepts:

Key Takeaways

•zswap intercepts swap-outs — Pages destined for swap are compressed and stored in RAM instead.
•Frontswap integration — Clean kernel API allows zswap to intercept without modifying core VM code.
•Same-filled optimization — Zero and pattern-filled pages are stored with minimal overhead.
•zpool backends — zbud, z3fold, and zsmalloc offer different space/CPU tradeoffs.
•Writeback mechanism — When pool is full, cold pages are evicted to actual swap.
•Highly configurable — Algorithm, pool size, and behavior are tunable via sysfs.
•Monitor for success — Track compression ratio, hit rate, and rejection statistics.
•Know when to disable — Encrypted/compressed workloads don't benefit from zswap.

What's Next:

Page Complete

2 / 5