Loading content...
Traditional huge pages require explicit configuration: reserving memory at boot, mounting filesystems, modifying application code. This creates a barrier to adoption. Many applications that would benefit from huge pages never use them because the setup is too complex or the workload too dynamic.
Transparent Huge Pages (THP) reimagines this approach. Introduced in Linux 2.6.38, THP automatically promotes contiguous 4KB pages into 2MB huge pagesβwithout any application modification. The backing of virtual memory regions with huge pages happens "transparently" from the application's perspective.
But this magic comes with tradeoffs. THP introduces background daemon activity, potential latency spikes, and memory overhead that can harm certain workloads. Understanding these tradeoffs is essential for making informed decisions about THP configuration in production systems.
By the end of this page, you will understand how THP works internally, master its configuration options (always, madvise, never), know when to enable or disable THP for different workloads, and recognize the symptoms of THP-related performance issues.
THP operates through two primary mechanisms: allocation-time promotion and background promotion via the khugepaged kernel thread.
Allocation-time promotion:
When an application allocates memory (through mmap, brk, etc.), the kernel attempts to back the allocation with huge pages immediately if:
If huge page allocation succeeds, the entire 2MB region is mapped with a single page table entry at the Page Directory level.
Background promotion (khugepaged):
When immediate allocation isn't possible, the khugepaged kernel thread continuously scans memory looking for opportunities to create huge pages from existing 4KB pages:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126
/* * Conceptual representation of THP promotion decision making * Based on Linux kernel mm/khugepaged.c logic */ #include <stdbool.h>#include <stdint.h> #define HPAGE_SIZE (2UL * 1024 * 1024)#define PAGE_SIZE 4096#define PAGES_PER_HPAGE (HPAGE_SIZE / PAGE_SIZE) // 512 struct vm_area_struct; // Forward declarationstruct page; /** * Check if a VMA is eligible for THP */bool thp_vma_eligible(struct vm_area_struct *vma) { // Check THP mode // - "always": All anonymous mappings eligible // - "madvise": Only regions with MADV_HUGEPAGE // - "never": No promotion // Must be anonymous memory (not file-backed) if (vma_is_file_backed(vma)) return false; // Must have write permission (copy-on-write optimization) if (!(vma->vm_flags & VM_WRITE)) return false; // Must not be marked NOHUGEPAGE if (vma->vm_flags & VM_NOHUGEPAGE) return false; // In madvise mode, must have HUGEPAGE hint if (thp_mode == THP_MADVISE) { if (!(vma->vm_flags & VM_HUGEPAGE)) return false; } return true;} /** * Check if a specific address range can be collapsed to huge page * Called by khugepaged */bool can_collapse_range(unsigned long address, struct vm_area_struct *vma) { unsigned long start = address & ~(HPAGE_SIZE - 1); // Align to 2MB int unmapped_pages = 0; int swap_pages = 0; int max_unmapped = 0; // Configurable via sysfs // Check all 512 pages in the range for (int i = 0; i < PAGES_PER_HPAGE; i++) { unsigned long addr = start + (i * PAGE_SIZE); struct page *page = follow_page(vma, addr); if (page == NULL) { // Page not present - would need to fault in unmapped_pages++; if (unmapped_pages > max_unmapped) return false; } else if (page_is_in_swap(page)) { // Page is swapped out swap_pages++; if (swap_pages > 0) // Usually don't collapse if any swapped return false; } else if (page_count(page) != 1) { // Page is shared - would need copy-on-write // May or may not collapse depending on settings } } // Check if huge page allocation is likely to succeed if (!memory_has_contiguous_2mb()) return false; return true;} /** * Perform the actual collapse operation * This is expensive - involves copying and page table updates */int collapse_range_to_hugepage(unsigned long address, struct vm_area_struct *vma) { struct page *hpage; unsigned long start = address & ~(HPAGE_SIZE - 1); // Allocate a new huge page hpage = alloc_hugepage(GFP_KERNEL | __GFP_COMP); if (!hpage) return -ENOMEM; // Copy content from 512 small pages for (int i = 0; i < PAGES_PER_HPAGE; i++) { void *src = page_address(follow_page(vma, start + i*PAGE_SIZE)); void *dst = page_address(hpage) + (i * PAGE_SIZE); copy_page(dst, src); // memcpy of PAGE_SIZE bytes } // Update page tables atomically // This requires locking and TLB shootdown update_pmd_to_hugepage(vma, start, hpage); // Free original small pages for (int i = 0; i < PAGES_PER_HPAGE; i++) { struct page *old = follow_page(vma, start + i*PAGE_SIZE); put_page(old); } return 0;} /* * Key points: * * 1. THP promotion is opportunistic, not guaranteed * 2. khugepaged runs continuously in background * 3. Promotion involves copying ~2MB of data * 4. Page tables must be updated atomically * 5. TLB entries must be invalidated across all CPUs */THP promotion isn't instant. The khugepaged daemon introduces a delay between when pages become eligible and when they're actually promoted. This delay can be tuned but is typically 10+ seconds. For short-lived allocations, THP benefits may never materialize.
THP behavior is controlled through /sys/kernel/mm/transparent_hugepage/enabled. There are three modes, each with distinct implications:
| Mode | Behavior | Use Case | Risk Level |
|---|---|---|---|
| always | All anonymous memory eligible | General-purpose servers | Medium-High |
| madvise | Only MADV_HUGEPAGE regions | Databases, JVMs, controlled apps | Low |
| never | THP completely disabled | Latency-critical, embedded | None |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
#!/bin/bash## Transparent Huge Pages Configuration Script# THP_PATH="/sys/kernel/mm/transparent_hugepage" show_current_settings() { echo "=== CURRENT THP SETTINGS ===" echo "" # Main THP mode echo "THP Enabled Mode:" cat $THP_PATH/enabled echo "" # Defrag behavior echo "Defrag Mode:" cat $THP_PATH/defrag echo "" # khugepaged settings echo "khugepaged Settings:" echo " Scan interval: $(cat $THP_PATH/khugepaged/scan_sleep_millisecs)ms" echo " Pages to scan: $(cat $THP_PATH/khugepaged/pages_to_scan)" echo " Max pages to collapse: $(cat $THP_PATH/khugepaged/max_ptes_none)" echo "" # Statistics echo "THP Statistics:" cat /proc/vmstat | grep thp_} set_mode() { local mode=$1 case $mode in always|madvise|never) echo $mode > $THP_PATH/enabled echo "THP mode set to: $mode" ;; *) echo "Invalid mode. Use: always, madvise, or never" return 1 ;; esac} # THP defrag modes:# always: Always try to defrag on fault (can cause latency)# defer: Defer defrag to khugepaged (lower latency)# madvise: Only defrag for MADV_HUGEPAGE regions# never: Never defragset_defrag() { local mode=$1 echo $mode > $THP_PATH/defrag echo "Defrag mode set to: $mode"} # Tune khugepaged for different workload profilestune_for_throughput() { echo "Tuning khugepaged for throughput..." # Scan more aggressively echo 100 > $THP_PATH/khugepaged/scan_sleep_millisecs echo 4096 > $THP_PATH/khugepaged/pages_to_scan # Allow more unmapped pages in collapse range echo 511 > $THP_PATH/khugepaged/max_ptes_none echo " - Reduced sleep interval" echo " - Increased pages per scan" echo " - More aggressive collapsing"} tune_for_latency() { echo "Tuning khugepaged for latency..." # Scan less aggressively to reduce CPU overhead echo 10000 > $THP_PATH/khugepaged/scan_sleep_millisecs echo 512 > $THP_PATH/khugepaged/pages_to_scan # Be conservative about collapsing echo 0 > $THP_PATH/khugepaged/max_ptes_none echo " - Increased sleep interval" echo " - Reduced pages per scan" echo " - Conservative collapsing"} # Recommended production settings for databasesproduction_database() { echo "Configuring THP for database workload..." # Use madvise mode - let app control THP usage set_mode madvise # Defer defragmentation to background set_defrag defer # Conservative khugepaged settings echo 3000 > $THP_PATH/khugepaged/scan_sleep_millisecs echo 1024 > $THP_PATH/khugepaged/pages_to_scan echo "Done. Application should use madvise(MADV_HUGEPAGE) on buffer pool."} # Completely disable THP (for Redis, etc.)disable_completely() { echo "Completely disabling THP..." set_mode never set_defrag never echo "THP disabled. Restart applications to take full effect."} # Maincase "${1:- show}" in show) show_current_settings ;; always | madvise | never) set_mode $1 ;; throughput) tune_for_throughput ;; latency) tune_for_latency ;; database) production_database ;; disable) disable_completely ;; *) echo "Usage: $0 {show|always|madvise|never|throughput|latency|database|disable}" ;; esacMaking settings persistent:
THP settings reset on reboot. To persist them:
1234567891011121314151617181920
# / etc / systemd / system / disable - thp.service# Systemd service to disable THP at boot [Unit] Description = Disable Transparent Huge Pages DefaultDependencies = no Before = sysinit.target local - fs.target After = local - fs.target [Service] Type = oneshot ExecStart = /bin/sh - c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' ExecStart = /bin/sh - c 'echo never > /sys/kernel/mm/transparent_hugepage/defrag' [Install] WantedBy = basic.target # Enable with:# sudo systemctl enable disable - thp# sudo systemctl start disable - thpThe madvise mode provides the best of both worlds: applications that understand their memory patterns can opt-in to THP, while others use standard pages. This is the recommended mode for most production deployments.
Application integration:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205
/* * Using madvise() for controlled THP usage * * Compile: gcc -o thp_madvise thp_madvise.c */ #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <string.h> #include < sys / mman.h > #include <errno.h> #define HPAGE_SIZE(2UL * 1024 * 1024) /** * Allocate memory with THP hint * * @param size Bytes to allocate (rounded up to 2MB) * @param use_thp If true, advise kernel to use huge pages * @return Pointer to allocated memory, or NULL on failure */ void* alloc_with_thp_hint(size_t size, int use_thp) { void * addr; size_t aligned_size; // Round up to 2MB boundary for best THP results aligned_size = (size + HPAGE_SIZE - 1) & ~(HPAGE_SIZE - 1); // Allocate with 2MB alignment hint addr = mmap(NULL, aligned_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); return NULL; } // Set THP advice if (use_thp) { // MADV_HUGEPAGE: Hint to kernel that this region benefits from THP if (madvise(addr, aligned_size, MADV_HUGEPAGE) != 0) { perror("madvise MADV_HUGEPAGE"); // Continue anyway - madvise failure is not fatal } } else { // MADV_NOHUGEPAGE: Never use huge pages for this region if (madvise(addr, aligned_size, MADV_NOHUGEPAGE) != 0) { perror("madvise MADV_NOHUGEPAGE"); } } return addr; } /** * Example: Database buffer pool style allocation * Uses THP for large persistent buffer, small pages for metadata */typedef struct { void * buffer; // Main buffer - huge pages size_t buffer_size; void * metadata; // Metadata - regular pages size_t metadata_size; } BufferPool; BufferPool * create_buffer_pool(size_t buffer_mb) { BufferPool * pool = malloc(sizeof(BufferPool)); if (!pool) return NULL; pool -> buffer_size = buffer_mb * 1024 * 1024; pool -> metadata_size = 1024 * 1024; // 1MB metadata // Allocate main buffer with THP hint printf("Allocating %zu MB buffer with THP...", buffer_mb); pool -> buffer = alloc_with_thp_hint(pool -> buffer_size, 1); if (!pool -> buffer) { free(pool); return NULL; } // Allocate metadata WITHOUT THP (small, random access) printf("Allocating 1 MB metadata without THP..."); pool -> metadata = alloc_with_thp_hint(pool -> metadata_size, 0); if (!pool -> metadata) { munmap(pool -> buffer, pool -> buffer_size); free(pool); return NULL; } // Touch the memory to trigger allocation memset(pool -> buffer, 0, pool -> buffer_size); memset(pool -> metadata, 0, pool -> metadata_size); return pool; } void destroy_buffer_pool(BufferPool * pool) { if (pool) { if (pool -> buffer) munmap(pool -> buffer, pool -> buffer_size); if (pool -> metadata) munmap(pool -> metadata, pool -> metadata_size); free(pool); } } /** * Check if a memory region is backed by huge pages */int check_thp_status(void * addr, size_t size) { char smaps_path[64]; char line[256]; FILE * f; int huge_pages = 0; snprintf(smaps_path, sizeof(smaps_path), "/proc/self/smaps"); f = fopen(smaps_path, "r"); if (!f) { perror("fopen smaps"); return -1; } unsigned long target = (unsigned long) addr; int in_target_region = 0; while (fgets(line, sizeof(line), f)) { unsigned long start, end; // Check for region header if (sscanf(line, "%lx-%lx", & start, & end) == 2) { in_target_region = (target >= start && target < end); } // Check for AnonHugePages in target region if (in_target_region && strstr(line, "AnonHugePages:")) { unsigned long kb; if (sscanf(line, "AnonHugePages: %lu kB", & kb) == 1) { huge_pages = kb > 0; break; } } } fclose(f); return huge_pages; } int main() { printf("THP madvise() Demonstration"); printf("βββββββββββββββββββββββββββββββ "); // Create a buffer pool BufferPool * pool = create_buffer_pool(64); // 64MB buffer if (!pool) { fprintf(stderr, "Failed to create buffer pool"); return 1; } // Give khugepaged time to work printf("Waiting for THP promotion..."); sleep(2); // Check THP status printf("Checking huge page backing:"); int buffer_thp = check_thp_status(pool -> buffer, pool -> buffer_size); int metadata_thp = check_thp_status(pool -> metadata, pool -> metadata_size); printf(" Buffer (THP requested): %s", buffer_thp > 0 ? "Using huge pages β" : buffer_thp == 0 ? "Not using huge pages" : "Unknown"); printf(" Metadata (THP disabled): %s", metadata_thp > 0 ? "Using huge pages" : metadata_thp == 0 ? "Not using huge pages β" : "Unknown"); // Show detailed smaps for buffer region printf("Buffer region smaps (excerpt):"); char cmd[128]; snprintf(cmd, sizeof(cmd), "grep -A 20 '%lx' /proc/%d/smaps | head -25", (unsigned long)pool -> buffer, getpid()); system(cmd); // Cleanup destroy_buffer_pool(pool); return 0; } Despite the benefits, THP introduces several problems that have caused production incidents at scale. Understanding these issues is crucial for informed deployment decisions.
1. Memory Bloat:
THP can cause memory usage to balloon unexpectedly. A process allocating 2.1MB will receive 4MB (two 2MB huge pages) instead of ~3 pages of overhead with 4KB pages. For processes with many small-to-medium allocations, this waste compounds dramatically.
| Allocation | 4KB Pages | THP (2MB) | Overhead |
|---|---|---|---|
| 100 KB | 100 KB | 2 MB | 1,900% waste |
| 2.1 MB | 2.1 MB | 4 MB | 90% waste |
| 10 MB | 10 MB | 10 MB | No overhead |
| 100 MB | 100 MB | 100 MB | No overhead |
2. Latency Spikes from Compaction:
When huge pages aren't available, THP triggers memory compactionβmoving pages to create contiguous regions. This can cause latency spikes of hundreds of milliseconds, catastrophic for latency-sensitive applications.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
#!/bin/bash## Monitor THP - related latency events# echo "=== THP LATENCY MONITORING ==="echo ""echo "Watching for THP-related events that can cause latency spikes..."echo "Press Ctrl+C to stop"echo "" # Key counters to watch:# - compact_stall: Process stalled waiting for compaction# - thp_fault_alloc: Page faults that attempted huge page allocation# - thp_fault_fallback: Huge page allocation failed, fell back to 4KB# - thp_collapse_alloc: khugepaged allocations# - thp_collapse_alloc_failed: khugepaged allocation failures get_counter() { grep "^$1 " / proc / vmstat 2 > /dev/null | awk '{print $2}' } # Initial values PREV_COMPACT_STALL = $(get_counter compact_stall) PREV_THP_FALLBACK = $(get_counter thp_fault_fallback) PREV_COLLAPSE_FAIL = $(get_counter thp_collapse_alloc_failed) while true; do sleep 1 CURR_COMPACT_STALL = $(get_counter compact_stall) CURR_THP_FALLBACK = $(get_counter thp_fault_fallback) CURR_COLLAPSE_FAIL = $(get_counter thp_collapse_alloc_failed) # Calculate deltas DELTA_STALL = $((CURR_COMPACT_STALL - PREV_COMPACT_STALL)) DELTA_FALLBACK = $((CURR_THP_FALLBACK - PREV_THP_FALLBACK)) DELTA_COLLAPSE_FAIL = $((CURR_COLLAPSE_FAIL - PREV_COLLAPSE_FAIL)) # Report if any activity if [$DELTA_STALL - gt 0] || [$DELTA_FALLBACK - gt 0] || [$DELTA_COLLAPSE_FAIL - gt 0]; then echo "[$(date +%H:%M:%S)] Events:" [$DELTA_STALL - gt 0 ] && echo " β οΈ compact_stall: +$DELTA_STALL (LATENCY RISK!)" [$DELTA_FALLBACK - gt 0 ] && echo " π thp_fault_fallback: +$DELTA_FALLBACK" [$DELTA_COLLAPSE_FAIL - gt 0 ] && echo " β collapse_alloc_failed: +$DELTA_COLLAPSE_FAIL" fi PREV_COMPACT_STALL = $CURR_COMPACT_STALL PREV_THP_FALLBACK = $CURR_THP_FALLBACK PREV_COLLAPSE_FAIL = $CURR_COLLAPSE_FAIL done3. khugepaged CPU Overhead:
The khugepaged daemon continuously scans memory and copies pages. On busy systems with high memory churn, this overhead becomes significant.
Redis explicitly warns: 'You have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis.' Always disable THP for Redis. This is one of the most common production misconfigurations.
THP defrag settings control how aggressive the kernel is about finding memory for huge pages. This is separate from the enabled mode and is configured via / sys / kernel / mm / transparent_hugepage / defrag.
| Mode | Behavior | Latency Impact | Recommendation |
|---|---|---|---|
| always | Sync compaction on every fault | Highest | Only for batch workloads |
| defer | Async compaction via kcompactd | Low | Good default |
| defer+madvise | Sync for MADV_HUGEPAGE, async for rest | Medium | Best for mixed workloads |
| madvise | Only defrag for MADV_HUGEPAGE regions | Low | Recommended for databases |
| never | Never defrag for THP | None | Use with explicit huge pages |
The defrag decision flow:
12345678910111213141516171819202122232425262728
THP Allocation Request β βββ Check if huge page available immediately β β β βββ YES ββ> Allocate huge page β β β β βββ NO ββ> Check defrag mode β β β βββ "never" β β βββ Fall back to 4KB pages β β β βββ "madvise" β β βββ Has MADV_HUGEPAGE ? β β β βββ YES ββ> Sync compaction(blocks) β β β βββ NO ββ> Fall back to 4KB β β β βββ "defer" or "defer+madvise" β β βββ Wake kcompactd β β βββ Fall back to 4KB(for now) β β βββ khugepaged will promote later β β β βββ "always" β βββ Sync compaction(blocks) β βββ Success ββ> Huge page β βββ Fail ββ> 4KB pages Key: "blocks" means the calling thread waits, causing latency. "defer" modes return immediately, defrag happens in background.On Linux 4.6+, use 'defer+madvise' as the defrag mode. This gives synchronous compaction only where applications explicitly want it (via MADV_HUGEPAGE), while using the low-latency kcompactd path for everything else.
Effective THP management requires monitoring to understand what's happening in production. Key data sources include / proc / meminfo, / proc / vmstat, and per-process / proc / <pid>/smaps.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
#!/bin/bash## Comprehensive THP Status Report# echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo " TRANSPARENT HUGE PAGES STATUS REPORT"echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo "" # Section 1: Current Configurationecho "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo "β CONFIGURATION β"echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo ""THP_PATH="/sys/kernel/mm/transparent_hugepage"echo "THP Enabled: $(cat $THP_PATH/enabled)"echo "Defrag Mode: $(cat $THP_PATH/defrag)"echo ""echo "khugepaged settings:"echo " scan_sleep_millisecs: $(cat $THP_PATH/khugepaged/scan_sleep_millisecs)"echo " pages_to_scan: $(cat $THP_PATH/khugepaged/pages_to_scan)"echo " max_ptes_none: $(cat $THP_PATH/khugepaged/max_ptes_none)" # Section 2: Memory Usageecho ""echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo "β MEMORY USAGE β"echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo ""grep -E "^(AnonHugePages|ShmemHugePages|HugePages_|Hugepagesize)" /proc/meminfo # Calculate THP percentageANON_HUGE=$(grep "^AnonHugePages:" /proc/meminfo | awk '{print $2}')ANON_TOTAL=$(grep "^AnonPages:" /proc/meminfo | awk '{print $2}')if [ -n "$ANON_HUGE" ] && [ -n "$ANON_TOTAL" ] && [ "$ANON_TOTAL" -gt 0 ]; then PERCENT=$(echo "scale=2; $ANON_HUGE * 100 / $ANON_TOTAL" | bc) echo "" echo "THP Coverage: ${PERCENT}% of anonymous memory"fi # Section 3: Statisticsecho ""echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo "β THP STATISTICS (cumulative since boot) β"echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo ""echo "Page Fault Allocations:"echo " thp_fault_alloc: $(grep "^thp_fault_alloc " /proc/vmstat | awk '{print $2}')"echo " thp_fault_fallback: $(grep "^thp_fault_fallback " /proc/vmstat | awk '{print $2}')" echo ""echo "khugepaged Activity:"echo " thp_collapse_alloc: $(grep "^thp_collapse_alloc " /proc/vmstat | awk '{print $2}')"echo " thp_collapse_alloc_failed: $(grep "^thp_collapse_alloc_failed " /proc/vmstat | awk '{print $2}')"echo " pages_scanned: $(cat $THP_PATH/khugepaged/pages_scanned 2>/dev/null || echo "N/A")"echo " full_scans: $(cat $THP_PATH/khugepaged/full_scans 2>/dev/null || echo "N/A")" echo ""echo "Splitting Events:"echo " thp_split_page: $(grep "^thp_split_page " /proc/vmstat | awk '{print $2}')"echo " thp_split_pmd: $(grep "^thp_split_pmd " /proc/vmstat | awk '{print $2}')" echo ""echo "Compaction (latency indicator):"echo " compact_stall: $(grep "^compact_stall " /proc/vmstat | awk '{print $2}')"echo " compact_fail: $(grep "^compact_fail " /proc/vmstat | awk '{print $2}')"echo " compact_success: $(grep "^compact_success " /proc/vmstat | awk '{print $2}')" # Section 4: Top THP consumersecho ""echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo "β TOP THP CONSUMERS (by AnonHugePages) β"echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo ""printf "%-10s %-40s %15s" "PID" "COMMAND" "AnonHugePages"echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ" for pid in /proc/[0-9]*; do pid_num=$(basename "$pid") if [ -f "$pid/smaps_rollup" ] 2>/dev/null; then ahp=$(grep "^AnonHugePages:" "$pid/smaps_rollup" 2>/dev/null | awk '{print $2}') if [ -n "$ahp" ] && [ "$ahp" -gt 0 ]; then cmd=$(cat "$pid/comm" 2>/dev/null | head -c 40) echo "$pid_num|$cmd|$ahp" fi fidone 2>/dev/null | sort -t'|' -k3 -rn | head -10 | while IFS='|' read pid cmd ahp; do printf "%-10s %-40s %12s kB" "$pid" "$cmd" "$ahp"done echo ""echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"echo "Report generated: $(date)"echo "βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ"Transparent Huge Pages offer automatic huge page benefits but come with significant tradeoffs. Here's the essential guidance:
| Workload Type | THP Mode | Defrag Mode | Notes |
|---|---|---|---|
| General purpose server | madvise | defer | Safe default |
| Database (PostgreSQL, MySQL) | madvise | defer | Use MADV_HUGEPAGE on buffer pool |
| Redis, MongoDB | never | never | Mandatoryβcauses severe issues |
| HPC, scientific computing | always | defer | Bandwidth-bound benefits |
| Low-latency trading | never | never | No latency spikes tolerated |
| Java/JVM large heap | madvise | defer+madvise | Set -XX:+UseTransparentHugePages |
What's next:
We've covered both explicit huge pages and THP. The final page brings everything together: When to Use Huge Pagesβa decision framework for choosing between standard pages, explicit huge pages, and THP based on workload characteristics, system constraints, and operational requirements.
You now understand Transparent Huge Pagesβhow they work, how to configure them, and critically, when they cause problems. The 'madvise' mode provides a safe middle ground, while workloads like Redis require THP to be completely disabled.