Loading content...
NFS abstracts network file access to appear local—but the network's presence can never be fully hidden. A local disk operation takes microseconds; a network round-trip takes milliseconds. This thousand-fold latency difference means that naive NFS usage can be painfully slow, while properly tuned NFS can closely approach local disk performance for many workloads.
Performance optimization in NFS requires understanding where time is spent: network latency, server processing, disk I/O, and the interactions between caching layers on client and server. Armed with this understanding, you can tune configurations, select appropriate mount options, and design applications that work with NFS's characteristics rather than against them.
This page provides a comprehensive guide to NFS performance: the fundamental bottlenecks, tuning parameters that matter, caching behaviors, and practical strategies for common workload patterns.
By the end of this page, you will understand NFS performance fundamentals, client and server caching mechanisms, critical tuning parameters, network optimization strategies, and how to diagnose and resolve common performance problems. You'll be able to configure NFS for optimal performance in your specific environment.
Every NFS operation incurs latency from multiple sources. Understanding these sources helps identify bottlenecks and focus optimization efforts effectively.
Latency Components of an NFS Operation
Consider a simple file read that misses all caches:
Total Latency = Client Processing
+ Network Latency (request)
+ Server Processing
+ Disk I/O
+ Network Latency (response)
+ Client Processing
Let's quantify typical values:
| Component | LAN (1Gbps) | WAN (100ms RTT) | Notes |
|---|---|---|---|
| Client RPC/XDR | 10-50 µs | 10-50 µs | CPU-bound, scales with data size |
| Network One-Way | 0.1-0.5 ms | 50 ms | Distance and congestion dependent |
| Server RPC Processing | 10-100 µs | 10-100 µs | Thread availability matters |
| Server Disk I/O | 0.1-10 ms (SSD) | 0.1-10 ms | Depends on cache hit |
| 5-15 ms (HDD) | 5-15 ms | Seek time dominates | |
| Total (cache miss) | 5-20 ms | 110-130 ms | Network dominates on WAN |
| Total (cache hit) | 0.5-2 ms | 100-105 ms | Still network-bound on WAN |
Key Insights:
On LANs, disk I/O often dominates — For cache misses, the server's disk is the bottleneck. SSDs dramatically improve NFS performance.
On WANs, network latency dominates — Even with SSDs, 100ms network RTT makes every operation slow. Reducing round-trips is critical.
Caching is everything — The difference between 'data in cache' and 'data on disk' is 10-100x for local disk, but caching also avoids network round-trips entirely.
Request size matters — Larger read/write sizes amortize per-operation overhead. A 1MB read is much more efficient than 256 × 4KB reads.
The most impactful performance optimization is reducing the number of network round-trips. Every technique we'll discuss—caching, read-ahead, compound operations, larger transfer sizes—ultimately serves this goal. One round-trip that transfers 1MB is far faster than 256 round-trips transferring 4KB each.
The NFS client employs multiple caching layers to reduce network traffic. Understanding and tuning these caches is essential for performance optimization.
Data Cache (Page Cache)
File contents are cached in the kernel's page cache, just like local files. When an application reads data, the NFS client:
The challenge is cache validity: how does the client know if cached data is still current?
Close-to-Open Consistency
NFS implements close-to-open consistency by default:
This means changes made on one client may not be visible to another client until it reopens the file.
12345678910111213141516171819202122
/* Demonstrating close-to-open consistency */ /* Client A */int fd = open("/mnt/nfs/file.txt", O_RDWR);// Client validates cache with server - file mtime=10:00:00char buf[100];read(fd, buf, 100); // Reads from cache or server// ... time passes, Client B modifies file ...read(fd, buf, 100); // Still sees old data! (cache not revalidated)close(fd); /* Client A reopens */fd = open("/mnt/nfs/file.txt", O_RDONLY);// Client validates cache - mtime now 10:05:00, cache invalidatedread(fd, buf, 100); // Now sees Client B's changesclose(fd); /* Summary: * - Changes are visible after close() + open() * - During a single open session, cache may be stale * - This is a deliberate design trade-off for performance */Attribute Cache
File attributes (size, mtime, mode, etc.) are cached separately with configurable timeouts:
| Mount Option | Default | Description |
|---|---|---|
actimeo=n | Set all attribute timeouts to n seconds | |
acregmin=n | 3 | Min cache time for regular file attributes |
acregmax=n | 60 | Max cache time for regular file attributes |
acdirmin=n | 30 | Min cache time for directory attributes |
acdirmax=n | 60 | Max cache time for directory attributes |
noac | Disable attribute caching entirely |
The timeout increases from min to max based on how long since the file was modified—files that haven't changed recently get longer cache times.
Directory Cache (dentry cache / DNLC)
Directory lookups (name → inode mapping) are cached:
First access to /mnt/nfs/deep/path/to/file.txt:
LOOKUP "deep" → handle_deep (cached)
LOOKUP "path" → handle_path (cached)
LOOKUP "to" → handle_to (cached)
LOOKUP "file.txt" → handle_file (cached)
Second access:
All lookups hit cache, no network traffic!
Negative entries (non-existent files) are also cached, avoiding repeated lookups for missing files.
Cache Invalidation Triggers:
lookupcache=none disables directory caching123456789101112131415161718192021222324
# Default caching - good for most workloadsmount -t nfs server:/export /mnt # Aggressive caching - for read-mostly workloads with rare changesmount -t nfs -o actimeo=3600 server:/export /mnt # 1-hour attribute cache # Reduced caching - for workloads needing fresher datamount -t nfs -o actimeo=1 server:/export /mnt # 1-second timeouts # No attribute caching - strict consistency, poor performancemount -t nfs -o noac server:/export /mnt # WARNING: Significant performance impact! # Disable directory caching (NFSv4)mount -t nfs -o lookupcache=none server:/export /mnt # /etc/fstab examplesserver:/export /mnt/default nfs defaults 0 0server:/export /mnt/read nfs ro,actimeo=3600 0 0server:/export /mnt/strict nfs noac,sync 0 0 # View current cache statisticscat /proc/fs/nfs/*/stats # Client statisticsnfsstat -c # Client summaryWhen facing consistency issues, it's tempting to set noac to disable all caching. This devastates performance—every stat(), open(), and many other operations require network round-trips. Usually, better solutions exist: application-level coordination, proper file locking, or accepting close-to-open semantics.
The NFS client employs sophisticated prefetching and write buffering to hide network latency from applications.
Read-Ahead: Prefetching Sequential Data
When the client detects sequential read patterns, it prefetches data before the application requests it:
Application reads:
read(offset=0, 4KB)
read(offset=4KB, 4KB)
read(offset=8KB, 4KB) ← Pattern detected: sequential!
NFS client behavior:
Application: read(offset=12KB, 4KB)
Client: Issues READ for 12KB AND prefetches 16KB-128KB
By time app asks for 16KB, data is already in cache!
Read-ahead window grows as sequential access continues and shrinks on random access or errors.
12345678910111213141516171819202122
# View current read-ahead setting for NFS mountcat /sys/class/bdi/*/read_ahead_kb# Default is often 128KB or higher # Set read-ahead for a specific device/mount# First find the bdi (backing device info) for your NFS mountmount | grep nfs# /dev/nfs4 on /mnt type nfs4 (...) # Set read-ahead to 1MB for better streaming performanceecho 1024 > /sys/class/bdi/0:XX/read_ahead_kb # Or use blockdev (some systems)blockdev --setra 2048 /dev/nfs4 # 2048 sectors = 1MB # NFS-specific read sizing via mount optionsmount -t nfs -o rsize=1048576 server:/export /mnt # 1MB reads # Application-level hint (posix_fadvise)# Tells kernel access pattern for optimizationposix_fadvise(fd, 0, 0, POSIX_FADV_SEQUENTIAL); // Will read sequentiallyposix_fadvise(fd, 0, 0, POSIX_FADV_RANDOM); // Random access patternWrite-Behind: Buffering Writes for Efficiency
The NFS client buffers writes in memory before sending to the server:
write()write() returns immediatelywsize worth of data)sync() or fsync() is calledWrite-behind benefits:
| Trigger | What Happens | Performance Impact |
|---|---|---|
| Buffer full | Background flush of dirty pages | None - asynchronous |
| Periodic timer | Flush pages older than threshold | None - asynchronous |
| close() | Flush all dirty pages for file | Blocks until server confirms |
| fsync() | Flush + wait for server commit | High - synchronous wait |
| sync() | Flush all NFS dirty pages | Very high - global flush |
| Memory pressure | Writeback to free memory | Variable |
Controlling Write Behavior
# Mount options affecting writes
mount -t nfs -o wsize=1048576 server:/export /mnt # 1MB writes
mount -t nfs -o async server:/export /mnt # Async (default)
mount -t nfs -o sync server:/export /mnt # Sync (each write waits)
# The 'sync' mount option is different from NFSv3 stable writes:
# - sync mount: every write() syscall waits for server ACK
# - async mount + stable=DATA_SYNC: server commits data, not metadata
# - async mount + stable=FILE_SYNC: server commits everything
Write Congestion Control:
When the server can't keep up with writes, the client has congestion control:
# View congestion thresholds
cat /proc/sys/sunrpc/tcp_slot_table_entries # Max concurrent RPCs
cat /proc/sys/fs/nfs/nfs_congestion_kb # Dirty data before throttling
# If application writes faster than server can handle:
# 1. Dirty pages accumulate
# 2. Hits nfs_congestion_kb threshold
# 3. Client throttles writes, application blocks
# This prevents unbounded memory consumption
With async writes, data is in client memory but not on server disk. A client crash loses this data. Applications requiring durability must use fsync() after critical writes, accepting the performance cost. Database applications typically use sync mounts or explicit fsync().
The size of NFS read and write operations has a dramatic impact on throughput. Per-operation overhead (RPC marshalling, network packets, disk I/O setup) is relatively constant, so larger operations are more efficient.
rsize and wsize: Read/Write Size
The rsize and wsize mount options control the maximum size of NFS READ and WRITE operations:
mount -t nfs -o rsize=1048576,wsize=1048576 server:/export /mnt
| Version | Maximum rsize/wsize |
|---|---|
| NFSv2 | 8 KB |
| NFSv3 | 1 MB+ (server negotiated) |
| NFSv4 | 1 MB+ (server negotiated) |
Why Larger is Better (Usually):
| rsize/wsize | Ops for 1GB | Overhead | Typical Throughput |
|---|---|---|---|
| 4 KB | 262,144 | Very High | 10-30 MB/s |
| 32 KB | 32,768 | High | 50-100 MB/s |
| 256 KB | 4,096 | Moderate | 150-300 MB/s |
| 1 MB | 1,024 | Low | 300-800 MB/s |
Negotiated vs. Specified:
Modern NFS clients and servers negotiate transfer sizes automatically. The client proposes its maximum, the server responds with what it supports, and both use the lesser value.
# View negotiated size
nfsstat -m
# Shows: rsize=1048576, wsize=1048576 (for example)
# If you specify smaller values, they're used
mount -t nfs -o rsize=65536 server:/export /mnt
# Result: rsize=65536 even if server supports 1MB
When Smaller Might Be Better:
Larger isn't always optimal:
1234567891011121314151617181920212223242526272829303132333435363738394041
#!/bin/bash# Benchmark different rsize/wsize values SERVER="nfsserver"EXPORT="/export/test"MOUNTPOINT="/mnt/nfs_test"TESTFILE="$MOUNTPOINT/testfile" echo "Testing NFS read/write performance at different transfer sizes"echo "=============================================================" for SIZE in 4096 16384 65536 262144 1048576; do SIZE_KB=$((SIZE / 1024)) # Unmount if mounted umount $MOUNTPOINT 2>/dev/null # Mount with specific size mount -t nfs -o rsize=$SIZE,wsize=$SIZE $SERVER:$EXPORT $MOUNTPOINT # Clear caches sync echo 3 > /proc/sys/vm/drop_caches # Write test WRITE_SPEED=$(dd if=/dev/zero of=$TESTFILE bs=1M count=256 2>&1 | grep -oP '[d.]+s*[MG]B/s' | tail -1) # Clear cache for read test sync echo 3 > /proc/sys/vm/drop_caches # Read test READ_SPEED=$(dd if=$TESTFILE of=/dev/null bs=1M 2>&1 | grep -oP '[d.]+s*[MG]B/s' | tail -1) echo "rsize/wsize=${SIZE_KB}KB: Write=$WRITE_SPEED, Read=$READ_SPEED" rm -f $TESTFILEdone # Cleanupumount $MOUNTPOINTModern NFS implementations negotiate optimal transfer sizes automatically. The default (often 1MB) works well for most workloads. Only tune rsize/wsize if benchmarking shows improvement or you have specific constraints like limited memory.
The NFS server's configuration and resources directly impact all clients' performance. Optimizing the server provides benefits that multiply across every connected client.
NFS Server Threads
The number of nfsd kernel threads determines how many operations can be processed concurrently:
# View current thread count
cat /proc/fs/nfsd/threads
# Set thread count (until reboot)
echo 64 > /proc/fs/nfsd/threads
# Permanent configuration: /etc/nfs.conf
[nfsd]
threads=64
Guidelines for thread count:
More threads allow more parallel operations, but each thread consumes kernel memory. Monitor /proc/net/rpc/nfsd to see if threads are fully utilized.
1234567891011121314151617181920212223242526272829303132333435363738
# Comprehensive NFS server monitoring # === Thread Utilization ===cat /proc/net/rpc/nfsd# Look for 'th' line: current threads, max threads, threads ever used # Threads fully utilized example:# th 8 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 100.000# The 100.000 means all threads busy 100% of samples - need more threads! # === NFS Statistics ===nfsstat -s # Server statistics# Key metrics:# - ops/sec per operation type# - null procedure calls (often health checks)# - getattr calls (high = poor client caching) # === RPC Statistics ===cat /proc/net/rpc/nfsd# rc (reply cache): hits, misses, nocache# High hits = lots of retransmits, possible network issues# # io: bytes read, bytes written# Use to track throughput over time # === Per-Operation Latency (requires nfsstat or sar) ===nfsstat -s -l # If available: per-op latency histograms # === Export-specific Statistics ===cat /proc/fs/nfsd/exports # === I/O Wait and Disk Performance ===iostat -x 1 # Watch disk utilization# High %util on NFS server disks = storage bottleneck # === Network Performance ===sar -n DEV 1 # Network utilization per interface# Check NFS server's network interface isn't saturatedServer Buffer Cache
The server's buffer cache dramatically impacts performance. Frequently-accessed files served from RAM are orders of magnitude faster than disk.
# Linux buffer cache is automatic based on available RAM
free -h
# "buff/cache" column shows caching memory
# For dedicated NFS servers, maximize memory
# 64GB+ RAM recommended for active datasets > 100GB
# Check cache effectiveness
cat /proc/meminfo | grep -E '^(Cached|Buffers|Active|Inactive)'
Export Options Affecting Performance:
| Option | Performance Impact | Recommendation |
|---|---|---|
sync | Each write waits for disk | Data safety, slower |
async | Writes buffered in RAM | Fast, but data at risk |
no_subtree_check | Skip path verification | Faster, recommended |
no_root_squash | No impact | Security consideration |
wdelay | Batch writes slightly | Default, usually good |
ZFS is an excellent backend for NFS servers: ARC caches reads efficiently, ZIL handles synchronous writes, compression reduces storage needs, and snapshots provide backups. The combination of ZFS + NFS is popular in enterprise environments.
Network configuration can make or break NFS performance, especially for high-throughput or high-latency scenarios.
TCP Tuning for NFS
NFSv4 (and NFSv3 over TCP) benefits from TCP optimization:
# Increase TCP buffer sizes for high-bandwidth networks
# These are maximum values; actual sizes are auto-tuned
# /etc/sysctl.conf or /etc/sysctl.d/nfs.conf
# TCP receive buffer
net.core.rmem_max = 16777216
net.ipv4.tcp_rmem = 4096 262144 16777216
# TCP send buffer
net.core.wmem_max = 16777216
net.ipv4.tcp_wmem = 4096 262144 16777216
# Apply: sysctl -p
For high-latency networks (WAN), larger buffers allow more data in-flight, improving throughput.
MTU and Jumbo Frames
Larger MTU (Maximum Transmission Unit) reduces packet overhead:
| MTU | Packets for 1MB | Overhead |
|---|---|---|
| 1500 (standard) | 683 | High |
| 9000 (jumbo) | 114 | Low |
# Enable jumbo frames (requires switch support)
ip link set eth0 mtu 9000
# Verify with ping
ping -M do -s 8972 nfsserver # 8972 + 28 bytes header = 9000
# Permanent: add MTU=9000 to network config
Important: All devices in the path (client NIC, switch, server NIC) must support jumbo frames. Mismatched MTU causes fragmentation or packet drops.
| Symptom | Likely Cause | Diagnosis | Solution |
|---|---|---|---|
| Slow throughput | Bandwidth saturation | sar -n DEV; check util% | Upgrade network, or compress |
| High latency | Network congestion | ping -f (flood ping) | QoS, dedicated VLAN |
| Variable performance | Packet loss | netstat -s (retransmits) | Fix network, check cables |
| Sudden drops | MTU mismatch | ping -M do -s | Enable PMTUD or reduce MTU |
| Operation timeouts | Firewall issues | tcpdump for timeouts | Check firewall rules |
12345678910111213141516171819202122232425262728
# Quick network health check for NFS # 1. Basic connectivityping -c 3 nfsserver # 2. MTU verification (test jumbo frames work)ping -c 3 -M do -s 8972 nfsserver # Will fail if jumbo not supported # 3. TCP performance test (install iperf3 on both ends)# Server: iperf3 -s# Client: iperf3 -c nfsserver# Should show line rate (e.g., 9.4 Gbps for 10GbE) # 4. NFS-specific round-trip testtime rpcinfo -p nfsserver # Should be < 1ms on LAN # 5. Monitor retransmits during NFS activitywatch -n 1 'netstat -s | grep -E "(retrans|timeout)"' # 6. Capture slow operationstcpdump -i eth0 -w nfs_trace.pcap host nfsserver and port 2049 &# Reproduce slow operation, then Ctrl+C# Analyze with Wireshark: filter "nfs" # 7. RPC layer diagnosticsrpcdebug -m nfs -s rpc # Enable RPC debugging (verbose!)dmesg | tail -50 # View debug outputrpcdebug -m nfs -c rpc # Disable when doneFor performance-critical deployments, use a dedicated network (VLAN or physical) for NFS traffic. This ensures NFS bandwidth isn't competed for by other traffic and simplifies network diagnostics. Many enterprises use 10GbE+ dedicated storage networks.
Experience with NFS reveals common patterns of performance problems. Knowing these patterns helps diagnose issues quickly.
Case Study: The Build System Disaster
A common pattern in software development environments using NFS for source code:
Symptom: Builds take 10x longer over NFS than local disk
Investigation:
nfsstat -c # On client during build
# Shows: GETATTR: 500,000/min (!)
# Shows: READ: 200/min
Diagnosis: Build system (make, ninja) stats every source file to check modification times. With thousands of source files and header dependencies, this generates massive attribute traffic.
Solutions (in order of preference):
actimeo=600noac won't help here, avoid it12345678910111213141516171819202122232425262728293031323334353637383940414243444546
#!/bin/bash# NFS performance diagnostic script echo "=== NFS Client Statistics ==="nfsstat -c echo ""echo "=== Mount Options ==="mount | grep nfs echo ""echo "=== NFS Mount Detailed Stats ==="nfsstat -m echo ""echo "=== Current NFS Activity (5 second sample) ==="echo "Before:"cat /proc/net/rpc/nfs | grep -v '^#'sleep 5echo "After:"cat /proc/net/rpc/nfs | grep -v '^#' echo ""echo "=== Network Statistics ==="netstat -s | grep -E "(retrans|timeout|reset)" echo ""echo "=== Diagnosis ==="# Check for common problems # High GETATTR rate?GETATTR=$(nfsstat -c | awk '/getattr/ {print $2}')if [ "$GETATTR" -gt 10000 ]; then echo "WARNING: High GETATTR rate ($GETATTR). Consider actimeo= tuning."fi # Check for sync mountif mount | grep nfs | grep -q 'sync'; then echo "WARNING: Sync mount detected. Consider async for performance."fi # Check rsize/wsizeRSIZE=$(nfsstat -m | grep rsize | head -1)if echo "$RSIZE" | grep -q 'rsize=[0-9]{1,4}[^0-9]'; then echo "WARNING: Small rsize detected. Consider larger transfer sizes."fiDon't guess at performance problems—measure them. Use nfsstat, tcpdump, and application profiling to identify actual bottlenecks. A perceived 'NFS is slow' problem might actually be application behavior, network issues, or server disk limits.
We've explored the many facets of NFS performance. Here's a consolidated view of what matters most and best practices to follow:
| Setting | Where | When to Change |
|---|---|---|
| rsize/wsize | Mount option | Default usually optimal; test if different helps |
| actimeo | Mount option | Increase for read-mostly; decrease for frequent changes |
| nfsd threads | Server config | Increase if threads show 100% utilization |
| tcp_rmem/wmem | Kernel sysctl | Increase for high-bandwidth, high-latency networks |
| MTU | Network config | 9000 if all gear supports jumbo frames |
| async/sync export | Server exports | async for performance; sync for data safety |
The Performance Hierarchy
When troubleshooting or optimizing NFS, consider issues in this order:
Is the application NFS-appropriate? Some applications (many small random I/Os) will never perform well on NFS.
Is the network healthy? Packet loss, high latency, or saturation cause universal slowness.
Is the server adequately resourced? CPU, RAM, disk, network—any bottleneck here affects all clients.
Are mount options appropriate? rsize/wsize, caching timeouts, hard/soft mounting.
Is the application optimized for NFS? Batch operations, use buffering, avoid unnecessary stats.
Address higher-level issues before tuning lower-level parameters. A misconfigured network won't be fixed by adjusting read-ahead settings.
Congratulations! You've completed the Network File Systems module. You now understand NFS architecture from the ground up: the stateless design philosophy, the evolution of NFS versions, and how to optimize for real-world performance. This knowledge enables you to deploy, troubleshoot, and tune NFS in any environment—from simple file sharing to enterprise-scale infrastructure.