Loading content...
The Network File System's elegant transparency conceals a sophisticated multi-layered architecture. When a user opens a file, they experience a seamless local operation, but beneath the surface, an intricate dance of protocols, daemons, and kernel modules orchestrates communication between client and server.
Understanding NFS architecture is crucial for anyone deploying, troubleshooting, or optimizing NFS environments. It reveals why NFS behaves the way it does—its performance characteristics, failure modes, and tuning options all follow directly from architectural decisions.
This page dissects the NFS architecture from top to bottom, examining each layer and how they interact to create the illusion of local file access across the network.
By the end of this page, you will understand the complete NFS protocol stack, the role of each daemon and kernel component, how the mount protocol establishes connections, and how the client and server interact at each architectural layer. This knowledge is essential for debugging NFS issues and optimizing performance.
NFS doesn't exist in isolation—it's built upon a stack of protocols, each providing services to the layer above. Understanding this stack is fundamental to understanding NFS behavior.
The Complete Protocol Stack
From application to network, the layers are:
Layer-by-Layer Analysis
1. Application Layer
Applications use standard POSIX system calls (open, read, write, stat, etc.). They have no awareness of whether files are local or remote—the same code works for both.
2. Virtual File System (VFS) Layer The VFS provides a uniform interface abstracting different file system implementations. When an application performs a file operation, VFS:
3. NFS Client Layer The NFS client translates VFS operations into NFS remote procedure calls. It:
4. RPC Layer Sun RPC (ONC RPC) provides the remote procedure call abstraction:
5. XDR Layer External Data Representation (XDR) serializes data types into a canonical network format:
6. Transport Layer UDP or TCP carries RPC messages:
Each layer provides a clean abstraction that simplifies the layer above. VFS means NFS doesn't worry about POSIX details. RPC means NFS doesn't handle networking. XDR means NFS doesn't worry about byte ordering. This separation allows each layer to be developed, tested, and optimized independently.
The NFS server comprises several components working together to handle client requests. Understanding this architecture is essential for server configuration and performance tuning.
Core Server Daemons
rpcbind (portmapper)
The rpcbind daemon is the RPC service registry. It runs on the well-known port 111 and maps RPC program numbers to actual port numbers.
When an NFS service starts, it:
When a client wants to contact NFS, it:
# Show registered RPC services
$ rpcinfo -p localhost
program vers proto port service
100000 4 tcp 111 portmapper
100000 4 udp 111 portmapper
100003 3 tcp 2049 nfs
100003 3 udp 2049 nfs
100003 4 tcp 2049 nfs
100005 3 tcp 20048 mountd
100005 3 udp 20048 mountd
100021 4 tcp 46245 nlockmgr
100024 1 tcp 56952 status
mountd (Mount Daemon)
The mount daemon handles the mount protocol—the handshake that occurs before NFS access begins. When a client wants to mount an export:
/etc/exports to verify access permissionmountd responsibilities:
/etc/exports configurationshowmount command12345678910111213141516171819202122232425
Mount Protocol Sequence: Client Server | | |------ PORTMAP: Lookup mountd ---------->| (port 111) |<----- mountd is on port 20048 ----------| | | |------ MNT: Mount /export/home --------->| (port 20048) | (includes client hostname/IP) | | | Server checks /etc/exports | | Server logs mount event |<----- MNT Reply: file_handle + status --| | (root file handle for export) | | | |------ PORTMAP: Lookup nfs ------------->| (port 111) |<----- nfs is on port 2049 --------------| | | |========= NFS Operations Begin ==========>| (port 2049) | (using root file handle) | | | Unmount Protocol: |------ UMNT: Unmount /export/home ------>| |<----- UMNT Reply: OK -------------------| | (server logs unmount) |nfsd (NFS Daemon)
The nfsd is the heart of the NFS server. In modern Linux implementations, nfsd is actually implemented in kernel space for performance, with a user-space component that starts and configures the kernel threads.
Kernel-mode nfsd (knfsd):
Most production NFS servers use kernel-mode NFS (knfsd) for performance reasons:
# Start 8 NFS server threads
$ sudo systemctl start nfs-server
# Or manually specify thread count
$ sudo rpc.nfsd 8
# View nfsd threads
$ ps aux | grep nfsd
root 1234 0.0 0.0 0 0 ? S 10:00 [nfsd]
root 1235 0.0 0.0 0 0 ? S 10:00 [nfsd]
... (8 threads)
The number of nfsd threads affects concurrency—more threads handle more simultaneous requests but consume more resources.
NFS traditionally uses dynamically assigned ports for mountd, lockd, and statd, making firewall configuration challenging. Modern NFS installations typically configure fixed ports for these services. NFSv4 simplifies this by multiplexing all operations through port 2049.
The NFS client is equally sophisticated, responsible for translating local file operations into network requests and managing the complexity of remote access transparently.
Client Components in Detail
VFS Integration
The NFS client registers with the VFS layer as a file system type. When a mount command specifies NFS:
mount -t nfs server:/export/home /home
VFS invokes the NFS module's mount function, which:
File Handle Management
When an application opens a file, the client must translate the path to a file handle:
The file handle cache is critical for performance—without it, accessing /home/alice/projects/code/main.c would require 5 LOOKUP RPCs every time.
12345678910111213141516171819202122232425262728293031323334353637383940
/* Simplified path resolution with caching */ struct nfs_fh* nfs_lookup_path(const char* path) { struct nfs_fh* current_fh = get_mount_root_fh(); char* component; char* path_copy = strdup(path); /* Split path and resolve component by component */ for (component = strtok(path_copy, "/"); component != NULL; component = strtok(NULL, "/")) { /* Check directory cache first */ struct nfs_fh* cached = dir_cache_lookup(current_fh, component); if (cached != NULL) { current_fh = cached; continue; /* Cache hit - no RPC needed */ } /* Cache miss - must contact server */ struct nfs_lookup_result result; int status = nfs_rpc_lookup(current_fh, component, &result); if (status != NFS_OK) { free(path_copy); return NULL; } /* Cache the result for future lookups */ dir_cache_insert(current_fh, component, &result.file_handle); /* Also cache attributes to avoid GETATTR calls */ attr_cache_insert(&result.file_handle, &result.attributes); current_fh = copy_fh(&result.file_handle); } free(path_copy); return current_fh;}The Caching Subsystem
Aggressive caching is essential for NFS performance. The client maintains several caches:
1. Attribute Cache
File attributes (size, mtime, permissions) are cached to avoid GETATTR calls. This is controlled by the ac mount option and acmin/acmax timeouts.
acregmin/acregmax — Attribute cache timeout for regular files (default: 3-60 seconds)acdirmin/acdirmax — Attribute cache timeout for directories (default: 30-60 seconds)noac — Disable attribute caching entirely (poor performance but strict consistency)2. Directory Cache (DNLC/dcache) Maps (parent_handle, name) → child_handle. Eliminates repeated LOOKUP calls for frequently accessed paths.
3. Data Cache (Page Cache) File contents are cached in the kernel's page cache, just like local files. This enables:
4. Reply Cache (Client) Retained replies for duplicate detection. If retransmitting a request, the client can avoid re-sending if it already has the answer.
Client-side caching creates a fundamental trade-off: better performance versus data consistency. A cached value might be stale if another client modified the file. We'll explore NFS's weak consistency semantics and cache invalidation strategies in the Performance Considerations page.
Let's examine the actual protocol messages exchanged between client and server. Understanding these details is invaluable for debugging NFS issues with tools like Wireshark or tcpdump.
RPC Message Structure
Every NFS operation is wrapped in an RPC message. The structure includes:
RPC Call (Request):
┌─────────────────────────────────────────────────────┐
│ Fragment Header (4 bytes) - TCP only │
├─────────────────────────────────────────────────────┤
│ XID (4 bytes) - Transaction ID for matching replies │
├─────────────────────────────────────────────────────┤
│ Message Type (4 bytes) - 0 = Call │
├─────────────────────────────────────────────────────┤
│ RPC Version (4 bytes) - Always 2 │
├─────────────────────────────────────────────────────┤
│ Program (4 bytes) - 100003 for NFS │
├─────────────────────────────────────────────────────┤
│ Version (4 bytes) - 2, 3, or 4 │
├─────────────────────────────────────────────────────┤
│ Procedure (4 bytes) - Operation number │
├─────────────────────────────────────────────────────┤
│ Credentials (variable) - Auth info │
├─────────────────────────────────────────────────────┤
│ Verifier (variable) - Auth verification │
├─────────────────────────────────────────────────────┤
│ Procedure Arguments (variable) - XDR encoded │
└─────────────────────────────────────────────────────┘
12345678910111213141516171819202122232425262728293031323334353637383940414243
NFSv3 READ Operation Analysis === READ Request ===RPC Header: XID: 0x12345678 (transaction identifier) MsgType: CALL (0) RPCVers: 2 Program: 100003 (NFS) ProgVers: 3 (NFSv3) Procedure: 6 (READ) Credentials (AUTH_SYS): Stamp: 0x00000001 Machine: "client.example.com" UID: 1000 (alice) GID: 1000 (alice) GIDs: [1000, 100, 27] (supplementary groups) READ Arguments: File Handle: 0x0000000a 00000001 40000000 00000010 00000000 00000002 00000000 00000000 (32 bytes, opaque to client) Offset: 0 (start of file) Count: 32768 (32KB requested) === READ Response ===RPC Header: XID: 0x12345678 (matches request) MsgType: REPLY (1) Status: MSG_ACCEPTED READ Result: Status: NFS3_OK (0) File Attributes (post-op): Type: Regular File Mode: 0644 Size: 45231 bytes Mtime: 2024-01-15 10:30:00 ... Count: 32768 (bytes read) EOF: FALSE (more data available) Data: [32768 bytes of file content]| Proc # | Name | Request Args | Response |
|---|---|---|---|
| 0 | NULL | None | None (ping/test) |
| 1 | GETATTR | file_handle | attributes |
| 2 | SETATTR | file_handle, new_attrs, guard | wcc_data |
| 3 | LOOKUP | dir_handle, name | file_handle, attributes |
| 6 | READ | file_handle, offset, count | attributes, count, eof, data |
| 7 | WRITE | file_handle, offset, count, stable, data | wcc_data, count, committed, verf |
| 8 | CREATE | dir_handle, name, how, attrs | file_handle, attributes, wcc_data |
| 12 | REMOVE | dir_handle, name | wcc_data |
| 14 | RENAME | from_dir, from_name, to_dir, to_name | wcc_data (both dirs) |
| 16 | READDIR | dir_handle, cookie, cookieverf, count | entries, eof |
| 17 | READDIRPLUS | dir_handle, cookie, verifier, dircount, maxcount | entries with attributes |
Weak Cache Consistency (WCC)
NFSv3 introduced Weak Cache Consistency (WCC) data to help clients maintain cache coherence. For operations that modify files, the response includes:
The client can compare pre-op attributes to its cached version:
This isn't perfect consistency (changes between operations aren't detected), but it catches common cases without requiring server-side state.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
/* NFSv3 Weak Cache Consistency Data */ /* Pre-operation attributes - lightweight, just for comparison */struct wcc_attr { uint64_t size; /* File size before operation */ nfstime3 mtime; /* Modification time before */ nfstime3 ctime; /* Change time before */}; /* Optional pre-op attributes (may not be available) */union pre_op_attr switch (bool attributes_follow) { case TRUE: wcc_attr attributes; case FALSE: void;}; /* Full post-operation attributes */union post_op_attr switch (bool attributes_follow) { case TRUE: fattr3 attributes; /* Complete attribute set */ case FALSE: void;}; /* WCC data returned with modifying operations */struct wcc_data { pre_op_attr before; /* Attributes before change */ post_op_attr after; /* Attributes after change */}; /* Example: Client cache coherence check */void update_cache_from_wcc(struct inode* inode, struct wcc_data* wcc) { if (!wcc->before.attributes_follow) return; /* No pre-op data, can't validate */ /* Compare cached attributes with pre-op */ if (inode->cached_size != wcc->before.attributes.size || !times_equal(&inode->cached_mtime, &wcc->before.attributes.mtime)) { /* Cache is stale - someone else modified the file */ invalidate_data_cache(inode); invalidate_attr_cache(inode); } /* Update cache with new attributes */ if (wcc->after.attributes_follow) { update_cached_attrs(inode, &wcc->after.attributes); }}File handles are the cornerstone of NFS's stateless design. Let's examine their architecture in depth—how they're created, validated, and used to enable seamless file access without server-side session state.
File Handle Generation
When a client performs a LOOKUP, CREATE, or other operation that returns a file handle, the server constructs a handle that encodes enough information to identify the file in future requests.
The handle typically includes:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
/* Linux NFS file handle structure (simplified from knfsd) */ struct knfsd_fh { uint8_t fh_size; /* Actual handle size (varies) */ uint8_t fh_auth_type; /* Authentication/export type */ uint8_t fh_fsid_type; /* Type of fsid encoding */ uint8_t fh_fileid_type; /* Type of fileid encoding */ /* Variable-size payload follows, contains: */ union { struct { /* For most local file systems */ uint32_t fsid_major; /* File system major number */ uint32_t fsid_minor; /* File system minor number */ uint64_t inode; /* Inode number */ uint32_t generation; /* Inode generation count */ uint32_t parent_inode; /* Parent inode (for verification) */ uint32_t parent_gen; /* Parent generation */ } local; struct { /* For UUID-identified file systems */ uuid_t fsid_uuid; /* File system UUID */ uint64_t inode; uint32_t generation; } uuid; } fh_payload;}; /* Maximum file handle sizes */#define NFS2_FHSIZE 32 /* NFSv2: exactly 32 bytes */#define NFS3_FHSIZE 64 /* NFSv3: up to 64 bytes */#define NFS4_FHSIZE 128 /* NFSv4: up to 128 bytes */ /* Handle validation on server */int nfsd_validate_fh(struct knfsd_fh* fh, struct inode** result) { struct super_block* sb; struct dentry* dentry; struct inode* inode; /* Step 1: Find the file system */ sb = nfsd_find_filesystem(fh->fh_payload.local.fsid_major, fh->fh_payload.local.fsid_minor); if (!sb) return nfserr_stale; /* Step 2: Look up the inode */ inode = nfsd_iget(sb, fh->fh_payload.local.inode); if (!inode || IS_ERR(inode)) return nfserr_stale; /* Step 3: Validate generation number */ if (inode->i_generation != fh->fh_payload.local.generation) { iput(inode); return nfserr_stale; /* Inode reused - stale handle! */ } /* Step 4: Verify file is within exported subtree */ if (!nfsd_check_export_path(fh, inode)) { iput(inode); return nfserr_stale; } *result = inode; return 0;}Stale File Handle Detection
The generation number mechanism prevents a dangerous scenario:
Without generation numbers, step 4-6 would access the wrong file silently—a catastrophic data corruption scenario.
File handles must remain valid across server reboots. This means the handle cannot contain volatile information (like memory addresses). The handle's components (fsid, inode, generation) must be reconstructable from persistent on-disk data. Some file systems (like XFS with changing UUIDs) can cause issues with handle persistence.
Security Concerns with File Handles
File handles in NFSv2/v3 present security challenges:
NFSv4 addresses some of these concerns with volatile file handles, stronger authentication, and pseudofs for export isolation.
Security in NFS has evolved significantly from its original trusted-network design to modern mechanisms supporting secure deployment across untrusted networks.
| Mechanism | Security Level | Description | Use Case |
|---|---|---|---|
| AUTH_NULL | None | No authentication | Testing only |
| AUTH_SYS (AUTH_UNIX) | Weak | Client-asserted UID/GID | Trusted LANs |
| AUTH_DH (Diffie-Hellman) | Medium | Cryptographic (rarely used) | Legacy secure NFS |
| RPCSEC_GSS / Kerberos | Strong | Full authentication, optional encryption | Enterprise/untrusted networks |
AUTH_SYS: The Default (and Weak) Scenario
The default NFS authentication is AUTH_SYS (historically called AUTH_UNIX). With this mechanism:
The security implications are severe:
This is acceptable only in controlled, trusted network environments.
123456789101112131415161718192021222324
/* AUTH_SYS (AUTH_UNIX) credentials */struct authsys_parms { uint32_t aup_time; /* Timestamp (not verified) */ char *aup_machname; /* Hostname of client */ uint32_t aup_uid; /* User ID (trusted blindly!) */ uint32_t aup_gid; /* Primary group ID */ uint32_t aup_len; /* Number of supplementary groups */ uint32_t aup_gids[16]; /* Supplementary group IDs (max 16) */}; /* The server receives this and simply believes it *//* Example: impersonating root is trivial */ /* On malicious client: */struct authsys_parms fake_creds = { .aup_time = time(NULL), .aup_machname = "legit-client.example.com", .aup_uid = 0, /* Root! */ .aup_gid = 0, /* Root group! */ .aup_len = 0,}; /* Server cannot distinguish this from legitimate root request * unless root_squash is enabled on the export */RPCSEC_GSS with Kerberos
For secure NFS deployment, RPCSEC_GSS with Kerberos provides:
Authentication Levels:
sec=krb5 — Authentication only. Client identity is cryptographically verified, but data is unencrypted.
sec=krb5i — Authentication + Integrity. Data is signed with cryptographic checksums to detect tampering.
sec=krb5p — Authentication + Integrity + Privacy. Full encryption of all data traffic.
How Kerberos Integration Works:
# Kerberos-secured NFS mount
mount -t nfs4 -o sec=krb5p server.example.com:/export /mnt
# Client needs valid Kerberos credentials
klist # Show current tickets
kinit alice@EXAMPLE.COM # Obtain ticket if needed
While Kerberos provides strong security, it requires significant infrastructure: a KDC, synchronized clocks, DNS entries, and key management. For many internal deployments, the complexity is justified. For simpler scenarios, network segmentation and firewall rules with AUTH_SYS may be acceptable.
Linux provides one of the most widely-used NFS implementations. Understanding its specific architecture helps with deployment and troubleshooting.
Kernel Module Structure
Linux NFS is implemented primarily in kernel modules:
$ lsmod | grep nfs
nfs [NFS client module]
nfsv3 [NFSv3 support]
nfsv4 [NFSv4 support]
nfsd [NFS server module]
auth_rpcgss [GSS authentication]
sunrpc [RPC infrastructure]
Server Implementation (knfsd):
The Linux NFS server uses a kernel-space implementation for performance:
nfsd module handles NFS protocolexportfs module manages exportslockd module handles file locking[nfsd]) handle concurrent requestsConfiguration flows from user-space tools (exportfs, rpc.nfsd) to kernel via /proc/fs/nfsd/.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
#!/bin/bash# Comprehensive NFS server setup and monitoring # === Server Configuration === # Configure /etc/exportscat > /etc/exports << 'EOF'# Home directories - authenticated access/export/home 192.168.1.0/24(rw,sync,no_subtree_check,root_squash) # Shared project data/export/projects *(rw,sync,no_subtree_check,all_squash,anonuid=1000,anongid=1000) # Read-only repository/export/repo *(ro,sync,no_subtree_check)EOF # Apply export configurationexportfs -ra # View current exportsexportfs -v # Start NFS servicessystemctl start rpcbindsystemctl start nfs-server # Configure number of server threads (/etc/nfs.conf)# [nfsd]# threads=16 # === Monitoring and Statistics === # View NFS statisticsnfsstat -s # Server statisticsnfsstat -c # Client statistics # Detailed RPC infocat /proc/net/rpc/nfsd # Current NFS sessions (NFSv4)cat /proc/fs/nfsd/clients/*/info # === Kernel Parameters === # View NFS-related kernel paramssysctl -a | grep nfs # Important parameters:# sunrpc.tcp_slot_table_entries - Max concurrent RPC requests# fs.nfs.nfs_congestion_kb - Congestion control writeback threshold# fs.nfs.nfs_acl_strict - Strict ACL enforcement # === Log Analysis === # Enable NFS debug loggingrpcdebug -m nfsd -s all # Server debugrpcdebug -m nfs -s all # Client debug # Debug messages go to kernel logdmesg | grep -i nfs # Journal logsjournalctl -u nfs-server -fThe /proc/fs/nfsd Interface
Linux exposes NFS server configuration and status via the /proc filesystem:
| File | Purpose |
|---|---|
/proc/fs/nfsd/threads | Number of nfsd threads (read/write) |
/proc/fs/nfsd/exports | Currently active exports |
/proc/fs/nfsd/pool_stats | Statistics per CPU pool |
/proc/fs/nfsd/clients/ | NFSv4 client sessions |
/proc/fs/nfsd/versions | Enabled NFS versions |
/proc/fs/nfsd/max_block_size | Maximum read/write block size |
Linux can simultaneously support NFSv3 and NFSv4. By default, clients negotiate the highest version supported by both sides. You can force specific versions with mount options (vers=3, vers=4, vers=4.1, vers=4.2) for testing or compatibility.
We've thoroughly examined the multi-layered architecture that makes NFS work. This understanding is foundational for deploying, troubleshooting, and optimizing NFS systems. Let's consolidate the key points:
What's Next
With the architecture understood, we'll explore the Stateless Protocol Design in depth. We'll examine why statelessness was chosen, how it enables crash recovery, the challenges it creates for certain operations (especially locking), and how NFS accommodates operations that inherently require state. This understanding is crucial for predicting NFS behavior in failure scenarios.
You now understand the complete NFS architecture—from the VFS layer through RPC to the wire protocol. You can trace the path of a file operation from application system call to disk blocks on a remote server. This architectural foundation enables effective deployment, debugging, and optimization of NFS systems.