Loading content...
When a diskless workstation powers on, a precise sequence of events unfolds in the span of milliseconds—a choreographed exchange between a client with no configuration and a server holding the answers. Understanding this operation at a packet-by-packet level is essential for troubleshooting, network design, and appreciating the elegance of early network bootstrapping.
This page dissects the complete RARP operation, from the moment the client's boot ROM begins executing to the successful configuration of its IP address. We examine timing constraints, network behavior, error scenarios, and the integration of RARP with the broader diskless boot process.
By the end of this page, you will understand the precise sequence of operations in a RARP transaction, timing considerations and retry strategies, how RARP frames traverse the network, error conditions and recovery mechanisms, server selection when multiple servers respond, and the complete integration of RARP within the diskless boot lifecycle.
A RARP transaction follows a well-defined lifecycle, with specific actions at each stage. Let's trace through every step in precise detail.
Phase 1: Client Initialization (0-100ms after power-on)
When the diskless workstation powers on:
1234567891011121314151617181920212223242526272829
// Boot ROM RARP Client - Initialization Phasefunction initializeRARPClient(): // Step 1: Initialize the NIC hardware nic = initializeNetworkInterface() // Step 2: Read our hardware address from NIC EEPROM myMAC = nic.readHardwareAddress() // e.g., myMAC = 00:1A:2B:3C:4D:5E // Step 3: Prepare the RARP request frame structure rarpRequest = { // Ethernet Header destMAC: FF:FF:FF:FF:FF:FF, // Broadcast address srcMAC: myMAC, // Our MAC etherType: 0x8035, // RARP protocol // RARP Payload hardwareType: 0x0001, // Ethernet protocolType: 0x0800, // IPv4 hardwareLength: 6, // MAC = 6 bytes protocolLength: 4, // IPv4 = 4 bytes operation: 3, // RARP Request senderHardwareAddr: myMAC, // Our MAC senderProtocolAddr: 0.0.0.0, // Unknown targetHardwareAddr: myMAC, // Query about ourselves targetProtocolAddr: 0.0.0.0 // This is what we need! } return rarpRequestPhase 2: Request Transmission
The constructed RARP request is transmitted onto the network:
Key timing considerations:
| Event | Typical Duration | Notes |
|---|---|---|
| Frame transmission | 50-100 μs | Depends on frame size and line speed |
| Switch propagation | 1-10 μs per hop | Store-and-forward adds latency |
| Broadcast flood | 1-100 μs | Parallel on most switches |
| Server processing | 100 μs - 10 ms | Database lookup time |
| Reply transmission | 50-100 μs | Similar to request |
| Total round-trip | 1-20 ms typical | Wide variation possible |
On modern switched networks, broadcast frames are handled differently than in the hub-based networks of RARP's era. While the broadcast still reaches all ports (within the VLAN), switch buffers and processing introduce latencies that didn't exist with hubs. Some switches may rate-limit broadcasts, potentially affecting RARP performance in busy environments.
Phase 3: Server Reception and Processing
When the RARP server receives the broadcast request:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
// RARP Server - Request Processingfunction processRARPRequest(frame): // Step 1: Validate the frame if frame.etherType != 0x8035: return // Not a RARP frame if frame.operation != 3: return // Not a RARP Request // Step 2: Extract the target MAC (what the client is querying about) queryMAC = frame.targetHardwareAddr // e.g., queryMAC = 00:1A:2B:3C:4D:5E // Step 3: Look up in our database hostname = ethersDatabase.lookup(queryMAC) // e.g., ethersDatabase = { "00:1A:2B:3C:4D:5E": "workstation01" } if hostname == null: // Unknown MAC - let another server handle it log("Unknown MAC: " + queryMAC) return // Step 4: Resolve hostname to IP clientIP = hostsDatabase.resolve(hostname) // e.g., hostsDatabase = { "workstation01": "192.168.1.100" } if clientIP == null: log("Cannot resolve hostname: " + hostname) return // Step 5: Construct the RARP Reply rarpReply = { // Ethernet Header destMAC: queryMAC, // Unicast to client srcMAC: myMAC, // Server's MAC etherType: 0x8035, // RARP protocol // RARP Payload hardwareType: 0x0001, // Ethernet protocolType: 0x0800, // IPv4 hardwareLength: 6, protocolLength: 4, operation: 4, // RARP Reply senderHardwareAddr: myMAC, // Server's MAC senderProtocolAddr: myIP, // Server's IP targetHardwareAddr: queryMAC, // Client's MAC targetProtocolAddr: clientIP // THE ANSWER! } // Step 6: Send the reply transmit(rarpReply)Phase 4: Client Response Processing
When the client receives the RARP reply:
Because RARP operates over unreliable Ethernet broadcasts with no acknowledgment mechanism, robust timeout and retry handling is essential. The client must balance responsiveness against network load.
The Retry Dilemma:
RFC 903 Considerations:
RFC 903 explicitly does not specify timeout values, stating only that 'the requester should be prepared to retransmit the request.' Implementations have varied widely:
| Strategy | Initial Wait | Max Retries | Total Max Wait | Pros | Cons |
|---|---|---|---|---|---|
| Fixed interval | 4 sec | 5 | 20 sec | Simple implementation | May overload slow servers |
| Linear backoff | 1,2,3,4,5 sec | 5 | 15 sec | Some congestion adaptation | May give up too soon |
| Exponential backoff | 1,2,4,8,16 sec | 5 | 31 sec | Excellent congestion handling | Slow final retries |
| Exponential w/ cap | 1,2,4,8,8 sec | 5 | 23 sec | Balance of speed and safety | Slightly complex |
| Infinite retry | 4 sec | ∞ | ∞ | Never fails if server exists | Could hang forever |
Exponential Backoff Algorithm:
The recommended approach for production environments is exponential backoff with jitter:
base_timeout = 1 second
max_timeout = 16 seconds
max_retries = 10
jitter_range = 0.5 seconds
for attempt in 1..max_retries:
timeout = min(base_timeout * 2^(attempt-1), max_timeout)
jitter = random(-jitter_range, +jitter_range)
actual_wait = timeout + jitter
send_rarp_request()
if response_received_within(actual_wait):
return SUCCESS
log("Retry " + attempt + " of " + max_retries)
return FAILURE
The Role of Jitter:
In environments with many diskless workstations (e.g., a lab with 30 identical machines powered on simultaneously), adding random jitter prevents synchronized retry storms:
When an entire facility loses power and then regains it, every diskless workstation boots simultaneously. Without jitter, RARP servers face a 'boot storm' with potentially thousands of simultaneous requests. This was a known operational challenge in data centers with many diskless clients. Modern protocols like DHCP incorporate jitter by default.
Timeout Selection Factors:
The optimal timeout values depend on several environmental factors:
| Factor | Impact on Timeout | Recommendation |
|---|---|---|
| Network speed | Faster networks → shorter timeouts | 10 Mbps: 2-4s, 100 Mbps: 1-2s |
| Server load | Heavily loaded servers → longer timeouts | Monitor response times |
| Number of clients | More clients → more jitter | 0.5-1s jitter per 10 clients |
| Boot criticality | Critical systems → more retries | Infinite retry for vital systems |
| User tolerance | Low patience → faster initial timeout | 1s initial for interactive boot |
Implementation Example (Sun Boot ROM):
Sun Microsystems, a major producer of diskless workstations, used this strategy:
Understanding how RARP frames interact with network infrastructure—switches, hubs, bridges, and routers—is essential for proper deployment and troubleshooting.
RARP Request Propagation:
The RARP request uses the Ethernet broadcast address (FF:FF:FF:FF:FF:FF), triggering specific network behaviors:
| Infrastructure | Behavior | Impact |
|---|---|---|
| Hub | Floods out all ports | All connected devices receive the request |
| Unmanaged Switch | Floods out all ports | Same as hub for broadcasts |
| Managed Switch | Floods within VLAN | Contained to configured VLAN |
| Router | Drops the frame | RARP cannot cross routers |
| Bridge | Forwards to other segment | Extends broadcast domain |
Routers operate at Layer 3 (IP). A RARP frame (Layer 2) has no IP header, so routers cannot process or forward it. This fundamental limitation means RARP servers must exist on every network segment, even if the segments are adjacent. This was a major driver for developing BOOTP, which uses UDP/IP and can be relayed across routers.
RARP Reply Delivery:
Unlike the broadcast request, the RARP reply is a unicast frame:
This has implications for switched networks:
Switch MAC Table Dynamics:
The RARP exchange affects the switch's MAC address table:
Before RARP Request:
MAC Address Table:
00:AA:BB:CC:DD:EE (RARP Server) -> Port 5
... (other entries)
00:1A:2B:3C:4D:5E (Client) -> NOT PRESENT
After RARP Request (broadcast received):
MAC Address Table:
00:AA:BB:CC:DD:EE (RARP Server) -> Port 5
**00:1A:2B:3C:4D:5E (Client) -> Port 1** (learned!)
... (other entries)
The switch learns the client's port from the source MAC of the request. When the reply comes from the server, the switch can deliver it directly to Port 1.
Spanning Tree Considerations:
In networks with redundant paths and Spanning Tree Protocol (STP):
Recommendation: Configure edge switch ports with PortFast to eliminate STP delay for client devices.
In VLAN-segmented networks, ensure that diskless workstations and their RARP server are in the same VLAN. Since RARP broadcasts don't cross VLAN boundaries (they're Layer 2 constructs), mismatched VLAN assignment is a common misconfiguration that causes boot failures. Verify VLAN tagging on both client and server ports.
RARP operation can fail for numerous reasons. Understanding the error conditions and their symptoms enables effective troubleshooting.
Common Failure Scenarios:
| Symptom | Likely Cause | Diagnostic Approach | Resolution |
|---|---|---|---|
| No response at all | No server on segment | Check server availability, verify same VLAN | Deploy server to segment or fix VLAN config |
| No response at all | Firewall blocking RARP | Review firewall rules for EtherType 0x8035 | Allow RARP traffic |
| No response at all | Server not listening | Check if rarpd daemon is running | Start rarpd service |
| No response at all | Client MAC not in database | Check server logs for rejected queries | Add MAC to /etc/ethers |
| Intermittent failures | Network congestion | Monitor switch errors and broadcasts | Upgrade infrastructure, reduce broadcasts |
| Intermittent failures | Multiple replies colliding | Packet capture to verify | Reduce server count or add delays |
| Wrong IP received | Database mismatch | Verify all server databases are synchronized | Synchronize /etc/ethers across servers |
| Long boot time | Server overloaded | Check server CPU and disk I/O | Optimize server or add capacity |
Client-Side Error Handling:
The client has limited options for error handling due to boot ROM constraints:
Typical Error Messages:
Sun Boot ROM Messages:
"No carrier - transceiver cable problem?" → Physical layer issue
"RARP timed out" → No server response
"RARP request failed" → Multiple retries exhausted
3Com Boot ROM Messages:
"RPL: Adapter Error" → NIC problem
"RPL: Time-out waiting for server" → No RARP response
RARP has no mechanism for the server to indicate 'your MAC is not in my database.' The server simply ignores unknown requests. From the client's perspective, an unknown MAC and no server on the network look identical—both result in timeout. This makes debugging more challenging compared to protocols like DHCP that can send explicit rejection messages.
Server-Side Error Handling:
RARP servers should implement robust error handling:
Logging:
Jan 15 10:30:01 server rarpd[1234]: received request from 00:1A:2B:3C:4D:5E
Jan 15 10:30:01 server rarpd[1234]: 00:1A:2B:3C:4D:5E -> workstation01 -> 192.168.1.100
Jan 15 10:30:01 server rarpd[1234]: sending reply to 00:1A:2B:3C:4D:5E
Jan 15 10:30:15 server rarpd[1234]: received request from 00:DE:AD:BE:EF:00
Jan 15 10:30:15 server rarpd[1234]: no entry for 00:DE:AD:BE:EF:00 in /etc/ethers
Database Validation:
Network-Level Debugging:
Packet capture is the definitive tool for RARP troubleshooting:
Using tcpdump:
# Capture all RARP traffic on interface eth0
tcpdump -i eth0 ether proto 0x8035 -v
# Sample output:
10:30:01.123456 00:1A:2B:3C:4D:5E > ff:ff:ff:ff:ff:ff,
RARP-req who-is 00:1A:2B:3C:4D:5E tell 00:1A:2B:3C:4D:5E
10:30:01.125123 00:AA:BB:CC:DD:EE > 00:1A:2B:3C:4D:5E,
RARP-reply 00:1A:2B:3C:4D:5E at 192.168.1.100
Using Wireshark display filter:
rarp
This captures all RARP traffic and provides decoded field analysis.
For reliability, production networks typically deploy multiple RARP servers on each segment. This redundancy introduces specific behaviors and potential issues that must be understood.
The Race Condition:
When multiple servers receive a RARP request simultaneously:
This is an intentional design providing natural redundancy and load balancing through racing.
Load Distribution Effects:
Surprisingly, having multiple servers doesn't provide true load balancing for RARP:
| Scenario | Result |
|---|---|
| Servers with identical performance | Random distribution based on timing jitter |
| One server faster than others | Faster server handles most requests |
| Servers with different load | Less loaded server wins more often |
This means:
Database Consistency Critical:
All RARP servers on a segment must have identical databases:
| Consistency Issue | Symptom |
|---|---|
| Server A has MAC, Server B doesn't | Some boots succeed, some fail (depending on which wins race) |
| Servers have different IPs for same MAC | Client gets different IP on each boot |
| Hostname typo on one server | Sporadic failures for that host |
If Server A maps MAC X to IP 192.168.1.100 and Server B maps MAC X to IP 192.168.1.200, the client will receive a different address depending on which server responds first. This leads to intermittent, hard-to-diagnose issues where the workstation 'randomly' has different IPs. Always use automated synchronization for multi-server deployments.
Synchronization Strategies:
Recommended approach for multi-server deployments:
# On primary server, rsync to all secondaries every 5 minutes
*/5 * * * * rsync -av /etc/ethers /etc/hosts secondary1:/etc/
*/5 * * * * rsync -av /etc/ethers /etc/hosts secondary2:/etc/
# Or use NFS:
# All servers mount: nfs-server:/shared/etc/ethers -> /etc/ethers (read-only)
RARP is just the first step in the diskless workstation boot process. Understanding how RARP integrates with subsequent phases provides context for its role and limitations.
The Complete Diskless Boot Sequence:
Phase 1: RARP - Address Discovery
What RARP provides:
What RARP does NOT provide:
Workarounds vendors used:
| Missing Information | Workaround |
|---|---|
| Boot server address | Assume RARP server is also TFTP server |
| Boot filename | Derive from IP (e.g., C0A8016A for 192.168.1.106) |
| Subnet mask | Hard-code or derive from IP class |
| Default gateway | Assume not needed (local segment) |
Phase 2: TFTP - Boot Image Download
After RARP, the client uses TFTP (Trivial File Transfer Protocol) to download its boot image:
1. Client sends TFTP Read Request to RARP server IP
2. Filename: derived from IP address in hex (e.g., C0A80164)
or architecture-specific (e.g., C0A80164.SUN4)
3. TFTP server sends boot image in 512-byte blocks
4. Client acknowledges each block
5. Boot image loaded into memory
The IP-to-Filename Mapping:
Sun's convention (widely adopted):
| Client IP | Hex IP | Filename |
|---|---|---|
| 192.168.1.100 | C0A80164 | C0A80164 |
| 10.0.0.1 | 0A000001 | 0A000001 |
| 172.16.5.200 | AC1005C8 | AC1005C8.SUN4 |
The server's TFTP directory contains:
/tftpboot/
C0A80164 -> symlink to sunos-boot-image
C0A80165 -> symlink to sunos-boot-image
sunos-boot-image
By deriving the filename from the IP address, the boot process avoided needing another protocol to discover the boot filename. The convention worked because the same administrator who adds a MAC to /etc/ethers also creates the TFTP symlink. This tight coupling was manageable in small deployments but became a scalability problem for large installations.
Phase 3: Operating System Initialization
Once the boot image is loaded and executing:
mount -t nfs server:/export/root/client1 /
Complete Timeline Example:
| Time | Event | Protocol |
|---|---|---|
| 0.0s | Power on | |
| 0.5s | Boot ROM starts | |
| 0.6s | RARP request sent | RARP |
| 0.7s | RARP reply received | RARP |
| 0.8s | TFTP request sent | TFTP |
| 5.0s | Boot image downloaded (4MB) | TFTP |
| 5.5s | Kernel executing | |
| 8.0s | NFS root mounted | NFS |
| 15.0s | Login prompt |
We have dissected the complete operation of RARP at both the packet level and the system level. Let's consolidate the key operational concepts:
What's Next:
In the next page, we will explore BOOTP (Bootstrap Protocol), which extended RARP's concept to address its fundamental limitations. You will learn how BOOTP moved from Layer 2 to Layer 3, enabled complete boot configuration in a single exchange, and introduced the relay agent concept that allowed centralized servers to serve clients across routed networks.
You now understand RARP's operational mechanics in complete detail. You can trace through a RARP transaction packet by packet, identify and troubleshoot common failure scenarios, and explain how RARP integrates with the complete diskless boot process. This operational knowledge prepares you to appreciate how BOOTP improved upon RARP's foundation.