Loading content...
When a diskless workstation powers on, a precise sequence of events unfolds in the span of milliseconds—a choreographed exchange between a client with no configuration and a server holding the answers. Understanding this operation at a packet-by-packet level is essential for troubleshooting, network design, and appreciating the elegance of early network bootstrapping.\n\nThis page dissects the complete RARP operation, from the moment the client's boot ROM begins executing to the successful configuration of its IP address. We examine timing constraints, network behavior, error scenarios, and the integration of RARP with the broader diskless boot process.
By the end of this page, you will understand the precise sequence of operations in a RARP transaction, timing considerations and retry strategies, how RARP frames traverse the network, error conditions and recovery mechanisms, server selection when multiple servers respond, and the complete integration of RARP within the diskless boot lifecycle.
A RARP transaction follows a well-defined lifecycle, with specific actions at each stage. Let's trace through every step in precise detail.\n\nPhase 1: Client Initialization (0-100ms after power-on)\n\nWhen the diskless workstation powers on:\n\n1. POST (Power-On Self-Test): Hardware diagnostics run\n2. Boot ROM activation: The network boot ROM takes control\n3. NIC initialization: The network interface card is configured\n4. MAC address retrieval: The boot ROM reads the burned-in MAC address from the NIC's EEPROM\n5. RARP frame construction: The request frame is built in memory
1234567891011121314151617181920212223242526272829
// Boot ROM RARP Client - Initialization Phasefunction initializeRARPClient(): // Step 1: Initialize the NIC hardware nic = initializeNetworkInterface() // Step 2: Read our hardware address from NIC EEPROM myMAC = nic.readHardwareAddress() // e.g., myMAC = 00:1A:2B:3C:4D:5E // Step 3: Prepare the RARP request frame structure rarpRequest = { // Ethernet Header destMAC: FF:FF:FF:FF:FF:FF, // Broadcast address srcMAC: myMAC, // Our MAC etherType: 0x8035, // RARP protocol // RARP Payload hardwareType: 0x0001, // Ethernet protocolType: 0x0800, // IPv4 hardwareLength: 6, // MAC = 6 bytes protocolLength: 4, // IPv4 = 4 bytes operation: 3, // RARP Request senderHardwareAddr: myMAC, // Our MAC senderProtocolAddr: 0.0.0.0, // Unknown targetHardwareAddr: myMAC, // Query about ourselves targetProtocolAddr: 0.0.0.0 // This is what we need! } return rarpRequestPhase 2: Request Transmission\n\nThe constructed RARP request is transmitted onto the network:\n\n1. Frame transmission: The NIC sends the Ethernet frame\n2. Broadcast propagation: All devices on the segment receive the frame\n3. Switch flooding: On switched networks, the frame floods all ports in the broadcast domain\n4. Timer activation: The client starts a response timeout timer\n\nKey timing considerations:\n\n| Event | Typical Duration | Notes |\n|-------|------------------|-------|\n| Frame transmission | 50-100 μs | Depends on frame size and line speed |\n| Switch propagation | 1-10 μs per hop | Store-and-forward adds latency |\n| Broadcast flood | 1-100 μs | Parallel on most switches |\n| Server processing | 100 μs - 10 ms | Database lookup time |\n| Reply transmission | 50-100 μs | Similar to request |\n| Total round-trip | 1-20 ms typical | Wide variation possible |
On modern switched networks, broadcast frames are handled differently than in the hub-based networks of RARP's era. While the broadcast still reaches all ports (within the VLAN), switch buffers and processing introduce latencies that didn't exist with hubs. Some switches may rate-limit broadcasts, potentially affecting RARP performance in busy environments.
Phase 3: Server Reception and Processing\n\nWhen the RARP server receives the broadcast request:\n\n1. Frame reception: NIC captures the frame (in promiscuous mode or with broadcast filter)\n2. EtherType check: Verify 0x8035 for RARP\n3. Operation check: Verify operation code = 3 (RARP Request)\n4. Database lookup: Search for Target Hardware Address in /etc/ethers\n5. Hostname resolution: If found, resolve hostname to IP via /etc/hosts\n6. Reply construction: Build RARP Reply frame\n7. Unicast transmission: Send reply directly to requester's MAC
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
// RARP Server - Request Processingfunction processRARPRequest(frame): // Step 1: Validate the frame if frame.etherType != 0x8035: return // Not a RARP frame if frame.operation != 3: return // Not a RARP Request // Step 2: Extract the target MAC (what the client is querying about) queryMAC = frame.targetHardwareAddr // e.g., queryMAC = 00:1A:2B:3C:4D:5E // Step 3: Look up in our database hostname = ethersDatabase.lookup(queryMAC) // e.g., ethersDatabase = { "00:1A:2B:3C:4D:5E": "workstation01" } if hostname == null: // Unknown MAC - let another server handle it log("Unknown MAC: " + queryMAC) return // Step 4: Resolve hostname to IP clientIP = hostsDatabase.resolve(hostname) // e.g., hostsDatabase = { "workstation01": "192.168.1.100" } if clientIP == null: log("Cannot resolve hostname: " + hostname) return // Step 5: Construct the RARP Reply rarpReply = { // Ethernet Header destMAC: queryMAC, // Unicast to client srcMAC: myMAC, // Server's MAC etherType: 0x8035, // RARP protocol // RARP Payload hardwareType: 0x0001, // Ethernet protocolType: 0x0800, // IPv4 hardwareLength: 6, protocolLength: 4, operation: 4, // RARP Reply senderHardwareAddr: myMAC, // Server's MAC senderProtocolAddr: myIP, // Server's IP targetHardwareAddr: queryMAC, // Client's MAC targetProtocolAddr: clientIP // THE ANSWER! } // Step 6: Send the reply transmit(rarpReply)Phase 4: Client Response Processing\n\nWhen the client receives the RARP reply:\n\n1. Frame reception: NIC captures the unicast frame\n2. EtherType verification: Confirm 0x8035\n3. Operation verification: Confirm operation code = 4 (RARP Reply)\n4. MAC verification: Confirm Target Hardware Address matches our MAC\n5. IP extraction: Read Target Protocol Address (our IP!)\n6. IP configuration: Configure the local IP stack with the received address\n7. Timer cancellation: Stop the retry timer\n8. Boot continuation: Proceed to next phase (typically TFTP)
Because RARP operates over unreliable Ethernet broadcasts with no acknowledgment mechanism, robust timeout and retry handling is essential. The client must balance responsiveness against network load.\n\nThe Retry Dilemma:\n\n- Too aggressive: Floods the network with repeated requests, potentially overwhelming servers\n- Too conservative: Delays boot time unnecessarily, frustrates users\n- Just right: Recovers from transient failures without excessive load\n\nRFC 903 Considerations:\n\nRFC 903 explicitly does not specify timeout values, stating only that 'the requester should be prepared to retransmit the request.' Implementations have varied widely:
| Strategy | Initial Wait | Max Retries | Total Max Wait | Pros | Cons |
|---|---|---|---|---|---|
| Fixed interval | 4 sec | 5 | 20 sec | Simple implementation | May overload slow servers |
| Linear backoff | 1,2,3,4,5 sec | 5 | 15 sec | Some congestion adaptation | May give up too soon |
| Exponential backoff | 1,2,4,8,16 sec | 5 | 31 sec | Excellent congestion handling | Slow final retries |
| Exponential w/ cap | 1,2,4,8,8 sec | 5 | 23 sec | Balance of speed and safety | Slightly complex |
| Infinite retry | 4 sec | ∞ | ∞ | Never fails if server exists | Could hang forever |
Exponential Backoff Algorithm:\n\nThe recommended approach for production environments is exponential backoff with jitter:\n\n\nbase_timeout = 1 second\nmax_timeout = 16 seconds\nmax_retries = 10\njitter_range = 0.5 seconds\n\nfor attempt in 1..max_retries:\n timeout = min(base_timeout * 2^(attempt-1), max_timeout)\n jitter = random(-jitter_range, +jitter_range)\n actual_wait = timeout + jitter\n \n send_rarp_request()\n \n if response_received_within(actual_wait):\n return SUCCESS\n \n log("Retry " + attempt + " of " + max_retries)\n\nreturn FAILURE\n\n\nThe Role of Jitter:\n\nIn environments with many diskless workstations (e.g., a lab with 30 identical machines powered on simultaneously), adding random jitter prevents synchronized retry storms:
When an entire facility loses power and then regains it, every diskless workstation boots simultaneously. Without jitter, RARP servers face a 'boot storm' with potentially thousands of simultaneous requests. This was a known operational challenge in data centers with many diskless clients. Modern protocols like DHCP incorporate jitter by default.
Timeout Selection Factors:\n\nThe optimal timeout values depend on several environmental factors:\n\n| Factor | Impact on Timeout | Recommendation |\n|--------|-------------------|----------------|\n| Network speed | Faster networks → shorter timeouts | 10 Mbps: 2-4s, 100 Mbps: 1-2s |\n| Server load | Heavily loaded servers → longer timeouts | Monitor response times |\n| Number of clients | More clients → more jitter | 0.5-1s jitter per 10 clients |\n| Boot criticality | Critical systems → more retries | Infinite retry for vital systems |\n| User tolerance | Low patience → faster initial timeout | 1s initial for interactive boot |\n\nImplementation Example (Sun Boot ROM):\n\nSun Microsystems, a major producer of diskless workstations, used this strategy:\n\n- Initial timeout: 4 seconds\n- Backoff: Double each retry (4, 8, 16, 32...)\n- Maximum timeout: 64 seconds\n- Retry limit: Implementation-specific (often 5-10)\n- On final failure: Display error and halt
Understanding how RARP frames interact with network infrastructure—switches, hubs, bridges, and routers—is essential for proper deployment and troubleshooting.\n\nRARP Request Propagation:\n\nThe RARP request uses the Ethernet broadcast address (FF:FF:FF:FF:FF:FF), triggering specific network behaviors:\n\n| Infrastructure | Behavior | Impact |\n|----------------|----------|--------|\n| Hub | Floods out all ports | All connected devices receive the request |\n| Unmanaged Switch | Floods out all ports | Same as hub for broadcasts |\n| Managed Switch | Floods within VLAN | Contained to configured VLAN |\n| Router | Drops the frame | RARP cannot cross routers |\n| Bridge | Forwards to other segment | Extends broadcast domain |
Routers operate at Layer 3 (IP). A RARP frame (Layer 2) has no IP header, so routers cannot process or forward it. This fundamental limitation means RARP servers must exist on every network segment, even if the segments are adjacent. This was a major driver for developing BOOTP, which uses UDP/IP and can be relayed across routers.
RARP Reply Delivery:\n\nUnlike the broadcast request, the RARP reply is a unicast frame:\n\n- Destination MAC: The specific MAC address of the requesting client\n- Source MAC: The RARP server's MAC address\n\nThis has implications for switched networks:\n\n1. MAC table learning: The switch learns the client's MAC from the request\n2. Unicast forwarding: The reply goes only to the port where the client is connected\n3. Reduced flooding: Reply doesn't consume bandwidth on uninvolved ports
Switch MAC Table Dynamics:\n\nThe RARP exchange affects the switch's MAC address table:\n\nBefore RARP Request:\n\nMAC Address Table:\n 00:AA:BB:CC:DD:EE (RARP Server) -> Port 5\n ... (other entries)\n 00:1A:2B:3C:4D:5E (Client) -> NOT PRESENT\n\n\nAfter RARP Request (broadcast received):\n\nMAC Address Table:\n 00:AA:BB:CC:DD:EE (RARP Server) -> Port 5\n **00:1A:2B:3C:4D:5E (Client) -> Port 1** (learned!)\n ... (other entries)\n\n\nThe switch learns the client's port from the source MAC of the request. When the reply comes from the server, the switch can deliver it directly to Port 1.\n\nSpanning Tree Considerations:\n\nIn networks with redundant paths and Spanning Tree Protocol (STP):\n\n- Initial boot delay: STP takes 30-50 seconds to converge\n- PortFast mitigation: Modern switches use PortFast for edge ports, skipping STP delay\n- RARP timeout interaction: If RARP timeouts are shorter than STP convergence, client will fail to boot on first attempt\n\nRecommendation: Configure edge switch ports with PortFast to eliminate STP delay for client devices.
In VLAN-segmented networks, ensure that diskless workstations and their RARP server are in the same VLAN. Since RARP broadcasts don't cross VLAN boundaries (they're Layer 2 constructs), mismatched VLAN assignment is a common misconfiguration that causes boot failures. Verify VLAN tagging on both client and server ports.
RARP operation can fail for numerous reasons. Understanding the error conditions and their symptoms enables effective troubleshooting.\n\nCommon Failure Scenarios:
| Symptom | Likely Cause | Diagnostic Approach | Resolution |
|---|---|---|---|
| No response at all | No server on segment | Check server availability, verify same VLAN | Deploy server to segment or fix VLAN config |
| No response at all | Firewall blocking RARP | Review firewall rules for EtherType 0x8035 | Allow RARP traffic |
| No response at all | Server not listening | Check if rarpd daemon is running | Start rarpd service |
| No response at all | Client MAC not in database | Check server logs for rejected queries | Add MAC to /etc/ethers |
| Intermittent failures | Network congestion | Monitor switch errors and broadcasts | Upgrade infrastructure, reduce broadcasts |
| Intermittent failures | Multiple replies colliding | Packet capture to verify | Reduce server count or add delays |
| Wrong IP received | Database mismatch | Verify all server databases are synchronized | Synchronize /etc/ethers across servers |
| Long boot time | Server overloaded | Check server CPU and disk I/O | Optimize server or add capacity |
Client-Side Error Handling:\n\nThe client has limited options for error handling due to boot ROM constraints:\n\n1. Retry with backoff: The primary recovery mechanism\n2. Error display: Show status on console or LED indicators\n3. Halt: Stop after exhausting retries\n4. Alternative boot: Some boot ROMs can fall back to local disk\n\nTypical Error Messages:\n\n\nSun Boot ROM Messages:\n "No carrier - transceiver cable problem?" → Physical layer issue\n "RARP timed out" → No server response\n "RARP request failed" → Multiple retries exhausted\n\n3Com Boot ROM Messages:\n "RPL: Adapter Error" → NIC problem\n "RPL: Time-out waiting for server" → No RARP response\n
RARP has no mechanism for the server to indicate 'your MAC is not in my database.' The server simply ignores unknown requests. From the client's perspective, an unknown MAC and no server on the network look identical—both result in timeout. This makes debugging more challenging compared to protocols like DHCP that can send explicit rejection messages.
Server-Side Error Handling:\n\nRARP servers should implement robust error handling:\n\nLogging:\n\nJan 15 10:30:01 server rarpd[1234]: received request from 00:1A:2B:3C:4D:5E\nJan 15 10:30:01 server rarpd[1234]: 00:1A:2B:3C:4D:5E -> workstation01 -> 192.168.1.100\nJan 15 10:30:01 server rarpd[1234]: sending reply to 00:1A:2B:3C:4D:5E\n\nJan 15 10:30:15 server rarpd[1234]: received request from 00:DE:AD:BE:EF:00\nJan 15 10:30:15 server rarpd[1234]: no entry for 00:DE:AD:BE:EF:00 in /etc/ethers\n\n\nDatabase Validation:\n- Check /etc/ethers syntax on load\n- Verify all hostnames can be resolved\n- Validate IP addresses are proper format\n- Alert on duplicate MAC entries\n- Monitor stale entries (old MACs that no longer exist)
Network-Level Debugging:\n\nPacket capture is the definitive tool for RARP troubleshooting:\n\nUsing tcpdump:\nbash\n# Capture all RARP traffic on interface eth0\ntcpdump -i eth0 ether proto 0x8035 -v\n\n# Sample output:\n10:30:01.123456 00:1A:2B:3C:4D:5E > ff:ff:ff:ff:ff:ff, \n RARP-req who-is 00:1A:2B:3C:4D:5E tell 00:1A:2B:3C:4D:5E\n10:30:01.125123 00:AA:BB:CC:DD:EE > 00:1A:2B:3C:4D:5E,\n RARP-reply 00:1A:2B:3C:4D:5E at 192.168.1.100\n\n\nUsing Wireshark display filter:\n\nrarp\n\n\nThis captures all RARP traffic and provides decoded field analysis.
For reliability, production networks typically deploy multiple RARP servers on each segment. This redundancy introduces specific behaviors and potential issues that must be understood.\n\nThe Race Condition:\n\nWhen multiple servers receive a RARP request simultaneously:\n\n1. All servers receive the broadcast at nearly the same time\n2. Each server looks up the MAC independently\n3. All servers with matching entries respond with replies\n4. Client accepts the first reply it receives\n5. Subsequent replies are discarded by the client\n\nThis is an intentional design providing natural redundancy and load balancing through racing.
Load Distribution Effects:\n\nSurprisingly, having multiple servers doesn't provide true load balancing for RARP:\n\n| Scenario | Result |\n|----------|--------|\n| Servers with identical performance | Random distribution based on timing jitter |\n| One server faster than others | Faster server handles most requests |\n| Servers with different load | Less loaded server wins more often |\n\nThis means:\n- The fastest server typically handles most traffic\n- 'Primary/backup' roles are determined by performance, not configuration\n- All servers still process every request (just don't all send replies first)\n\nDatabase Consistency Critical:\n\nAll RARP servers on a segment must have identical databases:\n\n| Consistency Issue | Symptom |\n|-------------------|---------|\n| Server A has MAC, Server B doesn't | Some boots succeed, some fail (depending on which wins race) |\n| Servers have different IPs for same MAC | Client gets different IP on each boot |\n| Hostname typo on one server | Sporadic failures for that host |
If Server A maps MAC X to IP 192.168.1.100 and Server B maps MAC X to IP 192.168.1.200, the client will receive a different address depending on which server responds first. This leads to intermittent, hard-to-diagnose issues where the workstation 'randomly' has different IPs. Always use automated synchronization for multi-server deployments.
Synchronization Strategies:\n\n1. Manual copying: Simple but error-prone, use for small deployments\n2. Shared storage (NFS): All servers mount the same /etc/ethers file\n3. Rsync cron job: Periodic synchronization from a master server\n4. Configuration management: Ansible/Puppet/Chef manages all files\n5. NIS (Network Information Service): Centralized database for ethers, hosts\n\nRecommended approach for multi-server deployments:\n\nbash\n# On primary server, rsync to all secondaries every 5 minutes\n*/5 * * * * rsync -av /etc/ethers /etc/hosts secondary1:/etc/\n*/5 * * * * rsync -av /etc/ethers /etc/hosts secondary2:/etc/\n\n# Or use NFS:\n# All servers mount: nfs-server:/shared/etc/ethers -> /etc/ethers (read-only)\n
RARP is just the first step in the diskless workstation boot process. Understanding how RARP integrates with subsequent phases provides context for its role and limitations.\n\nThe Complete Diskless Boot Sequence:
Phase 1: RARP - Address Discovery\n\nWhat RARP provides:\n- The client's IP address\n\nWhat RARP does NOT provide:\n- Subnet mask\n- Default gateway\n- Boot server address\n- Boot file name\n- Any other configuration\n\nWorkarounds vendors used:\n\n| Missing Information | Workaround |\n|---------------------|------------|\n| Boot server address | Assume RARP server is also TFTP server |\n| Boot filename | Derive from IP (e.g., C0A8016A for 192.168.1.106) |\n| Subnet mask | Hard-code or derive from IP class |\n| Default gateway | Assume not needed (local segment) |\n\nPhase 2: TFTP - Boot Image Download\n\nAfter RARP, the client uses TFTP (Trivial File Transfer Protocol) to download its boot image:\n\n\n1. Client sends TFTP Read Request to RARP server IP\n2. Filename: derived from IP address in hex (e.g., C0A80164)\n or architecture-specific (e.g., C0A80164.SUN4)\n3. TFTP server sends boot image in 512-byte blocks\n4. Client acknowledges each block\n5. Boot image loaded into memory\n\n\nThe IP-to-Filename Mapping:\n\nSun's convention (widely adopted):\n\n| Client IP | Hex IP | Filename |\n|-----------|--------|----------|\n| 192.168.1.100 | C0A80164 | C0A80164 |\n| 10.0.0.1 | 0A000001 | 0A000001 |\n| 172.16.5.200 | AC1005C8 | AC1005C8.SUN4 |\n\nThe server's TFTP directory contains:\n\n/tftpboot/\n C0A80164 -> symlink to sunos-boot-image\n C0A80165 -> symlink to sunos-boot-image\n sunos-boot-image\n
By deriving the filename from the IP address, the boot process avoided needing another protocol to discover the boot filename. The convention worked because the same administrator who adds a MAC to /etc/ethers also creates the TFTP symlink. This tight coupling was manageable in small deployments but became a scalability problem for large installations.
Phase 3: Operating System Initialization\n\nOnce the boot image is loaded and executing:\n\n1. Kernel boots: The downloaded kernel initializes\n2. Network reconfiguration: Kernel rebuilds network config (possibly via another RARP or using the boot ROM's values)\n3. NFS root mount: The kernel mounts its root filesystem via NFS:\n \n mount -t nfs server:/export/root/client1 /\n \n4. Init execution: /sbin/init runs, bringing up the system\n5. Swap configuration: Swap may be local (if any disk) or NFS-mounted\n\nComplete Timeline Example:\n\n| Time | Event | Protocol |\n|------|-------|----------|\n| 0.0s | Power on | - |\n| 0.5s | Boot ROM starts | - |\n| 0.6s | RARP request sent | RARP |\n| 0.7s | RARP reply received | RARP |\n| 0.8s | TFTP request sent | TFTP |\n| 5.0s | Boot image downloaded (4MB) | TFTP |\n| 5.5s | Kernel executing | - |\n| 8.0s | NFS root mounted | NFS |\n| 15.0s | Login prompt | - |
We have dissected the complete operation of RARP at both the packet level and the system level. Let's consolidate the key operational concepts:
What's Next:\n\nIn the next page, we will explore BOOTP (Bootstrap Protocol), which extended RARP's concept to address its fundamental limitations. You will learn how BOOTP moved from Layer 2 to Layer 3, enabled complete boot configuration in a single exchange, and introduced the relay agent concept that allowed centralized servers to serve clients across routed networks.
You now understand RARP's operational mechanics in complete detail. You can trace through a RARP transaction packet by packet, identify and troubleshoot common failure scenarios, and explain how RARP integrates with the complete diskless boot process. This operational knowledge prepares you to appreciate how BOOTP improved upon RARP's foundation.