Loading learning content...
When a UDP datagram travels across the Internet, it faces an invisible threat that has nothing to do with network congestion, packet loss, or malicious attacks. The threat is silent corruption—bits that flip due to electromagnetic interference, faulty hardware, cosmic rays striking memory cells, or software bugs in routers. A single corrupted bit can change a financial transaction amount, misdirect critical medical data, or crash an application expecting valid input.
The UDP header contains a checksum field designed to detect such corruption. But here's the puzzle: the UDP header is remarkably small—only 8 bytes. It contains source port, destination port, length, and checksum. Notably absent from this header are the source and destination IP addresses.
Yet if the IP addresses were corrupted during transmission—causing the datagram to arrive at the wrong host entirely—wouldn't we want to detect that as corruption? Absolutely. This is where the concept of the pseudo-header emerges as an elegant solution to a fundamental design constraint.
By the end of this page, you will understand what the pseudo-header is, why it was designed this way, exactly what fields it contains for both IPv4 and IPv6, how it integrates with the checksum calculation, and the subtle engineering tradeoffs that led to this design. You'll appreciate why this seemingly abstract construct is essential to UDP's reliability guarantees.
To understand the pseudo-header, we must first understand the layered architecture problem it addresses. In the OSI and TCP/IP models, each layer is supposed to operate independently—encapsulating data from the layer above without concerning itself with lower-layer details. This principle of layer independence enables modular protocol design.
The UDP header's intentional minimalism:
UDP was designed to be the simplest possible transport protocol—a thin wrapper around application data that adds just enough functionality to enable multiplexing (via port numbers) and optional error detection (via checksum). The designers deliberately kept IP addresses out of the UDP header because:
If the UDP checksum only covered the UDP header and payload, corruption in the IP header's address fields would go undetected by UDP. A datagram could arrive at Host B when it was intended for Host C—and if the UDP checksum passed (because UDP data wasn't corrupted), the receiving application would accept garbage data or data meant for someone else.
The security and correctness implications:
Consider a scenario where a banking application sends UDP datagrams containing transaction records:
Now imagine a single bit flip in the IP header changes the destination address from 192.168.1.200 to 192.168.1.201 (an unauthorized server). If UDP's checksum only covered the UDP portion:
This is precisely the attack vector—or accidental failure mode—that the pseudo-header prevents.
The pseudo-header is a conceptual data structure that exists only for the purpose of checksum calculation—it is never transmitted on the network. It's a clever mechanism that allows the transport layer to incorporate network-layer information into its integrity check without violating layer boundaries in the actual packet format.
The key insight:
Both the sender and receiver can construct the pseudo-header from information available at their respective ends:
By agreeing on a pseudo-header format and including it in checksum calculation, both sides can verify that the critical routing information hasn't been corrupted—without actually transmitting this redundant data.
The term 'pseudo' (from Greek 'pseudēs' meaning false) indicates this header is not real in the sense of being transmitted. It's a computational artifact—assembled temporarily, fed into the checksum algorithm, and then discarded. Think of it as a 'virtual header' that exists only in the mathematical calculation, like a variable that's computed but never stored.
The pseudo-header concept applies to both UDP and TCP:
Both transport protocols use this technique because both face the same architectural challenge. The pseudo-header format differs slightly between protocols (the 'Protocol' field value differs), but the principle is identical. This unified approach means that the integrity guarantees are consistent across connection-oriented and connectionless transport.
Conceptual flow of checksum calculation:
┌─────────────────────────────────────────────────────────────────┐
│ CHECKSUM CALCULATION INPUT │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Pseudo-Header │ + │ UDP Header │ + │ UDP Payload │ │
│ │ (never sent) │ │ (8 bytes) │ │ (variable) │ │
│ └──────────────────┘ └──────────────┘ └────────────────┘ │
│ │
│ ↓ │
│ ┌─────────────────────┐ │
│ │ Checksum Algorithm │ │
│ │ (1's complement) │ │
│ └─────────────────────┘ │
│ ↓ │
│ ┌─────────────────────┐ │
│ │ 16-bit Checksum │ │
│ │ (stored in header) │ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
When UDP operates over IPv4, the pseudo-header contains exactly 12 bytes of information extracted from or derived from the IP header. Understanding each field and its purpose is essential for implementing checksum calculation correctly.
IPv4 Pseudo-Header Layout:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source IP Address | Bytes 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination IP Address | Bytes 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Zero | Protocol | UDP Length | Bytes 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field | Size | Value/Source | Purpose |
|---|---|---|---|
| Source IP Address | 32 bits (4 bytes) | Copied from IPv4 header | Ensures sender identity is verified; prevents source spoofing from causing undetected delivery |
| Destination IP Address | 32 bits (4 bytes) | Copied from IPv4 header | Ensures datagram reached intended recipient; prevents misdelivery due to address corruption |
| Zero | 8 bits (1 byte) | Always 0x00 | Padding for alignment; reserved for future use; ensures consistent pseudo-header size |
| Protocol | 8 bits (1 byte) | 17 (0x11) for UDP | Confirms correct transport protocol handling; prevents TCP data being processed by UDP handler |
| UDP Length | 16 bits (2 bytes) | Copied from UDP header | Validates payload boundary; prevents buffer over-read or truncation attacks |
The protocol field (17 for UDP, 6 for TCP) ensures that if a bit flip corrupts the IP header's protocol field, causing the wrong transport handler to receive the data, the checksum will fail. Without this, UDP data could be misinterpreted as TCP data or vice versa—potentially causing protocol state corruption or security vulnerabilities.
Byte-by-byte construction example:
Let's construct a pseudo-header for a UDP datagram with these parameters:
Byte Position: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
Hex Values: C0 A8 01 64 0A 00 00 32 00 11 00 1C
└── Source IP ──┘ └── Dest IP ───┘ │ │ └ Length ┘
192.168.1.100 10.0.0.50 Zero Protocol 28
(UDP=17)
This 12-byte pseudo-header is prepended (conceptually) to the UDP header and payload before checksum calculation. The result is that any corruption to these critical fields—source IP, destination IP, protocol type, or UDP length—will cause the checksum to fail at the receiver.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
import structimport socket def construct_ipv4_pseudo_header(src_ip: str, dst_ip: str, udp_length: int) -> bytes: """ Constructs the IPv4 pseudo-header for UDP checksum calculation. Args: src_ip: Source IPv4 address as dotted-decimal string dst_ip: Destination IPv4 address as dotted-decimal string udp_length: Total UDP segment length (header + payload) Returns: 12-byte pseudo-header as bytes object """ # Convert IP addresses from string to packed binary format src_ip_bytes = socket.inet_aton(src_ip) # 4 bytes dst_ip_bytes = socket.inet_aton(dst_ip) # 4 bytes # Protocol number for UDP is 17 (0x11) UDP_PROTOCOL = 17 # Zero padding (reserved byte) zero_padding = 0 # Pack the pseudo-header: # '!' = network byte order (big-endian) # '4s' = 4-byte string (source IP) # '4s' = 4-byte string (destination IP) # 'B' = unsigned char (zero padding) # 'B' = unsigned char (protocol) # 'H' = unsigned short (UDP length) pseudo_header = struct.pack( '!4s4sBBH', src_ip_bytes, dst_ip_bytes, zero_padding, UDP_PROTOCOL, udp_length ) return pseudo_header # Example usageif __name__ == "__main__": pseudo = construct_ipv4_pseudo_header( src_ip="192.168.1.100", dst_ip="10.0.0.50", udp_length=28 ) print(f"Pseudo-header length: {len(pseudo)} bytes") print(f"Pseudo-header (hex): {pseudo.hex()}") # Output: c0a80164 0a000032 00 11 001c # (src IP) (dst IP) 0 UDP lenIPv6 introduced a significantly larger address space (128-bit addresses versus IPv4's 32-bit), which necessitated redesigning the pseudo-header. The IPv6 pseudo-header is 40 bytes—more than three times larger than the IPv4 version—primarily due to the expanded address fields.
IPv6 Pseudo-Header Layout:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ Source IPv6 Address +
| |
+ +
| | Bytes 0-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ Destination IPv6 Address +
| |
+ +
| | Bytes 16-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Upper-Layer Packet Length | Bytes 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Zero (24 bits) | Next Header | Bytes 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field | Size | Value/Source | Purpose |
|---|---|---|---|
| Source IPv6 Address | 128 bits (16 bytes) | From IPv6 header | Extended address verification; essential given IPv6's larger address space |
| Destination IPv6 Address | 128 bits (16 bytes) | From IPv6 header | Ensures delivery to intended recipient across global IPv6 network |
| Upper-Layer Packet Length | 32 bits (4 bytes) | UDP segment length | Supports jumbograms (packets > 64KB); expanded from IPv4's 16-bit field |
| Zero | 24 bits (3 bytes) | Always 0x000000 | Padding and reserved space; ensures 32-bit alignment |
| Next Header | 8 bits (1 byte) | 17 (0x11) for UDP | Equivalent to IPv4's Protocol field; identifies upper-layer protocol |
Beyond the larger addresses, note that IPv6 uses 'Upper-Layer Packet Length' (32 bits) instead of IPv4's 16-bit UDP Length. This enables UDP to support IPv6 jumbograms—packets larger than 65,535 bytes. Also, 'Next Header' replaces 'Protocol' to align with IPv6 extension header terminology.
Practical implications of the larger pseudo-header:
The IPv6 pseudo-header's 40-byte size means more data enters the checksum calculation. This has several implications:
IPv6 pseudo-header construction example:
For a UDP datagram with:
Bytes 0-15: 20 01 0d b8 00 00 00 00 00 00 00 00 00 00 00 01 (Source)
Bytes 16-31: 20 01 0d b8 00 00 00 00 00 00 00 00 00 00 00 02 (Destination)
Bytes 32-35: 00 00 00 1C (Length = 28)
Bytes 36-39: 00 00 00 11 (Zero + Next Header = 17)
Total: 40 bytes of pseudo-header feeding into the checksum algorithm.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
import structimport socket def construct_ipv6_pseudo_header(src_ip: str, dst_ip: str, udp_length: int) -> bytes: """ Constructs the IPv6 pseudo-header for UDP checksum calculation. Args: src_ip: Source IPv6 address as string (e.g., '2001:db8::1') dst_ip: Destination IPv6 address as string udp_length: Total UDP segment length (header + payload) Returns: 40-byte pseudo-header as bytes object """ # Convert IPv6 addresses to packed 16-byte format src_ip_bytes = socket.inet_pton(socket.AF_INET6, src_ip) # 16 bytes dst_ip_bytes = socket.inet_pton(socket.AF_INET6, dst_ip) # 16 bytes # Next Header value for UDP is 17 (0x11) UDP_NEXT_HEADER = 17 # Pack the pseudo-header: # '!' = network byte order (big-endian) # '16s' = 16-byte string (IPv6 source address) # '16s' = 16-byte string (IPv6 destination address) # 'I' = unsigned int (32-bit upper-layer packet length) # '3x' = 3 bytes of zero padding # 'B' = unsigned char (next header) pseudo_header = struct.pack( '!16s16sI3xB', src_ip_bytes, dst_ip_bytes, udp_length, UDP_NEXT_HEADER ) return pseudo_header # Example usageif __name__ == "__main__": pseudo = construct_ipv6_pseudo_header( src_ip="2001:db8::1", dst_ip="2001:db8::2", udp_length=28 ) print(f"Pseudo-header length: {len(pseudo)} bytes") print(f"Pseudo-header (hex): {pseudo.hex()}") # Source: 2001 0db8 0000 0000 0000 0000 0000 0001 # Destination: 2001 0db8 0000 0000 0000 0000 0000 0002 # Length: 0000001c (28) # Zero+Next: 000011The pseudo-header represents a careful balance between several competing engineering concerns. Understanding these trade-offs reveals the thoughtful design behind what might seem like a simple mechanism.
Trade-off 1: Layer purity vs. end-to-end integrity
Strict layer separation would dictate that UDP knows nothing about IP addresses—that's the network layer's concern. But the pseudo-header deliberately 'leaks' network layer information upward to provide stronger end-to-end guarantees.
The designers decided that integrity trumps architectural purity. A packet that arrives at the wrong destination due to address corruption is worse than a slightly impure layer design.
Trade-off 2: Bandwidth efficiency vs. redundancy
An alternative design could embed IP addresses directly in the UDP header. This would:
The pseudo-header avoids this bandwidth waste while maintaining equivalent protection—a elegant solution that uses computation instead of transmission.
When a NAT device modifies the source IP address in a packet's IP header, it MUST also recalculate the UDP checksum. Since the pseudo-header includes the source IP, changing it without updating the checksum would cause the receiver to reject the packet. This adds processing overhead to NAT devices and is one reason some implementations set the UDP checksum to zero when transmitting over trusted networks.
Historical context and RFC evolution:
The pseudo-header concept was established in the original UDP specification (RFC 768, 1980) and has remained stable for over four decades. When IPv6 was developed, the concept was preserved and extended in RFC 2460 (1998) and updated specifications. This longevity speaks to the fundamental correctness of the design.
Why not just trust IP header checksum?
IPv4 has its own header checksum, so one might ask: if IP verifies its header integrity, why does UDP need to include IP fields in its checksum?
Implementing pseudo-header handling correctly is essential for any UDP stack. Here we examine the practical considerations and common pitfalls.
Order of operations:
The checksum calculation must follow a precise sequence:
At the receiver:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
import structimport socket def ones_complement_sum(data: bytes) -> int: """ Calculate the one's complement sum of 16-bit words. Handles odd-length data by padding with zero. """ if len(data) % 2 == 1: data += b'\x00' # Pad with zero byte if odd length total = 0 for i in range(0, len(data), 2): word = (data[i] << 8) + data[i + 1] total += word # Fold 32-bit overflow back into 16 bits total = (total & 0xFFFF) + (total >> 16) return total def calculate_udp_checksum( src_ip: str, dst_ip: str, src_port: int, dst_port: int, payload: bytes) -> int: """ Calculate the complete UDP checksum including pseudo-header. Returns: 16-bit checksum value, or 0xFFFF if computed checksum is 0 """ udp_length = 8 + len(payload) # 8-byte header + payload # Step 1: Construct pseudo-header (IPv4) pseudo_header = struct.pack( '!4s4sBBH', socket.inet_aton(src_ip), socket.inet_aton(dst_ip), 0, # Zero padding 17, # Protocol (UDP) udp_length ) # Step 2: Construct UDP header with checksum = 0 udp_header = struct.pack( '!HHHH', src_port, dst_port, udp_length, 0 # Checksum placeholder ) # Step 3: Combine for checksum calculation complete_data = pseudo_header + udp_header + payload # Step 4: Calculate one's complement sum checksum = ones_complement_sum(complete_data) # Step 5: Take one's complement of the sum checksum = ~checksum & 0xFFFF # Step 6: Per RFC 768, if checksum is 0, use 0xFFFF # (because 0 means "no checksum computed") if checksum == 0: checksum = 0xFFFF return checksum # Examplechecksum = calculate_udp_checksum( src_ip="192.168.1.100", dst_ip="10.0.0.50", src_port=12345, dst_port=53, payload=b"Hello, UDP!")print(f"Computed checksum: 0x{checksum:04X}")When calculating the checksum, the UDP header's checksum field MUST be set to zero, not its eventual value. This is a common implementation bug. The checksum calculation produces the value that goes in this field—using any other value during calculation will produce incorrect results.
Understanding the pseudo-header is essential when analyzing network traffic or debugging protocol issues. Tools like Wireshark automatically handle pseudo-header construction, but knowing what's happening enables deeper analysis.
What Wireshark shows:
When you capture a UDP packet in Wireshark and examine the checksum, you'll see:
User Datagram Protocol, Src Port: 12345, Dst Port: 53
Source Port: 12345
Destination Port: 53
Length: 19
Checksum: 0x7a3b [correct]
[Checksum Status: Good]
[Calculated Checksum: 0x7a3b]
Wireshark reconstructs the pseudo-header from the IP header, combines it with the UDP segment, and verifies the checksum. If there's a mismatch, you'll see:
Checksum: 0x7a3b [incorrect, should be 0x8c4f]
[Checksum Status: Bad]
If you see checksum failures in captures: (1) Check if checksum offloading is enabled—modern NICs calculate checksums in hardware, so captures before transmission may show placeholder values. (2) Verify NAT isn't modifying addresses without recalculating checksums. (3) Check for MTU issues causing fragmentation where checksums can't be validated until reassembly.
Checksum offloading and capture artifacts:
Modern network interface cards (NICs) support checksum offloading—the NIC calculates checksums in hardware just before transmission. This creates a confusing situation when capturing packets:
To see accurate checksums, capture on the receiving host or on an intermediate device (switch mirror port, network tap).
Manual verification example:
If you need to verify a UDP checksum manually:
We've explored the UDP pseudo-header in depth—from its conceptual origins to practical implementation. Let's consolidate the key insights:
What's next:
Now that we understand the pseudo-header and why it exists, we'll examine the complete checksum calculation process. The next page covers the mathematical algorithm—the one's complement arithmetic, handling of odd-length data, and the specific steps to produce a valid UDP checksum.
You now understand the UDP pseudo-header: its purpose, structure for both IPv4 and IPv6, design rationale, and implementation requirements. This foundation is essential for the checksum calculation we'll explore next.