Loading learning content...
Every ESP-protected packet carries within its structure the machinery for confidentiality, integrity, and authentication. Understanding this structure is not merely academic—it's essential for troubleshooting VPN issues, analyzing packet captures, optimizing performance, and implementing security systems correctly.
The ESP packet format is elegantly designed, balancing security requirements against overhead constraints. Unlike protocols where headers simply identify fields, ESP's format defines critical boundaries: where encryption starts and ends, what gets authenticated, and how padding ensures both security and block cipher alignment.
In this page, we'll dissect the ESP packet byte by byte, understanding the purpose of every field, the constraints that shaped its design, and the practical implications for network operations.
By the end of this page, you will understand the complete ESP packet structure including the ESP header fields (SPI, Sequence Number), the encrypted payload body, ESP trailer components (Padding, Pad Length, Next Header), and the Integrity Check Value (ICV). You'll learn exactly what gets encrypted versus authenticated, how padding works, and how to interpret ESP packets in packet captures.
An ESP packet consists of four main components, arranged in a specific order that enables both encryption and authentication operations:
1. ESP Header (8 bytes, unencrypted)
2. Payload Data (variable, encrypted)
3. ESP Trailer (variable, encrypted)
4. Integrity Check Value (ICV) (variable, unencrypted)
The Critical Boundaries:
Understanding what gets encrypted versus what gets authenticated is crucial:
Encrypted Portion:
Authenticated Portion:
This means an attacker can see the ESP header (knowing which Security Association is being used and the sequence number) but cannot read or modify the payload without detection.
The ESP header (SPI and Sequence Number) remains unencrypted because the receiver needs this information to locate the correct Security Association before decryption can begin. Without knowing which SA applies, the receiver wouldn't know which decryption key to use. The SPI acts as an index into the SA database, making decryption possible.
The ESP header contains just two fields—8 bytes total—yet these carry critical information for processing the packet.
Security Parameters Index (SPI) — 32 bits
The SPI is a 32-bit identifier that, combined with the destination IP address and protocol (ESP = 50), uniquely identifies the Security Association under which this packet should be processed.
Key Characteristics:
SPI Selection:
| Field | Size | Offset | Purpose | Value Range |
|---|---|---|---|---|
| SPI | 32 bits (4 bytes) | 0 | Identifies Security Association | 256 - 2³²-1 (0x100 - 0xFFFFFFFF) |
| Sequence Number | 32 bits (4 bytes) | 4 | Anti-replay protection | 1 - 2³²-1 (wraps require SA rekey) |
Sequence Number — 32 bits
The Sequence Number is a 32-bit unsigned integer that provides anti-replay protection. It starts at 1 when an SA is established and increments by 1 for each packet sent.
Critical Behavior:
Extended Sequence Numbers (ESN):
For high-volume connections, 2³² packets (~4.3 billion) might be exhausted too quickly. RFC 4303 defines Extended Sequence Numbers:
At 10 Gbps with minimum-sized packets, an SA could exhaust 2³² sequence numbers in under an hour. High-throughput deployments MUST use Extended Sequence Numbers (ESN) or configure aggressive SA lifetimes to trigger rekeying before exhaustion. Sequence number wrap without rekey creates a critical vulnerability—the anti-replay mechanism fails completely.
12345678910111213141516
ESP Header Format (8 bytes / 64 bits)═══════════════════════════════════════════════════════════════════ 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤│ Security Parameters Index (SPI) │├───────────────────────────────────────────────────────────────┤│ Sequence Number │├───────────────────────────────────────────────────────────────┤│ │~ Payload Data (variable) ~│ [Encrypted] │├───────────────────────────────────────────────────────────────┤ Note: What follows is encrypted - cannot be seen in packet capturewithout decryption keys.Many encryption algorithms require an Initialization Vector (IV) to ensure that identical plaintexts produce different ciphertexts. The IV's placement and handling in ESP depends on the encryption algorithm used.
Purpose of the IV:
Without an IV, encrypting the same plaintext with the same key always produces the same ciphertext. This creates a vulnerability—an attacker observing traffic could detect repeated messages and potentially deduce patterns or content. The IV introduces randomness into each encryption operation, ensuring unique ciphertexts even for identical plaintexts.
IV Characteristics:
| Algorithm | IV Size | IV Requirements | IV Source |
|---|---|---|---|
| AES-CBC | 16 bytes (128 bits) | Unpredictable, random | CSPRNG per-packet |
| 3DES-CBC | 8 bytes (64 bits) | Unpredictable, random | CSPRNG per-packet |
| AES-CTR | 8 bytes (64 bits) | Unique (never reused with same key) | Counter or random |
| AES-GCM | 8 bytes (64 bits)* | Unique (never reused with same key) | Counter, typically 64-bit |
| ChaCha20-Poly1305 | 8 bytes | Unique (never reused with same key) | Counter-based |
*Note: AES-GCM technically uses a 12-byte nonce, but ESP implementations typically derive it from an 8-byte explicit IV concatenated with a 4-byte salt from the SA.
IV Placement in ESP:
The IV conceptually falls within the encrypted payload area, but since it's needed to begin decryption, it must be accessible before decryption. In practice:
Explicit vs Implicit IV:
Some algorithms allow implicit IV derivation from the sequence number and SA parameters, reducing per-packet overhead. For example:
For counter-mode algorithms (AES-GCM, AES-CTR, ChaCha20), reusing an IV with the same key is a devastating failure—it can reveal the XOR of two plaintexts and potentially the authentication key. Implementations must guarantee IV uniqueness, typically by deriving IV from the sequence number. This is why sequence number wrap requires SA termination.
The Payload Data field contains the actual protected content—what we're ultimately trying to secure. Its contents differ based on whether ESP operates in transport mode or tunnel mode.
Transport Mode Payload:
In transport mode, ESP protects only the upper-layer protocol (TCP, UDP, ICMP, etc.) without encapsulating the original IP header:
[Original IP Header][ESP Header][IV][TCP/UDP/etc. Header + Data][ESP Trailer][ICV]
\________ Encrypted _________/
Tunnel Mode Payload:
In tunnel mode, ESP encapsulates the entire original IP packet within a new IP packet:
[New IP Header][ESP Header][IV][Original IP Header + Complete Packet][ESP Trailer][ICV]
\______________ Encrypted ______________/
Payload Size Considerations:
The payload field has no fixed size limit imposed by ESP itself, but practical constraints exist:
MTU Impact Calculation:
For a typical IPSec tunnel with AES-GCM:
Original MTU: 1500 bytes
- Outer IP header: 20 bytes
- ESP header: 8 bytes
- IV: 8 bytes
- ESP trailer (padding + pad length + next header): 2-17 bytes
- ICV: 16 bytes
─────────────────
Maximum payload: 1431-1446 bytes
(Inner IP header + original data)
This MTU reduction causes fragmentation if not accounted for—a common source of VPN performance issues.
VPN tunnels frequently suffer from MTU-related black holes when Path MTU Discovery (PMTUD) fails. Best practices include: (1) Configure tunnel endpoints with reduced MTU values, (2) Enable tunnel MSS clamping for TCP traffic, (3) Consider DF-bit clearing with careful fragmentation handling. Unexplained 'some sites work, some don't' symptoms often trace to MTU issues.
The ESP trailer follows the payload data and contains three components critical for proper packet processing: Padding, Pad Length, and Next Header. Unlike the ESP header, the trailer is encrypted—visible only after decryption.
Padding (0-255 bytes):
Padding serves multiple purposes in ESP:
Block Cipher Alignment: Many encryption algorithms (AES-CBC, 3DES-CBC) operate on fixed-size blocks (16 bytes for AES). The plaintext must be padded to a multiple of the block size.
Traffic Flow Confidentiality (TFC): Padding can obscure the actual message length, preventing traffic analysis based on packet sizes.
Alignment Requirements: Some hardware implementations require 4-byte or 8-byte alignment of specific fields.
Padding Content:
RFC 4303 specifies that padding bytes should contain sequential values (1, 2, 3, ... n). This pattern serves as a verification mechanism—the receiver can check that padding follows the expected pattern after decryption.
| Field | Size | Value/Content | Purpose |
|---|---|---|---|
| Padding | 0-255 bytes | 0x01 0x02 0x03 ... 0xNN (sequential) | Block alignment + traffic flow confidentiality |
| Pad Length | 1 byte | 0-255 (number of padding bytes) | Enables receiver to remove padding |
| Next Header | 1 byte | IP protocol number | Identifies encapsulated protocol |
Pad Length (1 byte):
The Pad Length field indicates how many bytes of padding precede it. After decryption, the receiver:
Next Header (1 byte):
The Next Header field identifies what type of data is contained in the payload:
This field is essential because, after decryption, the receiver must know how to process the decrypted data. It occupies the same semantic position as the IP Protocol field, enabling seamless reintegration of decrypted packets into the IP stack.
The Next Header field is encrypted as part of the trailer. This is intentional—it prevents an attacker from knowing what protocol is being carried (TCP? UDP? ICMP?) without decryption. This provides modest traffic analysis protection, hiding the nature of encapsulated traffic from observers.
12345678910111213141516
ESP Trailer Structure (example with 5 bytes padding)═══════════════════════════════════════════════════════════════════ Data Field (before trailer):│ ...encrypted payload data... │ ESP Trailer:├───────┬───────┬───────┬───────┬───────┬───────────┬───────────┤│ 0x01 │ 0x02 │ 0x03 │ 0x04 │ 0x05 │ Pad Length│Next Header││(pad) │(pad) │(pad) │(pad) │(pad) │ = 5 │ = 4 │├───────┴───────┴───────┴───────┴───────┴───────────┴───────────┤ 5 bytes of padding │ 1 byte │ 1 byte │ Next Header = 4 means IPv4 (tunnel mode, encapsulating IPv4 packet)Next Header = 6 would mean TCP (transport mode, encapsulating TCP)Next Header = 17 would mean UDP (transport mode, encapsulating UDP)The Integrity Check Value (ICV) is the cryptographic tag that provides authentication and integrity verification. It's computed over the ESP header, payload, and trailer, and appended to the end of the packet (after the ESP trailer, unencrypted).
ICV Computation:
The ICV is calculated using a cryptographic authentication algorithm, typically either:
Coverage:
The ICV covers (authenticates):
The ICV does NOT cover (and cannot cover):
| Algorithm | Full Output | ICV Size (Truncated) | Security Level |
|---|---|---|---|
| HMAC-SHA-256-128 | 256 bits | 16 bytes (128 bits) | Strong |
| HMAC-SHA-384-192 | 384 bits | 24 bytes (192 bits) | Very Strong |
| HMAC-SHA-512-256 | 512 bits | 32 bytes (256 bits) | Highest |
| AES-GCM (AEAD) | 128 bits | 16 bytes (128 bits) | Strong + Encryption |
| AES-GMAC (Auth only) | 128 bits | 16 bytes (128 bits) | Strong (no encryption) |
| HMAC-SHA-1-96 (legacy) | 160 bits | 12 bytes (96 bits) | Deprecated — avoid |
ICV Verification Process:
When a receiver processes an ESP packet:
The 'silent discard' for failed verification is deliberate—sending an error would confirm receipt and could be exploited for oracle attacks.
AEAD Algorithms:
Modern deployments increasingly use Authenticated Encryption with Associated Data (AEAD) algorithms like AES-GCM. These combine encryption and authentication in a single cryptographic operation:
While ESP technically permits null authentication (no ICV), this should never be used in practice. Without integrity checking, an attacker can modify encrypted packets—even without knowing the key—to cause unpredictable decryption results. This can enable bit-flipping attacks, especially against CBC-mode encryption. Always configure both encryption AND authentication.
Let's walk through a complete ESP packet to see how all components fit together. Consider an ESP packet in tunnel mode using AES-GCM-256, protecting an ICMP echo request.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
Complete ESP Packet in Tunnel Mode (AES-256-GCM)═══════════════════════════════════════════════════════════════════════════════ OUTER IP HEADER (20 bytes) - NOT authenticated, NOT encrypted├─────────────────────────────────────────────────────────────────────────────┤│ Version: 4 │ IHL: 5 │ TOS: 0x00 │ Total Length: 136 ││ Identification: 0x1234 │ Flags: 0 │ Fragment Offset: 0 ││ TTL: 64 │ Protocol: 50 (ESP) │ Header Checksum: 0xABCD ││ Source IP: 10.1.1.1 (VPN Gateway A) ││ Destination IP: 10.2.2.1 (VPN Gateway B) │├─────────────────────────────────────────────────────────────────────────────┤ ESP HEADER (8 bytes) - Authenticated, NOT encrypted├─────────────────────────────────────────────────────────────────────────────┤│ SPI: 0xCAFEBABE (3405691582) ││ Sequence Number: 0x00000001 (1) │├─────────────────────────────────────────────────────────────────────────────┤ INITIALIZATION VECTOR (8 bytes) - Authenticated, NOT "encrypted" but public├─────────────────────────────────────────────────────────────────────────────┤│ IV: 0x0001020304050607 (derived from counter) │├─────────────────────────────────────────────────────────────────────────────┤ ENCRYPTED PAYLOAD - Authenticated AND encrypted┌─────────────────────────────────────────────────────────────────────────────┐│ [This is what's encrypted - cannot be seen in capture without key] ││ ││ INNER IP HEADER (20 bytes): ││ Version: 4 │ IHL: 5 │ TOS: 0x00 │ Total Length: 84 ││ Source IP: 192.168.1.100 (actual source) ││ Destination IP: 192.168.2.200 (actual destination) ││ Protocol: 1 (ICMP) ││ ││ ICMP DATA (64 bytes): ││ Type: 8 (Echo Request) │ Code: 0 ││ Identifier: 0x1234 │ Sequence: 1 ││ Payload: "Hello from ping..." ││ ││ ESP TRAILER (12 bytes): ││ Padding: 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0A (10 bytes) ││ Pad Length: 0x0A (10) ││ Next Header: 0x04 (4 = IPv4) │└─────────────────────────────────────────────────────────────────────────────┘ ICV (16 bytes) - NOT encrypted├─────────────────────────────────────────────────────────────────────────────┤│ Authentication Tag: 0x1A2B3C4D5E6F7A8B9C0D1E2F3A4B5C6D ││ (AES-GCM computed over: ESP Header + IV + Ciphertext + ESP Trailer) │└─────────────────────────────────────────────────────────────────────────────┘ TOTAL PACKET: 20 (outer IP) + 8 (ESP) + 8 (IV) + 20 (inner IP) + 64 (ICMP) + 12 (trailer) + 16 (ICV) = 148 bytes Overhead Added by ESP: 148 - 84 (original packet) = 64 bytes (76% overhead!)Key Observations:
The outer IP addresses are visible: Anyone sniffing the network sees traffic between VPN gateways, not original hosts
SPI and Sequence Number are visible: An observer knows which SA is being used and can count packets
Everything else is hidden: Inner IP addresses, protocol, and data are encrypted
Significant overhead: A small 84-byte ICMP packet becomes 148 bytes—76% overhead. For larger packets, the percentage decreases (1500-byte payload → ~5% overhead)
Padding ensures alignment: AES-GCM uses 16-byte blocks; padding brings the encrypted section to a multiple of 16
When analyzing ESP packets in Wireshark, you'll see the outer IP header, ESP header (SPI + Sequence), and encrypted payload blob. Without decryption keys, Wireshark cannot interpret the encrypted portion. Wireshark can decrypt ESP if you provide the SA keys—useful for debugging but obviously not available for traffic you don't control.
The ESP packet format is a carefully engineered structure that enables comprehensive security services while maintaining reasonable efficiency. Understanding this format is essential for anyone implementing, troubleshooting, or analyzing IPSec deployments.
What's Next:
Now that we understand ESP's packet structure, we'll dive deeper into the encryption mechanisms ESP employs. You'll learn how symmetric encryption algorithms like AES work within ESP, key derivation from SA parameters, cipher mode selection and implications, and best practices for algorithm configuration in modern deployments.
You now understand the complete ESP packet format—every field, every boundary, every purpose. This knowledge enables you to analyze packet captures, troubleshoot VPN issues, and understand the overhead implications of ESP protection. Next, we'll explore how ESP's encryption mechanisms actually protect payload data.