Computer NetworksEncapsulating Security Payload

Encapsulating Security Payload (ESP)

LevelIntermediate

Duration60 mins

TopicEncapsulating Security Payload

2 / 5

ESP Format

Anatomy of a Secure Packet

Every ESP-protected packet carries within its structure the machinery for confidentiality, integrity, and authentication. Understanding this structure is not merely academic—it's essential for troubleshooting VPN issues, analyzing packet captures, optimizing performance, and implementing security systems correctly.

The ESP packet format is elegantly designed, balancing security requirements against overhead constraints. Unlike protocols where headers simply identify fields, ESP's format defines critical boundaries: where encryption starts and ends, what gets authenticated, and how padding ensures both security and block cipher alignment.

In this page, we'll dissect the ESP packet byte by byte, understanding the purpose of every field, the constraints that shaped its design, and the practical implications for network operations.

What You Will Learn

By the end of this page, you will understand the complete ESP packet structure including the ESP header fields (SPI, Sequence Number), the encrypted payload body, ESP trailer components (Padding, Pad Length, Next Header), and the Integrity Check Value (ICV). You'll learn exactly what gets encrypted versus authenticated, how padding works, and how to interpret ESP packets in packet captures.

ESP Packet Overview

An ESP packet consists of four main components, arranged in a specific order that enables both encryption and authentication operations:

1. ESP Header (8 bytes, unencrypted)

Security Parameters Index (SPI) — 32 bits
Sequence Number — 32 bits

2. Payload Data (variable, encrypted)

Original IP packet (tunnel mode) or upper-layer protocol (transport mode)
Initialization Vector (IV) if required by algorithm

3. ESP Trailer (variable, encrypted)

Padding — 0-255 bytes
Pad Length — 8 bits
Next Header — 8 bits

4. Integrity Check Value (ICV) (variable, unencrypted)

Computed over ESP header, payload, and trailer
Length depends on authentication algorithm (typically 12-16 bytes)

Converting Mermaid diagram...

The Critical Boundaries:

Understanding what gets encrypted versus what gets authenticated is crucial:

Encrypted Portion:

Starts immediately after ESP header (or after IV, conceptually)
Includes: Payload data, Padding, Pad Length, Next Header
Does NOT include: ESP header (SPI, Sequence Number)

Authenticated Portion:

Starts at ESP header
Includes: SPI, Sequence Number, Payload, Padding, Pad Length, Next Header
Does NOT include: The ICV itself (it's the result of authentication)

This means an attacker can see the ESP header (knowing which Security Association is being used and the sequence number) but cannot read or modify the payload without detection.

Why Not Encrypt the Header?

The ESP header (SPI and Sequence Number) remains unencrypted because the receiver needs this information to locate the correct Security Association before decryption can begin. Without knowing which SA applies, the receiver wouldn't know which decryption key to use. The SPI acts as an index into the SA database, making decryption possible.

ESP Header Fields

The ESP header contains just two fields—8 bytes total—yet these carry critical information for processing the packet.

Security Parameters Index (SPI) — 32 bits

The SPI is a 32-bit identifier that, combined with the destination IP address and protocol (ESP = 50), uniquely identifies the Security Association under which this packet should be processed.

Key Characteristics:

Assigned by the receiving side during SA establishment
Must be unique within the receiver's SA database for the given destination
Values 1-255 reserved by IANA for future use
Value 0 is reserved (indicates 'no SA')
Typically appears as 8 hexadecimal digits (e.g., 0xCAFEBABE)

SPI Selection:

IPSec implementations automatically assign SPIs during IKE negotiation
Common approaches: random selection, sequential allocation, hash-based generation
The sender learns the SPI to use when the receiver proposes it during IKE Phase 2

ESP Header Field Details
Field	Size	Offset	Purpose	Value Range
SPI	32 bits (4 bytes)	0	Identifies Security Association	256 - 2³²-1 (0x100 - 0xFFFFFFFF)
Sequence Number	32 bits (4 bytes)	4	Anti-replay protection	1 - 2³²-1 (wraps require SA rekey)

Sequence Number — 32 bits

The Sequence Number is a 32-bit unsigned integer that provides anti-replay protection. It starts at 1 when an SA is established and increments by 1 for each packet sent.

Critical Behavior:

Never wraps: If the sequence number reaches 2³²-1, the SA MUST be terminated and a new SA established
Monotonically increasing: Sender never decrements or reuses sequence numbers
Receiver verification: Receiver maintains a sliding window to detect replays

Extended Sequence Numbers (ESN):

For high-volume connections, 2³² packets (~4.3 billion) might be exhausted too quickly. RFC 4303 defines Extended Sequence Numbers:

64-bit sequence number space (2⁶⁴ packets before rekey)
Only lower 32 bits transmitted in header (bandwidth conservation)
Upper 32 bits implicitly maintained by both ends
Negotiated during IKE SA establishment

Sequence Number Exhaustion

At 10 Gbps with minimum-sized packets, an SA could exhaust 2³² sequence numbers in under an hour. High-throughput deployments MUST use Extended Sequence Numbers (ESN) or configure aggressive SA lifetimes to trigger rekeying before exhaustion. Sequence number wrap without rekey creates a critical vulnerability—the anti-replay mechanism fails completely.

esp_header_structure.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ESP Header Format (8 bytes / 64 bits)
═══════════════════════════════════════════════════════════════════
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│                Security Parameters Index (SPI)                │
├───────────────────────────────────────────────────────────────┤
│                    Sequence Number                            │
├───────────────────────────────────────────────────────────────┤
│                                                               │
~                    Payload Data (variable)                    ~
│                        [Encrypted]                            │
├───────────────────────────────────────────────────────────────┤
 
Note: What follows is encrypted - cannot be seen in packet capture
without decryption keys.

Initialization Vector (IV)

Many encryption algorithms require an Initialization Vector (IV) to ensure that identical plaintexts produce different ciphertexts. The IV's placement and handling in ESP depends on the encryption algorithm used.

Purpose of the IV:

Without an IV, encrypting the same plaintext with the same key always produces the same ciphertext. This creates a vulnerability—an attacker observing traffic could detect repeated messages and potentially deduce patterns or content. The IV introduces randomness into each encryption operation, ensuring unique ciphertexts even for identical plaintexts.

IV Characteristics:

Must be unpredictable for CBC mode (random or counter-based)
Must be unique per-packet for counter modes (CTR, GCM)
Length depends on algorithm (typically 8-16 bytes)
Transmitted in cleartext (but this is cryptographically safe)

IV Requirements by Algorithm Type
Algorithm	IV Size	IV Requirements	IV Source
AES-CBC	16 bytes (128 bits)	Unpredictable, random	CSPRNG per-packet
3DES-CBC	8 bytes (64 bits)	Unpredictable, random	CSPRNG per-packet
AES-CTR	8 bytes (64 bits)	Unique (never reused with same key)	Counter or random
AES-GCM	8 bytes (64 bits)*	Unique (never reused with same key)	Counter, typically 64-bit
ChaCha20-Poly1305	8 bytes	Unique (never reused with same key)	Counter-based

*Note: AES-GCM technically uses a 12-byte nonce, but ESP implementations typically derive it from an 8-byte explicit IV concatenated with a 4-byte salt from the SA.

IV Placement in ESP:

The IV conceptually falls within the encrypted payload area, but since it's needed to begin decryption, it must be accessible before decryption. In practice:

IV is transmitted at the beginning of the Payload Data field
Receiver extracts IV before decryption
IV is considered part of the encrypted data for protocol purposes
IV is included in ICV calculation (authenticated)

Explicit vs Implicit IV:

Some algorithms allow implicit IV derivation from the sequence number and SA parameters, reducing per-packet overhead. For example:

Implicit IV with AES-GCM-ESP: 8 bytes saved per packet
Computation: IV = SA-derived-salt || Sequence Number
Requires Extended Sequence Number (ESN) for 64-bit uniqueness

IV Reuse is Catastrophic

For counter-mode algorithms (AES-GCM, AES-CTR, ChaCha20), reusing an IV with the same key is a devastating failure—it can reveal the XOR of two plaintexts and potentially the authentication key. Implementations must guarantee IV uniqueness, typically by deriving IV from the sequence number. This is why sequence number wrap requires SA termination.

ESP Payload Data

The Payload Data field contains the actual protected content—what we're ultimately trying to secure. Its contents differ based on whether ESP operates in transport mode or tunnel mode.

Transport Mode Payload:

In transport mode, ESP protects only the upper-layer protocol (TCP, UDP, ICMP, etc.) without encapsulating the original IP header:

[Original IP Header][ESP Header][IV][TCP/UDP/etc. Header + Data][ESP Trailer][ICV]
                                \________ Encrypted _________/

Original IP addresses remain visible (unencrypted in IP header)
Only the upper-layer protocol data is protected
Used for host-to-host communication

Tunnel Mode Payload:

In tunnel mode, ESP encapsulates the entire original IP packet within a new IP packet:

[New IP Header][ESP Header][IV][Original IP Header + Complete Packet][ESP Trailer][ICV]
                               \______________ Encrypted ______________/

Original IP addresses are hidden (encrypted inside ESP)
Entire original packet protected, including inner IP header
Used for site-to-site VPNs (gateway-to-gateway)

Transport Mode

•Payload: Upper-layer data only
•Original IP Header: Preserved, visible
•Overhead: Lower (no IP header duplication)
•Use Case: Host-to-host, same subnet
•NAT: Limited compatibility

Tunnel Mode

•Payload: Complete original IP packet
•Original IP Header: Encrypted, hidden
•Overhead: Higher (outer + inner IP headers)
•Use Case: Site-to-site VPNs, remote access
•NAT: Full compatibility with NAT-T

Payload Size Considerations:

The payload field has no fixed size limit imposed by ESP itself, but practical constraints exist:

MTU Limitations: ESP overhead reduces available space for payload
Fragmentation Issues: Tunnel mode creates larger packets that may require fragmentation
Block Cipher Alignment: Payload + padding must align to cipher block size

MTU Impact Calculation:

For a typical IPSec tunnel with AES-GCM:

Original MTU: 1500 bytes
- Outer IP header: 20 bytes
- ESP header: 8 bytes
- IV: 8 bytes
- ESP trailer (padding + pad length + next header): 2-17 bytes
- ICV: 16 bytes
─────────────────
Maximum payload: 1431-1446 bytes
(Inner IP header + original data)

This MTU reduction causes fragmentation if not accounted for—a common source of VPN performance issues.

Path MTU Discovery

VPN tunnels frequently suffer from MTU-related black holes when Path MTU Discovery (PMTUD) fails. Best practices include: (1) Configure tunnel endpoints with reduced MTU values, (2) Enable tunnel MSS clamping for TCP traffic, (3) Consider DF-bit clearing with careful fragmentation handling. Unexplained 'some sites work, some don't' symptoms often trace to MTU issues.

ESP Trailer

The ESP trailer follows the payload data and contains three components critical for proper packet processing: Padding, Pad Length, and Next Header. Unlike the ESP header, the trailer is encrypted—visible only after decryption.

Padding (0-255 bytes):

Padding serves multiple purposes in ESP:

Block Cipher Alignment: Many encryption algorithms (AES-CBC, 3DES-CBC) operate on fixed-size blocks (16 bytes for AES). The plaintext must be padded to a multiple of the block size.
Traffic Flow Confidentiality (TFC): Padding can obscure the actual message length, preventing traffic analysis based on packet sizes.
Alignment Requirements: Some hardware implementations require 4-byte or 8-byte alignment of specific fields.

Padding Content:

RFC 4303 specifies that padding bytes should contain sequential values (1, 2, 3, ... n). This pattern serves as a verification mechanism—the receiver can check that padding follows the expected pattern after decryption.

ESP Trailer Fields
Field	Size	Value/Content	Purpose
Padding	0-255 bytes	0x01 0x02 0x03 ... 0xNN (sequential)	Block alignment + traffic flow confidentiality
Pad Length	1 byte	0-255 (number of padding bytes)	Enables receiver to remove padding
Next Header	1 byte	IP protocol number	Identifies encapsulated protocol

Pad Length (1 byte):

The Pad Length field indicates how many bytes of padding precede it. After decryption, the receiver:

Reads the Pad Length field (last byte before Next Header)
Removes that many bytes from the end of the decrypted data
What remains is the original payload

Next Header (1 byte):

The Next Header field identifies what type of data is contained in the payload:

In Transport Mode: The upper-layer protocol (TCP=6, UDP=17, ICMP=1, etc.)
In Tunnel Mode: Always IP (4 for IPv4, 41 for IPv6)

This field is essential because, after decryption, the receiver must know how to process the decrypted data. It occupies the same semantic position as the IP Protocol field, enabling seamless reintegration of decrypted packets into the IP stack.

Why Encrypt the Next Header?

The Next Header field is encrypted as part of the trailer. This is intentional—it prevents an attacker from knowing what protocol is being carried (TCP? UDP? ICMP?) without decryption. This provides modest traffic analysis protection, hiding the nature of encapsulated traffic from observers.

esp_trailer_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ESP Trailer Structure (example with 5 bytes padding)
═══════════════════════════════════════════════════════════════════
 
Data Field (before trailer):
│ ...encrypted payload data...                              │
 
ESP Trailer:
├───────┬───────┬───────┬───────┬───────┬───────────┬───────────┤
│ 0x01  │ 0x02  │ 0x03  │ 0x04  │ 0x05  │ Pad Length│Next Header│
│(pad)  │(pad)  │(pad)  │(pad)  │(pad)  │    = 5    │   = 4     │
├───────┴───────┴───────┴───────┴───────┴───────────┴───────────┤
         5 bytes of padding              │   1 byte  │  1 byte   │
 
Next Header = 4 means IPv4 (tunnel mode, encapsulating IPv4 packet)
Next Header = 6 would mean TCP (transport mode, encapsulating TCP)
Next Header = 17 would mean UDP (transport mode, encapsulating UDP)

Integrity Check Value (ICV)

The Integrity Check Value (ICV) is the cryptographic tag that provides authentication and integrity verification. It's computed over the ESP header, payload, and trailer, and appended to the end of the packet (after the ESP trailer, unencrypted).

ICV Computation:

The ICV is calculated using a cryptographic authentication algorithm, typically either:

HMAC (Hash-based Message Authentication Code): Uses a hash function (SHA-256, SHA-384, SHA-512) with a secret key
AEAD Tag: For algorithms like AES-GCM, the authentication tag is a natural output of the encryption process

Coverage:

The ICV covers (authenticates):

ESP header (SPI + Sequence Number)
IV (if explicit)
Encrypted payload
ESP trailer (Padding + Pad Length + Next Header)

The ICV does NOT cover (and cannot cover):

Outer IP header (would break NAT traversal)
The ICV field itself (circular dependency)

Common ICV Algorithms and Sizes
Algorithm	Full Output	ICV Size (Truncated)	Security Level
HMAC-SHA-256-128	256 bits	16 bytes (128 bits)	Strong
HMAC-SHA-384-192	384 bits	24 bytes (192 bits)	Very Strong
HMAC-SHA-512-256	512 bits	32 bytes (256 bits)	Highest
AES-GCM (AEAD)	128 bits	16 bytes (128 bits)	Strong + Encryption
AES-GMAC (Auth only)	128 bits	16 bytes (128 bits)	Strong (no encryption)
HMAC-SHA-1-96 (legacy)	160 bits	12 bytes (96 bits)	Deprecated — avoid

ICV Verification Process:

When a receiver processes an ESP packet:

Extract SPI from ESP header
Look up Security Association in SA database
Retrieve authentication algorithm and key from SA
Compute expected ICV over received (ESP header + encrypted payload + trailer)
Compare computed ICV with received ICV
If mismatch: Discard packet silently (no ICMP error—security principle)
If match: Proceed to decryption

The 'silent discard' for failed verification is deliberate—sending an error would confirm receipt and could be exploited for oracle attacks.

AEAD Algorithms:

Modern deployments increasingly use Authenticated Encryption with Associated Data (AEAD) algorithms like AES-GCM. These combine encryption and authentication in a single cryptographic operation:

Single key for both operations
Better performance (one pass over data)
Mathematically stronger binding between ciphertext and authentication tag
Tag computation uses Associated Data (AAD) = ESP header

Always Use ICV

While ESP technically permits null authentication (no ICV), this should never be used in practice. Without integrity checking, an attacker can modify encrypted packets—even without knowing the key—to cause unpredictable decryption results. This can enable bit-flipping attacks, especially against CBC-mode encryption. Always configure both encryption AND authentication.

Complete ESP Packet Example

Let's walk through a complete ESP packet to see how all components fit together. Consider an ESP packet in tunnel mode using AES-GCM-256, protecting an ICMP echo request.

esp_packet_analysis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Complete ESP Packet in Tunnel Mode (AES-256-GCM)
═══════════════════════════════════════════════════════════════════════════════
 
OUTER IP HEADER (20 bytes) - NOT authenticated, NOT encrypted
├─────────────────────────────────────────────────────────────────────────────┤
│ Version: 4 │ IHL: 5 │ TOS: 0x00 │ Total Length: 136                        │
│ Identification: 0x1234 │ Flags: 0 │ Fragment Offset: 0                     │
│ TTL: 64 │ Protocol: 50 (ESP) │ Header Checksum: 0xABCD                    │
│ Source IP: 10.1.1.1 (VPN Gateway A)                                        │
│ Destination IP: 10.2.2.1 (VPN Gateway B)                                   │
├─────────────────────────────────────────────────────────────────────────────┤
 
ESP HEADER (8 bytes) - Authenticated, NOT encrypted
├─────────────────────────────────────────────────────────────────────────────┤
│ SPI: 0xCAFEBABE (3405691582)                                               │
│ Sequence Number: 0x00000001 (1)                                            │
├─────────────────────────────────────────────────────────────────────────────┤
 
INITIALIZATION VECTOR (8 bytes) - Authenticated, NOT "encrypted" but public
├─────────────────────────────────────────────────────────────────────────────┤
│ IV: 0x0001020304050607 (derived from counter)                              │
├─────────────────────────────────────────────────────────────────────────────┤
 
ENCRYPTED PAYLOAD - Authenticated AND encrypted
┌─────────────────────────────────────────────────────────────────────────────┐
│ [This is what's encrypted - cannot be seen in capture without key]         │
│                                                                             │
│ INNER IP HEADER (20 bytes):                                                │
│   Version: 4 │ IHL: 5 │ TOS: 0x00 │ Total Length: 84                      │
│   Source IP: 192.168.1.100 (actual source)                                 │
│   Destination IP: 192.168.2.200 (actual destination)                       │
│   Protocol: 1 (ICMP)                                                       │
│                                                                             │
│ ICMP DATA (64 bytes):                                                      │
│   Type: 8 (Echo Request) │ Code: 0                                        │
│   Identifier: 0x1234 │ Sequence: 1                                        │
│   Payload: "Hello from ping..."                                           │
│                                                                             │
│ ESP TRAILER (12 bytes):                                                    │
│   Padding: 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0A (10 bytes)   │
│   Pad Length: 0x0A (10)                                                   │
│   Next Header: 0x04 (4 = IPv4)                                            │
└─────────────────────────────────────────────────────────────────────────────┘
 
ICV (16 bytes) - NOT encrypted
├─────────────────────────────────────────────────────────────────────────────┤
│ Authentication Tag: 0x1A2B3C4D5E6F7A8B9C0D1E2F3A4B5C6D                     │
│ (AES-GCM computed over: ESP Header + IV + Ciphertext + ESP Trailer)        │
└─────────────────────────────────────────────────────────────────────────────┘
 
TOTAL PACKET: 20 (outer IP) + 8 (ESP) + 8 (IV) + 20 (inner IP) + 
              64 (ICMP) + 12 (trailer) + 16 (ICV) = 148 bytes
 
Overhead Added by ESP: 148 - 84 (original packet) = 64 bytes (76% overhead!)

Key Observations:

The outer IP addresses are visible: Anyone sniffing the network sees traffic between VPN gateways, not original hosts
SPI and Sequence Number are visible: An observer knows which SA is being used and can count packets
Everything else is hidden: Inner IP addresses, protocol, and data are encrypted
Significant overhead: A small 84-byte ICMP packet becomes 148 bytes—76% overhead. For larger packets, the percentage decreases (1500-byte payload → ~5% overhead)
Padding ensures alignment: AES-GCM uses 16-byte blocks; padding brings the encrypted section to a multiple of 16

Packet Capture Analysis

When analyzing ESP packets in Wireshark, you'll see the outer IP header, ESP header (SPI + Sequence), and encrypted payload blob. Without decryption keys, Wireshark cannot interpret the encrypted portion. Wireshark can decrypt ESP if you provide the SA keys—useful for debugging but obviously not available for traffic you don't control.

Summary and Key Takeaways

The ESP packet format is a carefully engineered structure that enables comprehensive security services while maintaining reasonable efficiency. Understanding this format is essential for anyone implementing, troubleshooting, or analyzing IPSec deployments.

Key Takeaways

•ESP header (8 bytes) contains SPI and Sequence Number—visible to observers but sufficient for SA lookup and anti-replay
•The IV enables secure encryption by ensuring unique ciphertexts; its length and requirements depend on the chosen algorithm
•Payload contains either upper-layer data (transport mode) or complete inner IP packet (tunnel mode), fully encrypted
•ESP trailer includes padding for block alignment, Pad Length, and Next Header—all encrypted to prevent traffic analysis
•ICV authenticates the entire ESP packet (except outer IP header), enabling integrity verification before decryption
•AEAD algorithms like AES-GCM combine encryption and authentication efficiently, reducing complexity and improving security

What's Next:

Now that we understand ESP's packet structure, we'll dive deeper into the encryption mechanisms ESP employs. You'll learn how symmetric encryption algorithms like AES work within ESP, key derivation from SA parameters, cipher mode selection and implications, and best practices for algorithm configuration in modern deployments.

Page Complete

You now understand the complete ESP packet format—every field, every boundary, every purpose. This knowledge enables you to analyze packet captures, troubleshoot VPN issues, and understand the overhead implications of ESP protection. Next, we'll explore how ESP's encryption mechanisms actually protect payload data.

2 / 5

Loading learning content...

Computer NetworksEncapsulating Security Payload

Encapsulating Security Payload (ESP)

LevelIntermediate

Duration60 mins

TopicEncapsulating Security Payload

2 / 5

ESP Format

Anatomy of a Secure Packet

In this page, we'll dissect the ESP packet byte by byte, understanding the purpose of every field, the constraints that shaped its design, and the practical implications for network operations.

What You Will Learn

ESP Packet Overview

An ESP packet consists of four main components, arranged in a specific order that enables both encryption and authentication operations:

1. ESP Header (8 bytes, unencrypted)

Security Parameters Index (SPI) — 32 bits
Sequence Number — 32 bits

2. Payload Data (variable, encrypted)

Original IP packet (tunnel mode) or upper-layer protocol (transport mode)
Initialization Vector (IV) if required by algorithm

3. ESP Trailer (variable, encrypted)

Padding — 0-255 bytes
Pad Length — 8 bits
Next Header — 8 bits

4. Integrity Check Value (ICV) (variable, unencrypted)

Computed over ESP header, payload, and trailer
Length depends on authentication algorithm (typically 12-16 bytes)

Converting Mermaid diagram...

The Critical Boundaries:

Understanding what gets encrypted versus what gets authenticated is crucial:

Encrypted Portion:

Starts immediately after ESP header (or after IV, conceptually)
Includes: Payload data, Padding, Pad Length, Next Header
Does NOT include: ESP header (SPI, Sequence Number)

Authenticated Portion:

Starts at ESP header
Includes: SPI, Sequence Number, Payload, Padding, Pad Length, Next Header
Does NOT include: The ICV itself (it's the result of authentication)

This means an attacker can see the ESP header (knowing which Security Association is being used and the sequence number) but cannot read or modify the payload without detection.

Why Not Encrypt the Header?

ESP Header Fields

The ESP header contains just two fields—8 bytes total—yet these carry critical information for processing the packet.

Security Parameters Index (SPI) — 32 bits

The SPI is a 32-bit identifier that, combined with the destination IP address and protocol (ESP = 50), uniquely identifies the Security Association under which this packet should be processed.

Key Characteristics:

Assigned by the receiving side during SA establishment
Must be unique within the receiver's SA database for the given destination
Values 1-255 reserved by IANA for future use
Value 0 is reserved (indicates 'no SA')
Typically appears as 8 hexadecimal digits (e.g., 0xCAFEBABE)

SPI Selection:

IPSec implementations automatically assign SPIs during IKE negotiation
Common approaches: random selection, sequential allocation, hash-based generation
The sender learns the SPI to use when the receiver proposes it during IKE Phase 2

ESP Header Field Details
Field	Size	Offset	Purpose	Value Range
SPI	32 bits (4 bytes)	0	Identifies Security Association	256 - 2³²-1 (0x100 - 0xFFFFFFFF)
Sequence Number	32 bits (4 bytes)	4	Anti-replay protection	1 - 2³²-1 (wraps require SA rekey)

Sequence Number — 32 bits

The Sequence Number is a 32-bit unsigned integer that provides anti-replay protection. It starts at 1 when an SA is established and increments by 1 for each packet sent.

Critical Behavior:

Never wraps: If the sequence number reaches 2³²-1, the SA MUST be terminated and a new SA established
Monotonically increasing: Sender never decrements or reuses sequence numbers
Receiver verification: Receiver maintains a sliding window to detect replays

Extended Sequence Numbers (ESN):

For high-volume connections, 2³² packets (~4.3 billion) might be exhausted too quickly. RFC 4303 defines Extended Sequence Numbers:

64-bit sequence number space (2⁶⁴ packets before rekey)
Only lower 32 bits transmitted in header (bandwidth conservation)
Upper 32 bits implicitly maintained by both ends
Negotiated during IKE SA establishment

Sequence Number Exhaustion

esp_header_structure.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ESP Header Format (8 bytes / 64 bits)
═══════════════════════════════════════════════════════════════════
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│                Security Parameters Index (SPI)                │
├───────────────────────────────────────────────────────────────┤
│                    Sequence Number                            │
├───────────────────────────────────────────────────────────────┤
│                                                               │
~                    Payload Data (variable)                    ~
│                        [Encrypted]                            │
├───────────────────────────────────────────────────────────────┤
 
Note: What follows is encrypted - cannot be seen in packet capture
without decryption keys.

Initialization Vector (IV)

Purpose of the IV:

IV Characteristics:

Must be unpredictable for CBC mode (random or counter-based)
Must be unique per-packet for counter modes (CTR, GCM)
Length depends on algorithm (typically 8-16 bytes)
Transmitted in cleartext (but this is cryptographically safe)

IV Requirements by Algorithm Type
Algorithm	IV Size	IV Requirements	IV Source
AES-CBC	16 bytes (128 bits)	Unpredictable, random	CSPRNG per-packet
3DES-CBC	8 bytes (64 bits)	Unpredictable, random	CSPRNG per-packet
AES-CTR	8 bytes (64 bits)	Unique (never reused with same key)	Counter or random
AES-GCM	8 bytes (64 bits)*	Unique (never reused with same key)	Counter, typically 64-bit
ChaCha20-Poly1305	8 bytes	Unique (never reused with same key)	Counter-based

*Note: AES-GCM technically uses a 12-byte nonce, but ESP implementations typically derive it from an 8-byte explicit IV concatenated with a 4-byte salt from the SA.

IV Placement in ESP:

The IV conceptually falls within the encrypted payload area, but since it's needed to begin decryption, it must be accessible before decryption. In practice:

IV is transmitted at the beginning of the Payload Data field
Receiver extracts IV before decryption
IV is considered part of the encrypted data for protocol purposes
IV is included in ICV calculation (authenticated)

Explicit vs Implicit IV:

Some algorithms allow implicit IV derivation from the sequence number and SA parameters, reducing per-packet overhead. For example:

Implicit IV with AES-GCM-ESP: 8 bytes saved per packet
Computation: IV = SA-derived-salt || Sequence Number
Requires Extended Sequence Number (ESN) for 64-bit uniqueness

IV Reuse is Catastrophic

ESP Payload Data

The Payload Data field contains the actual protected content—what we're ultimately trying to secure. Its contents differ based on whether ESP operates in transport mode or tunnel mode.

Transport Mode Payload:

In transport mode, ESP protects only the upper-layer protocol (TCP, UDP, ICMP, etc.) without encapsulating the original IP header:

[Original IP Header][ESP Header][IV][TCP/UDP/etc. Header + Data][ESP Trailer][ICV]
                                \________ Encrypted _________/

Original IP addresses remain visible (unencrypted in IP header)
Only the upper-layer protocol data is protected
Used for host-to-host communication

Tunnel Mode Payload:

In tunnel mode, ESP encapsulates the entire original IP packet within a new IP packet:

[New IP Header][ESP Header][IV][Original IP Header + Complete Packet][ESP Trailer][ICV]
                               \______________ Encrypted ______________/

Original IP addresses are hidden (encrypted inside ESP)
Entire original packet protected, including inner IP header
Used for site-to-site VPNs (gateway-to-gateway)

Transport Mode

•Payload: Upper-layer data only
•Original IP Header: Preserved, visible
•Overhead: Lower (no IP header duplication)
•Use Case: Host-to-host, same subnet
•NAT: Limited compatibility

Tunnel Mode

•Payload: Complete original IP packet
•Original IP Header: Encrypted, hidden
•Overhead: Higher (outer + inner IP headers)
•Use Case: Site-to-site VPNs, remote access
•NAT: Full compatibility with NAT-T

Payload Size Considerations:

The payload field has no fixed size limit imposed by ESP itself, but practical constraints exist:

MTU Limitations: ESP overhead reduces available space for payload
Fragmentation Issues: Tunnel mode creates larger packets that may require fragmentation
Block Cipher Alignment: Payload + padding must align to cipher block size

MTU Impact Calculation:

For a typical IPSec tunnel with AES-GCM:

Original MTU: 1500 bytes
- Outer IP header: 20 bytes
- ESP header: 8 bytes
- IV: 8 bytes
- ESP trailer (padding + pad length + next header): 2-17 bytes
- ICV: 16 bytes
─────────────────
Maximum payload: 1431-1446 bytes
(Inner IP header + original data)

This MTU reduction causes fragmentation if not accounted for—a common source of VPN performance issues.

Path MTU Discovery

ESP Trailer

Padding (0-255 bytes):

Padding serves multiple purposes in ESP:

Block Cipher Alignment: Many encryption algorithms (AES-CBC, 3DES-CBC) operate on fixed-size blocks (16 bytes for AES). The plaintext must be padded to a multiple of the block size.
Traffic Flow Confidentiality (TFC): Padding can obscure the actual message length, preventing traffic analysis based on packet sizes.
Alignment Requirements: Some hardware implementations require 4-byte or 8-byte alignment of specific fields.

Padding Content:

ESP Trailer Fields
Field	Size	Value/Content	Purpose
Padding	0-255 bytes	0x01 0x02 0x03 ... 0xNN (sequential)	Block alignment + traffic flow confidentiality
Pad Length	1 byte	0-255 (number of padding bytes)	Enables receiver to remove padding
Next Header	1 byte	IP protocol number	Identifies encapsulated protocol

Pad Length (1 byte):

The Pad Length field indicates how many bytes of padding precede it. After decryption, the receiver:

Reads the Pad Length field (last byte before Next Header)
Removes that many bytes from the end of the decrypted data
What remains is the original payload

Next Header (1 byte):

The Next Header field identifies what type of data is contained in the payload:

In Transport Mode: The upper-layer protocol (TCP=6, UDP=17, ICMP=1, etc.)
In Tunnel Mode: Always IP (4 for IPv4, 41 for IPv6)

Why Encrypt the Next Header?

esp_trailer_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ESP Trailer Structure (example with 5 bytes padding)
═══════════════════════════════════════════════════════════════════
 
Data Field (before trailer):
│ ...encrypted payload data...                              │
 
ESP Trailer:
├───────┬───────┬───────┬───────┬───────┬───────────┬───────────┤
│ 0x01  │ 0x02  │ 0x03  │ 0x04  │ 0x05  │ Pad Length│Next Header│
│(pad)  │(pad)  │(pad)  │(pad)  │(pad)  │    = 5    │   = 4     │
├───────┴───────┴───────┴───────┴───────┴───────────┴───────────┤
         5 bytes of padding              │   1 byte  │  1 byte   │
 
Next Header = 4 means IPv4 (tunnel mode, encapsulating IPv4 packet)
Next Header = 6 would mean TCP (transport mode, encapsulating TCP)
Next Header = 17 would mean UDP (transport mode, encapsulating UDP)

Integrity Check Value (ICV)

ICV Computation:

The ICV is calculated using a cryptographic authentication algorithm, typically either:

HMAC (Hash-based Message Authentication Code): Uses a hash function (SHA-256, SHA-384, SHA-512) with a secret key
AEAD Tag: For algorithms like AES-GCM, the authentication tag is a natural output of the encryption process

Coverage:

The ICV covers (authenticates):

ESP header (SPI + Sequence Number)
IV (if explicit)
Encrypted payload
ESP trailer (Padding + Pad Length + Next Header)

The ICV does NOT cover (and cannot cover):

Outer IP header (would break NAT traversal)
The ICV field itself (circular dependency)

Common ICV Algorithms and Sizes
Algorithm	Full Output	ICV Size (Truncated)	Security Level
HMAC-SHA-256-128	256 bits	16 bytes (128 bits)	Strong
HMAC-SHA-384-192	384 bits	24 bytes (192 bits)	Very Strong
HMAC-SHA-512-256	512 bits	32 bytes (256 bits)	Highest
AES-GCM (AEAD)	128 bits	16 bytes (128 bits)	Strong + Encryption
AES-GMAC (Auth only)	128 bits	16 bytes (128 bits)	Strong (no encryption)
HMAC-SHA-1-96 (legacy)	160 bits	12 bytes (96 bits)	Deprecated — avoid

ICV Verification Process:

When a receiver processes an ESP packet:

Extract SPI from ESP header
Look up Security Association in SA database
Retrieve authentication algorithm and key from SA
Compute expected ICV over received (ESP header + encrypted payload + trailer)
Compare computed ICV with received ICV
If mismatch: Discard packet silently (no ICMP error—security principle)
If match: Proceed to decryption

The 'silent discard' for failed verification is deliberate—sending an error would confirm receipt and could be exploited for oracle attacks.

AEAD Algorithms:

Modern deployments increasingly use Authenticated Encryption with Associated Data (AEAD) algorithms like AES-GCM. These combine encryption and authentication in a single cryptographic operation:

Single key for both operations
Better performance (one pass over data)
Mathematically stronger binding between ciphertext and authentication tag
Tag computation uses Associated Data (AAD) = ESP header

Always Use ICV

Complete ESP Packet Example

Let's walk through a complete ESP packet to see how all components fit together. Consider an ESP packet in tunnel mode using AES-GCM-256, protecting an ICMP echo request.

esp_packet_analysis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Complete ESP Packet in Tunnel Mode (AES-256-GCM)
═══════════════════════════════════════════════════════════════════════════════
 
OUTER IP HEADER (20 bytes) - NOT authenticated, NOT encrypted
├─────────────────────────────────────────────────────────────────────────────┤
│ Version: 4 │ IHL: 5 │ TOS: 0x00 │ Total Length: 136                        │
│ Identification: 0x1234 │ Flags: 0 │ Fragment Offset: 0                     │
│ TTL: 64 │ Protocol: 50 (ESP) │ Header Checksum: 0xABCD                    │
│ Source IP: 10.1.1.1 (VPN Gateway A)                                        │
│ Destination IP: 10.2.2.1 (VPN Gateway B)                                   │
├─────────────────────────────────────────────────────────────────────────────┤
 
ESP HEADER (8 bytes) - Authenticated, NOT encrypted
├─────────────────────────────────────────────────────────────────────────────┤
│ SPI: 0xCAFEBABE (3405691582)                                               │
│ Sequence Number: 0x00000001 (1)                                            │
├─────────────────────────────────────────────────────────────────────────────┤
 
INITIALIZATION VECTOR (8 bytes) - Authenticated, NOT "encrypted" but public
├─────────────────────────────────────────────────────────────────────────────┤
│ IV: 0x0001020304050607 (derived from counter)                              │
├─────────────────────────────────────────────────────────────────────────────┤
 
ENCRYPTED PAYLOAD - Authenticated AND encrypted
┌─────────────────────────────────────────────────────────────────────────────┐
│ [This is what's encrypted - cannot be seen in capture without key]         │
│                                                                             │
│ INNER IP HEADER (20 bytes):                                                │
│   Version: 4 │ IHL: 5 │ TOS: 0x00 │ Total Length: 84                      │
│   Source IP: 192.168.1.100 (actual source)                                 │
│   Destination IP: 192.168.2.200 (actual destination)                       │
│   Protocol: 1 (ICMP)                                                       │
│                                                                             │
│ ICMP DATA (64 bytes):                                                      │
│   Type: 8 (Echo Request) │ Code: 0                                        │
│   Identifier: 0x1234 │ Sequence: 1                                        │
│   Payload: "Hello from ping..."                                           │
│                                                                             │
│ ESP TRAILER (12 bytes):                                                    │
│   Padding: 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0A (10 bytes)   │
│   Pad Length: 0x0A (10)                                                   │
│   Next Header: 0x04 (4 = IPv4)                                            │
└─────────────────────────────────────────────────────────────────────────────┘
 
ICV (16 bytes) - NOT encrypted
├─────────────────────────────────────────────────────────────────────────────┤
│ Authentication Tag: 0x1A2B3C4D5E6F7A8B9C0D1E2F3A4B5C6D                     │
│ (AES-GCM computed over: ESP Header + IV + Ciphertext + ESP Trailer)        │
└─────────────────────────────────────────────────────────────────────────────┘
 
TOTAL PACKET: 20 (outer IP) + 8 (ESP) + 8 (IV) + 20 (inner IP) + 
              64 (ICMP) + 12 (trailer) + 16 (ICV) = 148 bytes
 
Overhead Added by ESP: 148 - 84 (original packet) = 64 bytes (76% overhead!)

Key Observations:

The outer IP addresses are visible: Anyone sniffing the network sees traffic between VPN gateways, not original hosts
SPI and Sequence Number are visible: An observer knows which SA is being used and can count packets
Everything else is hidden: Inner IP addresses, protocol, and data are encrypted
Significant overhead: A small 84-byte ICMP packet becomes 148 bytes—76% overhead. For larger packets, the percentage decreases (1500-byte payload → ~5% overhead)
Padding ensures alignment: AES-GCM uses 16-byte blocks; padding brings the encrypted section to a multiple of 16

Packet Capture Analysis

Summary and Key Takeaways

Key Takeaways

•ESP header (8 bytes) contains SPI and Sequence Number—visible to observers but sufficient for SA lookup and anti-replay
•The IV enables secure encryption by ensuring unique ciphertexts; its length and requirements depend on the chosen algorithm
•Payload contains either upper-layer data (transport mode) or complete inner IP packet (tunnel mode), fully encrypted
•ESP trailer includes padding for block alignment, Pad Length, and Next Header—all encrypted to prevent traffic analysis
•ICV authenticates the entire ESP packet (except outer IP header), enabling integrity verification before decryption
•AEAD algorithms like AES-GCM combine encryption and authentication efficiently, reducing complexity and improving security

What's Next:

Page Complete

2 / 5