Loading learning content...
Encryption alone is not enough. A packet encrypted with the strongest cipher becomes useless—or worse, dangerous—if an attacker can modify it without detection. ESP's authentication mechanisms provide the cryptographic guarantee that packets are exactly as the sender transmitted them, originating from a legitimate source, and not replayed from previous sessions.
Authentication in ESP encompasses three related but distinct services:
These services work together to ensure that even across hostile networks, receivers can trust the authenticity and freshness of every packet they process.
By the end of this page, you will understand how ESP provides authentication through HMAC algorithms and AEAD constructions, the structure and verification of the Integrity Check Value (ICV), how anti-replay protection works via sequence numbers and sliding windows, the critical relationship between encryption and authentication (and why both are necessary), and best practices for configuring authentication in production deployments.
To understand why authentication is essential, consider what encryption alone provides—and what it doesn't.
What Encryption Provides:
What Encryption Does NOT Provide:
The Modification Problem:
Without authentication, an attacker can modify encrypted packets in ways that produce predictable changes in the decrypted output. This is especially problematic with stream ciphers and counter modes:
| Attack | Mechanism | Potential Impact |
|---|---|---|
| Bit-Flipping Attack | Flip bits in ciphertext → predictable bit changes in plaintext (CTR, stream ciphers) | Modify financial amounts, change destinations, corrupt data |
| Block Reordering | Rearrange encrypted blocks (ECB mode) | Shuffle data sections, corrupt structured messages |
| Padding Oracle | Modify padding, observe decryption errors (CBC mode) | Decrypt without key through error analysis |
| Replay Attack | Resend previously captured valid packets | Duplicate transactions, redo commands, amplify traffic |
| Cut-and-Paste | Combine encrypted fragments from different messages | Construct malicious messages from legitimate components |
Example: Bit-Flipping Attack on CTR Mode
In counter mode encryption:
Authentication prevents this by detecting any ciphertext modification before decryption occurs.
Data Origin Authentication:
Authentication also establishes who sent the data. In symmetric key systems, this is implicit:
This logical chain provides data origin authentication as a side effect of integrity verification.
Never use encryption without authentication. This is not a theoretical concern—practical attacks like BEAST, POODLE, and Lucky13 exploited unauthenticated encryption (or improper authentication ordering). ESP without authentication (null integrity) should never be deployed, even when encryption is enabled.
For non-AEAD encryption modes (like AES-CBC), ESP uses HMAC (Hash-based Message Authentication Code) to provide integrity and authentication. HMAC combines a cryptographic hash function with a secret key to produce an authentication tag that verifies both integrity and authenticity.
HMAC Construction:
HMAC uses nested hash operations with inner and outer padding:
HMAC(K, M) = H((K' ⊕ opad) || H((K' ⊕ ipad) || M))
Where:
Why HMAC Instead of Simple Hash?
Using a plain hash (like SHA-256(key || message)) is vulnerable to length-extension attacks. HMAC's nested structure prevents this and provides provable security properties when the underlying hash function is secure.
| Algorithm | Hash Function | Full Output | ICV (Truncated) | Status |
|---|---|---|---|---|
| HMAC-SHA-256-128 | SHA-256 | 256 bits | 128 bits (16 bytes) | Recommended |
| HMAC-SHA-384-192 | SHA-384 | 384 bits | 192 bits (24 bytes) | Recommended |
| HMAC-SHA-512-256 | SHA-512 | 512 bits | 256 bits (32 bytes) | Recommended |
| HMAC-SHA-1-96 | SHA-1 | 160 bits | 96 bits (12 bytes) | Deprecated |
| HMAC-MD5-96 | MD5 | 128 bits | 96 bits (12 bytes) | Avoid |
ICV Truncation:
ESP truncates HMAC output to reduce per-packet overhead. For example, HMAC-SHA-256 produces 256 bits but ESP uses only the first 128 bits (HMAC-SHA-256-128).
Is Truncation Safe?
Yes, when done correctly. Security analysis shows that truncating to half the hash output maintains strong security bounds. An attacker would need to forge approximately 2^(n/2) tags before finding a collision, where n is the truncated length. For 128 bits, this is 2^64—computationally infeasible.
HMAC Key Derivation:
The HMAC key is derived from IKE keying material, separate from the encryption key:
Using separate keys for encryption and authentication is a security requirement—never reuse the same key for both operations.
While HMAC-SHA-1 is not directly vulnerable to SHA-1's collision attacks (HMAC uses SHA-1 differently than certificate signing), best practice is to migrate away from SHA-1. The cryptographic community has deprecated SHA-1 comprehensively, and modern systems should use SHA-256 or stronger. RFC 8221 downgrades HMAC-SHA-1-96 to SHOULD NOT.
Authenticated Encryption with Associated Data (AEAD) algorithms combine encryption and authentication in a single cryptographic operation. For ESP, AEAD eliminates the need for separate HMAC computation, improving both efficiency and security.
AEAD Concept:
AEAD algorithms take four inputs:
And produce two outputs:
Associated Data in ESP:
The Associated Data for ESP is the ESP header (SPI + Sequence Number). This data is:
| Algorithm | Tag Size | Nonce Size | Key Size | Notes |
|---|---|---|---|---|
| AES-128-GCM | 128 bits (16 bytes) | 96 bits (12 bytes) | 128 bits | Hardware accelerated, widely deployed |
| AES-256-GCM | 128 bits (16 bytes) | 96 bits (12 bytes) | 256 bits | Best choice for high security |
| ChaCha20-Poly1305 | 128 bits (16 bytes) | 96 bits (12 bytes) | 256 bits | Excellent without AES-NI |
| AES-CCM | Variable (4-16 bytes) | 56-104 bits | 128/256 bits | Less common in ESP |
AES-GCM Deep Dive:
AES-GCM (Galois/Counter Mode) is the dominant AEAD algorithm for ESP. It combines:
GCM Authentication Process:
1. Encrypt plaintext using AES-CTR
2. Compute GHASH over:
- Associated Data (ESP header)
- Ciphertext
- Lengths of AAD and ciphertext
3. Encrypt GHASH result with AES(Key, Nonce||0³²) to produce tag
ChaCha20-Poly1305:
For environments without AES hardware acceleration, ChaCha20-Poly1305 provides comparable security with excellent software performance:
For new deployments, always choose AEAD algorithms (AES-GCM or ChaCha20-Poly1305). They provide combined encryption+authentication with fewer configuration errors, better performance, and stronger security guarantees than separate encryption and HMAC.
The Integrity Check Value (ICV) is the authentication tag appended to ESP packets. Whether computed via HMAC or as part of AEAD, the ICV provides the cryptographic proof of integrity and authenticity.
ICV Coverage:
The ICV is computed over specific portions of the ESP packet:
For HMAC (with CBC/CTR encryption):
ICV = HMAC-SHA-256(AuthKey, ESP_Header || IV || Ciphertext || ESP_Trailer)
For AEAD (GCM/ChaCha20-Poly1305):
(Ciphertext, Tag) = AES-GCM(Key, Nonce, Plaintext, AAD)
where AAD = ESP_Header (SPI || Sequence Number)
What's Authenticated:
What's NOT Authenticated:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
def compute_esp_hmac(auth_key, esp_header, iv, ciphertext, esp_trailer): """ Compute ICV for ESP using HMAC-SHA-256 Args: auth_key: 256-bit authentication key from SA esp_header: SPI (4 bytes) + Sequence Number (4 bytes) iv: Initialization Vector (algorithm dependent) ciphertext: Encrypted payload esp_trailer: Encrypted padding + pad_length + next_header Returns: 128-bit truncated ICV """ import hmac import hashlib # Concatenate authenticated data authenticated_data = esp_header + iv + ciphertext + esp_trailer # Compute HMAC-SHA-256 full_mac = hmac.new(auth_key, authenticated_data, hashlib.sha256).digest() # Truncate to 128 bits (16 bytes) for HMAC-SHA-256-128 icv = full_mac[:16] return icv def verify_esp_hmac(auth_key, esp_header, iv, ciphertext, esp_trailer, received_icv): """ Verify received ICV matches computed ICV Returns: True if verification passes, False otherwise """ import hmac expected_icv = compute_esp_hmac(auth_key, esp_header, iv, ciphertext, esp_trailer) # Constant-time comparison to prevent timing attacks return hmac.compare_digest(expected_icv, received_icv) def aead_authenticate(key, nonce, ciphertext, aad, received_tag): """ For AEAD, authentication is integrated with decryption. This is conceptual - actual implementation uses crypto library. """ from cryptography.hazmat.primitives.ciphers.aead import AESGCM aesgcm = AESGCM(key) try: # Decrypt and verify in single atomic operation plaintext = aesgcm.decrypt(nonce, ciphertext + received_tag, aad) return True, plaintext except Exception: # Authentication failed - ciphertext was tampered return False, NoneICV Placement:
The ICV is appended to the end of the ESP packet, after the encrypted ESP trailer:
[ESP Header][IV][Encrypted Payload + Trailer][ICV]
^^^^
Unencrypted
Verification Process:
ICV comparison must use constant-time operations. If comparison short-circuits on first mismatched byte, an attacker can measure response timing to determine how many bytes match—potentially forging valid ICVs byte by byte. Always use cryptographic comparison functions (hmac.compare_digest in Python, crypto.timingSafeEqual in Node.js).
Even with encryption and authentication, packets can be captured and retransmitted. Anti-replay protection ensures that each packet is processed exactly once, preventing replay attacks where legitimate encrypted packets are resent to duplicate their effect.
The Replay Threat:
Consider a banking application where an encrypted, authenticated packet transfers $1000 from Account A to Account B. Without anti-replay:
Anti-replay protection detects duplicate packets and rejects them.
ESP Sequence Number:
Every ESP packet contains a 32-bit Sequence Number that:
| Component | Location | Size | Purpose |
|---|---|---|---|
| Sequence Number | ESP Header | 32 bits | Monotonically increasing per-packet counter |
| Extended SN (ESN) | Implicit (not transmitted) | 32 bits upper | Extends counter to 64 bits for high-throughput |
| Replay Window | Receiver state | 32-8192 bits | Bitmap tracking received sequence numbers |
| Window Position | Receiver state | 32/64 bits | Highest sequence number received |
Sliding Window Mechanism:
The receiver maintains a sliding window (typically 32 or 64 packets wide) representing recently received sequence numbers:
Window Example (size = 32):
Received: 42, 43, 45, 47, 48 (44, 46 not yet received)
┌──── Window Right Edge (highest received = 48)
▼
...[16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]
──────────────────────────────────────────────────────────────────
[32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48]
✓ ✓ ○ ✓ ○ ✓ ✓
▲
Window Left Edge ───┘ (48 - 32 + 1 = 17)
✓ = Received ○ = Not yet received (acceptable if arrives later)
Reception Rules:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
class AntiReplayWindow: """ ESP Anti-Replay Window Implementation """ def __init__(self, window_size=64): self.window_size = window_size self.window_right = 0 # Highest sequence number received self.bitmap = 0 # Bit i set if (window_right - i) received def check_and_update(self, sequence_number): """ Check if packet should be accepted and update window. Returns: True if packet is acceptable, False if replay/too-old """ # First packet initializes window if self.window_right == 0: self.window_right = sequence_number self.bitmap = 1 return True # Calculate position relative to window diff = sequence_number - self.window_right if diff > 0: # New packet ahead of window # Slide window right if diff >= self.window_size: # Way ahead - reset bitmap self.bitmap = 1 else: # Slide and set new bit self.bitmap = (self.bitmap << diff) | 1 # Mask to window size self.bitmap &= (1 << self.window_size) - 1 self.window_right = sequence_number return True elif diff == 0: # Duplicate of most recent packet return False else: # Packet behind window_right index = -diff # Position in window (1 = one behind, etc.) if index >= self.window_size: # Too old - outside window return False # Check if already received mask = 1 << index if self.bitmap & mask: # Already received - replay! return False # Not received yet - accept and mark self.bitmap |= mask return True def get_window_status(self): """Debug: Show window state""" status = [] for i in range(self.window_size): seq = self.window_right - i received = bool(self.bitmap & (1 << i)) status.append(f"{seq}: {'✓' if received else '○'}") return status[::-1] # Oldest to newestLarger windows accommodate more out-of-order delivery but consume more memory and increase the risk of accepting reordered attacks. Default of 64 works for most deployments. High-latency, high-bandwidth links may benefit from larger windows (128-1024). Consider your network's reordering characteristics when tuning.
For high-throughput connections, 32-bit sequence numbers can exhaust too quickly. Extended Sequence Numbers (ESN) extend the sequence space to 64 bits while transmitting only 32 bits per packet.
The Exhaustion Problem:
At 10 Gbps with 1000-byte packets:
At 100 Gbps with 1000-byte packets:
Rekeying this frequently is undesirable—ESN provides the solution.
ESN Mechanism:
| Aspect | 32-bit Sequence | 64-bit ESN |
|---|---|---|
| Transmitted bits | 32 bits | 32 bits (same) |
| Total sequence space | 2^32 (~4.3 billion) | 2^64 (~18 quintillion) |
| Time to exhaust @10Gbps | ~57 minutes | ~47 million years |
| ICV computation | Uses 32-bit SN | Uses full 64-bit ESN |
| Negotiation | Default | Must be negotiated in IKE |
ESN ICV Computation:
When ESN is used, the full 64-bit sequence number is included in authentication:
ICV = Auth(Key, ESP_Header(SPI || Low32_Seq) || IV || Ciphertext || Trailer || High32_Seq)
Note that the high-order 32 bits are appended to the authenticated data after the trailer for ICV computation, but they're NOT transmitted—both sender and receiver compute them independently.
Receiver ESN Handling:
The receiver must infer the correct high-order bits:
Any deployment expecting more than ~1 billion packets per SA lifetime should use ESN. This includes most data center, cloud, and high-bandwidth VPN deployments. ESN is negotiated during IKE Child SA creation—ensure both endpoints support and request it.
Proper authentication configuration is critical for ESP security. Misconfigurations can completely undermine protection, even with strong algorithms.
Algorithm Selection:
Anti-Replay Configuration:
Operational Considerations:
Authentication protects against network-layer attacks but doesn't replace application-layer security. Continue using TLS for web services, SSH for remote access, etc. ESP authentication defends the network transport; application security defends the application logic. Both layers providing complementary protection.
ESP authentication provides the critical guarantees that encrypted data hasn't been tampered with, comes from a legitimate source, and isn't a replay of previously captured traffic. Without authentication, encryption alone is vulnerable to numerous attacks.
What's Next:
With ESP's encryption and authentication mechanisms covered, we'll examine ESP modes of operation—transport mode versus tunnel mode. You'll learn when each mode is appropriate, how they differ in their protection boundaries, their interaction with NAT, and how to choose the correct mode for different deployment scenarios.
You now understand ESP authentication comprehensively—from HMAC computation to AEAD integration, ICV verification to anti-replay protection. This knowledge is essential for configuring secure VPN tunnels and diagnosing authentication-related failures. Next, we'll explore how ESP operates differently in transport and tunnel modes.