Network Virtualization - Learning Module

Loading content...

0/240

VXLAN: The Dominant Overlay Protocol

VXLAN: Scaling Network Virtualization

Of all overlay encapsulation protocols developed for network virtualization, VXLAN (Virtual eXtensible LAN) has emerged as the undisputed industry standard. Originally developed by VMware, Cisco, and other vendors and standardized in RFC 7348, VXLAN provides the encapsulation mechanism that enables Layer 2 networks to span across Layer 3 infrastructure at massive scale.

VXLAN's dominance stems from several key advantages:

Simplicity: Straightforward UDP encapsulation without complex state machines
Hardware offload: Widely supported by NIC and switch ASICs for line-rate performance
ECMP compatibility: UDP encapsulation enables seamless ECMP load balancing
24-bit VNI space: Over 16 million isolated networks vs. VLAN's 4,094
Industry adoption: Universal support from VMware, Microsoft, Linux, all major switch vendors

This page will take you through every aspect of VXLAN—from packet format to control plane options to production deployment patterns. By the end, you'll have the complete knowledge needed to design, deploy, and troubleshoot VXLAN-based overlay networks.

What You Will Learn

By the end of this page, you will master VXLAN packet format and headers, understand control plane options (multicast, ingress replication, EVPN), configure VXLAN on Open vSwitch and Linux bridge, understand hardware offload capabilities, troubleshoot common VXLAN issues, and apply best practices for production deployments.

VXLAN Packet Format and Headers

VXLAN uses UDP encapsulation to carry Layer 2 Ethernet frames across Layer 3 networks. Understanding the exact packet structure is essential for troubleshooting and understanding protocol behavior.

Complete VXLAN Packet Structure

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Outer Ethernet Header (14 bytes)             |
|      Destination MAC      |       Source MAC      | Type      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Outer IP Header (20 bytes)                 |
| Ver | IHL | ToS |     Total Length    | Identification        |
| Flags | Fragment Offset | TTL | Proto=17 | Header Checksum    |
|                    Source IP Address                          |
|                  Destination IP Address                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Outer UDP Header (8 bytes)                 |
|       Source Port (hash)      |   Dest Port = 4789            |
|         UDP Length            |      UDP Checksum             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    VXLAN Header (8 bytes)                     |
|R|R|R|R|I|R|R|R|         Reserved (24 bits)                    |
|               VXLAN Network Identifier (VNI) 24 bits          | Reserved
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Inner Ethernet Frame                         |
|  (Original L2 frame with Dest MAC, Src MAC, Type, Payload)    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Header Field Details

Outer Ethernet Header (14 bytes)

Standard Ethernet header with underlay MACs
Destination MAC: Next-hop router MAC (not the remote VTEP)
Source MAC: Local NIC MAC
EtherType: 0x0800 (IPv4) or 0x86DD (IPv6)

Outer IP Header (20 bytes for IPv4)

Source IP: Local VTEP IP address
Destination IP: Remote VTEP IP address
Protocol: 17 (UDP)
TTL: Typically 64; should be sufficient for underlay path

Outer UDP Header (8 bytes)

Source Port: Hash of inner frame fields (provides ECMP entropy)
Destination Port: 4789 (IANA-assigned VXLAN port)
Checksum: Often set to 0 for performance (valid per RFC 768)

VXLAN Header (8 bytes)

Flags (8 bits): Bit 4 (I flag) must be 1 to indicate valid VNI
Reserved: 24 bits, must be 0
VNI (24 bits): VXLAN Network Identifier (0 to 16,777,215)
Reserved: 8 bits, must be 0

VXLAN Header Flags
Bit Position	Name	Value	Meaning
Bit 0-3	Reserved	0	Must be zero on transmit, ignored on receive
Bit 4	I (VNI Flag)	1	VNI is valid; MUST be set to 1
Bit 5-7	Reserved	0	Must be zero on transmit, ignored on receive

vxlan-header-struct.c
C Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/* VXLAN header as defined in RFC 7348 */
struct vxlanhdr {
    __be32 vx_flags;    /* Flags (I flag at bit 27, network byte order) */
    __be32 vx_vni;      /* VNI in upper 24 bits, reserved in lower 8 */
};
 
/* VNI extraction from VXLAN header */
#define VXLAN_VNI_MASK     0xffffff00
#define VXLAN_VNI_SHIFT    8
#define VXLAN_FLAGS_I      0x08000000  /* I flag position */
 
static inline __u32 vxlan_vni(struct vxlanhdr *vxh)
{
    return (ntohl(vxh->vx_vni) & VXLAN_VNI_MASK) >> VXLAN_VNI_SHIFT;
}
 
static inline bool vxlan_valid(struct vxlanhdr *vxh)
{
    return (ntohl(vxh->vx_flags) & VXLAN_FLAGS_I) != 0;
}
 
/* Example: VXLAN encapsulation structure
 * Total overhead: 50 bytes (14 + 20 + 8 + 8)
 * 
 * +----------------+
 * | Outer Eth (14) |
 * +----------------+
 * | Outer IP  (20) |
 * +----------------+
 * | Outer UDP (8)  |
 * +----------------+
 * | VXLAN Hdr (8)  |
 * +----------------+
 * | Inner Eth      |
 * | Frame          |
 * | (variable)     |
 * +----------------+
 */

UDP Source Port Entropy

The outer UDP source port is intentionally randomized (typically a hash of the inner 5-tuple) to provide entropy for L3/L4 ECMP hashing in the underlay. This ensures that different flows between the same VTEP pair take different physical paths, achieving good load distribution across parallel links.

VNI and Network Segmentation

The VXLAN Network Identifier (VNI) is the core segmentation mechanism in VXLAN. Each VNI defines an isolated Layer 2 broadcast domain, allowing millions of independent virtual networks to coexist on the same physical infrastructure.

VNI Address Space

With 24 bits, VNIs range from 0 to 16,777,215:

VNI 0 is typically reserved/unused
VNIs 1-4095 may be mapped 1:1 to VLANs for simplified management
VNIs 4096+ are available for overlay-only networks
Maximum usable VNIs: 16,777,214

This represents a 4,096x increase over VLAN capacity, enabling true multi-tenant cloud scale.

VNI-to-VLAN Mapping

In hybrid environments with both overlay networks and traditional VLANs, VNIs are often mapped to VLANs at the network edge:

Ingress (Physical to Overlay):

Frame arrives on physical port with VLAN tag 100
VTEP strips VLAN tag, encapsulates with VNI 100
Frame enters overlay network in VNI 100

Egress (Overlay to Physical):

VXLAN frame with VNI 100 arrives at gateway VTEP
VTEP decapsulates, adds VLAN tag 100
Frame sent to physical network on VLAN 100

This mapping is configurable—VNI 5000 could map to VLAN 100 if needed.

VNI Isolation Properties

Complete Layer 2 Isolation: VMs in different VNIs cannot communicate at Layer 2, even if they have MAC address collisions. Each VNI is a completely independent Ethernet domain.

Overlapping IP Spaces: Different VNIs can use identical IP subnets. Tenant A's 10.0.0.0/24 in VNI 1000 is completely isolated from Tenant B's 10.0.0.0/24 in VNI 2000.

Independent Policies: Security policies, QoS, and forwarding rules can differ per VNI.

VNI Allocation Strategy Examples
VNI Range	Purpose	Mapping	Example Use
1-1000	Infrastructure networks	1:1 with management VLANs	Management, storage, vMotion
1001-4000	Internal tenant networks	Mapped to customer VLANs	Enterprise workloads
10000-19999	Production tenant networks	Dynamic allocation	Cloud tenant isolation
20000-29999	Development/Test	Dynamic allocation	Non-production workloads
100000+	Service provider scale	Multi-DC with location prefix	Large cloud deployments

VNI Design Considerations

While 16 million VNIs seem limitless, control plane scalability often becomes the practical limit. Each VNI represents a separate broadcast domain that must be managed. Some deployments group related workloads into shared VNIs to reduce control plane overhead, relying on security groups rather than VNI isolation for microsegmentation.

VXLAN Control Plane Options

VXLAN as specified in RFC 7348 only defines the data plane—how packets are encapsulated. It leaves the control plane (how VTEPs discover each other and learn VM locations) undefined. This flexibility has led to multiple control plane approaches:

Option 1: Multicast-Based Control Plane

The original VXLAN approach uses IP multicast for BUM (Broadcast, Unknown unicast, Multicast) traffic:

Configuration:

Each VNI is mapped to an IP multicast group (e.g., VNI 5000 → 239.1.1.5)
VTEPs join the multicast group for VNIs they participate in
Underlay routers perform multicast routing (PIM-SM typically)

BUM Traffic Handling:

VTEP sends BUM traffic to the VNI's multicast group
Underlay multicast infrastructure delivers to all member VTEPs
Recipient VTEPs learn source MAC → source VTEP from flooded frames

Advantages:

Efficient multicast delivery (underlay replicates, not source VTEP)
Follows traditional Ethernet learning semantics

Disadvantages:

Requires multicast-capable underlay (not all networks support it)
Multicast adds operational complexity
Multicast groups consume TCAM resources on switches

Option 2: Ingress Replication (Head-End Replication)

When multicast isn't available, the source VTEP can unicast BUM traffic to each remote VTEP:

Configuration:

Each VTEP maintains a list of peer VTEPs per VNI
List is statically configured or provided by a controller

BUM Traffic Handling:

VTEP creates copies of BUM frame for each peer
Sends unicast encapsulated frame to each peer VTEP
N copies for N peer VTEPs

Advantages:

Works on any IP network (no multicast needed)
Simpler underlay requirements

Disadvantages:

N×M replication load on source VTEP for large VNIs
Increased underlay bandwidth consumption
Scales poorly beyond a few hundred VTEPs per VNI

Option 3: BGP EVPN Control Plane

The gold standard for production VXLAN deployments is EVPN (Ethernet VPN) with BGP as the control plane:

How It Works:

VTEPs peer with BGP route reflectors (or each other in small deployments)
When a VM is learned locally, VTEP advertises EVPN Type-2 route:
- MAC address
- Optionally: IP address (for ARP suppression)
- VNI
- Next-hop (VTEP IP)
Remote VTEPs install the route in their forwarding tables
BUM traffic uses selective ingress replication or EVPN multicast

EVPN Route Types for VXLAN:

Type-2 (MAC/IP Advertisement): Advertises VM MAC and IP
Type-3 (Inclusive Multicast Ethernet Tag): Joins VTEP to VNI flood domain
Type-5 (IP Prefix): Advertises routed IP prefixes (inter-VNI routing)

Advantages:

No flooding for known unicast: Eliminates broadcast storms from MAC learning
ARP suppression: EVPN Type-2 routes with IP enable local ARP response
Multi-vendor interoperability: IETF standard (RFC 8365)
Scales massively: BGP route reflector hierarchy handles millions of routes
Integrated routing: Same control plane for L2 and L3

VXLAN Control Plane Comparison
Aspect	Multicast	Ingress Replication	BGP EVPN
Underlay Requirement	Multicast-enabled	Unicast IP only	BGP infrastructure
BUM Traffic Handling	Efficient multicast	N×M unicast copies	Selective replication
MAC Learning	Data plane (flood/learn)	Data plane (flood/learn)	Control plane (no flood)
ARP Suppression	Not supported	Not supported	Native support (Type-2)
Scalability	Medium (multicast limits)	Low (replication load)	Very High (BGP scales)
Operational Complexity	Medium (multicast ops)	Low (static lists)	High (BGP operations)
Multi-Vendor	Good	Good	Excellent (IETF standard)
Recommended For	Legacy/simple	Small deployments	Production at scale

Production Recommendation

For any production deployment beyond a single rack or simple lab, BGP EVPN is strongly recommended. The operational efficiency gains from ARP suppression, the elimination of flooding, and the scalability of BGP far outweigh the initial setup complexity. All major network vendors support EVPN-VXLAN.

VXLAN with BGP EVPN Deep Dive

BGP EVPN provides a sophisticated, standards-based control plane for VXLAN. Let's examine how EVPN routes enable efficient VXLAN operation.

EVPN Route Type 2: MAC/IP Advertisement

This is the workhorse route type for VXLAN. When a VM powers on or is detected by a VTEP:

Route Type 2 - MAC/IP Advertisement
├── Route Distinguisher: 192.168.1.10:100 (VTEP + VNI combo)
├── Ethernet Segment Identifier: 0 (single-homed)
├── Ethernet Tag ID: 0
├── MAC Address Length: 48
├── MAC Address: 00:50:56:01:02:03
├── IP Address Length: 32 (or 0 if MAC-only)
├── IP Address: 10.0.1.50
├── Extended Communities:
│   ├── Route Target: 64512:5000 (imported by VNI 5000 members)
│   └── Encapsulation: VXLAN
└── Next Hop: 192.168.1.10 (VTEP IP, used as tunnel destination)

The Route Target (RT) determines which VTEPs import the route. VTEPs configured for VNI 5000 import routes with RT 64512:5000.

EVPN Route Type 3: Inclusive Multicast Ethernet Tag (IMET)

Type-3 routes allow VTEPs to discover each other for BUM traffic replication:

Route Type 3 - Inclusive Multicast
├── Route Distinguisher: 192.168.1.10:100
├── Ethernet Tag ID: 0
├── IP Address Length: 32
├── Originating Router's IP: 192.168.1.10 (VTEP IP)
├── Extended Communities:
│   ├── Route Target: 64512:5000
│   └── PMSI Tunnel Attribute:
│       ├── Tunnel Type: Ingress Replication
│       └── Tunnel Endpoint: 192.168.1.10
└── Importing VTEPs add this VTEP to VNI 5000 flood list

When a VTEP needs to flood BUM traffic, it sends copies to all VTEPs that advertised Type-3 routes for that VNI.

EVPN Route Type 5: IP Prefix Route

Type-5 routes enable inter-VNI (inter-subnet) routing at the VTEP:

Route Type 5 - IP Prefix
├── Route Distinguisher: 192.168.1.10:100
├── Ethernet Tag ID: 0
├── IP Prefix Length: 24
├── IP Prefix: 10.0.1.0/24
├── Gateway IP: 0.0.0.0 (or specific gateway)
├── MPLS Label: VNI for L3 routing (L3VNI)
├── Extended Communities:
│   ├── Route Target: 64512:3000 (L3 VRF RT)
│   └── Router MAC: 00:00:00:11:22:33 (for inter-subnet routing)
└── Enables distributed routing between subnets

Converting Mermaid diagram...

EVPN Symmetric IRB

In Symmetric Integrated Routing and Bridging (IRB), both ingress and egress VTEPs perform routing. Traffic between different subnets uses a shared L3VNI for the VRF, enabling optimal routing without hairpinning through a central gateway. This is the recommended model for EVPN-VXLAN deployments with inter-subnet traffic.

Configuring VXLAN

Let's examine practical VXLAN configuration on common platforms. We'll cover Open vSwitch, Linux native bridge, and summarize switch configurations.

Open vSwitch VXLAN Configuration

OVS is the most common software VTEP, used extensively in OpenStack, Kubernetes, and other cloud platforms.

ovs-vxlan-config.sh
OVS Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash
# Open vSwitch VXLAN Configuration Example
 
# Create integration bridge (main switching bridge)
ovs-vsctl add-br br-int
 
# Add VXLAN tunnel port with static remote VTEP
ovs-vsctl add-port br-int vxlan0 -- \
    set interface vxlan0 type=vxlan \
    options:local_ip=192.168.1.10 \
    options:remote_ip=192.168.2.20 \
    options:key=5000 \
    options:dst_port=4789
 
# View tunnel port
ovs-vsctl show
# Output:
# Bridge br-int
#     Port vxlan0
#         Interface vxlan0
#             type: vxlan
#             options: {dst_port="4789", key="5000", 
#                       local_ip="192.168.1.10", remote_ip="192.168.2.20"}
 
# ALTERNATIVE: VXLAN port with flow-based VNI (multiple VNIs per tunnel)
ovs-vsctl add-port br-int vxlan-tun -- \
    set interface vxlan-tun type=vxlan \
    options:local_ip=192.168.1.10 \
    options:remote_ip=flow \
    options:key=flow \
    options:dst_port=4789
 
# OpenFlow rules to set VNI per traffic flow
ovs-ofctl add-flow br-int \
    "table=0,in_port=LOCAL,dl_dst=aa:bb:cc:dd:ee:ff,\
     actions=set_field:5000->tun_id,output:vxlan-tun"
 
# View VXLAN tunnel statistics
ovs-ofctl dump-ports br-int vxlan0
 
# Display FDB entries (MAC-to-tunnel mappings)
ovs-appctl fdb/show br-int

Linux Native VXLAN Configuration

Linux kernel includes native VXLAN support via the ip command, useful for lightweight deployments without OVS.

linux-vxlan-config.sh
Linux iproute2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#!/bin/bash
# Linux Native VXLAN Configuration
 
# Create VXLAN interface
ip link add vxlan5000 type vxlan \
    id 5000 \
    local 192.168.1.10 \
    dstport 4789 \
    learning \
    noarp
 
# For multicast-based control plane:
ip link add vxlan5000 type vxlan \
    id 5000 \
    local 192.168.1.10 \
    group 239.1.1.5 \
    dev eth0 \
    dstport 4789
 
# For ingress replication (static remote VTEPs):
ip link add vxlan5000 type vxlan \
    id 5000 \
    local 192.168.1.10 \
    dstport 4789 \
    nolearning
 
# Add FDB entries for remote VTEPs
bridge fdb append 00:00:00:00:00:00 dev vxlan5000 dst 192.168.2.20
bridge fdb append 00:00:00:00:00:00 dev vxlan5000 dst 192.168.3.30
 
# Add specific MAC-to-VTEP mappings
bridge fdb add aa:bb:cc:dd:ee:ff dev vxlan5000 dst 192.168.2.20
 
# Bring interface up
ip link set vxlan5000 up
 
# Add to bridge for switching with other local VMs
ip link add br-tenant type bridge
ip link set vxlan5000 master br-tenant
ip link set br-tenant up
 
# Verify configuration
ip -d link show vxlan5000
# Output:
# 5: vxlan5000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue 
#    master br-tenant state UNKNOWN mode DEFAULT group default qlen 1000
#    link/ether 7e:5b:3c:4d:2a:1f brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535
#    vxlan id 5000 local 192.168.1.10 srcport 0 0 dstport 4789 ...
 
# View FDB entries
bridge fdb show dev vxlan5000

MTU Configuration Critical

Notice the VXLAN interface shows mtu 1450—this accounts for 50-byte encapsulation overhead. Ensure VM interfaces or the overlay bridge also use this reduced MTU, or configure the underlay for jumbo frames (9000 bytes recommended) to support full 1500-byte inner MTU.

VXLAN Hardware Offload

Software VXLAN processing consumes CPU cycles—encapsulation, decapsulation, and checksum calculations add up at high packet rates. Hardware offload moves these operations to NIC ASICs, dramatically reducing CPU load and increasing throughput.

Offload Capabilities

Stateless Offloads (widely supported)

TX Offloads (transmit path):

TSO (TCP Segmentation Offload): NIC segments large TCP payloads, adding VXLAN headers to each segment
TX Checksum Offload: NIC calculates inner and outer checksums
VXLAN Encapsulation: Some NICs can add VXLAN headers in hardware

RX Offloads (receive path):

LRO/GRO (Large Receive Offload): NIC aggregates multiple VXLAN packets into large payloads
RX Checksum Verification: NIC validates inner and outer checksums
VXLAN Decapsulation: NIC strips VXLAN headers, presents inner frame to host
RSS (Receive Side Scaling): NIC distributes flows across multiple CPU queues based on inner flow hash

Stateful/Flow Offloads (SmartNICs)

Advanced NICs (NVIDIA Mellanox ConnectX, Intel IPU, Broadcom Stingray) support TC flower offload or OVS-DPDK offload:

NIC ASIC stores flow tables (match patterns + actions)
Matching packets are processed entirely in NIC, never reaching host CPU
Can achieve line-rate forwarding (100Gbps+) on commodity servers

vxlan-offload-check.sh
Linux Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Check VXLAN offload capabilities on NIC
ethtool -k eth0 | grep -i vxlan
# Output:
# tx-udp_tnl-segmentation: on
# tx-udp_tnl-csum-segmentation: on
# rx-vxlan-port-offload: on
 
# Verify offload is enabled
ethtool --show-offload eth0
 
# Add VXLAN port for hardware awareness (tells NIC to expect VXLAN on port 4789)
ethtool --add-rx-flow-type eth0 vxlan 4789
 
# Check hardware flow offload capability
ethtool -k eth0 | grep hw-tc-offload
# hw-tc-offload: on  (flow offload available)
 
# Enable TC flower hardware offload for OVS
ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
systemctl restart openvswitch
 
# Verify flows are offloaded
tc filter show dev eth0 ingress
# Offloaded flows show: in_hw in_hw_count 1
 
# Check offload statistics
ethtool -S eth0 | grep vxlan
# Shows VXLAN-specific counters if NIC supports them

VXLAN Hardware Offload Performance Impact
Scenario	Software Only	Stateless Offload	Full Flow Offload
Throughput (64-byte packets)	~5 Mpps	~10 Mpps	~100+ Mpps
Throughput (1400-byte packets)	~40 Gbps	~100 Gbps	Line Rate
CPU Usage (per 10Gbps)	~200%	~50%	~0%
Latency	~20-50 µs	~10-20 µs	~5 µs
Maximum Rules	N/A	N/A	Thousands to millions

SmartNIC Investment

For high-performance cloud and NFV workloads, SmartNICs with flow offload capability are increasingly essential. A $500-1500 SmartNIC provides performance equivalent to 4-8 additional CPU cores that would cost much more and consume more power. Consider SmartNICs for any deployment expecting 25Gbps+ throughput per server.

Troubleshooting VXLAN

VXLAN troubleshooting requires systematic approach, working from physical connectivity up through the overlay stack. Here's a comprehensive troubleshooting methodology.

Troubleshooting Layers

Layer 1: Underlay Connectivity

Verify IP reachability between VTEP IP addresses
Check for MTU issues (ICMP fragmentation needed errors)
Ensure UDP port 4789 is not blocked by firewalls

Layer 2: VXLAN Tunnel Status

Verify tunnel interface is up
Check for encapsulation/decapsulation errors
Validate VNI configuration consistency

Layer 3: Control Plane

Verify MAC learning (FDB entries exist)
Check EVPN route exchange (if applicable)
Validate ARP suppression cache

Layer 4: Application Connectivity

Test end-to-end VM connectivity
Check for MTU-related application failures
Verify security group/ACL rules

vxlan-troubleshoot.sh
Troubleshooting Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#!/bin/bash
# VXLAN Troubleshooting Guide
 
# ==== STEP 1: Underlay Connectivity ====
# Verify VTEP-to-VTEP IP reachability
ping -c 3 192.168.2.20  # Remote VTEP IP
 
# Check for MTU issues (send large ICMP with DF bit)
ping -c 3 -s 1450 -M do 192.168.2.20
# If this fails but small ping works: MTU problem
 
# Verify UDP 4789 is reachable
nc -vuz 192.168.2.20 4789
 
# ==== STEP 2: VXLAN Tunnel Status ====
# Open vSwitch: Check tunnel state
ovs-vsctl show | grep -A5 vxlan
ovs-ofctl dump-ports br-int vxlan0
 
# Linux native: Check VXLAN interface
ip -d link show vxlan5000
ip -s link show vxlan5000  # Statistics
 
# ==== STEP 3: FDB/MAC Learning ====
# OVS: Display MAC table
ovs-appctl fdb/show br-int
 
# Linux bridge: Display FDB
bridge fdb show dev vxlan5000
 
# Check if specific MAC is learned
bridge fdb show | grep "aa:bb:cc:dd:ee:ff"
 
# ==== STEP 4: Packet Capture ====
# Capture VXLAN encapsulated traffic on underlay
tcpdump -i eth0 -nn "udp port 4789" -c 10
 
# Capture and decode VXLAN (requires tcpdump 4.9+)
tcpdump -i eth0 -nn "port 4789" -c 5 -vv
 
# Capture inner frames (on VXLAN interface)
tcpdump -i vxlan5000 -nn
 
# Use tshark for detailed VXLAN decode
tshark -i eth0 -f "udp port 4789" \
    -T fields -e vxlan.vni -e eth.src -e eth.dst -e ip.src -e ip.dst
 
# ==== STEP 5: Common Issues ====
# MTU too small: Symptoms are TCP works, large UDP fails
# Solution: Increase underlay MTU or decrease overlay MTU
 
# VNI mismatch: VMs can't communicate
# Check: ovs-vsctl get interface vxlan0 options | grep key
 
# VTEP IP mismatch: Tunnel shows up but no traffic
# Verify local_ip matches source IP in tcpdump
 
# Firewall blocking: No encapsulated traffic seen
# Check: iptables -L -n | grep 4789; firewall-cmd --list-ports

Common VXLAN Issues and Solutions

•MTU black hole: TCP works (MSS clamping), large UDP packets silently dropped → Increase underlay MTU to 1600+ or reduce overlay MTU to 1450
•VNI mismatch: Packets encapsulated but never decapsulated → Verify VNI configuration matches on all VTEPs
•VTEP IP misconfiguration: Local IP doesn't match expected source → Verify local_ip matches actual egress interface
•Firewall blocking UDP 4789: No encapsulated packets reach destination → Open UDP 4789 inbound on all VTEPs
•FDB not populated: Traffic floods, no unicast → Check control plane (multicast group, controller, EVPN peering)
•Asymmetric routing: Traffic works one way → Verify FDB/routing tables consistent in both directions
•ECMP hashing issues: All traffic takes one path → Verify outer UDP source port entropy; check underlay ECMP config

VXLAN Security Considerations

VXLAN as specified in RFC 7348 provides no inherent security—frames are encapsulated in plain UDP without encryption or authentication. Understanding these security implications is critical for production deployments.

Security Concerns

1. No Encryption VXLAN traffic is transmitted in cleartext. Anyone with access to the underlay network (physical switch port mirror, compromised router) can:

See all encapsulated frame contents
Extract sensitive data (credentials, personal information)
Monitor VM-to-VM communication patterns

2. No Authentication VTEPs don't authenticate each other. A rogue VTEP can:

Inject frames into any VNI (VNI is just a number in the header)
Impersonate legitimate VMs by sending frames with spoofed source MACs
Flood VNIs with spurious traffic (DoS)

3. VNI Space Vulnerability VNIs are not a security boundary—they're a logical segmentation mechanism. An attacker who can send VXLAN packets can target any VNI simply by changing the VNI field.

Mitigation Strategies

Underlay Segmentation

Run VXLAN only on dedicated underlay networks isolated from untrusted traffic
Use infrastructure VLANs accessible only to trusted hosts
Implement strict access control on physical switch ports

IPsec/WireGuard Encapsulation For highly sensitive traffic, encrypt VXLAN at the underlay:

VM → VTEP → IPsec Tunnel → Underlay → IPsec Tunnel → VTEP → VM

This adds encryption and authentication but increases overhead and complexity.

MACsec (802.1AE) Hardware-based encryption at Layer 2, providing line-rate encryption between adjacent devices. Effective for securing underlay links.

Firewall and Security Groups Even if VXLAN itself isn't secure, apply rigorous firewall rules:

Drop VXLAN traffic from unexpected sources (non-VTEP IPs)
Implement VM-level security groups regardless of VNI isolation
Use network IDS/IPS to detect anomalous VXLAN patterns

VXLAN Security Mitigation Options
Threat	Mitigation	Overhead	Complexity
Eavesdropping	IPsec/WireGuard encryption	~15-20%	High
Eavesdropping	MACsec on underlay links	~0% (hardware)	Medium
Rogue VTEP injection	Firewall whitelist VTEP IPs	~0%	Low
VNI hopping	Per-VM security groups	~5%	Medium
Traffic analysis	Traffic shaping/padding	Variable	High
DoS attacks	Rate limiting per source	~0%	Low

VNI ≠ Security Boundary

Never rely on VNI isolation as your only security control. VNIs provide logical separation, not cryptographic isolation. Any workload with access to the underlay network could potentially inject packets into any VNI. Always combine VNI segmentation with proper firewall rules and, for sensitive workloads, encryption.

Summary: Mastering VXLAN

VXLAN has become the universal language of overlay networking—the protocol that enables massive-scale network virtualization across datacenter and cloud environments. Its combination of simplicity, scalability, and broad industry support has made it the dominant choice for production deployments.

Key Takeaways

•VXLAN encapsulates L2 in UDP — 8-byte VXLAN header inside UDP allows Layer 2 frames to traverse Layer 3 networks
•24-bit VNI provides 16M+ networks — Massive improvement over VLAN's 4,094 limit, enabling true cloud-scale multi-tenancy
•UDP source port provides ECMP entropy —Inner flow hash distributed as source port enables underlay load balancing
•Control plane options differ in scalability — Multicast for simple, ingress replication for small, BGP EVPN for production scale
•BGP EVPN is the gold standard — Eliminates flooding, enables ARP suppression, provides multi-vendor interoperability
•Hardware offload is essential for performance — SmartNICs with flow offload achieve line-rate forwarding
•MTU planning is critical — 50-byte overhead requires either jumbo underlay or reduced overlay MTU
•Security requires additional measures — VXLAN has no encryption; use IPsec, MACsec, or strict underlay segmentation for sensitive workloads

VXLAN Expertise Achieved

You now possess comprehensive knowledge of VXLAN—from packet format through control plane options to production deployment and troubleshooting. This knowledge is directly applicable to OpenStack, VMware NSX, Kubernetes networking, and any modern cloud infrastructure.

Next Up: We'll explore Network Segmentation—the design patterns and technologies for dividing networks into isolated security and administrative domains, using the overlay and virtual switching capabilities we've covered to implement robust multi-tier architectures.

VXLAN: The Dominant Overlay Protocol

VXLAN: Scaling Network Virtualization

VXLAN's dominance stems from several key advantages:

Simplicity: Straightforward UDP encapsulation without complex state machines
Hardware offload: Widely supported by NIC and switch ASICs for line-rate performance
ECMP compatibility: UDP encapsulation enables seamless ECMP load balancing
24-bit VNI space: Over 16 million isolated networks vs. VLAN's 4,094
Industry adoption: Universal support from VMware, Microsoft, Linux, all major switch vendors

What You Will Learn

VXLAN Packet Format and Headers

Complete VXLAN Packet Structure

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Outer Ethernet Header (14 bytes)             |
|      Destination MAC      |       Source MAC      | Type      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Outer IP Header (20 bytes)                 |
| Ver | IHL | ToS |     Total Length    | Identification        |
| Flags | Fragment Offset | TTL | Proto=17 | Header Checksum    |
|                    Source IP Address                          |
|                  Destination IP Address                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Outer UDP Header (8 bytes)                 |
|       Source Port (hash)      |   Dest Port = 4789            |
|         UDP Length            |      UDP Checksum             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    VXLAN Header (8 bytes)                     |
|R|R|R|R|I|R|R|R|         Reserved (24 bits)                    |
|               VXLAN Network Identifier (VNI) 24 bits          | Reserved
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Inner Ethernet Frame                         |
|  (Original L2 frame with Dest MAC, Src MAC, Type, Payload)    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Header Field Details

Outer Ethernet Header (14 bytes)

Standard Ethernet header with underlay MACs
Destination MAC: Next-hop router MAC (not the remote VTEP)
Source MAC: Local NIC MAC
EtherType: 0x0800 (IPv4) or 0x86DD (IPv6)

Outer IP Header (20 bytes for IPv4)

Source IP: Local VTEP IP address
Destination IP: Remote VTEP IP address
Protocol: 17 (UDP)
TTL: Typically 64; should be sufficient for underlay path

Outer UDP Header (8 bytes)

Source Port: Hash of inner frame fields (provides ECMP entropy)
Destination Port: 4789 (IANA-assigned VXLAN port)
Checksum: Often set to 0 for performance (valid per RFC 768)

VXLAN Header (8 bytes)

Flags (8 bits): Bit 4 (I flag) must be 1 to indicate valid VNI
Reserved: 24 bits, must be 0
VNI (24 bits): VXLAN Network Identifier (0 to 16,777,215)
Reserved: 8 bits, must be 0

VXLAN Header Flags
Bit Position	Name	Value	Meaning
Bit 0-3	Reserved	0	Must be zero on transmit, ignored on receive
Bit 4	I (VNI Flag)	1	VNI is valid; MUST be set to 1
Bit 5-7	Reserved	0	Must be zero on transmit, ignored on receive

vxlan-header-struct.c
C Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/* VXLAN header as defined in RFC 7348 */
struct vxlanhdr {
    __be32 vx_flags;    /* Flags (I flag at bit 27, network byte order) */
    __be32 vx_vni;      /* VNI in upper 24 bits, reserved in lower 8 */
};
 
/* VNI extraction from VXLAN header */
#define VXLAN_VNI_MASK     0xffffff00
#define VXLAN_VNI_SHIFT    8
#define VXLAN_FLAGS_I      0x08000000  /* I flag position */
 
static inline __u32 vxlan_vni(struct vxlanhdr *vxh)
{
    return (ntohl(vxh->vx_vni) & VXLAN_VNI_MASK) >> VXLAN_VNI_SHIFT;
}
 
static inline bool vxlan_valid(struct vxlanhdr *vxh)
{
    return (ntohl(vxh->vx_flags) & VXLAN_FLAGS_I) != 0;
}
 
/* Example: VXLAN encapsulation structure
 * Total overhead: 50 bytes (14 + 20 + 8 + 8)
 * 
 * +----------------+
 * | Outer Eth (14) |
 * +----------------+
 * | Outer IP  (20) |
 * +----------------+
 * | Outer UDP (8)  |
 * +----------------+
 * | VXLAN Hdr (8)  |
 * +----------------+
 * | Inner Eth      |
 * | Frame          |
 * | (variable)     |
 * +----------------+
 */

UDP Source Port Entropy

VNI and Network Segmentation

VNI Address Space

With 24 bits, VNIs range from 0 to 16,777,215:

VNI 0 is typically reserved/unused
VNIs 1-4095 may be mapped 1:1 to VLANs for simplified management
VNIs 4096+ are available for overlay-only networks
Maximum usable VNIs: 16,777,214

This represents a 4,096x increase over VLAN capacity, enabling true multi-tenant cloud scale.

VNI-to-VLAN Mapping

In hybrid environments with both overlay networks and traditional VLANs, VNIs are often mapped to VLANs at the network edge:

Ingress (Physical to Overlay):

Frame arrives on physical port with VLAN tag 100
VTEP strips VLAN tag, encapsulates with VNI 100
Frame enters overlay network in VNI 100

Egress (Overlay to Physical):

VXLAN frame with VNI 100 arrives at gateway VTEP
VTEP decapsulates, adds VLAN tag 100
Frame sent to physical network on VLAN 100

This mapping is configurable—VNI 5000 could map to VLAN 100 if needed.

VNI Isolation Properties

Complete Layer 2 Isolation: VMs in different VNIs cannot communicate at Layer 2, even if they have MAC address collisions. Each VNI is a completely independent Ethernet domain.

Overlapping IP Spaces: Different VNIs can use identical IP subnets. Tenant A's 10.0.0.0/24 in VNI 1000 is completely isolated from Tenant B's 10.0.0.0/24 in VNI 2000.

Independent Policies: Security policies, QoS, and forwarding rules can differ per VNI.

VNI Allocation Strategy Examples
VNI Range	Purpose	Mapping	Example Use
1-1000	Infrastructure networks	1:1 with management VLANs	Management, storage, vMotion
1001-4000	Internal tenant networks	Mapped to customer VLANs	Enterprise workloads
10000-19999	Production tenant networks	Dynamic allocation	Cloud tenant isolation
20000-29999	Development/Test	Dynamic allocation	Non-production workloads
100000+	Service provider scale	Multi-DC with location prefix	Large cloud deployments

VNI Design Considerations

VXLAN Control Plane Options

Option 1: Multicast-Based Control Plane

The original VXLAN approach uses IP multicast for BUM (Broadcast, Unknown unicast, Multicast) traffic:

Configuration:

Each VNI is mapped to an IP multicast group (e.g., VNI 5000 → 239.1.1.5)
VTEPs join the multicast group for VNIs they participate in
Underlay routers perform multicast routing (PIM-SM typically)

BUM Traffic Handling:

VTEP sends BUM traffic to the VNI's multicast group
Underlay multicast infrastructure delivers to all member VTEPs
Recipient VTEPs learn source MAC → source VTEP from flooded frames

Advantages:

Efficient multicast delivery (underlay replicates, not source VTEP)
Follows traditional Ethernet learning semantics

Disadvantages:

Requires multicast-capable underlay (not all networks support it)
Multicast adds operational complexity
Multicast groups consume TCAM resources on switches

Option 2: Ingress Replication (Head-End Replication)

When multicast isn't available, the source VTEP can unicast BUM traffic to each remote VTEP:

Configuration:

Each VTEP maintains a list of peer VTEPs per VNI
List is statically configured or provided by a controller

BUM Traffic Handling:

VTEP creates copies of BUM frame for each peer
Sends unicast encapsulated frame to each peer VTEP
N copies for N peer VTEPs

Advantages:

Works on any IP network (no multicast needed)
Simpler underlay requirements

Disadvantages:

N×M replication load on source VTEP for large VNIs
Increased underlay bandwidth consumption
Scales poorly beyond a few hundred VTEPs per VNI

Option 3: BGP EVPN Control Plane

The gold standard for production VXLAN deployments is EVPN (Ethernet VPN) with BGP as the control plane:

How It Works:

VTEPs peer with BGP route reflectors (or each other in small deployments)
When a VM is learned locally, VTEP advertises EVPN Type-2 route:
- MAC address
- Optionally: IP address (for ARP suppression)
- VNI
- Next-hop (VTEP IP)
Remote VTEPs install the route in their forwarding tables
BUM traffic uses selective ingress replication or EVPN multicast

EVPN Route Types for VXLAN:

Type-2 (MAC/IP Advertisement): Advertises VM MAC and IP
Type-3 (Inclusive Multicast Ethernet Tag): Joins VTEP to VNI flood domain
Type-5 (IP Prefix): Advertises routed IP prefixes (inter-VNI routing)

Advantages:

No flooding for known unicast: Eliminates broadcast storms from MAC learning
ARP suppression: EVPN Type-2 routes with IP enable local ARP response
Multi-vendor interoperability: IETF standard (RFC 8365)
Scales massively: BGP route reflector hierarchy handles millions of routes
Integrated routing: Same control plane for L2 and L3

VXLAN Control Plane Comparison
Aspect	Multicast	Ingress Replication	BGP EVPN
Underlay Requirement	Multicast-enabled	Unicast IP only	BGP infrastructure
BUM Traffic Handling	Efficient multicast	N×M unicast copies	Selective replication
MAC Learning	Data plane (flood/learn)	Data plane (flood/learn)	Control plane (no flood)
ARP Suppression	Not supported	Not supported	Native support (Type-2)
Scalability	Medium (multicast limits)	Low (replication load)	Very High (BGP scales)
Operational Complexity	Medium (multicast ops)	Low (static lists)	High (BGP operations)
Multi-Vendor	Good	Good	Excellent (IETF standard)
Recommended For	Legacy/simple	Small deployments	Production at scale

Production Recommendation

VXLAN with BGP EVPN Deep Dive

BGP EVPN provides a sophisticated, standards-based control plane for VXLAN. Let's examine how EVPN routes enable efficient VXLAN operation.

EVPN Route Type 2: MAC/IP Advertisement

This is the workhorse route type for VXLAN. When a VM powers on or is detected by a VTEP:

Route Type 2 - MAC/IP Advertisement
├── Route Distinguisher: 192.168.1.10:100 (VTEP + VNI combo)
├── Ethernet Segment Identifier: 0 (single-homed)
├── Ethernet Tag ID: 0
├── MAC Address Length: 48
├── MAC Address: 00:50:56:01:02:03
├── IP Address Length: 32 (or 0 if MAC-only)
├── IP Address: 10.0.1.50
├── Extended Communities:
│   ├── Route Target: 64512:5000 (imported by VNI 5000 members)
│   └── Encapsulation: VXLAN
└── Next Hop: 192.168.1.10 (VTEP IP, used as tunnel destination)

The Route Target (RT) determines which VTEPs import the route. VTEPs configured for VNI 5000 import routes with RT 64512:5000.

EVPN Route Type 3: Inclusive Multicast Ethernet Tag (IMET)

Type-3 routes allow VTEPs to discover each other for BUM traffic replication:

Route Type 3 - Inclusive Multicast
├── Route Distinguisher: 192.168.1.10:100
├── Ethernet Tag ID: 0
├── IP Address Length: 32
├── Originating Router's IP: 192.168.1.10 (VTEP IP)
├── Extended Communities:
│   ├── Route Target: 64512:5000
│   └── PMSI Tunnel Attribute:
│       ├── Tunnel Type: Ingress Replication
│       └── Tunnel Endpoint: 192.168.1.10
└── Importing VTEPs add this VTEP to VNI 5000 flood list

When a VTEP needs to flood BUM traffic, it sends copies to all VTEPs that advertised Type-3 routes for that VNI.

EVPN Route Type 5: IP Prefix Route

Type-5 routes enable inter-VNI (inter-subnet) routing at the VTEP:

Route Type 5 - IP Prefix
├── Route Distinguisher: 192.168.1.10:100
├── Ethernet Tag ID: 0
├── IP Prefix Length: 24
├── IP Prefix: 10.0.1.0/24
├── Gateway IP: 0.0.0.0 (or specific gateway)
├── MPLS Label: VNI for L3 routing (L3VNI)
├── Extended Communities:
│   ├── Route Target: 64512:3000 (L3 VRF RT)
│   └── Router MAC: 00:00:00:11:22:33 (for inter-subnet routing)
└── Enables distributed routing between subnets

Converting Mermaid diagram...

EVPN Symmetric IRB

Configuring VXLAN

Let's examine practical VXLAN configuration on common platforms. We'll cover Open vSwitch, Linux native bridge, and summarize switch configurations.

Open vSwitch VXLAN Configuration

OVS is the most common software VTEP, used extensively in OpenStack, Kubernetes, and other cloud platforms.

ovs-vxlan-config.sh
OVS Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash
# Open vSwitch VXLAN Configuration Example
 
# Create integration bridge (main switching bridge)
ovs-vsctl add-br br-int
 
# Add VXLAN tunnel port with static remote VTEP
ovs-vsctl add-port br-int vxlan0 -- \
    set interface vxlan0 type=vxlan \
    options:local_ip=192.168.1.10 \
    options:remote_ip=192.168.2.20 \
    options:key=5000 \
    options:dst_port=4789
 
# View tunnel port
ovs-vsctl show
# Output:
# Bridge br-int
#     Port vxlan0
#         Interface vxlan0
#             type: vxlan
#             options: {dst_port="4789", key="5000", 
#                       local_ip="192.168.1.10", remote_ip="192.168.2.20"}
 
# ALTERNATIVE: VXLAN port with flow-based VNI (multiple VNIs per tunnel)
ovs-vsctl add-port br-int vxlan-tun -- \
    set interface vxlan-tun type=vxlan \
    options:local_ip=192.168.1.10 \
    options:remote_ip=flow \
    options:key=flow \
    options:dst_port=4789
 
# OpenFlow rules to set VNI per traffic flow
ovs-ofctl add-flow br-int \
    "table=0,in_port=LOCAL,dl_dst=aa:bb:cc:dd:ee:ff,\
     actions=set_field:5000->tun_id,output:vxlan-tun"
 
# View VXLAN tunnel statistics
ovs-ofctl dump-ports br-int vxlan0
 
# Display FDB entries (MAC-to-tunnel mappings)
ovs-appctl fdb/show br-int

Linux Native VXLAN Configuration

Linux kernel includes native VXLAN support via the ip command, useful for lightweight deployments without OVS.

linux-vxlan-config.sh
Linux iproute2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#!/bin/bash
# Linux Native VXLAN Configuration
 
# Create VXLAN interface
ip link add vxlan5000 type vxlan \
    id 5000 \
    local 192.168.1.10 \
    dstport 4789 \
    learning \
    noarp
 
# For multicast-based control plane:
ip link add vxlan5000 type vxlan \
    id 5000 \
    local 192.168.1.10 \
    group 239.1.1.5 \
    dev eth0 \
    dstport 4789
 
# For ingress replication (static remote VTEPs):
ip link add vxlan5000 type vxlan \
    id 5000 \
    local 192.168.1.10 \
    dstport 4789 \
    nolearning
 
# Add FDB entries for remote VTEPs
bridge fdb append 00:00:00:00:00:00 dev vxlan5000 dst 192.168.2.20
bridge fdb append 00:00:00:00:00:00 dev vxlan5000 dst 192.168.3.30
 
# Add specific MAC-to-VTEP mappings
bridge fdb add aa:bb:cc:dd:ee:ff dev vxlan5000 dst 192.168.2.20
 
# Bring interface up
ip link set vxlan5000 up
 
# Add to bridge for switching with other local VMs
ip link add br-tenant type bridge
ip link set vxlan5000 master br-tenant
ip link set br-tenant up
 
# Verify configuration
ip -d link show vxlan5000
# Output:
# 5: vxlan5000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue 
#    master br-tenant state UNKNOWN mode DEFAULT group default qlen 1000
#    link/ether 7e:5b:3c:4d:2a:1f brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535
#    vxlan id 5000 local 192.168.1.10 srcport 0 0 dstport 4789 ...
 
# View FDB entries
bridge fdb show dev vxlan5000

MTU Configuration Critical

VXLAN Hardware Offload

Offload Capabilities

Stateless Offloads (widely supported)

TX Offloads (transmit path):

TSO (TCP Segmentation Offload): NIC segments large TCP payloads, adding VXLAN headers to each segment
TX Checksum Offload: NIC calculates inner and outer checksums
VXLAN Encapsulation: Some NICs can add VXLAN headers in hardware

RX Offloads (receive path):

LRO/GRO (Large Receive Offload): NIC aggregates multiple VXLAN packets into large payloads
RX Checksum Verification: NIC validates inner and outer checksums
VXLAN Decapsulation: NIC strips VXLAN headers, presents inner frame to host
RSS (Receive Side Scaling): NIC distributes flows across multiple CPU queues based on inner flow hash

Stateful/Flow Offloads (SmartNICs)

Advanced NICs (NVIDIA Mellanox ConnectX, Intel IPU, Broadcom Stingray) support TC flower offload or OVS-DPDK offload:

NIC ASIC stores flow tables (match patterns + actions)
Matching packets are processed entirely in NIC, never reaching host CPU
Can achieve line-rate forwarding (100Gbps+) on commodity servers

vxlan-offload-check.sh
Linux Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Check VXLAN offload capabilities on NIC
ethtool -k eth0 | grep -i vxlan
# Output:
# tx-udp_tnl-segmentation: on
# tx-udp_tnl-csum-segmentation: on
# rx-vxlan-port-offload: on
 
# Verify offload is enabled
ethtool --show-offload eth0
 
# Add VXLAN port for hardware awareness (tells NIC to expect VXLAN on port 4789)
ethtool --add-rx-flow-type eth0 vxlan 4789
 
# Check hardware flow offload capability
ethtool -k eth0 | grep hw-tc-offload
# hw-tc-offload: on  (flow offload available)
 
# Enable TC flower hardware offload for OVS
ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
systemctl restart openvswitch
 
# Verify flows are offloaded
tc filter show dev eth0 ingress
# Offloaded flows show: in_hw in_hw_count 1
 
# Check offload statistics
ethtool -S eth0 | grep vxlan
# Shows VXLAN-specific counters if NIC supports them

VXLAN Hardware Offload Performance Impact
Scenario	Software Only	Stateless Offload	Full Flow Offload
Throughput (64-byte packets)	~5 Mpps	~10 Mpps	~100+ Mpps
Throughput (1400-byte packets)	~40 Gbps	~100 Gbps	Line Rate
CPU Usage (per 10Gbps)	~200%	~50%	~0%
Latency	~20-50 µs	~10-20 µs	~5 µs
Maximum Rules	N/A	N/A	Thousands to millions

SmartNIC Investment

Troubleshooting VXLAN

VXLAN troubleshooting requires systematic approach, working from physical connectivity up through the overlay stack. Here's a comprehensive troubleshooting methodology.

Troubleshooting Layers

Layer 1: Underlay Connectivity

Verify IP reachability between VTEP IP addresses
Check for MTU issues (ICMP fragmentation needed errors)
Ensure UDP port 4789 is not blocked by firewalls

Layer 2: VXLAN Tunnel Status

Verify tunnel interface is up
Check for encapsulation/decapsulation errors
Validate VNI configuration consistency

Layer 3: Control Plane

Verify MAC learning (FDB entries exist)
Check EVPN route exchange (if applicable)
Validate ARP suppression cache

Layer 4: Application Connectivity

Test end-to-end VM connectivity
Check for MTU-related application failures
Verify security group/ACL rules

vxlan-troubleshoot.sh
Troubleshooting Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#!/bin/bash
# VXLAN Troubleshooting Guide
 
# ==== STEP 1: Underlay Connectivity ====
# Verify VTEP-to-VTEP IP reachability
ping -c 3 192.168.2.20  # Remote VTEP IP
 
# Check for MTU issues (send large ICMP with DF bit)
ping -c 3 -s 1450 -M do 192.168.2.20
# If this fails but small ping works: MTU problem
 
# Verify UDP 4789 is reachable
nc -vuz 192.168.2.20 4789
 
# ==== STEP 2: VXLAN Tunnel Status ====
# Open vSwitch: Check tunnel state
ovs-vsctl show | grep -A5 vxlan
ovs-ofctl dump-ports br-int vxlan0
 
# Linux native: Check VXLAN interface
ip -d link show vxlan5000
ip -s link show vxlan5000  # Statistics
 
# ==== STEP 3: FDB/MAC Learning ====
# OVS: Display MAC table
ovs-appctl fdb/show br-int
 
# Linux bridge: Display FDB
bridge fdb show dev vxlan5000
 
# Check if specific MAC is learned
bridge fdb show | grep "aa:bb:cc:dd:ee:ff"
 
# ==== STEP 4: Packet Capture ====
# Capture VXLAN encapsulated traffic on underlay
tcpdump -i eth0 -nn "udp port 4789" -c 10
 
# Capture and decode VXLAN (requires tcpdump 4.9+)
tcpdump -i eth0 -nn "port 4789" -c 5 -vv
 
# Capture inner frames (on VXLAN interface)
tcpdump -i vxlan5000 -nn
 
# Use tshark for detailed VXLAN decode
tshark -i eth0 -f "udp port 4789" \
    -T fields -e vxlan.vni -e eth.src -e eth.dst -e ip.src -e ip.dst
 
# ==== STEP 5: Common Issues ====
# MTU too small: Symptoms are TCP works, large UDP fails
# Solution: Increase underlay MTU or decrease overlay MTU
 
# VNI mismatch: VMs can't communicate
# Check: ovs-vsctl get interface vxlan0 options | grep key
 
# VTEP IP mismatch: Tunnel shows up but no traffic
# Verify local_ip matches source IP in tcpdump
 
# Firewall blocking: No encapsulated traffic seen
# Check: iptables -L -n | grep 4789; firewall-cmd --list-ports

Common VXLAN Issues and Solutions

•MTU black hole: TCP works (MSS clamping), large UDP packets silently dropped → Increase underlay MTU to 1600+ or reduce overlay MTU to 1450
•VNI mismatch: Packets encapsulated but never decapsulated → Verify VNI configuration matches on all VTEPs
•VTEP IP misconfiguration: Local IP doesn't match expected source → Verify local_ip matches actual egress interface
•Firewall blocking UDP 4789: No encapsulated packets reach destination → Open UDP 4789 inbound on all VTEPs
•FDB not populated: Traffic floods, no unicast → Check control plane (multicast group, controller, EVPN peering)
•Asymmetric routing: Traffic works one way → Verify FDB/routing tables consistent in both directions
•ECMP hashing issues: All traffic takes one path → Verify outer UDP source port entropy; check underlay ECMP config

VXLAN Security Considerations

Security Concerns

1. No Encryption VXLAN traffic is transmitted in cleartext. Anyone with access to the underlay network (physical switch port mirror, compromised router) can:

See all encapsulated frame contents
Extract sensitive data (credentials, personal information)
Monitor VM-to-VM communication patterns

2. No Authentication VTEPs don't authenticate each other. A rogue VTEP can:

Inject frames into any VNI (VNI is just a number in the header)
Impersonate legitimate VMs by sending frames with spoofed source MACs
Flood VNIs with spurious traffic (DoS)

3. VNI Space Vulnerability VNIs are not a security boundary—they're a logical segmentation mechanism. An attacker who can send VXLAN packets can target any VNI simply by changing the VNI field.

Mitigation Strategies

Underlay Segmentation

Run VXLAN only on dedicated underlay networks isolated from untrusted traffic
Use infrastructure VLANs accessible only to trusted hosts
Implement strict access control on physical switch ports

IPsec/WireGuard Encapsulation For highly sensitive traffic, encrypt VXLAN at the underlay:

VM → VTEP → IPsec Tunnel → Underlay → IPsec Tunnel → VTEP → VM

This adds encryption and authentication but increases overhead and complexity.

MACsec (802.1AE) Hardware-based encryption at Layer 2, providing line-rate encryption between adjacent devices. Effective for securing underlay links.

Firewall and Security Groups Even if VXLAN itself isn't secure, apply rigorous firewall rules:

Drop VXLAN traffic from unexpected sources (non-VTEP IPs)
Implement VM-level security groups regardless of VNI isolation
Use network IDS/IPS to detect anomalous VXLAN patterns

VXLAN Security Mitigation Options
Threat	Mitigation	Overhead	Complexity
Eavesdropping	IPsec/WireGuard encryption	~15-20%	High
Eavesdropping	MACsec on underlay links	~0% (hardware)	Medium
Rogue VTEP injection	Firewall whitelist VTEP IPs	~0%	Low
VNI hopping	Per-VM security groups	~5%	Medium
Traffic analysis	Traffic shaping/padding	Variable	High
DoS attacks	Rate limiting per source	~0%	Low

VNI ≠ Security Boundary

Summary: Mastering VXLAN

Key Takeaways

•VXLAN encapsulates L2 in UDP — 8-byte VXLAN header inside UDP allows Layer 2 frames to traverse Layer 3 networks
•24-bit VNI provides 16M+ networks — Massive improvement over VLAN's 4,094 limit, enabling true cloud-scale multi-tenancy
•UDP source port provides ECMP entropy —Inner flow hash distributed as source port enables underlay load balancing
•Control plane options differ in scalability — Multicast for simple, ingress replication for small, BGP EVPN for production scale
•BGP EVPN is the gold standard — Eliminates flooding, enables ARP suppression, provides multi-vendor interoperability
•Hardware offload is essential for performance — SmartNICs with flow offload achieve line-rate forwarding
•MTU planning is critical — 50-byte overhead requires either jumbo underlay or reduced overlay MTU
•Security requires additional measures — VXLAN has no encryption; use IPsec, MACsec, or strict underlay segmentation for sensitive workloads

VXLAN Expertise Achieved