Loading content...
Of all overlay encapsulation protocols developed for network virtualization, VXLAN (Virtual eXtensible LAN) has emerged as the undisputed industry standard. Originally developed by VMware, Cisco, and other vendors and standardized in RFC 7348, VXLAN provides the encapsulation mechanism that enables Layer 2 networks to span across Layer 3 infrastructure at massive scale.
VXLAN's dominance stems from several key advantages:
This page will take you through every aspect of VXLAN—from packet format to control plane options to production deployment patterns. By the end, you'll have the complete knowledge needed to design, deploy, and troubleshoot VXLAN-based overlay networks.
By the end of this page, you will master VXLAN packet format and headers, understand control plane options (multicast, ingress replication, EVPN), configure VXLAN on Open vSwitch and Linux bridge, understand hardware offload capabilities, troubleshoot common VXLAN issues, and apply best practices for production deployments.
VXLAN uses UDP encapsulation to carry Layer 2 Ethernet frames across Layer 3 networks. Understanding the exact packet structure is essential for troubleshooting and understanding protocol behavior.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Ethernet Header (14 bytes) |
| Destination MAC | Source MAC | Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer IP Header (20 bytes) |
| Ver | IHL | ToS | Total Length | Identification |
| Flags | Fragment Offset | TTL | Proto=17 | Header Checksum |
| Source IP Address |
| Destination IP Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer UDP Header (8 bytes) |
| Source Port (hash) | Dest Port = 4789 |
| UDP Length | UDP Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Header (8 bytes) |
|R|R|R|R|I|R|R|R| Reserved (24 bits) |
| VXLAN Network Identifier (VNI) 24 bits | Reserved
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Inner Ethernet Frame |
| (Original L2 frame with Dest MAC, Src MAC, Type, Payload) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Outer Ethernet Header (14 bytes)
Outer IP Header (20 bytes for IPv4)
Outer UDP Header (8 bytes)
VXLAN Header (8 bytes)
| Bit Position | Name | Value | Meaning |
|---|---|---|---|
| Bit 0-3 | Reserved | 0 | Must be zero on transmit, ignored on receive |
| Bit 4 | I (VNI Flag) | 1 | VNI is valid; MUST be set to 1 |
| Bit 5-7 | Reserved | 0 | Must be zero on transmit, ignored on receive |
1234567891011121314151617181920212223242526272829303132333435363738
/* VXLAN header as defined in RFC 7348 */struct vxlanhdr { __be32 vx_flags; /* Flags (I flag at bit 27, network byte order) */ __be32 vx_vni; /* VNI in upper 24 bits, reserved in lower 8 */}; /* VNI extraction from VXLAN header */#define VXLAN_VNI_MASK 0xffffff00#define VXLAN_VNI_SHIFT 8#define VXLAN_FLAGS_I 0x08000000 /* I flag position */ static inline __u32 vxlan_vni(struct vxlanhdr *vxh){ return (ntohl(vxh->vx_vni) & VXLAN_VNI_MASK) >> VXLAN_VNI_SHIFT;} static inline bool vxlan_valid(struct vxlanhdr *vxh){ return (ntohl(vxh->vx_flags) & VXLAN_FLAGS_I) != 0;} /* Example: VXLAN encapsulation structure * Total overhead: 50 bytes (14 + 20 + 8 + 8) * * +----------------+ * | Outer Eth (14) | * +----------------+ * | Outer IP (20) | * +----------------+ * | Outer UDP (8) | * +----------------+ * | VXLAN Hdr (8) | * +----------------+ * | Inner Eth | * | Frame | * | (variable) | * +----------------+ */The outer UDP source port is intentionally randomized (typically a hash of the inner 5-tuple) to provide entropy for L3/L4 ECMP hashing in the underlay. This ensures that different flows between the same VTEP pair take different physical paths, achieving good load distribution across parallel links.
The VXLAN Network Identifier (VNI) is the core segmentation mechanism in VXLAN. Each VNI defines an isolated Layer 2 broadcast domain, allowing millions of independent virtual networks to coexist on the same physical infrastructure.
With 24 bits, VNIs range from 0 to 16,777,215:
This represents a 4,096x increase over VLAN capacity, enabling true multi-tenant cloud scale.
In hybrid environments with both overlay networks and traditional VLANs, VNIs are often mapped to VLANs at the network edge:
Ingress (Physical to Overlay):
Egress (Overlay to Physical):
This mapping is configurable—VNI 5000 could map to VLAN 100 if needed.
Complete Layer 2 Isolation: VMs in different VNIs cannot communicate at Layer 2, even if they have MAC address collisions. Each VNI is a completely independent Ethernet domain.
Overlapping IP Spaces: Different VNIs can use identical IP subnets. Tenant A's 10.0.0.0/24 in VNI 1000 is completely isolated from Tenant B's 10.0.0.0/24 in VNI 2000.
Independent Policies: Security policies, QoS, and forwarding rules can differ per VNI.
| VNI Range | Purpose | Mapping | Example Use |
|---|---|---|---|
| 1-1000 | Infrastructure networks | 1:1 with management VLANs | Management, storage, vMotion |
| 1001-4000 | Internal tenant networks | Mapped to customer VLANs | Enterprise workloads |
| 10000-19999 | Production tenant networks | Dynamic allocation | Cloud tenant isolation |
| 20000-29999 | Development/Test | Dynamic allocation | Non-production workloads |
| 100000+ | Service provider scale | Multi-DC with location prefix | Large cloud deployments |
While 16 million VNIs seem limitless, control plane scalability often becomes the practical limit. Each VNI represents a separate broadcast domain that must be managed. Some deployments group related workloads into shared VNIs to reduce control plane overhead, relying on security groups rather than VNI isolation for microsegmentation.
VXLAN as specified in RFC 7348 only defines the data plane—how packets are encapsulated. It leaves the control plane (how VTEPs discover each other and learn VM locations) undefined. This flexibility has led to multiple control plane approaches:
The original VXLAN approach uses IP multicast for BUM (Broadcast, Unknown unicast, Multicast) traffic:
Configuration:
BUM Traffic Handling:
Advantages:
Disadvantages:
When multicast isn't available, the source VTEP can unicast BUM traffic to each remote VTEP:
Configuration:
BUM Traffic Handling:
Advantages:
Disadvantages:
The gold standard for production VXLAN deployments is EVPN (Ethernet VPN) with BGP as the control plane:
How It Works:
EVPN Route Types for VXLAN:
Advantages:
| Aspect | Multicast | Ingress Replication | BGP EVPN |
|---|---|---|---|
| Underlay Requirement | Multicast-enabled | Unicast IP only | BGP infrastructure |
| BUM Traffic Handling | Efficient multicast | N×M unicast copies | Selective replication |
| MAC Learning | Data plane (flood/learn) | Data plane (flood/learn) | Control plane (no flood) |
| ARP Suppression | Not supported | Not supported | Native support (Type-2) |
| Scalability | Medium (multicast limits) | Low (replication load) | Very High (BGP scales) |
| Operational Complexity | Medium (multicast ops) | Low (static lists) | High (BGP operations) |
| Multi-Vendor | Good | Good | Excellent (IETF standard) |
| Recommended For | Legacy/simple | Small deployments | Production at scale |
For any production deployment beyond a single rack or simple lab, BGP EVPN is strongly recommended. The operational efficiency gains from ARP suppression, the elimination of flooding, and the scalability of BGP far outweigh the initial setup complexity. All major network vendors support EVPN-VXLAN.
BGP EVPN provides a sophisticated, standards-based control plane for VXLAN. Let's examine how EVPN routes enable efficient VXLAN operation.
This is the workhorse route type for VXLAN. When a VM powers on or is detected by a VTEP:
Route Type 2 - MAC/IP Advertisement
├── Route Distinguisher: 192.168.1.10:100 (VTEP + VNI combo)
├── Ethernet Segment Identifier: 0 (single-homed)
├── Ethernet Tag ID: 0
├── MAC Address Length: 48
├── MAC Address: 00:50:56:01:02:03
├── IP Address Length: 32 (or 0 if MAC-only)
├── IP Address: 10.0.1.50
├── Extended Communities:
│ ├── Route Target: 64512:5000 (imported by VNI 5000 members)
│ └── Encapsulation: VXLAN
└── Next Hop: 192.168.1.10 (VTEP IP, used as tunnel destination)
The Route Target (RT) determines which VTEPs import the route. VTEPs configured for VNI 5000 import routes with RT 64512:5000.
Type-3 routes allow VTEPs to discover each other for BUM traffic replication:
Route Type 3 - Inclusive Multicast
├── Route Distinguisher: 192.168.1.10:100
├── Ethernet Tag ID: 0
├── IP Address Length: 32
├── Originating Router's IP: 192.168.1.10 (VTEP IP)
├── Extended Communities:
│ ├── Route Target: 64512:5000
│ └── PMSI Tunnel Attribute:
│ ├── Tunnel Type: Ingress Replication
│ └── Tunnel Endpoint: 192.168.1.10
└── Importing VTEPs add this VTEP to VNI 5000 flood list
When a VTEP needs to flood BUM traffic, it sends copies to all VTEPs that advertised Type-3 routes for that VNI.
Type-5 routes enable inter-VNI (inter-subnet) routing at the VTEP:
Route Type 5 - IP Prefix
├── Route Distinguisher: 192.168.1.10:100
├── Ethernet Tag ID: 0
├── IP Prefix Length: 24
├── IP Prefix: 10.0.1.0/24
├── Gateway IP: 0.0.0.0 (or specific gateway)
├── MPLS Label: VNI for L3 routing (L3VNI)
├── Extended Communities:
│ ├── Route Target: 64512:3000 (L3 VRF RT)
│ └── Router MAC: 00:00:00:11:22:33 (for inter-subnet routing)
└── Enables distributed routing between subnets
In Symmetric Integrated Routing and Bridging (IRB), both ingress and egress VTEPs perform routing. Traffic between different subnets uses a shared L3VNI for the VRF, enabling optimal routing without hairpinning through a central gateway. This is the recommended model for EVPN-VXLAN deployments with inter-subnet traffic.
Let's examine practical VXLAN configuration on common platforms. We'll cover Open vSwitch, Linux native bridge, and summarize switch configurations.
OVS is the most common software VTEP, used extensively in OpenStack, Kubernetes, and other cloud platforms.
123456789101112131415161718192021222324252627282930313233343536373839404142
#!/bin/bash# Open vSwitch VXLAN Configuration Example # Create integration bridge (main switching bridge)ovs-vsctl add-br br-int # Add VXLAN tunnel port with static remote VTEPovs-vsctl add-port br-int vxlan0 -- \ set interface vxlan0 type=vxlan \ options:local_ip=192.168.1.10 \ options:remote_ip=192.168.2.20 \ options:key=5000 \ options:dst_port=4789 # View tunnel portovs-vsctl show# Output:# Bridge br-int# Port vxlan0# Interface vxlan0# type: vxlan# options: {dst_port="4789", key="5000", # local_ip="192.168.1.10", remote_ip="192.168.2.20"} # ALTERNATIVE: VXLAN port with flow-based VNI (multiple VNIs per tunnel)ovs-vsctl add-port br-int vxlan-tun -- \ set interface vxlan-tun type=vxlan \ options:local_ip=192.168.1.10 \ options:remote_ip=flow \ options:key=flow \ options:dst_port=4789 # OpenFlow rules to set VNI per traffic flowovs-ofctl add-flow br-int \ "table=0,in_port=LOCAL,dl_dst=aa:bb:cc:dd:ee:ff,\ actions=set_field:5000->tun_id,output:vxlan-tun" # View VXLAN tunnel statisticsovs-ofctl dump-ports br-int vxlan0 # Display FDB entries (MAC-to-tunnel mappings)ovs-appctl fdb/show br-intLinux kernel includes native VXLAN support via the ip command, useful for lightweight deployments without OVS.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
#!/bin/bash# Linux Native VXLAN Configuration # Create VXLAN interfaceip link add vxlan5000 type vxlan \ id 5000 \ local 192.168.1.10 \ dstport 4789 \ learning \ noarp # For multicast-based control plane:ip link add vxlan5000 type vxlan \ id 5000 \ local 192.168.1.10 \ group 239.1.1.5 \ dev eth0 \ dstport 4789 # For ingress replication (static remote VTEPs):ip link add vxlan5000 type vxlan \ id 5000 \ local 192.168.1.10 \ dstport 4789 \ nolearning # Add FDB entries for remote VTEPsbridge fdb append 00:00:00:00:00:00 dev vxlan5000 dst 192.168.2.20bridge fdb append 00:00:00:00:00:00 dev vxlan5000 dst 192.168.3.30 # Add specific MAC-to-VTEP mappingsbridge fdb add aa:bb:cc:dd:ee:ff dev vxlan5000 dst 192.168.2.20 # Bring interface upip link set vxlan5000 up # Add to bridge for switching with other local VMsip link add br-tenant type bridgeip link set vxlan5000 master br-tenantip link set br-tenant up # Verify configurationip -d link show vxlan5000# Output:# 5: vxlan5000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue # master br-tenant state UNKNOWN mode DEFAULT group default qlen 1000# link/ether 7e:5b:3c:4d:2a:1f brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535# vxlan id 5000 local 192.168.1.10 srcport 0 0 dstport 4789 ... # View FDB entriesbridge fdb show dev vxlan5000Notice the VXLAN interface shows mtu 1450—this accounts for 50-byte encapsulation overhead. Ensure VM interfaces or the overlay bridge also use this reduced MTU, or configure the underlay for jumbo frames (9000 bytes recommended) to support full 1500-byte inner MTU.
Software VXLAN processing consumes CPU cycles—encapsulation, decapsulation, and checksum calculations add up at high packet rates. Hardware offload moves these operations to NIC ASICs, dramatically reducing CPU load and increasing throughput.
Stateless Offloads (widely supported)
TX Offloads (transmit path):
RX Offloads (receive path):
Stateful/Flow Offloads (SmartNICs)
Advanced NICs (NVIDIA Mellanox ConnectX, Intel IPU, Broadcom Stingray) support TC flower offload or OVS-DPDK offload:
12345678910111213141516171819202122232425262728
# Check VXLAN offload capabilities on NICethtool -k eth0 | grep -i vxlan# Output:# tx-udp_tnl-segmentation: on# tx-udp_tnl-csum-segmentation: on# rx-vxlan-port-offload: on # Verify offload is enabledethtool --show-offload eth0 # Add VXLAN port for hardware awareness (tells NIC to expect VXLAN on port 4789)ethtool --add-rx-flow-type eth0 vxlan 4789 # Check hardware flow offload capabilityethtool -k eth0 | grep hw-tc-offload# hw-tc-offload: on (flow offload available) # Enable TC flower hardware offload for OVSovs-vsctl set Open_vSwitch . other_config:hw-offload=truesystemctl restart openvswitch # Verify flows are offloadedtc filter show dev eth0 ingress# Offloaded flows show: in_hw in_hw_count 1 # Check offload statisticsethtool -S eth0 | grep vxlan# Shows VXLAN-specific counters if NIC supports them| Scenario | Software Only | Stateless Offload | Full Flow Offload |
|---|---|---|---|
| Throughput (64-byte packets) | ~5 Mpps | ~10 Mpps | ~100+ Mpps |
| Throughput (1400-byte packets) | ~40 Gbps | ~100 Gbps | Line Rate |
| CPU Usage (per 10Gbps) | ~200% | ~50% | ~0% |
| Latency | ~20-50 µs | ~10-20 µs | ~5 µs |
| Maximum Rules | N/A | N/A | Thousands to millions |
For high-performance cloud and NFV workloads, SmartNICs with flow offload capability are increasingly essential. A $500-1500 SmartNIC provides performance equivalent to 4-8 additional CPU cores that would cost much more and consume more power. Consider SmartNICs for any deployment expecting 25Gbps+ throughput per server.
VXLAN troubleshooting requires systematic approach, working from physical connectivity up through the overlay stack. Here's a comprehensive troubleshooting methodology.
Layer 1: Underlay Connectivity
Layer 2: VXLAN Tunnel Status
Layer 3: Control Plane
Layer 4: Application Connectivity
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
#!/bin/bash# VXLAN Troubleshooting Guide # ==== STEP 1: Underlay Connectivity ====# Verify VTEP-to-VTEP IP reachabilityping -c 3 192.168.2.20 # Remote VTEP IP # Check for MTU issues (send large ICMP with DF bit)ping -c 3 -s 1450 -M do 192.168.2.20# If this fails but small ping works: MTU problem # Verify UDP 4789 is reachablenc -vuz 192.168.2.20 4789 # ==== STEP 2: VXLAN Tunnel Status ====# Open vSwitch: Check tunnel stateovs-vsctl show | grep -A5 vxlanovs-ofctl dump-ports br-int vxlan0 # Linux native: Check VXLAN interfaceip -d link show vxlan5000ip -s link show vxlan5000 # Statistics # ==== STEP 3: FDB/MAC Learning ====# OVS: Display MAC tableovs-appctl fdb/show br-int # Linux bridge: Display FDBbridge fdb show dev vxlan5000 # Check if specific MAC is learnedbridge fdb show | grep "aa:bb:cc:dd:ee:ff" # ==== STEP 4: Packet Capture ====# Capture VXLAN encapsulated traffic on underlaytcpdump -i eth0 -nn "udp port 4789" -c 10 # Capture and decode VXLAN (requires tcpdump 4.9+)tcpdump -i eth0 -nn "port 4789" -c 5 -vv # Capture inner frames (on VXLAN interface)tcpdump -i vxlan5000 -nn # Use tshark for detailed VXLAN decodetshark -i eth0 -f "udp port 4789" \ -T fields -e vxlan.vni -e eth.src -e eth.dst -e ip.src -e ip.dst # ==== STEP 5: Common Issues ====# MTU too small: Symptoms are TCP works, large UDP fails# Solution: Increase underlay MTU or decrease overlay MTU # VNI mismatch: VMs can't communicate# Check: ovs-vsctl get interface vxlan0 options | grep key # VTEP IP mismatch: Tunnel shows up but no traffic# Verify local_ip matches source IP in tcpdump # Firewall blocking: No encapsulated traffic seen# Check: iptables -L -n | grep 4789; firewall-cmd --list-portsVXLAN as specified in RFC 7348 provides no inherent security—frames are encapsulated in plain UDP without encryption or authentication. Understanding these security implications is critical for production deployments.
1. No Encryption VXLAN traffic is transmitted in cleartext. Anyone with access to the underlay network (physical switch port mirror, compromised router) can:
2. No Authentication VTEPs don't authenticate each other. A rogue VTEP can:
3. VNI Space Vulnerability VNIs are not a security boundary—they're a logical segmentation mechanism. An attacker who can send VXLAN packets can target any VNI simply by changing the VNI field.
Underlay Segmentation
IPsec/WireGuard Encapsulation For highly sensitive traffic, encrypt VXLAN at the underlay:
VM → VTEP → IPsec Tunnel → Underlay → IPsec Tunnel → VTEP → VM
This adds encryption and authentication but increases overhead and complexity.
MACsec (802.1AE) Hardware-based encryption at Layer 2, providing line-rate encryption between adjacent devices. Effective for securing underlay links.
Firewall and Security Groups Even if VXLAN itself isn't secure, apply rigorous firewall rules:
| Threat | Mitigation | Overhead | Complexity |
|---|---|---|---|
| Eavesdropping | IPsec/WireGuard encryption | ~15-20% | High |
| Eavesdropping | MACsec on underlay links | ~0% (hardware) | Medium |
| Rogue VTEP injection | Firewall whitelist VTEP IPs | ~0% | Low |
| VNI hopping | Per-VM security groups | ~5% | Medium |
| Traffic analysis | Traffic shaping/padding | Variable | High |
| DoS attacks | Rate limiting per source | ~0% | Low |
Never rely on VNI isolation as your only security control. VNIs provide logical separation, not cryptographic isolation. Any workload with access to the underlay network could potentially inject packets into any VNI. Always combine VNI segmentation with proper firewall rules and, for sensitive workloads, encryption.
VXLAN has become the universal language of overlay networking—the protocol that enables massive-scale network virtualization across datacenter and cloud environments. Its combination of simplicity, scalability, and broad industry support has made it the dominant choice for production deployments.
You now possess comprehensive knowledge of VXLAN—from packet format through control plane options to production deployment and troubleshooting. This knowledge is directly applicable to OpenStack, VMware NSX, Kubernetes networking, and any modern cloud infrastructure.
Next Up: We'll explore Network Segmentation—the design patterns and technologies for dividing networks into isolated security and administrative domains, using the overlay and virtual switching capabilities we've covered to implement robust multi-tier architectures.