Computer NetworksContainerized Networking

Containerized Networking

LevelAdvanced

Duration90 mins

TopicContainerized Networking

5 / 5

CNI Plugins Deep Dive

The Network Beneath the Abstraction

Throughout this module, we've discussed container networking concepts—namespaces, virtual Ethernet pairs, Kubernetes' flat network model. But who actually implements these concepts? Who assigns IP addresses to pods, programs routing tables, configures iptables rules?

The answer is CNI plugins. The Container Network Interface (CNI) is a specification that defines how container runtimes interact with network plugins. The runtime tells the plugin: "I need networking for this container." The plugin does whatever is necessary—create interfaces, assign IPs, configure routes—and returns the result.

Understanding CNI plugins is essential because your choice of CNI plugin fundamentally determines your cluster's network behavior, performance characteristics, security capabilities, and operational complexity.

What You Will Learn

By the end of this page, you will understand the CNI specification, the plugin invocation lifecycle, and the architectures of major CNI plugins: Calico (BGP and VXLAN modes), Cilium (eBPF), Flannel (simplicity), AWS VPC CNI (cloud-native), and Weave. You'll know how to choose the right plugin for your environment and troubleshoot CNI issues.

The CNI Specification

The Container Network Interface (CNI) is a specification maintained by the Cloud Native Computing Foundation (CNCF). It defines a minimal, JSON-based interface between container runtimes (like containerd, CRI-O) and network plugins.

Why standardize?

Before CNI, each container runtime had its own networking API. Docker had libnetwork, rkt had its own, etc. CNI provides a common interface so:

Any runtime can use any CNI plugin
Plugins are independent of specific runtimes
Kubernetes doesn't need to know implementation details

CNI Plugin Operations

•ADD — Set up networking for a container. Create veth pair, assign IP, configure routes. Called when a pod is created.
•DEL — Clean up networking for a container. Remove interfaces, release IP. Called when a pod is deleted.
•CHECK — Verify networking is correctly configured. Called periodically or on demand.
•VERSION — Return supported CNI specification versions.

cni-config.json
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  "cniVersion": "1.0.0",
  "name": "my-network",
  "type": "bridge",
  "bridge": "cni0",
  "isDefaultGateway": true,
  "ipMasq": true,
  "ipam": {
    "type": "host-local",
    "subnet": "10.244.0.0/16",
    "routes": [
      { "dst": "0.0.0.0/0" }
    ]
  }
}

CNI invocation flow:

When kubelet needs to create a pod:

kubelet calls the container runtime (containerd)
Runtime creates the pod sandbox (pause container) with a new network namespace
Runtime invokes the CNI plugin as an executable, passing JSON config via stdin
Plugin receives container ID, namespace path, and network config
Plugin performs network setup (interface, IP, routes)
Plugin returns result JSON to runtime
Runtime records pod's IP address for kubelet

cni-invocation.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# CNI plugin invocation (what the runtime does internally)
 
# Environment variables passed to plugin
export CNI_COMMAND=ADD
export CNI_CONTAINERID=abc123def456
export CNI_NETNS=/var/run/netns/ns-abc123
export CNI_IFNAME=eth0
export CNI_PATH=/opt/cni/bin
 
# Config passed via stdin
cat /etc/cni/net.d/10-bridge.conf | /opt/cni/bin/bridge
 
# Plugin returns JSON result:
{
  "cniVersion": "1.0.0",
  "interfaces": [
    {
      "name": "eth0",
      "mac": "02:42:ac:11:00:02",
      "sandbox": "/var/run/netns/ns-abc123"
    }
  ],
  "ips": [
    {
      "address": "10.244.1.5/24",
      "gateway": "10.244.1.1"
    }
  ],
  "routes": [
    { "dst": "0.0.0.0/0", "gw": "10.244.1.1" }
  ]
}

IPAM Plugins

IP Address Management (IPAM) is often handled by a separate plugin. The main CNI plugin (e.g., bridge) delegates IP allocation to an IPAM plugin (e.g., host-local, whereabouts). This separation allows mixing and matching networking and IP management strategies.

Calico: BGP-Based Networking

Calico is one of the most popular CNI plugins, known for its performance, scalability, and rich network policy support. Originally designed around BGP (Border Gateway Protocol) for routing, it now also supports VXLAN and eBPF modes.

Architecture:

Calico runs a calico-node DaemonSet on each node containing:

Felix — The main Calico agent. Programs routes, ACLs (iptables/eBPF), manages interfaces.
BIRD — Open-source BGP daemon. Advertises and receives routes.
confd — Config management daemon. Watches datastore, generates BIRD configs.

Converting Mermaid diagram...

Networking modes:

Calico Networking Modes
Mode	How It Works	Pros	Cons
BGP (native routing)	Routes advertised via BGP to network fabric	No encapsulation overhead; line-rate performance	Requires BGP-capable network or route reflectors
VXLAN overlay	Encapsulates pod traffic in VXLAN tunnels	Works on any network; no BGP needed	~10% overhead from encapsulation
IP-in-IP	Encapsulates in IP-in-IP tunnels	Simpler than VXLAN; works across subnets	Not supported by all cloud providers
eBPF dataplane	eBPF replaces iptables for packet processing	Lower latency; better observability	Requires kernel 5.3+; newer

calico-inspection.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# View Calico node status
kubectl get pods -n kube-system -l k8s-app=calico-node
# NAME                READY   STATUS    RESTARTS   AGE
# calico-node-abc12   1/1     Running   0          30d
# calico-node-def34   1/1     Running   0          30d
 
# Check Calico node status (if calicoctl is installed)
calicoctl node status
# Calico process is running.
# IPv4 BGP status
# +---------------+-------------------+-------+----------+
# | PEER ADDRESS  | PEER TYPE         | STATE | SINCE    |
# +---------------+-------------------+-------+----------+
# | 10.0.1.11     | node-to-node mesh | up    | 12:00:00 |
# +---------------+-------------------+-------+----------+
 
# View IP pool configuration
calicoctl get ippool -o yaml
# Shows CIDR, encapsulation mode, NAT settings
 
# View routes on a node (BGP advertised routes appear here)
ip route | grep bird
# 10.244.2.0/24 via 10.0.1.11 dev eth0 proto bird
 
# View Calico network policies
kubectl get networkpolicies.crd.projectcalico.org -A

BGP Route Reflectors

In large clusters, full-mesh BGP (every node peers with every other) doesn't scale. Calico can use route reflectors—dedicated nodes that aggregate routes and reduce peering complexity from O(n²) to O(n).

Cilium: eBPF-Powered Networking

Cilium leverages eBPF (extended Berkeley Packet Filter) to provide high-performance networking, security, and observability. Instead of using iptables for packet processing, Cilium attaches eBPF programs directly to the Linux kernel's networking stack.

Why eBPF?

iptables rules are evaluated linearly—as rules accumulate (thousands in large clusters), performance degrades. eBPF provides:

O(1) lookups — Hash maps instead of linear rule evaluation
Kernel-native — No context switches; packets never leave kernel
Programmable — Custom logic without kernel modifications
Observability — Deep visibility into packet processing

Converting Mermaid diagram...

Cilium architecture:

Cilium Agent — Runs on each node (DaemonSet). Compiles and loads eBPF programs, configures networking.
Cilium Operator — Single cluster-wide deployment. Handles IPAM, garbage collection, Kubernetes API interaction.
Hubble — Observability layer built on Cilium. Provides flow visibility, metrics, service maps.

Key differentiators:

Feature	Traditional (iptables)	Cilium (eBPF)
Policy evaluation	O(n) linear rules	O(1) hash lookups
kube-proxy mode	Can replace entirely	Native eBPF load balancing
L7 policies	Requires proxy	Native HTTP/gRPC awareness
Observability	Conntrack + logging	Hubble flow visibility
Service mesh	Requires sidecars	Optional sidecar-less mode

cilium-inspection.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# View Cilium status
cilium status
# KVStore:                 Ok   Kubernetes
# Kubernetes:              Ok   1.27
# Cluster health:          Ok   2/2 reachable
# BPF:                     Ok   Attached
 
# View endpoint (pod) status
cilium endpoint list
# ENDPOINT   POLICY   IDENTITY   LABELS
# 12345      Enabled  45678      k8s:app=frontend
# 12346      Enabled  45679      k8s:app=backend
 
# View BPF programs attached
sudo bpftool prog show
# Shows all loaded eBPF programs
 
# View Cilium's eBPF maps
cilium bpf lb list
# Shows load balancer map entries
 
cilium bpf policy list
# Shows network policy map entries
 
# Hubble observability
hubble observe --namespace default
# Shows real-time network flows:
# TIMESTAMP  SOURCE   DESTINATION  TYPE    VERDICT  
# 12:00:01   frontend backend:80   L7/HTTP FORWARDED
 
# Generate a service map
hubble observe --output jsonpb | hubble-ui

Replacing kube-proxy

Cilium can completely replace kube-proxy for Service load balancing. This eliminates thousands of iptables rules and provides better performance, especially for clusters with many Services. Enable with --set kubeProxyReplacement=strict during installation.

Flannel: Simplicity First

Flannel is one of the oldest and simplest CNI plugins. Developed by CoreOS, it focuses on one thing: providing a flat overlay network for containers. No Network Policies, no advanced features—just networking that works.

Philosophy:

Flannel aims to be the simplest way to get Kubernetes networking running. For many use cases, especially development environments and smaller clusters, this simplicity is valuable. If you need Network Policies, pair Flannel with Calico (for policy enforcement only).

Flannel Backend Modes
Backend	Transport	Performance	Requirements
VXLAN	UDP encapsulation (port 8472)	Good (some overhead)	None (works everywhere)
host-gw	Direct routing (no encapsulation)	Best	All nodes on same L2 network
WireGuard	Encrypted WireGuard tunnels	Good	WireGuard kernel module
UDP (deprecated)	User-space UDP	Poor	Fallback only

flannel-config.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Flannel ConfigMap (typical kube-flannel.yml)
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-flannel-cfg
  namespace: kube-flannel
data:
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan",
        "VNI": 1,
        "DirectRouting": true
      }
    }
---
# Flannel DaemonSet runs on each node
# Creates:
# - flannel.1 VXLAN interface
# - Routes for other nodes' pod CIDRs
# - Bridge for local pods

flannel-inspection.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# View Flannel pods
kubectl get pods -n kube-flannel
# NAME                    READY   STATUS    RESTARTS   AGE
# kube-flannel-ds-abc12   1/1     Running   0          10d
 
# View VXLAN interface
ip -d link show flannel.1
# flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 ...
#     vxlan id 1 local 10.0.1.10 dev eth0 srcport 0 0 dstport 8472
 
# View bridge (for local pods)
ip link show cni0
# cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> ...
#     master cni0
 
# View routes to other nodes' pod networks
ip route | grep flannel
# 10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
# 10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
 
# View Flannel's VXLAN FDB (MAC-to-VTEP mapping)
bridge fdb show dev flannel.1
# 92:7b:... dst 10.0.1.11 self permanent

No Network Policy Support

Flannel does not implement Kubernetes NetworkPolicy. All pods can communicate freely. For production environments requiring network segmentation, use Calico alongside Flannel (Canal) or choose a different CNI like Calico or Cilium that includes policy support.

Cloud Provider CNI Plugins

Major cloud providers offer CNI plugins that integrate directly with their virtual networking infrastructure. Instead of overlay networks, pods receive IPs from the cloud VPC/VNET, making them directly routable from cloud resources.

Advantages of cloud-native CNIs:

No overlay overhead — Packets go directly through cloud networking
Native security groups — Cloud IAM and firewall rules apply to pods
Direct connectivity — Pods reachable from VMs, Lambda, RDS, etc.
Better observability — VPC flow logs capture pod traffic

AWS VPC CNI assigns pod IPs from your VPC subnets. Each node can attach secondary Elastic Network Interfaces (ENIs), and each ENI can hold multiple IPs assigned to pods.

How it works:

Node requests secondary ENIs from EC2
Each ENI provides additional IPs (varies by instance type)
Pods get IPs from this pool
Traffic routed natively by VPC

Limitations:

IP pool size limited by instance type (t3.medium: ~17 pods, m5.xlarge: ~58 pods)
Subnet exhaustion possible in large clusters
Secondary ENI attachment not instant

aws-vpc-cni.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
# View AWS CNI DaemonSet
kubectl get pods -n kube-system -l k8s-app=aws-node
 
# Check IP capacity per node
kubectl get node -o custom-columns=NAME:.metadata.name,PODS:.status.allocatable.pods
 
# View ENIs attached to node (on the node)
curl http://169.254.169.254/latest/meta-data/network/interfaces/macs/
# Lists MAC addresses of attached ENIs
 
# Common issue: IP exhaustion
# Error: "failed to assign an IP address to container"
# Solution: Use smaller subnets or prefix delegation

Choosing the Right CNI Plugin

Selecting a CNI plugin depends on your environment, requirements, and operational constraints. Here's a decision framework:

CNI Plugin Selection Guide
Scenario	Recommended CNI	Reason
Cloud (AWS EKS)	AWS VPC CNI + Calico (policies)	Native VPC integration; add Calico for policies
Cloud (Azure AKS)	Azure CNI + Cilium	VNET integration + eBPF performance
Cloud (GKE)	GKE Dataplane V2	Native, managed, eBPF-based
On-prem (BGP network)	Calico (BGP mode)	Native routing; integrates with network fabric
On-prem (no BGP)	Calico (VXLAN) or Cilium	Overlay works anywhere
Maximum performance	Cilium (eBPF)	Kernel-native packet processing
Maximum simplicity	Flannel	Minimal config; add Calico if policies needed
Multi-cluster	Cilium Cluster Mesh	Cross-cluster service discovery
Service mesh included	Cilium	Sidecar-less service mesh option

Key Decision Factors

•Network Policy requirements — If you need microsegmentation, rule out Flannel (standalone).
•Cloud vs on-prem — Cloud environments benefit from native CNIs (VPC CNI, Azure CNI).
•Performance sensitivity — High PPS workloads benefit from eBPF (Cilium) or native routing (Calico BGP).
•Kernel version — Cilium requires kernel 4.19+; eBPF features need 5.3+.
•Operations complexity — Flannel is simplest; Calico BGP requires network expertise; Cilium eBPF requires kernel knowledge.
•Observability needs — Cilium/Hubble provides superior network-level observability.

Start Simple, Evolve

For learning or development, start with the default CNI your platform provides. As requirements grow, you can migrate—but plan carefully, as CNI changes often require cluster recreation or careful node-by-node migration.

Troubleshooting CNI Issues

CNI issues typically manifest as pods stuck in ContainerCreating state, failing network connections, or IP address conflicts. Here's a systematic approach to diagnosis:

cni-troubleshooting.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 1. Check pod status for CNI errors
kubectl describe pod <pod-name>
# Look for events like:
# "Failed to create pod sandbox: CNI error"
# "NetworkPlugin cni failed to set up pod network"
 
# 2. Check CNI plugin pods are running
kubectl get pods -n kube-system | grep -E "calico|cilium|flannel|weave"
# Ensure all CNI pods are Running and Ready
 
# 3. View CNI pod logs for errors
kubectl logs -n kube-system daemonset/<cni-daemonset> -c <container>
# Common issues: IPAM exhaustion, connectivity to API server
 
# 4. Check CNI configuration on node
ls /etc/cni/net.d/
# Should contain CNI config files (e.g., 10-calico.conflist)
 
cat /etc/cni/net.d/10-calico.conflist
# Verify configuration is valid
 
# 5. Test CNI binary manually (on node)
export CNI_PATH=/opt/cni/bin
export CNI_CONTAINERID=test123
export CNI_NETNS=/var/run/netns/test
export CNI_IFNAME=eth0
export CNI_COMMAND=VERSION
/opt/cni/bin/calico < /etc/cni/net.d/10-calico.conflist
 
# 6. Check kubelet logs for CNI errors
journalctl -u kubelet | grep -i cni
 
# 7. Common issues:
# - "no IP addresses available": IPAM pool exhausted
# - "network not ready": CNI not installed or misconfigured
# - "failed to connect to calico datastore": etcd/API connectivity

Common CNI Issues and Solutions
Issue	Symptoms	Cause	Solution
IPAM exhaustion	Pods stuck in ContainerCreating	No IPs left in pool	Expand IPAM pool or delete unused IPs
CNI binary not found	Pod creation fails immediately	CNI not installed	Install CNI plugin (check /opt/cni/bin)
Network unreachable	Pods can't reach each other	Routing not configured	Check CNI agent logs; verify routes
MTU mismatch	Large packets drop	Encapsulation overhead not accounted	Reduce pod MTU (typically 1450 for VXLAN)
Firewall blocking	Cross-node traffic fails	Host firewall blocking overlay	Allow VXLAN (UDP 8472), BGP (TCP 179)

Summary: CNI Plugins Mastery

We've covered CNI plugins comprehensively—from the specification to major implementations and troubleshooting.

Key Takeaways

•CNI is the interface between runtime and networking — Plugins implement ADD/DEL/CHECK operations to configure container networking.
•Calico excels at routing and policies — BGP mode for performance, VXLAN for flexibility, comprehensive Network Policy support.
•Cilium provides eBPF-powered networking — Replaces iptables with kernel-native packet processing; superior performance and observability.
•Flannel prioritizes simplicity — Quick setup, minimal configuration; pair with Calico for policies.
•Cloud CNIs integrate with VPC/VNET — Native cloud networking for best performance and integration.
•Choose based on requirements — Consider policies, performance, cloud integration, and operational complexity.
•Troubleshoot systematically — Check pod events, CNI pod logs, config files, and kubelet logs.

Module Complete:

Congratulations! You've completed the Containerized Networking module. You now understand the full stack: from Linux namespaces and veth pairs, through Docker and Kubernetes networking, to Service Meshes and CNI plugins. This knowledge forms the foundation for operating container infrastructure at any scale.

Module Complete

You've mastered containerized networking—from fundamental Linux primitives to production-grade Kubernetes networking, service meshes, and CNI plugins. Whether you're debugging pod connectivity issues or designing multi-cluster architectures, you have the deep understanding needed to succeed.

5 / 5

Loading learning content...

Computer NetworksContainerized Networking

Containerized Networking

LevelAdvanced

Duration90 mins

TopicContainerized Networking

5 / 5

CNI Plugins Deep Dive

The Network Beneath the Abstraction

What You Will Learn

The CNI Specification

Why standardize?

Before CNI, each container runtime had its own networking API. Docker had libnetwork, rkt had its own, etc. CNI provides a common interface so:

Any runtime can use any CNI plugin
Plugins are independent of specific runtimes
Kubernetes doesn't need to know implementation details

CNI Plugin Operations

•ADD — Set up networking for a container. Create veth pair, assign IP, configure routes. Called when a pod is created.
•DEL — Clean up networking for a container. Remove interfaces, release IP. Called when a pod is deleted.
•CHECK — Verify networking is correctly configured. Called periodically or on demand.
•VERSION — Return supported CNI specification versions.

cni-config.json
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  "cniVersion": "1.0.0",
  "name": "my-network",
  "type": "bridge",
  "bridge": "cni0",
  "isDefaultGateway": true,
  "ipMasq": true,
  "ipam": {
    "type": "host-local",
    "subnet": "10.244.0.0/16",
    "routes": [
      { "dst": "0.0.0.0/0" }
    ]
  }
}

CNI invocation flow:

When kubelet needs to create a pod:

kubelet calls the container runtime (containerd)
Runtime creates the pod sandbox (pause container) with a new network namespace
Runtime invokes the CNI plugin as an executable, passing JSON config via stdin
Plugin receives container ID, namespace path, and network config
Plugin performs network setup (interface, IP, routes)
Plugin returns result JSON to runtime
Runtime records pod's IP address for kubelet

cni-invocation.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# CNI plugin invocation (what the runtime does internally)
 
# Environment variables passed to plugin
export CNI_COMMAND=ADD
export CNI_CONTAINERID=abc123def456
export CNI_NETNS=/var/run/netns/ns-abc123
export CNI_IFNAME=eth0
export CNI_PATH=/opt/cni/bin
 
# Config passed via stdin
cat /etc/cni/net.d/10-bridge.conf | /opt/cni/bin/bridge
 
# Plugin returns JSON result:
{
  "cniVersion": "1.0.0",
  "interfaces": [
    {
      "name": "eth0",
      "mac": "02:42:ac:11:00:02",
      "sandbox": "/var/run/netns/ns-abc123"
    }
  ],
  "ips": [
    {
      "address": "10.244.1.5/24",
      "gateway": "10.244.1.1"
    }
  ],
  "routes": [
    { "dst": "0.0.0.0/0", "gw": "10.244.1.1" }
  ]
}

IPAM Plugins

Calico: BGP-Based Networking

Architecture:

Calico runs a calico-node DaemonSet on each node containing:

Felix — The main Calico agent. Programs routes, ACLs (iptables/eBPF), manages interfaces.
BIRD — Open-source BGP daemon. Advertises and receives routes.
confd — Config management daemon. Watches datastore, generates BIRD configs.

Converting Mermaid diagram...

Networking modes:

Calico Networking Modes
Mode	How It Works	Pros	Cons
BGP (native routing)	Routes advertised via BGP to network fabric	No encapsulation overhead; line-rate performance	Requires BGP-capable network or route reflectors
VXLAN overlay	Encapsulates pod traffic in VXLAN tunnels	Works on any network; no BGP needed	~10% overhead from encapsulation
IP-in-IP	Encapsulates in IP-in-IP tunnels	Simpler than VXLAN; works across subnets	Not supported by all cloud providers
eBPF dataplane	eBPF replaces iptables for packet processing	Lower latency; better observability	Requires kernel 5.3+; newer

calico-inspection.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# View Calico node status
kubectl get pods -n kube-system -l k8s-app=calico-node
# NAME                READY   STATUS    RESTARTS   AGE
# calico-node-abc12   1/1     Running   0          30d
# calico-node-def34   1/1     Running   0          30d
 
# Check Calico node status (if calicoctl is installed)
calicoctl node status
# Calico process is running.
# IPv4 BGP status
# +---------------+-------------------+-------+----------+
# | PEER ADDRESS  | PEER TYPE         | STATE | SINCE    |
# +---------------+-------------------+-------+----------+
# | 10.0.1.11     | node-to-node mesh | up    | 12:00:00 |
# +---------------+-------------------+-------+----------+
 
# View IP pool configuration
calicoctl get ippool -o yaml
# Shows CIDR, encapsulation mode, NAT settings
 
# View routes on a node (BGP advertised routes appear here)
ip route | grep bird
# 10.244.2.0/24 via 10.0.1.11 dev eth0 proto bird
 
# View Calico network policies
kubectl get networkpolicies.crd.projectcalico.org -A

BGP Route Reflectors

Cilium: eBPF-Powered Networking

Why eBPF?

iptables rules are evaluated linearly—as rules accumulate (thousands in large clusters), performance degrades. eBPF provides:

O(1) lookups — Hash maps instead of linear rule evaluation
Kernel-native — No context switches; packets never leave kernel
Programmable — Custom logic without kernel modifications
Observability — Deep visibility into packet processing

Converting Mermaid diagram...

Cilium architecture:

Cilium Agent — Runs on each node (DaemonSet). Compiles and loads eBPF programs, configures networking.
Cilium Operator — Single cluster-wide deployment. Handles IPAM, garbage collection, Kubernetes API interaction.
Hubble — Observability layer built on Cilium. Provides flow visibility, metrics, service maps.

Key differentiators:

Feature	Traditional (iptables)	Cilium (eBPF)
Policy evaluation	O(n) linear rules	O(1) hash lookups
kube-proxy mode	Can replace entirely	Native eBPF load balancing
L7 policies	Requires proxy	Native HTTP/gRPC awareness
Observability	Conntrack + logging	Hubble flow visibility
Service mesh	Requires sidecars	Optional sidecar-less mode

cilium-inspection.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# View Cilium status
cilium status
# KVStore:                 Ok   Kubernetes
# Kubernetes:              Ok   1.27
# Cluster health:          Ok   2/2 reachable
# BPF:                     Ok   Attached
 
# View endpoint (pod) status
cilium endpoint list
# ENDPOINT   POLICY   IDENTITY   LABELS
# 12345      Enabled  45678      k8s:app=frontend
# 12346      Enabled  45679      k8s:app=backend
 
# View BPF programs attached
sudo bpftool prog show
# Shows all loaded eBPF programs
 
# View Cilium's eBPF maps
cilium bpf lb list
# Shows load balancer map entries
 
cilium bpf policy list
# Shows network policy map entries
 
# Hubble observability
hubble observe --namespace default
# Shows real-time network flows:
# TIMESTAMP  SOURCE   DESTINATION  TYPE    VERDICT  
# 12:00:01   frontend backend:80   L7/HTTP FORWARDED
 
# Generate a service map
hubble observe --output jsonpb | hubble-ui

Replacing kube-proxy

Flannel: Simplicity First

Philosophy:

Flannel Backend Modes
Backend	Transport	Performance	Requirements
VXLAN	UDP encapsulation (port 8472)	Good (some overhead)	None (works everywhere)
host-gw	Direct routing (no encapsulation)	Best	All nodes on same L2 network
WireGuard	Encrypted WireGuard tunnels	Good	WireGuard kernel module
UDP (deprecated)	User-space UDP	Poor	Fallback only

flannel-config.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Flannel ConfigMap (typical kube-flannel.yml)
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-flannel-cfg
  namespace: kube-flannel
data:
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan",
        "VNI": 1,
        "DirectRouting": true
      }
    }
---
# Flannel DaemonSet runs on each node
# Creates:
# - flannel.1 VXLAN interface
# - Routes for other nodes' pod CIDRs
# - Bridge for local pods

flannel-inspection.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# View Flannel pods
kubectl get pods -n kube-flannel
# NAME                    READY   STATUS    RESTARTS   AGE
# kube-flannel-ds-abc12   1/1     Running   0          10d
 
# View VXLAN interface
ip -d link show flannel.1
# flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 ...
#     vxlan id 1 local 10.0.1.10 dev eth0 srcport 0 0 dstport 8472
 
# View bridge (for local pods)
ip link show cni0
# cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> ...
#     master cni0
 
# View routes to other nodes' pod networks
ip route | grep flannel
# 10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
# 10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
 
# View Flannel's VXLAN FDB (MAC-to-VTEP mapping)
bridge fdb show dev flannel.1
# 92:7b:... dst 10.0.1.11 self permanent

No Network Policy Support

Cloud Provider CNI Plugins

Advantages of cloud-native CNIs:

No overlay overhead — Packets go directly through cloud networking
Native security groups — Cloud IAM and firewall rules apply to pods
Direct connectivity — Pods reachable from VMs, Lambda, RDS, etc.
Better observability — VPC flow logs capture pod traffic

AWS VPC CNI assigns pod IPs from your VPC subnets. Each node can attach secondary Elastic Network Interfaces (ENIs), and each ENI can hold multiple IPs assigned to pods.

How it works:

Node requests secondary ENIs from EC2
Each ENI provides additional IPs (varies by instance type)
Pods get IPs from this pool
Traffic routed natively by VPC

Limitations:

IP pool size limited by instance type (t3.medium: ~17 pods, m5.xlarge: ~58 pods)
Subnet exhaustion possible in large clusters
Secondary ENI attachment not instant

aws-vpc-cni.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
# View AWS CNI DaemonSet
kubectl get pods -n kube-system -l k8s-app=aws-node
 
# Check IP capacity per node
kubectl get node -o custom-columns=NAME:.metadata.name,PODS:.status.allocatable.pods
 
# View ENIs attached to node (on the node)
curl http://169.254.169.254/latest/meta-data/network/interfaces/macs/
# Lists MAC addresses of attached ENIs
 
# Common issue: IP exhaustion
# Error: "failed to assign an IP address to container"
# Solution: Use smaller subnets or prefix delegation

Choosing the Right CNI Plugin

Selecting a CNI plugin depends on your environment, requirements, and operational constraints. Here's a decision framework:

CNI Plugin Selection Guide
Scenario	Recommended CNI	Reason
Cloud (AWS EKS)	AWS VPC CNI + Calico (policies)	Native VPC integration; add Calico for policies
Cloud (Azure AKS)	Azure CNI + Cilium	VNET integration + eBPF performance
Cloud (GKE)	GKE Dataplane V2	Native, managed, eBPF-based
On-prem (BGP network)	Calico (BGP mode)	Native routing; integrates with network fabric
On-prem (no BGP)	Calico (VXLAN) or Cilium	Overlay works anywhere
Maximum performance	Cilium (eBPF)	Kernel-native packet processing
Maximum simplicity	Flannel	Minimal config; add Calico if policies needed
Multi-cluster	Cilium Cluster Mesh	Cross-cluster service discovery
Service mesh included	Cilium	Sidecar-less service mesh option

Key Decision Factors

•Network Policy requirements — If you need microsegmentation, rule out Flannel (standalone).
•Cloud vs on-prem — Cloud environments benefit from native CNIs (VPC CNI, Azure CNI).
•Performance sensitivity — High PPS workloads benefit from eBPF (Cilium) or native routing (Calico BGP).
•Kernel version — Cilium requires kernel 4.19+; eBPF features need 5.3+.
•Operations complexity — Flannel is simplest; Calico BGP requires network expertise; Cilium eBPF requires kernel knowledge.
•Observability needs — Cilium/Hubble provides superior network-level observability.

Start Simple, Evolve

Troubleshooting CNI Issues

CNI issues typically manifest as pods stuck in ContainerCreating state, failing network connections, or IP address conflicts. Here's a systematic approach to diagnosis:

cni-troubleshooting.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 1. Check pod status for CNI errors
kubectl describe pod <pod-name>
# Look for events like:
# "Failed to create pod sandbox: CNI error"
# "NetworkPlugin cni failed to set up pod network"
 
# 2. Check CNI plugin pods are running
kubectl get pods -n kube-system | grep -E "calico|cilium|flannel|weave"
# Ensure all CNI pods are Running and Ready
 
# 3. View CNI pod logs for errors
kubectl logs -n kube-system daemonset/<cni-daemonset> -c <container>
# Common issues: IPAM exhaustion, connectivity to API server
 
# 4. Check CNI configuration on node
ls /etc/cni/net.d/
# Should contain CNI config files (e.g., 10-calico.conflist)
 
cat /etc/cni/net.d/10-calico.conflist
# Verify configuration is valid
 
# 5. Test CNI binary manually (on node)
export CNI_PATH=/opt/cni/bin
export CNI_CONTAINERID=test123
export CNI_NETNS=/var/run/netns/test
export CNI_IFNAME=eth0
export CNI_COMMAND=VERSION
/opt/cni/bin/calico < /etc/cni/net.d/10-calico.conflist
 
# 6. Check kubelet logs for CNI errors
journalctl -u kubelet | grep -i cni
 
# 7. Common issues:
# - "no IP addresses available": IPAM pool exhausted
# - "network not ready": CNI not installed or misconfigured
# - "failed to connect to calico datastore": etcd/API connectivity

Common CNI Issues and Solutions
Issue	Symptoms	Cause	Solution
IPAM exhaustion	Pods stuck in ContainerCreating	No IPs left in pool	Expand IPAM pool or delete unused IPs
CNI binary not found	Pod creation fails immediately	CNI not installed	Install CNI plugin (check /opt/cni/bin)
Network unreachable	Pods can't reach each other	Routing not configured	Check CNI agent logs; verify routes
MTU mismatch	Large packets drop	Encapsulation overhead not accounted	Reduce pod MTU (typically 1450 for VXLAN)
Firewall blocking	Cross-node traffic fails	Host firewall blocking overlay	Allow VXLAN (UDP 8472), BGP (TCP 179)

Summary: CNI Plugins Mastery

We've covered CNI plugins comprehensively—from the specification to major implementations and troubleshooting.

Key Takeaways

•CNI is the interface between runtime and networking — Plugins implement ADD/DEL/CHECK operations to configure container networking.
•Calico excels at routing and policies — BGP mode for performance, VXLAN for flexibility, comprehensive Network Policy support.
•Cilium provides eBPF-powered networking — Replaces iptables with kernel-native packet processing; superior performance and observability.
•Flannel prioritizes simplicity — Quick setup, minimal configuration; pair with Calico for policies.
•Cloud CNIs integrate with VPC/VNET — Native cloud networking for best performance and integration.
•Choose based on requirements — Consider policies, performance, cloud integration, and operational complexity.
•Troubleshoot systematically — Check pod events, CNI pod logs, config files, and kubelet logs.

Module Complete:

Module Complete

5 / 5