Sdn Applications - Learning Module

Loading content...

0/228

Load Balancing: Intelligent Traffic Distribution

Load Balancing Without Appliances

Traditional load balancing requires dedicated hardware appliances—expensive boxes positioned as traffic chokepoints. These appliances see all traffic, make distribution decisions, and often become bottlenecks themselves. Scaling requires buying more boxes. High availability requires pairs of boxes. The appliances become critical infrastructure with their own management overhead.

SDN eliminates the need for dedicated load balancer appliances. With programmable switches and centralized control, load balancing becomes a software function distributed across the network fabric. Every switch participates in traffic distribution. The controller implements load balancing algorithms as applications. Scaling means adding capacity, not buying specialized hardware.

This approach—where commodity switches perform load balancing under software control—represents a fundamental shift from purpose-built appliances to general-purpose, programmable infrastructure. The same switches that forward regular traffic can distribute it intelligently across servers, services, or network paths.

What You Will Learn

By the end of this page, you will understand SDN-based load balancing including server load balancing without appliances, network path load balancing, load balancing algorithms (round robin, weighted, adaptive), Layer 4 vs Layer 7 approaches, and integration with health checking and service discovery.

SDN Load Balancing Architecture

SDN implements load balancing by programming switches to distribute traffic according to controller-defined policies.

Traditional vs SDN Load Balancing

Traditional Hardware Load Balancer:

[Clients] → [Load Balancer Appliance] → [Server Pool]
                    ↓
           - Single device
           - Dedicated hardware
           - ASIC-based decisions
           - Potential bottleneck
           - Expensive HA pairs

SDN Distributed Load Balancing:

[Clients] → [Switch 1] → [Server Pool]
                ↓
           [Switch 2] → [Server Pool]
                ↓
           [Switch N] → [Server Pool]
           
           All switches participate
           Controller defines policy
           Distributed execution
           No single bottleneck

Implementation Approaches

1. Proactive Rule Installation:

Controller pre-installs distribution rules:

Switch receives rules:
  Rule 1: Match(dst=VIP, hash mod 3 = 0) → Forward(server1)
  Rule 2: Match(dst=VIP, hash mod 3 = 1) → Forward(server2)
  Rule 3: Match(dst=VIP, hash mod 3 = 2) → Forward(server3)

Advantages: No per-flow controller involvement, line-rate forwarding Limitations: Static distribution, limited algorithm flexibility

2. Reactive Per-Flow Assignment:

First packet triggers controller decision:

1. New flow arrives at switch (first packet)
2. Switch sends Packet-In to controller
3. Controller applies LB algorithm, selects server
4. Controller installs flow-specific rule
5. Subsequent packets forwarded by switch (no controller)

Advantages: Full algorithm flexibility, per-flow decisions Limitations: Latency on first packet, controller load

3. Hybrid Approach:

Combine proactive rules with reactive refinement:

Proactive rules handle common cases
Reactive decisions for special cases (VIP clients, large flows)
Best of both worlds for most deployments

Converting Mermaid diagram...

OpenFlow Group Tables

OpenFlow group tables are ideal for load balancing. A SELECT group with multiple buckets distributes traffic according to configured weights. The switch performs the selection at line rate without controller involvement, while the controller can update weights dynamically based on server load.

Server Load Balancing

SDN server load balancing distributes client requests across a pool of servers implementing the same service.

Virtual IP (VIP) Implementation

Clients connect to a Virtual IP address; the network distributes connections to actual servers:

Configuration:

Virtual IP: 10.100.1.1:443  (what clients connect to)
Server Pool:
  - 10.1.1.10:443 (weight: 3)
  - 10.1.1.11:443 (weight: 2)
  - 10.1.1.12:443 (weight: 1)

Traffic Flow:

1. Client sends packet to VIP (10.100.1.1)
2. Switch matches VIP, applies LB decision
3. Switch rewrites destination to selected server
4. Server responds (source NAT may be needed depending on topology)
5. Switch rewrites source back to VIP
6. Client sees consistent VIP throughout connection

OpenFlow Group Table Load Balancing

OpenFlow SELECT groups enable efficient switch-based load balancing:

Group ID: 100
Type: SELECT
Buckets:
  Bucket 1 (weight: 50):
    Actions: 
      - Set-Field(ip_dst=10.1.1.10)
      - Output(port=1)
      
  Bucket 2 (weight: 33):
    Actions:
      - Set-Field(ip_dst=10.1.1.11)
      - Output(port=2)
      
  Bucket 3 (weight: 17):
    Actions:
      - Set-Field(ip_dst=10.1.1.12)
      - Output(port=3)

Flow Rule:
  Match: ip_dst=10.100.1.1, tcp_dst=443
  Actions: Group(100)

The switch uses a hash of packet headers to select a bucket, ensuring the same flow always reaches the same server (session persistence).

sdn-server-lb
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
"""
SDN Server Load Balancing
Demonstrates VIP-based load balancing with health checks
"""
 
from dataclasses import dataclass
from typing import Dict, List, Optional
from enum import Enum
import hashlib
import time
 
class ServerState(Enum):
    HEALTHY = "healthy"
    UNHEALTHY = "unhealthy"
    DRAINING = "draining"  # Finishing existing connections
 
@dataclass
class Server:
    ip: str
    port: int
    weight: int = 1
    state: ServerState = ServerState.HEALTHY
    active_connections: int = 0
    last_health_check: float = 0
 
@dataclass
class VirtualService:
    vip: str
    vip_port: int
    name: str
    servers: List[Server]
    algorithm: str = "weighted_round_robin"
    health_check_interval: int = 5  # seconds
    session_persistence: bool = True
 
class SDNServerLoadBalancer:
    """
    SDN-based server load balancing application.
    Distributes traffic across server pools using programmable switches.
    """
 
    def __init__(self, controller, health_checker):
        self.controller = controller
        self.health_checker = health_checker
        self.services: Dict[str, VirtualService] = {}
        self.round_robin_counters: Dict[str, int] = {}
 
    def register_service(self, service: VirtualService):
        """Register a virtual service for load balancing."""
        self.services[f"{service.vip}:{service.vip_port}"] = service
        self.round_robin_counters[service.name] = 0
        
        # Start health checking
        self.health_checker.add_targets(
            [s.ip for s in service.servers],
            callback=lambda ip, healthy: self._on_health_change(
                service.name, ip, healthy
            )
        )
        
        # Install initial flow rules
        self._install_lb_rules(service)
 
    def _install_lb_rules(self, service: VirtualService):
        """Install OpenFlow rules for load balancing."""
        
        healthy_servers = [
            s for s in service.servers 
            if s.state == ServerState.HEALTHY
        ]
        
        if not healthy_servers:
            # No healthy servers - install drop rule with alert
            self._install_service_unavailable(service)
            return
 
        # Calculate bucket weights
        total_weight = sum(s.weight for s in healthy_servers)
        
        buckets = []
        for server in healthy_servers:
            bucket_weight = int((server.weight / total_weight) * 100)
            buckets.append({
                "weight": bucket_weight,
                "actions": [
                    {"type": "SET_FIELD", "field": "ip_dst", 
                     "value": server.ip},
                    {"type": "SET_FIELD", "field": "tcp_dst", 
                     "value": server.port},
                    {"type": "OUTPUT", "port": self._get_server_port(server)}
                ]
            })
 
        # Create or update SELECT group
        group_id = self._get_group_id(service)
        
        for switch_id in self._get_ingress_switches():
            self.controller.install_group(
                switch_id=switch_id,
                group_id=group_id,
                group_type="SELECT",
                buckets=buckets
            )
            
            # Install flow rule using the group
            self.controller.install_flow(
                switch_id=switch_id,
                priority=1000,
                match={
                    "ip_dst": service.vip,
                    "tcp_dst": service.vip_port,
                },
                actions=[{"type": "GROUP", "group_id": group_id}]
            )
 
            # Install reverse NAT for responses
            for server in healthy_servers:
                self.controller.install_flow(
                    switch_id=switch_id,
                    priority=1000,
                    match={
                        "ip_src": server.ip,
                        "tcp_src": server.port,
                    },
                    actions=[
                        {"type": "SET_FIELD", "field": "ip_src", 
                         "value": service.vip},
                        {"type": "SET_FIELD", "field": "tcp_src", 
                         "value": service.vip_port},
                        {"type": "OUTPUT", "port": "NORMAL"}
                    ]
                )
 
    def _on_health_change(self, service_name: str, server_ip: str, healthy: bool):
        """Handle server health state change."""
        service_key = None
        for key, service in self.services.items():
            if service.name == service_name:
                service_key = key
                break
        
        if not service_key:
            return
            
        service = self.services[service_key]
        
        for server in service.servers:
            if server.ip == server_ip:
                old_state = server.state
                server.state = (ServerState.HEALTHY if healthy 
                               else ServerState.UNHEALTHY)
                
                if old_state != server.state:
                    print(f"Server {server_ip} state: {old_state} -> {server.state}")
                    
                    # Reinstall rules with updated server list
                    self._install_lb_rules(service)
                break
 
    def select_server_reactive(
        self, 
        service: VirtualService,
        client_ip: str,
        flow_tuple: tuple
    ) -> Optional[Server]:
        """
        Select server for reactive (per-flow) load balancing.
        Called when first packet triggers Packet-In.
        """
        healthy_servers = [
            s for s in service.servers 
            if s.state == ServerState.HEALTHY
        ]
        
        if not healthy_servers:
            return None
 
        if service.algorithm == "round_robin":
            return self._round_robin(service.name, healthy_servers)
        elif service.algorithm == "weighted_round_robin":
            return self._weighted_round_robin(service.name, healthy_servers)
        elif service.algorithm == "least_connections":
            return self._least_connections(healthy_servers)
        elif service.algorithm == "ip_hash":
            return self._ip_hash(client_ip, healthy_servers)
        else:
            return healthy_servers[0]
 
    def _round_robin(
        self, 
        service_name: str, 
        servers: List[Server]
    ) -> Server:
        """Simple round-robin selection."""
        counter = self.round_robin_counters.get(service_name, 0)
        selected = servers[counter % len(servers)]
        self.round_robin_counters[service_name] = counter + 1
        return selected
 
    def _weighted_round_robin(
        self, 
        service_name: str, 
        servers: List[Server]
    ) -> Server:
        """Weighted round-robin based on server weights."""
        # Expand server list by weights
        weighted_list = []
        for server in servers:
            weighted_list.extend([server] * server.weight)
        
        counter = self.round_robin_counters.get(service_name, 0)
        selected = weighted_list[counter % len(weighted_list)]
        self.round_robin_counters[service_name] = counter + 1
        return selected
 
    def _least_connections(self, servers: List[Server]) -> Server:
        """Select server with fewest active connections."""
        return min(servers, key=lambda s: s.active_connections)
 
    def _ip_hash(self, client_ip: str, servers: List[Server]) -> Server:
        """Consistent hashing based on client IP."""
        hash_value = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
        return servers[hash_value % len(servers)]
 
    def drain_server(self, service_name: str, server_ip: str):
        """
        Gracefully remove server from pool.
        Existing connections continue; new connections go elsewhere.
        """
        for service in self.services.values():
            if service.name == service_name:
                for server in service.servers:
                    if server.ip == server_ip:
                        server.state = ServerState.DRAINING
                        
                        # Update LB rules to exclude this server
                        self._install_lb_rules(service)
                        
                        print(f"Server {server_ip} draining - "
                              f"no new connections")
                        return
 
    def _install_service_unavailable(self, service: VirtualService):
        """Install rules when no servers are available."""
        for switch_id in self._get_ingress_switches():
            # Option 1: Send ICMP unreachable
            # Option 2: Redirect to maintenance page
            # Option 3: Simply drop (worst UX)
            
            self.controller.install_flow(
                switch_id=switch_id,
                priority=1000,
                match={
                    "ip_dst": service.vip,
                    "tcp_dst": service.vip_port,
                },
                actions=[
                    # Send to controller for custom response
                    {"type": "OUTPUT", "port": "CONTROLLER"}
                ]
            )
 
    def _get_group_id(self, service: VirtualService) -> int:
        """Generate unique group ID for service."""
        return hash(f"{service.vip}:{service.vip_port}") & 0xFFFFFFFF
 
    def _get_server_port(self, server: Server) -> int:
        """Get switch port for server."""
        return self.controller.get_host_port(server.ip)
 
    def _get_ingress_switches(self) -> List[str]:
        """Get switches where LB rules should be installed."""
        return self.controller.get_all_switches()

Session Persistence

Most SDN load balancing uses packet header hashing for consistency—the same 5-tuple always hashes to the same server. For true session persistence across multiple connections (e.g., user shopping session), you need application-layer cookies or client IP tracking, which typically requires controller involvement or integration with Layer 7 components.

Load Balancing Algorithms

The choice of load balancing algorithm significantly impacts distribution quality and server utilization.

Static Algorithms

1. Round Robin:

Distribute requests sequentially across servers:

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

Best for: Homogeneous servers, similar request costs Limitation: Ignores server capacity differences and current load

2. Weighted Round Robin:

Distribute proportionally to assigned weights:

Server A (weight 5): Gets 5 out of 10 requests
Server B (weight 3): Gets 3 out of 10 requests
Server C (weight 2): Gets 2 out of 10 requests

Best for: Heterogeneous server capacities Limitation: Weights are static; doesn't adapt to actual load

3. IP Hash:

Hash client IP to determine server:

hash(client_ip) mod server_count → server_index

Best for: Session persistence without cookies Limitation: Distribution may be uneven; server changes disrupt persistence

Dynamic Algorithms

4. Least Connections:

Route to server with fewest active connections:

Server A: 150 connections
Server B: 120 connections  ← Select this one
Server C: 180 connections

Best for: Variable request durations Limitation: Requires tracking connection state

5. Weighted Least Connections:

Combine connection count with server capacity:

Score = active_connections / weight

Server A: 150/5 = 30  ← Lowest score, select this
Server B: 120/3 = 40
Server C: 80/2 = 40

Best for: Heterogeneous servers with variable load

6. Adaptive/Response-Time:

Route based on server response times:

Server A: avg response 50ms
Server B: avg response 30ms  ← Fastest, prefer this
Server C: avg response 45ms

Best for: Optimizing user experience Limitation: Requires active monitoring

Load Balancing Algorithm Comparison
Algorithm	State Required	SDN Implementation	Best Use Case
Round Robin	Counter only	Group bucket rotation	Homogeneous, uniform load
Weighted RR	Counter + weights	Weighted SELECT group	Heterogeneous servers
IP Hash	None	Hash-based bucket selection	Session persistence needed
Least Connections	Per-server counter	Controller decision	Long-lived connections
Weighted Least Conn	Counters + weights	Controller decision	Variable capacity + load
Response Time	Latency metrics	Controller + monitoring	Latency-sensitive apps

Implementing Adaptive Load Balancing

Adaptive algorithms require feedback from servers or traffic analysis:

Controller-Based Adaptive LB:

Controller monitors server metrics (connections, response time, CPU)
Periodically recalculates optimal weights
Updates OpenFlow group bucket weights
Switches immediately distribute according to new weights

Challenges:

Stability: Rapid weight changes cause oscillation
Granularity: Group weights are coarse (integer percentages)
Latency: Metric collection adds delay to adaptation

Best Practices:

Dampen weight changes (smooth rather than step)
Update weights on timescales longer than oscillation period
Use hysteresis (require significant change before re-weighting)
Combine static weights with adaptive adjustment

Hash Consistency on Server Changes

When servers are added or removed, simple modulo hashing redistributes many connections unnecessarily. Consistent hashing algorithms minimize redistribution—only connections to the changed server are affected. SDN implementations should use consistent hashing for production deployments where server pool changes are common.

Network Path Load Balancing

Beyond server load balancing, SDN enables intelligent distribution of traffic across network paths.

Multi-Path Traffic Distribution

When multiple paths exist between source and destination, SDN can distribute traffic:

Use Cases:

Datacenter fabrics: Multiple paths through spine-leaf topology
WAN optimization: Multiple ISP links or MPLS paths
Redundancy with utilization: Use backup paths for load sharing, not just failover

SDN Path Load Balancing

1. ECMP Enhancement:

Traditional ECMP uses static hashing. SDN enables:

Traditional ECMP:
  hash(5-tuple) → always same path
  No awareness of path conditions

SDN-Enhanced:
  Monitor path utilization
  Adjust hash bucket assignments
  Move flows from congested to available paths

2. Flowlet-Based Distribution:

Large flows (elephants) cause congestion when ECMP places them on same path.

Flowlet switching exploits natural gaps in flows:

Flow A: [burst] [gap] [burst] [gap] [burst]
                  ↓
During gaps, reassign to less-utilized path
No packet reordering within bursts

3. Traffic Matrix Awareness:

Controller knows aggregate demand between all pairs:

DC1 → DC2: 80 Gbps demand
DC1 → DC3: 40 Gbps demand
DC2 → DC3: 60 Gbps demand

Paths:
  DC1-DC2: Path A (100G), Path B (100G)
  DC1-DC3: Path C (50G), Path D (50G)
  DC2-DC3: Path E (100G)

Optimize: Place DC1→DC2 traffic on both paths evenly
          Use only Path C for DC1→DC3 (no need for D)
          Reserve E capacity for DC2→DC3 burst

sdn-path-lb
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
"""
SDN Path Load Balancing
Distributes traffic across multiple network paths based on utilization
"""
 
from dataclasses import dataclass
from typing import Dict, List, Tuple
import heapq
 
@dataclass
class NetworkPath:
    path_id: str
    hops: List[str]  # List of switch IDs
    capacity_gbps: float
    current_utilization: float = 0.0
 
    @property
    def available_bandwidth(self) -> float:
        return self.capacity_gbps * (1 - self.current_utilization)
 
class PathLoadBalancer:
    """
    SDN path load balancing application.
    Distributes flows across multiple paths based on capacity and utilization.
    """
 
    def __init__(self, controller, topology_manager):
        self.controller = controller
        self.topology = topology_manager
        self.path_flows: Dict[str, List[dict]] = {}  # path_id -> flows
 
    def compute_paths(
        self, 
        source: str, 
        destination: str,
        k: int = 3
    ) -> List[NetworkPath]:
        """
        Compute K diverse paths between source and destination.
        """
        all_paths = self.topology.get_k_shortest_paths(source, destination, k)
        
        return [
            NetworkPath(
                path_id=f"{source}-{destination}-{i}",
                hops=path,
                capacity_gbps=self._get_path_capacity(path),
                current_utilization=self._get_path_utilization(path)
            )
            for i, path in enumerate(all_paths)
        ]
 
    def distribute_flow(
        self, 
        source: str, 
        destination: str,
        flow_demand_gbps: float,
        flow_id: str
    ) -> List[Tuple[NetworkPath, float]]:
        """
        Distribute a flow across paths based on available capacity.
        Returns list of (path, allocated_bandwidth) tuples.
        """
        paths = self.compute_paths(source, destination)
        
        # For small flows, use single best path
        if flow_demand_gbps < 1.0:
            best_path = max(paths, key=lambda p: p.available_bandwidth)
            self._install_single_path(flow_id, best_path)
            return [(best_path, flow_demand_gbps)]
 
        # For large flows, split across paths
        return self._split_flow_across_paths(flow_id, flow_demand_gbps, paths)
 
    def _split_flow_across_paths(
        self,
        flow_id: str,
        demand: float,
        paths: List[NetworkPath]
    ) -> List[Tuple[NetworkPath, float]]:
        """
        Split large flow across multiple paths proportionally.
        """
        allocations = []
        remaining_demand = demand
        
        # Sort paths by available bandwidth
        sorted_paths = sorted(
            paths, 
            key=lambda p: p.available_bandwidth, 
            reverse=True
        )
        
        # Allocate proportionally to available bandwidth
        total_available = sum(p.available_bandwidth for p in sorted_paths)
        
        if total_available < demand:
            # Not enough capacity - allocate what we can
            print(f"Warning: Insufficient capacity for flow {flow_id}")
            
        for path in sorted_paths:
            if remaining_demand <= 0:
                break
                
            allocation = min(
                path.available_bandwidth,
                remaining_demand * (path.available_bandwidth / total_available)
            )
            
            if allocation > 0:
                allocations.append((path, allocation))
                remaining_demand -= allocation
 
        # Install split flow rules
        self._install_split_flow(flow_id, allocations)
        
        return allocations
 
    def _install_single_path(self, flow_id: str, path: NetworkPath):
        """Install flow rules for single-path routing."""
        
        hops = path.hops
        for i, switch_id in enumerate(hops[:-1]):
            next_hop = hops[i + 1]
            out_port = self.topology.get_port_to_neighbor(switch_id, next_hop)
            
            self.controller.install_flow(
                switch_id=switch_id,
                priority=500,
                match=self._get_flow_match(flow_id),
                actions=[{"type": "OUTPUT", "port": out_port}],
                cookie=hash(f"{flow_id}-{path.path_id}")
            )
 
    def _install_split_flow(
        self, 
        flow_id: str, 
        allocations: List[Tuple[NetworkPath, float]]
    ):
        """
        Install flow rules to split traffic across paths.
        Uses weighted group buckets.
        """
        # Get first switch (ingress)
        first_hops = set(alloc[0].hops[0] for alloc in allocations)
        
        for ingress_switch in first_hops:
            # Create weighted group for splitting
            buckets = []
            total_allocation = sum(a[1] for a in allocations)
            
            for path, bandwidth in allocations:
                if path.hops[0] != ingress_switch:
                    continue
                    
                weight = int((bandwidth / total_allocation) * 100)
                next_hop = path.hops[1]
                out_port = self.topology.get_port_to_neighbor(
                    ingress_switch, next_hop
                )
                
                buckets.append({
                    "weight": weight,
                    "actions": [
                        # Optionally set path identifier in DSCP/MPLS
                        {"type": "OUTPUT", "port": out_port}
                    ]
                })
 
            group_id = hash(f"{flow_id}-split") & 0xFFFFFFFF
            
            self.controller.install_group(
                switch_id=ingress_switch,
                group_id=group_id,
                group_type="SELECT",
                buckets=buckets
            )
            
            self.controller.install_flow(
                switch_id=ingress_switch,
                priority=500,
                match=self._get_flow_match(flow_id),
                actions=[{"type": "GROUP", "group_id": group_id}]
            )
 
        # Install path rules for subsequent hops
        for path, _ in allocations:
            for i, switch_id in enumerate(path.hops[1:-1], start=1):
                next_hop = path.hops[i + 1]
                out_port = self.topology.get_port_to_neighbor(
                    switch_id, next_hop
                )
                
                self.controller.install_flow(
                    switch_id=switch_id,
                    priority=500,
                    match=self._get_flow_match(flow_id),
                    actions=[{"type": "OUTPUT", "port": out_port}]
                )
 
    def rebalance_paths(self):
        """
        Periodically rebalance flows across paths based on current utilization.
        Move flows from congested to available paths.
        """
        for (src, dst), flows in self._get_active_flows().items():
            paths = self.compute_paths(src, dst)
            
            # Check for imbalance
            utilizations = [p.current_utilization for p in paths]
            max_util = max(utilizations)
            min_util = min(utilizations)
            
            if max_util - min_util > 0.3:  # 30% imbalance threshold
                # Identify flows on congested path
                congested_path = max(paths, key=lambda p: p.current_utilization)
                flows_on_congested = self.path_flows.get(
                    congested_path.path_id, []
                )
                
                if flows_on_congested:
                    # Move smallest flow to least-loaded path
                    flow_to_move = min(
                        flows_on_congested, 
                        key=lambda f: f['demand']
                    )
                    new_path = min(paths, key=lambda p: p.current_utilization)
                    
                    self._move_flow(flow_to_move['id'], new_path)
 
    def _get_path_capacity(self, path: List[str]) -> float:
        """Get minimum link capacity along path (bottleneck)."""
        min_capacity = float('inf')
        for i in range(len(path) - 1):
            link_capacity = self.topology.get_link_capacity(path[i], path[i+1])
            min_capacity = min(min_capacity, link_capacity)
        return min_capacity
 
    def _get_path_utilization(self, path: List[str]) -> float:
        """Get maximum link utilization along path (bottleneck)."""
        max_util = 0.0
        for i in range(len(path) - 1):
            link_util = self.topology.get_link_utilization(path[i], path[i+1])
            max_util = max(max_util, link_util)
        return max_util
 
    def _get_flow_match(self, flow_id: str) -> Dict:
        """Get OpenFlow match for flow."""
        # In practice, look up flow definition
        return {"cookie": hash(flow_id)}
 
    def _get_active_flows(self) -> Dict:
        """Get currently active flows by source-destination pair."""
        return {}
 
    def _move_flow(self, flow_id: str, new_path: NetworkPath):
        """Move flow to new path."""
        pass

Avoiding Packet Reordering

TCP performance degrades significantly with packet reordering. When splitting flows across paths with different latencies, ensure flow-level (not packet-level) splitting, or use flowlet-based techniques that only redistribute during natural flow gaps. Never split a single TCP connection across paths with different delays.

Health Checking and Service Discovery

Load balancing effectiveness depends on accurate health information. SDN integrates with health checking and service discovery systems.

Controller-Based Health Checking

Active Probing:

Controller (or dedicated health checker) probes servers:

Health Check Configuration:
  Target: 10.1.1.10:443
  Protocol: HTTPS
  Path: /health
  Interval: 5 seconds
  Timeout: 2 seconds
  Unhealthy threshold: 3 failures
  Healthy threshold: 2 successes

Response-Based Detection:

Monitor actual traffic for failure indicators:

Indicators:
  - TCP RST responses
  - Connection timeouts
  - HTTP 5xx error rates
  - Response latency spikes

Integration with Service Discovery

In dynamic environments (Kubernetes, cloud), servers appear and disappear:

Event-Driven Updates:

1. Service discovery detects new pod: api-server-xyz at 10.244.1.50
2. Discovery system notifies SDN controller
3. Controller adds server to pool
4. Controller updates flow rules/group buckets
5. New server immediately receives traffic

Common Integrations:

Kubernetes: Watch Service/Endpoint changes via API
Consul: Subscribe to service catalog updates
etcd: Watch key prefixes for server registrations
DNS: Poll SRV records for service members

Health Check Types

•TCP connect — Port is listening
•HTTP GET — Application responds
•HTTP path — Specific health endpoint
•gRPC health — gRPC health protocol
•Custom script — Application-specific

SDN Response to Health

•Healthy — Include in load balancing
•Unhealthy — Remove from pool immediately
•Draining — No new connections, finish existing
•Unknown — Remove after timeout

Graceful Server Removal

When removing a server (maintenance, scaling down), abrupt removal drops connections:

Graceful Drain Process:

1. Mark server as 'draining' in service discovery
2. SDN controller receives drain notification
3. Controller removes server from LB group (no new connections)
4. Existing flow rules remain for ongoing connections
5. Monitor active connection count on server
6. When all connections complete (or timeout), fully remove
7. Delete remaining flow rules for that server

This ensures zero connection drops during planned maintenance.

Connection Draining with SDN

Before drain:
  Group 100: [Server1: 33%] [Server2: 33%] [Server3: 33%]

Drain Server2:
  Group 100: [Server1: 50%] [Server3: 50%]
  
  Existing flows to Server2: Unchanged (continue working)
  New flows: Cannot reach Server2 (removed from group)

After connections drain:
  Delete flow rules mentioning Server2
  Server2 can be safely removed

Health Check Placement

Health checks can run in the SDN controller, on dedicated health checker infrastructure, or distributed across switches (limited capability). Controller-based health checking is simplest but adds controller load. For large deployments, dedicated health checking infrastructure with event notification to the controller scales better.

Summary: SDN Load Balancing

SDN transforms load balancing from appliance-based to software-defined, distributed across the switching fabric. Let's consolidate the key concepts:

Key Takeaways

•No dedicated appliances — Commodity switches perform load balancing under controller direction
•OpenFlow groups — SELECT groups enable efficient weighted distribution at line rate
•Multiple algorithms — Round robin, weighted, least connections, IP hash—selected per service
•Server and path balancing — Both destination servers and network paths can be load balanced
•Dynamic adaptation — Controller updates weights based on server health and utilization
•Health integration — Real-time health checking removes failed servers immediately
•Service discovery — Dynamic server pools updated automatically as instances appear/disappear
•Graceful operations — Connection draining enables zero-downtime maintenance

What's Next:

With load balancing covered, we'll explore QoS Management—how SDN enables fine-grained Quality of Service control including traffic classification, priority queuing, rate limiting, and bandwidth guarantees.

Page Complete

You now understand how SDN enables sophisticated load balancing without dedicated appliances. By distributing load balancing logic across the switching fabric under centralized control, SDN achieves the flexibility of software with the performance of hardware switching. This architecture scales naturally with the network rather than requiring additional specialized equipment.