Computer NetworksSDN Applications

Software-Defined Networking Applications

LevelAdvanced

Duration120 mins

TopicSDN Applications

2 / 5

Network Monitoring: Comprehensive Visibility Through SDN

The Visibility Revolution

Traditional network monitoring feels like observing a city through scattered security cameras—each device provides a local view, and operators must mentally stitch together fragmented perspectives to understand what's happening. SNMP polls return stale data. NetFlow samples miss the details. Correlating events across devices requires expensive external tools that never quite achieve real-time visibility.

SDN fundamentally transforms network monitoring. With programmatic access to every switch, the controller becomes a comprehensive monitoring platform. It can query statistics on demand, install measurement rules dynamically, and correlate data across the entire network in real-time. Monitoring isn't bolted on—it's built into the architecture.

This integrated visibility powers everything else SDN enables: traffic engineering requires knowing current utilization; security applications need traffic analysis; troubleshooting demands flow-level tracing. Network monitoring in SDN isn't just about observability—it's the sensory system that enables intelligent control.

What You Will Learn

By the end of this page, you will understand SDN's monitoring architecture including statistics collection mechanisms, flow-level visibility, traffic sampling and analysis, real-time measurement systems, and how monitoring data feeds back into control decisions. You'll explore both OpenFlow-native monitoring and integration with external systems.

SDN Monitoring Architecture

SDN's monitoring capabilities stem from its fundamental architecture—the separation of control and data planes creates natural instrumentation points.

Monitoring Data Sources

1. OpenFlow Statistics:

Every OpenFlow switch maintains counters that the controller can query:

Port statistics: Bytes, packets, drops, errors per port
Flow statistics: Bytes, packets, duration per flow rule
Table statistics: Active entries, lookups, matches per table
Queue statistics: Bytes, packets, drops per queue
Group statistics: Bytes, packets per group bucket
Meter statistics: Bytes, packets processed per meter band

2. Packet-In Messages:

When switches encounter unknown flows or explicit sampling rules, they send packets (or headers) to the controller—providing direct traffic visibility.

3. Port Status Notifications:

Switches asynchronously notify controllers of port state changes, link failures, and configuration modifications.

4. Auxiliary Connections:

OpenFlow 1.3+ supports auxiliary connections for high-volume data like sampled traffic, separate from the main control channel.

Controller as Monitoring Platform

The controller aggregates monitoring data from all switches, providing:

Unified view: Single point for network-wide statistics
Correlation: Connect events across devices and time
Computation: Derive metrics (utilization, loss rates) from raw counters
Alerting: Detect anomalies and trigger responses
Historical storage: Retain data for trend analysis

Converting Mermaid diagram...

Polling vs. Push

OpenFlow primarily uses pull-based statistics (controller requests, switch responds). For real-time monitoring, controllers poll frequently—but this creates overhead. Modern approaches include push-based telemetry (streaming), in-band network telemetry (INT), and switch-local sampling to reduce control plane load while maintaining visibility.

Flow-Level Visibility

One of SDN's most powerful monitoring capabilities is flow-level visibility—the ability to track individual conversations through the network.

Per-Flow Statistics

Every flow rule maintains counters:

Flow Rule:
  Match: ip_src=10.1.1.0/24, ip_dst=10.2.2.0/24, tcp_dst=443
  Actions: Output(port=3)
  Counters:
    - packet_count: 1,547,832
    - byte_count: 2,147,483,648
    - duration_sec: 3600
    - duration_nsec: 500000000

Derived Metrics:

Throughput: byte_count / duration = 596 Kbps
Packet rate: packet_count / duration = 430 pps
Average packet size: byte_count / packet_count = 1,387 bytes

Dynamic Flow Installation for Measurement

The controller can install temporary rules purely for measurement:

Use Case: Measure traffic between specific hosts

1. Install high-priority rule matching specific flow
2. Set action to: Forward + count (same as existing path)
3. Periodically read counters
4. Remove rule when measurement complete

This enables on-demand deep visibility without permanent overhead.

sdn-flow-monitoring
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
"""
SDN Flow Monitoring: Programmatic Per-Flow Statistics Collection
Demonstrates dynamic flow measurement installation and analysis
"""
 
from dataclasses import dataclass
from typing import Dict, List, Optional
from datetime import datetime, timedelta
import time
 
@dataclass
class FlowStats:
    """Statistics for a single flow rule"""
    match: Dict[str, str]
    packet_count: int
    byte_count: int
    duration_sec: int
    duration_nsec: int = 0
 
    @property
    def duration_total_sec(self) -> float:
        return self.duration_sec + (self.duration_nsec / 1_000_000_000)
 
    @property
    def throughput_bps(self) -> float:
        if self.duration_total_sec == 0:
            return 0
        return (self.byte_count * 8) / self.duration_total_sec
 
    @property
    def packet_rate(self) -> float:
        if self.duration_total_sec == 0:
            return 0
        return self.packet_count / self.duration_total_sec
 
    @property
    def avg_packet_size(self) -> float:
        if self.packet_count == 0:
            return 0
        return self.byte_count / self.packet_count
 
 
class FlowMonitor:
    """
    SDN Flow Monitoring System
    Provides per-flow visibility through OpenFlow statistics
    """
 
    def __init__(self, controller_connection):
        self.controller = controller_connection
        self.flow_history: Dict[str, List[FlowStats]] = {}
        self.active_measurements: Dict[str, dict] = {}
 
    def get_all_flow_stats(self, switch_id: str) -> List[FlowStats]:
        """
        Query all flow statistics from a switch.
        OpenFlow OFPMP_FLOW (Multipart Flow Stats Request)
        """
        # In real implementation, this sends OpenFlow message
        # and parses response
        response = self.controller.send_stats_request(
            switch_id=switch_id,
            stats_type="FLOW",
            match={}  # Empty match = all flows
        )
        
        return [
            FlowStats(
                match=flow['match'],
                packet_count=flow['packet_count'],
                byte_count=flow['byte_count'],
                duration_sec=flow['duration_sec'],
                duration_nsec=flow['duration_nsec']
            )
            for flow in response['flows']
        ]
 
    def install_measurement_flow(
        self,
        switch_id: str,
        src_ip: str,
        dst_ip: str,
        protocol: Optional[str] = None,
        dst_port: Optional[int] = None,
        measurement_id: str = None
    ) -> str:
        """
        Install a high-priority flow rule for measurement.
        The rule matches specific traffic and forwards normally,
        but allows us to track counters for this specific flow.
        """
        measurement_id = measurement_id or f"measure_{int(time.time())}"
 
        match = {
            "ip_src": src_ip,
            "ip_dst": dst_ip,
        }
        if protocol:
            match["ip_proto"] = protocol
        if dst_port:
            match["tcp_dst" if protocol == "TCP" else "udp_dst"] = dst_port
 
        # Get existing forwarding action for this traffic
        # (We want to measure without changing forwarding behavior)
        existing_action = self._get_existing_action(switch_id, match)
        
        # Install measurement rule at high priority
        self.controller.install_flow(
            switch_id=switch_id,
            priority=65000,  # High priority to ensure match
            match=match,
            actions=existing_action,  # Same forwarding as before
            idle_timeout=0,  # Don't expire
            hard_timeout=0,
            cookie=hash(measurement_id) & 0xFFFFFFFFFFFFFFFF
        )
 
        self.active_measurements[measurement_id] = {
            "switch_id": switch_id,
            "match": match,
            "installed_at": datetime.now(),
            "samples": []
        }
 
        return measurement_id
 
    def sample_measurement(self, measurement_id: str) -> Optional[FlowStats]:
        """
        Collect current statistics for a measurement flow.
        """
        if measurement_id not in self.active_measurements:
            return None
 
        measurement = self.active_measurements[measurement_id]
        
        stats = self.controller.get_flow_stats(
            switch_id=measurement["switch_id"],
            match=measurement["match"]
        )
 
        if stats:
            flow_stats = FlowStats(
                match=measurement["match"],
                packet_count=stats['packet_count'],
                byte_count=stats['byte_count'],
                duration_sec=stats['duration_sec'],
                duration_nsec=stats['duration_nsec']
            )
            measurement["samples"].append({
                "timestamp": datetime.now(),
                "stats": flow_stats
            })
            return flow_stats
 
        return None
 
    def compute_interval_stats(
        self,
        measurement_id: str,
        interval_seconds: int = 60
    ) -> Dict:
        """
        Compute statistics over the last interval.
        Uses delta between samples for accurate interval metrics.
        """
        if measurement_id not in self.active_measurements:
            return {}
 
        samples = self.active_measurements[measurement_id]["samples"]
        if len(samples) < 2:
            return {"error": "Need at least 2 samples"}
 
        # Find samples spanning the interval
        now = datetime.now()
        interval_start = now - timedelta(seconds=interval_seconds)
 
        relevant_samples = [
            s for s in samples 
            if s["timestamp"] >= interval_start
        ]
 
        if len(relevant_samples) < 2:
            return {"error": "Insufficient samples in interval"}
 
        first = relevant_samples[0]["stats"]
        last = relevant_samples[-1]["stats"]
        
        time_delta = (
            relevant_samples[-1]["timestamp"] - 
            relevant_samples[0]["timestamp"]
        ).total_seconds()
 
        byte_delta = last.byte_count - first.byte_count
        packet_delta = last.packet_count - first.packet_count
 
        return {
            "interval_seconds": time_delta,
            "bytes_transferred": byte_delta,
            "packets_transferred": packet_delta,
            "throughput_bps": (byte_delta * 8) / time_delta if time_delta else 0,
            "packet_rate_pps": packet_delta / time_delta if time_delta else 0,
            "avg_packet_size": byte_delta / packet_delta if packet_delta else 0
        }
 
    def remove_measurement(self, measurement_id: str):
        """Remove measurement flow rule and clean up."""
        if measurement_id not in self.active_measurements:
            return
 
        measurement = self.active_measurements[measurement_id]
        
        self.controller.delete_flow(
            switch_id=measurement["switch_id"],
            cookie=hash(measurement_id) & 0xFFFFFFFFFFFFFFFF
        )
 
        del self.active_measurements[measurement_id]
 
    def _get_existing_action(self, switch_id: str, match: Dict) -> List:
        """Query existing forwarding action for traffic matching pattern."""
        # Implementation queries flow tables to find current action
        # Returns action list like [{"type": "OUTPUT", "port": 3}]
        pass
 
 
# Demonstration of network-wide monitoring
class NetworkWideMonitor:
    """
    Aggregates monitoring across all switches for network-wide view.
    """
 
    def __init__(self, controller):
        self.controller = controller
        self.switch_monitors: Dict[str, FlowMonitor] = {}
 
    def get_network_utilization(self) -> Dict[str, float]:
        """
        Collect port utilization across all switches.
        Returns link utilization as percentage.
        """
        utilization = {}
 
        for switch_id in self.controller.get_all_switches():
            port_stats = self.controller.get_port_stats(switch_id)
 
            for port in port_stats:
                link_id = f"{switch_id}:{port['port_no']}"
                
                # Calculate utilization from byte counters
                # Assuming we have previous sample and link capacity
                capacity_bps = port.get('curr_speed', 10_000_000_000)
                
                # In real implementation, compute delta from previous sample
                current_bps = self._compute_rate(
                    switch_id, port['port_no'], port['tx_bytes']
                )
                
                utilization[link_id] = (current_bps / capacity_bps) * 100
 
        return utilization
 
    def detect_elephant_flows(
        self,
        threshold_bytes: int = 10_000_000  # 10MB
    ) -> List[Dict]:
        """
        Identify large flows across the network.
        Elephant flows are candidates for special handling.
        """
        elephants = []
 
        for switch_id in self.controller.get_all_switches():
            flows = self.controller.get_flow_stats(switch_id)
 
            for flow in flows:
                if flow['byte_count'] >= threshold_bytes:
                    elephants.append({
                        "switch": switch_id,
                        "match": flow['match'],
                        "bytes": flow['byte_count'],
                        "packets": flow['packet_count'],
                        "duration": flow['duration_sec']
                    })
 
        # Sort by size, largest first
        return sorted(elephants, key=lambda x: x['bytes'], reverse=True)
 
    def _compute_rate(self, switch_id, port_no, current_bytes):
        """Compute rate from counter delta."""
        # Implementation tracks previous values and timestamps
        pass

Counter Wraparound

OpenFlow counters are finite (typically 64-bit). At 100Gbps, a byte counter wraps in about 47 years—but packet counters on busy switches can wrap faster. Robust monitoring implementations must handle counter wraparound gracefully, detecting when current < previous indicates wrap rather than counter reset.

Traffic Sampling and Analysis

While flow statistics provide aggregate metrics, sometimes deeper packet-level analysis is required. SDN enables sophisticated sampling strategies.

Sampling Approaches

1. sFlow Integration:

Many OpenFlow switches also support sFlow—a hardware-based sampling technology:

Random 1-in-N packet sampling
Samples sent to external collector
Minimal performance impact
Industry standard analysis tools

2. OpenFlow Packet-In Sampling:

Controller-directed sampling using OpenFlow:

Flow Rule:
  Match: ip_dst=0.0.0.0/0  (all traffic)
  Actions:
    - Sample(probability=0.001)  # 1 in 1000 packets
    - Forward(normal)            # Continue processing

3. Mirror Port Configuration:

SDN can dynamically configure port mirroring:

Mirror specific flows to analysis port
Time-bounded mirroring for troubleshooting
Selective mirroring based on traffic characteristics

In-Band Network Telemetry (INT)

Modern switches support INT—embedding measurement metadata directly in packets:

How INT Works:

Source adds INT header requesting telemetry
Each switch along path adds its metadata:
- Switch ID
- Ingress/egress port
- Queue depth
- Timestamp
Destination/collector extracts and processes metadata

Benefits:

Hop-by-hop latency measurement
Queue depth visibility at each hop
Path verification (did packet take expected route?)
Minimal additional overhead (metadata in packet)
No controller polling required

Traffic Visibility Approaches Comparison
Approach	Visibility	Overhead	Use Case
Flow Statistics	Aggregate counters per rule	Low (polling)	Utilization, throughput monitoring
sFlow Sampling	Packet headers (sampled)	Very Low	Traffic analysis, DDoS detection
Packet-In	Full packets (selective)	High	Deep inspection, unknown flows
Port Mirroring	Full packets (mirrored)	Medium	Troubleshooting, forensics
INT	Per-hop metadata in-band	Low	Latency, path verification

Real-Time Traffic Classification

SDN controllers can perform real-time traffic classification using sampling data:

Classification Hierarchy:

Protocol identification: HTTP, HTTPS, DNS, custom protocols
Application detection: Video streaming, VoIP, web browsing, backup
Behavior analysis: Interactive, bulk transfer, periodic, bursty
Anomaly detection: Port scans, DDoS, data exfiltration

Controller Actions Based on Classification:

Route video traffic through high-bandwidth paths
Prioritize VoIP in QoS queues
Rate-limit or block detected attacks
Alert operators to unusual patterns

The feedback loop—monitor → classify → act → monitor—enables adaptive network behavior that traditional networks cannot achieve.

Sampling Rate Trade-offs

Higher sampling rates provide better accuracy but increase processing load. For elephant flow detection, even 1-in-10,000 sampling often suffices—large flows will be sampled frequently. For security analysis requiring detection of low-rate attacks, higher sampling or flow-based detection is necessary. Match sampling strategy to detection requirements.

Network-Wide Event Correlation

The controller's unified view enables correlation across devices that traditional monitoring tools struggle to achieve.

Cross-Device Flow Tracing

Trace a flow's path through the network:

Flow: src=10.1.1.5, dst=10.3.3.8, tcp/443

Switch 1 (Leaf-1):
  Ingress port 5, matched flow rule #47
  Egress port 49 (uplink to Spine-1)
  Packets: 15,234 | Bytes: 21,457,892

Switch 2 (Spine-1):
  Ingress port 1, matched flow rule #112
  Egress port 24 (downlink to Leaf-3)
  Packets: 15,234 | Bytes: 21,457,892  ← No loss

Switch 3 (Leaf-3):
  Ingress port 49, matched flow rule #83
  Egress port 12 (server port)
  Packets: 15,230 | Bytes: 21,450,000  ← 4 packets lost here!

Root Cause Identification:

By correlating counters across path, we identify:

Where loss occurs (between which switches)
Whether drops are ingress or egress
Queue depth at time of drops
Other competing flows

Temporal Correlation

Link events and performance:

Timeline:
10:23:45 - Link Spine-1:port-12 flaps down
10:23:45 - Controller receives port-down notification
10:23:46 - ECMP rehash redistributes traffic
10:23:46 - Link Spine-2:port-8 utilization spikes to 98%
10:23:47 - Queue drops detected on Spine-2
10:23:48 - Controller installs load-balancing adjustment
10:23:49 - Utilization normalizes, drops stop

The controller correlates the link failure → traffic shift → congestion → remediation sequence automatically.

Multi-Layer Correlation

Correlating network events with application metrics:

Network: 100ms latency spike between DC1 and DC2
Application: Database query latency increased 100ms
Correlation: Network latency directly impacted app performance

SDN controllers can expose APIs for application monitoring tools to correlate network and application metrics.

Correlation Capabilities

•Spatial correlation — Trace flows across multiple switches to identify where issues occur
•Temporal correlation — Connect events across time to identify cause-effect relationships
•Multi-flow correlation — Identify competing flows contributing to congestion
•Multi-layer correlation — Connect network events to application performance impact
•Topology correlation — Understand how topology changes affect traffic patterns

Central Point of Analysis

In traditional networks, achieving this correlation requires shipping data from every device to external analytics platforms, then reconstructing network state. SDN's controller already has this unified view—correlation becomes a natural capability rather than an expensive integration project.

Monitoring for Control Decisions

Monitoring data drives the control loop—the feedback mechanism that enables SDN's intelligent network management.

The Control Loop

┌─────────────────────────────────────────────┐
│                 OBSERVE                      │
│  Collect statistics, sample traffic,         │
│  receive notifications                       │
└──────────────────────┬──────────────────────┘
                       ▼
┌─────────────────────────────────────────────┐
│                 ANALYZE                      │
│  Detect anomalies, identify patterns,        │
│  correlate events                            │
└──────────────────────┬──────────────────────┘
                       ▼
┌─────────────────────────────────────────────┐
│                 DECIDE                       │
│  Select response: reroute, rate-limit,       │
│  alert, or take no action                    │
└──────────────────────┬──────────────────────┘
                       ▼
┌─────────────────────────────────────────────┐
│                 ACT                          │
│  Install/modify flow rules, update config,   │
│  notify operators                            │
└──────────────────────┬──────────────────────┘
                       │
                       └──────── OBSERVE ───────┐
                                                │
                       (continuous loop)        │

Monitoring-Driven Applications

1. Adaptive Traffic Engineering:

Input: Link utilization exceeds threshold
Analysis: Identify flows causing congestion
Decision: Reroute some flows to alternate paths
Action: Update flow rules
Feedback: Monitor that congestion resolved

2. Automatic Failure Response:

Input: Port-down notification received
Analysis: Identify affected flows and paths
Decision: Compute new paths avoiding failed link
Action: Install updated flow rules
Feedback: Verify traffic flowing on new paths

3. Security Response:

Input: Traffic sampling detects scanning pattern
Analysis: Classify as port scan/reconnaissance
Decision: Block source, alert security team
Action: Install drop rule for source IP
Feedback: Verify attack traffic stopped

sdn-adaptive-control
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
"""
SDN Adaptive Control Loop
Demonstrates monitoring-driven network optimization
"""
 
from dataclasses import dataclass
from typing import Dict, List, Optional
from enum import Enum
import time
 
class ActionType(Enum):
    REROUTE = "reroute"
    RATE_LIMIT = "rate_limit"
    BLOCK = "block"
    ALERT = "alert"
    NO_ACTION = "no_action"
 
@dataclass
class ControlDecision:
    action: ActionType
    target: str  # Flow ID, switch ID, or IP
    parameters: Dict
    reason: str
 
class AdaptiveController:
    """
    Implements monitoring-driven adaptive control loop.
    """
 
    def __init__(self, network_monitor, path_computer, flow_manager):
        self.monitor = network_monitor
        self.path_computer = path_computer
        self.flow_manager = flow_manager
        
        # Thresholds for decisions
        self.congestion_threshold = 0.85  # 85% utilization
        self.elephant_threshold_bytes = 100_000_000  # 100MB
        self.scan_threshold_ports = 100  # ports per second
 
    def control_loop_iteration(self):
        """
        Single iteration of the control loop.
        Called periodically (e.g., every second).
        """
        # OBSERVE
        observations = self._collect_observations()
        
        # ANALYZE
        issues = self._analyze_observations(observations)
        
        # DECIDE
        decisions = self._make_decisions(issues)
        
        # ACT
        for decision in decisions:
            self._execute_decision(decision)
 
    def _collect_observations(self) -> Dict:
        """Gather current network state."""
        return {
            "link_utilization": self.monitor.get_network_utilization(),
            "elephant_flows": self.monitor.detect_elephant_flows(
                self.elephant_threshold_bytes
            ),
            "traffic_anomalies": self.monitor.detect_anomalies(),
            "port_events": self.monitor.get_recent_port_events(),
            "timestamp": time.time()
        }
 
    def _analyze_observations(self, obs: Dict) -> List[Dict]:
        """Analyze observations to identify issues requiring action."""
        issues = []
 
        # Check for congested links
        for link_id, utilization in obs["link_utilization"].items():
            if utilization > self.congestion_threshold * 100:
                issues.append({
                    "type": "congestion",
                    "link": link_id,
                    "utilization": utilization,
                    "severity": "high" if utilization > 95 else "medium"
                })
 
        # Check for elephant flows on congested paths
        for elephant in obs["elephant_flows"]:
            # Determine if elephant is on congested link
            elephant_path = self.flow_manager.get_flow_path(
                elephant["match"]
            )
            for link in elephant_path:
                if obs["link_utilization"].get(link, 0) > 80:
                    issues.append({
                        "type": "elephant_on_congested_path",
                        "flow": elephant,
                        "congested_link": link
                    })
 
        # Check for security anomalies
        for anomaly in obs["traffic_anomalies"]:
            if anomaly["type"] == "port_scan":
                issues.append({
                    "type": "security_threat",
                    "threat_type": "port_scan",
                    "source": anomaly["source_ip"],
                    "ports_per_second": anomaly["rate"]
                })
 
        return issues
 
    def _make_decisions(self, issues: List[Dict]) -> List[ControlDecision]:
        """Determine appropriate response to each issue."""
        decisions = []
 
        for issue in issues:
            if issue["type"] == "congestion":
                # Find flows that can be rerouted
                reroutable = self._find_reroutable_flows(issue["link"])
                if reroutable:
                    decisions.append(ControlDecision(
                        action=ActionType.REROUTE,
                        target=reroutable[0]["flow_id"],
                        parameters={
                            "from_link": issue["link"],
                            "to_path": self._compute_alternate_path(
                                reroutable[0]
                            )
                        },
                        reason=f"Relieve congestion on {issue['link']}"
                    ))
 
            elif issue["type"] == "elephant_on_congested_path":
                alt_path = self.path_computer.compute_constrained_path(
                    source=issue["flow"]["match"]["ip_src"],
                    dest=issue["flow"]["match"]["ip_dst"],
                    required_bandwidth=issue["flow"]["bytes"] / 
                                      issue["flow"]["duration"],
                    avoid_links=[issue["congested_link"]]
                )
                if alt_path:
                    decisions.append(ControlDecision(
                        action=ActionType.REROUTE,
                        target=self._flow_id(issue["flow"]["match"]),
                        parameters={"new_path": alt_path},
                        reason="Move elephant flow off congested link"
                    ))
 
            elif issue["type"] == "security_threat":
                decisions.append(ControlDecision(
                    action=ActionType.BLOCK,
                    target=issue["source"],
                    parameters={"duration": 3600},  # 1 hour
                    reason=f"Port scan detected: {issue['ports_per_second']} pps"
                ))
                decisions.append(ControlDecision(
                    action=ActionType.ALERT,
                    target="security_team",
                    parameters={
                        "threat": issue,
                        "action_taken": "blocked"
                    },
                    reason="Notify security team of threat"
                ))
 
        return decisions
 
    def _execute_decision(self, decision: ControlDecision):
        """Execute a control decision."""
        print(f"Executing: {decision.action.value} for {decision.target}")
        print(f"  Reason: {decision.reason}")
 
        if decision.action == ActionType.REROUTE:
            self.flow_manager.reroute_flow(
                flow_id=decision.target,
                new_path=decision.parameters["new_path"]
            )
        elif decision.action == ActionType.BLOCK:
            self.flow_manager.install_block_rule(
                source_ip=decision.target,
                duration=decision.parameters["duration"]
            )
        elif decision.action == ActionType.ALERT:
            self._send_alert(
                team=decision.target,
                details=decision.parameters
            )
 
    def _find_reroutable_flows(self, link: str) -> List[Dict]:
        """Find flows on a link that could use alternate paths."""
        pass
 
    def _compute_alternate_path(self, flow: Dict) -> List[str]:
        """Compute alternate path for flow."""
        pass
 
    def _flow_id(self, match: Dict) -> str:
        """Generate unique ID for flow match."""
        return f"{match.get('ip_src', '*')}_{match.get('ip_dst', '*')}"
 
    def _send_alert(self, team: str, details: Dict):
        """Send alert to operations team."""
        pass

Avoiding Oscillation

Rapid control reactions to monitoring data can cause oscillation—rerouting traffic that then causes congestion elsewhere, triggering another reroute. Production systems implement damping (minimum time between changes), hysteresis (different thresholds for action vs. return), and holistic optimization to ensure stable convergence.

Summary: Network Monitoring in SDN

SDN transforms network monitoring from a distributed data-collection challenge into an integrated capability of the control plane. Let's consolidate the key concepts:

Key Takeaways

•Integrated monitoring architecture — The controller naturally aggregates statistics from all switches, providing unified visibility
•Per-flow visibility — OpenFlow counters enable tracking individual conversations through the network
•Dynamic measurement — Install temporary rules for on-demand deep visibility without permanent overhead
•Multiple sampling approaches — sFlow, Packet-In, INT provide different trade-offs between visibility and overhead
•Network-wide correlation — Trace flows across devices, correlate events temporally, identify root causes
•Monitoring drives control — The observe-analyze-decide-act loop enables adaptive network behavior
•Closed-loop automation — Monitoring data triggers automatic responses: rerouting, rate-limiting, blocking

What's Next:

With monitoring foundations established, we'll explore Security Applications—how SDN's visibility and programmability enable sophisticated network security including dynamic access control, micro-segmentation, and real-time threat response.

Page Complete

You now understand how SDN provides comprehensive network visibility through integrated monitoring. This visibility—complete, correlated, and actionable—is the foundation for all intelligent SDN applications. Without knowing what's happening in the network, no amount of programmability matters.

2 / 5

Loading learning content...

Computer NetworksSDN Applications

Software-Defined Networking Applications

LevelAdvanced

Duration120 mins

TopicSDN Applications

2 / 5

Network Monitoring: Comprehensive Visibility Through SDN

The Visibility Revolution

What You Will Learn

SDN Monitoring Architecture

SDN's monitoring capabilities stem from its fundamental architecture—the separation of control and data planes creates natural instrumentation points.

Monitoring Data Sources

1. OpenFlow Statistics:

Every OpenFlow switch maintains counters that the controller can query:

Port statistics: Bytes, packets, drops, errors per port
Flow statistics: Bytes, packets, duration per flow rule
Table statistics: Active entries, lookups, matches per table
Queue statistics: Bytes, packets, drops per queue
Group statistics: Bytes, packets per group bucket
Meter statistics: Bytes, packets processed per meter band

2. Packet-In Messages:

When switches encounter unknown flows or explicit sampling rules, they send packets (or headers) to the controller—providing direct traffic visibility.

3. Port Status Notifications:

Switches asynchronously notify controllers of port state changes, link failures, and configuration modifications.

4. Auxiliary Connections:

OpenFlow 1.3+ supports auxiliary connections for high-volume data like sampled traffic, separate from the main control channel.

Controller as Monitoring Platform

The controller aggregates monitoring data from all switches, providing:

Unified view: Single point for network-wide statistics
Correlation: Connect events across devices and time
Computation: Derive metrics (utilization, loss rates) from raw counters
Alerting: Detect anomalies and trigger responses
Historical storage: Retain data for trend analysis

Converting Mermaid diagram...

Polling vs. Push

Flow-Level Visibility

One of SDN's most powerful monitoring capabilities is flow-level visibility—the ability to track individual conversations through the network.

Per-Flow Statistics

Every flow rule maintains counters:

Flow Rule:
  Match: ip_src=10.1.1.0/24, ip_dst=10.2.2.0/24, tcp_dst=443
  Actions: Output(port=3)
  Counters:
    - packet_count: 1,547,832
    - byte_count: 2,147,483,648
    - duration_sec: 3600
    - duration_nsec: 500000000

Derived Metrics:

Throughput: byte_count / duration = 596 Kbps
Packet rate: packet_count / duration = 430 pps
Average packet size: byte_count / packet_count = 1,387 bytes

Dynamic Flow Installation for Measurement

The controller can install temporary rules purely for measurement:

Use Case: Measure traffic between specific hosts

1. Install high-priority rule matching specific flow
2. Set action to: Forward + count (same as existing path)
3. Periodically read counters
4. Remove rule when measurement complete

This enables on-demand deep visibility without permanent overhead.

sdn-flow-monitoring
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
"""
SDN Flow Monitoring: Programmatic Per-Flow Statistics Collection
Demonstrates dynamic flow measurement installation and analysis
"""
 
from dataclasses import dataclass
from typing import Dict, List, Optional
from datetime import datetime, timedelta
import time
 
@dataclass
class FlowStats:
    """Statistics for a single flow rule"""
    match: Dict[str, str]
    packet_count: int
    byte_count: int
    duration_sec: int
    duration_nsec: int = 0
 
    @property
    def duration_total_sec(self) -> float:
        return self.duration_sec + (self.duration_nsec / 1_000_000_000)
 
    @property
    def throughput_bps(self) -> float:
        if self.duration_total_sec == 0:
            return 0
        return (self.byte_count * 8) / self.duration_total_sec
 
    @property
    def packet_rate(self) -> float:
        if self.duration_total_sec == 0:
            return 0
        return self.packet_count / self.duration_total_sec
 
    @property
    def avg_packet_size(self) -> float:
        if self.packet_count == 0:
            return 0
        return self.byte_count / self.packet_count
 
 
class FlowMonitor:
    """
    SDN Flow Monitoring System
    Provides per-flow visibility through OpenFlow statistics
    """
 
    def __init__(self, controller_connection):
        self.controller = controller_connection
        self.flow_history: Dict[str, List[FlowStats]] = {}
        self.active_measurements: Dict[str, dict] = {}
 
    def get_all_flow_stats(self, switch_id: str) -> List[FlowStats]:
        """
        Query all flow statistics from a switch.
        OpenFlow OFPMP_FLOW (Multipart Flow Stats Request)
        """
        # In real implementation, this sends OpenFlow message
        # and parses response
        response = self.controller.send_stats_request(
            switch_id=switch_id,
            stats_type="FLOW",
            match={}  # Empty match = all flows
        )
        
        return [
            FlowStats(
                match=flow['match'],
                packet_count=flow['packet_count'],
                byte_count=flow['byte_count'],
                duration_sec=flow['duration_sec'],
                duration_nsec=flow['duration_nsec']
            )
            for flow in response['flows']
        ]
 
    def install_measurement_flow(
        self,
        switch_id: str,
        src_ip: str,
        dst_ip: str,
        protocol: Optional[str] = None,
        dst_port: Optional[int] = None,
        measurement_id: str = None
    ) -> str:
        """
        Install a high-priority flow rule for measurement.
        The rule matches specific traffic and forwards normally,
        but allows us to track counters for this specific flow.
        """
        measurement_id = measurement_id or f"measure_{int(time.time())}"
 
        match = {
            "ip_src": src_ip,
            "ip_dst": dst_ip,
        }
        if protocol:
            match["ip_proto"] = protocol
        if dst_port:
            match["tcp_dst" if protocol == "TCP" else "udp_dst"] = dst_port
 
        # Get existing forwarding action for this traffic
        # (We want to measure without changing forwarding behavior)
        existing_action = self._get_existing_action(switch_id, match)
        
        # Install measurement rule at high priority
        self.controller.install_flow(
            switch_id=switch_id,
            priority=65000,  # High priority to ensure match
            match=match,
            actions=existing_action,  # Same forwarding as before
            idle_timeout=0,  # Don't expire
            hard_timeout=0,
            cookie=hash(measurement_id) & 0xFFFFFFFFFFFFFFFF
        )
 
        self.active_measurements[measurement_id] = {
            "switch_id": switch_id,
            "match": match,
            "installed_at": datetime.now(),
            "samples": []
        }
 
        return measurement_id
 
    def sample_measurement(self, measurement_id: str) -> Optional[FlowStats]:
        """
        Collect current statistics for a measurement flow.
        """
        if measurement_id not in self.active_measurements:
            return None
 
        measurement = self.active_measurements[measurement_id]
        
        stats = self.controller.get_flow_stats(
            switch_id=measurement["switch_id"],
            match=measurement["match"]
        )
 
        if stats:
            flow_stats = FlowStats(
                match=measurement["match"],
                packet_count=stats['packet_count'],
                byte_count=stats['byte_count'],
                duration_sec=stats['duration_sec'],
                duration_nsec=stats['duration_nsec']
            )
            measurement["samples"].append({
                "timestamp": datetime.now(),
                "stats": flow_stats
            })
            return flow_stats
 
        return None
 
    def compute_interval_stats(
        self,
        measurement_id: str,
        interval_seconds: int = 60
    ) -> Dict:
        """
        Compute statistics over the last interval.
        Uses delta between samples for accurate interval metrics.
        """
        if measurement_id not in self.active_measurements:
            return {}
 
        samples = self.active_measurements[measurement_id]["samples"]
        if len(samples) < 2:
            return {"error": "Need at least 2 samples"}
 
        # Find samples spanning the interval
        now = datetime.now()
        interval_start = now - timedelta(seconds=interval_seconds)
 
        relevant_samples = [
            s for s in samples 
            if s["timestamp"] >= interval_start
        ]
 
        if len(relevant_samples) < 2:
            return {"error": "Insufficient samples in interval"}
 
        first = relevant_samples[0]["stats"]
        last = relevant_samples[-1]["stats"]
        
        time_delta = (
            relevant_samples[-1]["timestamp"] - 
            relevant_samples[0]["timestamp"]
        ).total_seconds()
 
        byte_delta = last.byte_count - first.byte_count
        packet_delta = last.packet_count - first.packet_count
 
        return {
            "interval_seconds": time_delta,
            "bytes_transferred": byte_delta,
            "packets_transferred": packet_delta,
            "throughput_bps": (byte_delta * 8) / time_delta if time_delta else 0,
            "packet_rate_pps": packet_delta / time_delta if time_delta else 0,
            "avg_packet_size": byte_delta / packet_delta if packet_delta else 0
        }
 
    def remove_measurement(self, measurement_id: str):
        """Remove measurement flow rule and clean up."""
        if measurement_id not in self.active_measurements:
            return
 
        measurement = self.active_measurements[measurement_id]
        
        self.controller.delete_flow(
            switch_id=measurement["switch_id"],
            cookie=hash(measurement_id) & 0xFFFFFFFFFFFFFFFF
        )
 
        del self.active_measurements[measurement_id]
 
    def _get_existing_action(self, switch_id: str, match: Dict) -> List:
        """Query existing forwarding action for traffic matching pattern."""
        # Implementation queries flow tables to find current action
        # Returns action list like [{"type": "OUTPUT", "port": 3}]
        pass
 
 
# Demonstration of network-wide monitoring
class NetworkWideMonitor:
    """
    Aggregates monitoring across all switches for network-wide view.
    """
 
    def __init__(self, controller):
        self.controller = controller
        self.switch_monitors: Dict[str, FlowMonitor] = {}
 
    def get_network_utilization(self) -> Dict[str, float]:
        """
        Collect port utilization across all switches.
        Returns link utilization as percentage.
        """
        utilization = {}
 
        for switch_id in self.controller.get_all_switches():
            port_stats = self.controller.get_port_stats(switch_id)
 
            for port in port_stats:
                link_id = f"{switch_id}:{port['port_no']}"
                
                # Calculate utilization from byte counters
                # Assuming we have previous sample and link capacity
                capacity_bps = port.get('curr_speed', 10_000_000_000)
                
                # In real implementation, compute delta from previous sample
                current_bps = self._compute_rate(
                    switch_id, port['port_no'], port['tx_bytes']
                )
                
                utilization[link_id] = (current_bps / capacity_bps) * 100
 
        return utilization
 
    def detect_elephant_flows(
        self,
        threshold_bytes: int = 10_000_000  # 10MB
    ) -> List[Dict]:
        """
        Identify large flows across the network.
        Elephant flows are candidates for special handling.
        """
        elephants = []
 
        for switch_id in self.controller.get_all_switches():
            flows = self.controller.get_flow_stats(switch_id)
 
            for flow in flows:
                if flow['byte_count'] >= threshold_bytes:
                    elephants.append({
                        "switch": switch_id,
                        "match": flow['match'],
                        "bytes": flow['byte_count'],
                        "packets": flow['packet_count'],
                        "duration": flow['duration_sec']
                    })
 
        # Sort by size, largest first
        return sorted(elephants, key=lambda x: x['bytes'], reverse=True)
 
    def _compute_rate(self, switch_id, port_no, current_bytes):
        """Compute rate from counter delta."""
        # Implementation tracks previous values and timestamps
        pass

Counter Wraparound

Traffic Sampling and Analysis

While flow statistics provide aggregate metrics, sometimes deeper packet-level analysis is required. SDN enables sophisticated sampling strategies.

Sampling Approaches

1. sFlow Integration:

Many OpenFlow switches also support sFlow—a hardware-based sampling technology:

Random 1-in-N packet sampling
Samples sent to external collector
Minimal performance impact
Industry standard analysis tools

2. OpenFlow Packet-In Sampling:

Controller-directed sampling using OpenFlow:

Flow Rule:
  Match: ip_dst=0.0.0.0/0  (all traffic)
  Actions:
    - Sample(probability=0.001)  # 1 in 1000 packets
    - Forward(normal)            # Continue processing

3. Mirror Port Configuration:

SDN can dynamically configure port mirroring:

Mirror specific flows to analysis port
Time-bounded mirroring for troubleshooting
Selective mirroring based on traffic characteristics

In-Band Network Telemetry (INT)

Modern switches support INT—embedding measurement metadata directly in packets:

How INT Works:

Source adds INT header requesting telemetry
Each switch along path adds its metadata:
- Switch ID
- Ingress/egress port
- Queue depth
- Timestamp
Destination/collector extracts and processes metadata

Benefits:

Hop-by-hop latency measurement
Queue depth visibility at each hop
Path verification (did packet take expected route?)
Minimal additional overhead (metadata in packet)
No controller polling required

Traffic Visibility Approaches Comparison
Approach	Visibility	Overhead	Use Case
Flow Statistics	Aggregate counters per rule	Low (polling)	Utilization, throughput monitoring
sFlow Sampling	Packet headers (sampled)	Very Low	Traffic analysis, DDoS detection
Packet-In	Full packets (selective)	High	Deep inspection, unknown flows
Port Mirroring	Full packets (mirrored)	Medium	Troubleshooting, forensics
INT	Per-hop metadata in-band	Low	Latency, path verification

Real-Time Traffic Classification

SDN controllers can perform real-time traffic classification using sampling data:

Classification Hierarchy:

Protocol identification: HTTP, HTTPS, DNS, custom protocols
Application detection: Video streaming, VoIP, web browsing, backup
Behavior analysis: Interactive, bulk transfer, periodic, bursty
Anomaly detection: Port scans, DDoS, data exfiltration

Controller Actions Based on Classification:

Route video traffic through high-bandwidth paths
Prioritize VoIP in QoS queues
Rate-limit or block detected attacks
Alert operators to unusual patterns

The feedback loop—monitor → classify → act → monitor—enables adaptive network behavior that traditional networks cannot achieve.

Sampling Rate Trade-offs

Network-Wide Event Correlation

The controller's unified view enables correlation across devices that traditional monitoring tools struggle to achieve.

Cross-Device Flow Tracing

Trace a flow's path through the network:

Flow: src=10.1.1.5, dst=10.3.3.8, tcp/443

Switch 1 (Leaf-1):
  Ingress port 5, matched flow rule #47
  Egress port 49 (uplink to Spine-1)
  Packets: 15,234 | Bytes: 21,457,892

Switch 2 (Spine-1):
  Ingress port 1, matched flow rule #112
  Egress port 24 (downlink to Leaf-3)
  Packets: 15,234 | Bytes: 21,457,892  ← No loss

Switch 3 (Leaf-3):
  Ingress port 49, matched flow rule #83
  Egress port 12 (server port)
  Packets: 15,230 | Bytes: 21,450,000  ← 4 packets lost here!

Root Cause Identification:

By correlating counters across path, we identify:

Where loss occurs (between which switches)
Whether drops are ingress or egress
Queue depth at time of drops
Other competing flows

Temporal Correlation

Link events and performance:

Timeline:
10:23:45 - Link Spine-1:port-12 flaps down
10:23:45 - Controller receives port-down notification
10:23:46 - ECMP rehash redistributes traffic
10:23:46 - Link Spine-2:port-8 utilization spikes to 98%
10:23:47 - Queue drops detected on Spine-2
10:23:48 - Controller installs load-balancing adjustment
10:23:49 - Utilization normalizes, drops stop

The controller correlates the link failure → traffic shift → congestion → remediation sequence automatically.

Multi-Layer Correlation

Correlating network events with application metrics:

Network: 100ms latency spike between DC1 and DC2
Application: Database query latency increased 100ms
Correlation: Network latency directly impacted app performance

SDN controllers can expose APIs for application monitoring tools to correlate network and application metrics.

Correlation Capabilities

•Spatial correlation — Trace flows across multiple switches to identify where issues occur
•Temporal correlation — Connect events across time to identify cause-effect relationships
•Multi-flow correlation — Identify competing flows contributing to congestion
•Multi-layer correlation — Connect network events to application performance impact
•Topology correlation — Understand how topology changes affect traffic patterns

Central Point of Analysis

Monitoring for Control Decisions

Monitoring data drives the control loop—the feedback mechanism that enables SDN's intelligent network management.

The Control Loop

┌─────────────────────────────────────────────┐
│                 OBSERVE                      │
│  Collect statistics, sample traffic,         │
│  receive notifications                       │
└──────────────────────┬──────────────────────┘
                       ▼
┌─────────────────────────────────────────────┐
│                 ANALYZE                      │
│  Detect anomalies, identify patterns,        │
│  correlate events                            │
└──────────────────────┬──────────────────────┘
                       ▼
┌─────────────────────────────────────────────┐
│                 DECIDE                       │
│  Select response: reroute, rate-limit,       │
│  alert, or take no action                    │
└──────────────────────┬──────────────────────┘
                       ▼
┌─────────────────────────────────────────────┐
│                 ACT                          │
│  Install/modify flow rules, update config,   │
│  notify operators                            │
└──────────────────────┬──────────────────────┘
                       │
                       └──────── OBSERVE ───────┐
                                                │
                       (continuous loop)        │

Monitoring-Driven Applications

1. Adaptive Traffic Engineering:

Input: Link utilization exceeds threshold
Analysis: Identify flows causing congestion
Decision: Reroute some flows to alternate paths
Action: Update flow rules
Feedback: Monitor that congestion resolved

2. Automatic Failure Response:

Input: Port-down notification received
Analysis: Identify affected flows and paths
Decision: Compute new paths avoiding failed link
Action: Install updated flow rules
Feedback: Verify traffic flowing on new paths

3. Security Response:

Input: Traffic sampling detects scanning pattern
Analysis: Classify as port scan/reconnaissance
Decision: Block source, alert security team
Action: Install drop rule for source IP
Feedback: Verify attack traffic stopped

sdn-adaptive-control
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
"""
SDN Adaptive Control Loop
Demonstrates monitoring-driven network optimization
"""
 
from dataclasses import dataclass
from typing import Dict, List, Optional
from enum import Enum
import time
 
class ActionType(Enum):
    REROUTE = "reroute"
    RATE_LIMIT = "rate_limit"
    BLOCK = "block"
    ALERT = "alert"
    NO_ACTION = "no_action"
 
@dataclass
class ControlDecision:
    action: ActionType
    target: str  # Flow ID, switch ID, or IP
    parameters: Dict
    reason: str
 
class AdaptiveController:
    """
    Implements monitoring-driven adaptive control loop.
    """
 
    def __init__(self, network_monitor, path_computer, flow_manager):
        self.monitor = network_monitor
        self.path_computer = path_computer
        self.flow_manager = flow_manager
        
        # Thresholds for decisions
        self.congestion_threshold = 0.85  # 85% utilization
        self.elephant_threshold_bytes = 100_000_000  # 100MB
        self.scan_threshold_ports = 100  # ports per second
 
    def control_loop_iteration(self):
        """
        Single iteration of the control loop.
        Called periodically (e.g., every second).
        """
        # OBSERVE
        observations = self._collect_observations()
        
        # ANALYZE
        issues = self._analyze_observations(observations)
        
        # DECIDE
        decisions = self._make_decisions(issues)
        
        # ACT
        for decision in decisions:
            self._execute_decision(decision)
 
    def _collect_observations(self) -> Dict:
        """Gather current network state."""
        return {
            "link_utilization": self.monitor.get_network_utilization(),
            "elephant_flows": self.monitor.detect_elephant_flows(
                self.elephant_threshold_bytes
            ),
            "traffic_anomalies": self.monitor.detect_anomalies(),
            "port_events": self.monitor.get_recent_port_events(),
            "timestamp": time.time()
        }
 
    def _analyze_observations(self, obs: Dict) -> List[Dict]:
        """Analyze observations to identify issues requiring action."""
        issues = []
 
        # Check for congested links
        for link_id, utilization in obs["link_utilization"].items():
            if utilization > self.congestion_threshold * 100:
                issues.append({
                    "type": "congestion",
                    "link": link_id,
                    "utilization": utilization,
                    "severity": "high" if utilization > 95 else "medium"
                })
 
        # Check for elephant flows on congested paths
        for elephant in obs["elephant_flows"]:
            # Determine if elephant is on congested link
            elephant_path = self.flow_manager.get_flow_path(
                elephant["match"]
            )
            for link in elephant_path:
                if obs["link_utilization"].get(link, 0) > 80:
                    issues.append({
                        "type": "elephant_on_congested_path",
                        "flow": elephant,
                        "congested_link": link
                    })
 
        # Check for security anomalies
        for anomaly in obs["traffic_anomalies"]:
            if anomaly["type"] == "port_scan":
                issues.append({
                    "type": "security_threat",
                    "threat_type": "port_scan",
                    "source": anomaly["source_ip"],
                    "ports_per_second": anomaly["rate"]
                })
 
        return issues
 
    def _make_decisions(self, issues: List[Dict]) -> List[ControlDecision]:
        """Determine appropriate response to each issue."""
        decisions = []
 
        for issue in issues:
            if issue["type"] == "congestion":
                # Find flows that can be rerouted
                reroutable = self._find_reroutable_flows(issue["link"])
                if reroutable:
                    decisions.append(ControlDecision(
                        action=ActionType.REROUTE,
                        target=reroutable[0]["flow_id"],
                        parameters={
                            "from_link": issue["link"],
                            "to_path": self._compute_alternate_path(
                                reroutable[0]
                            )
                        },
                        reason=f"Relieve congestion on {issue['link']}"
                    ))
 
            elif issue["type"] == "elephant_on_congested_path":
                alt_path = self.path_computer.compute_constrained_path(
                    source=issue["flow"]["match"]["ip_src"],
                    dest=issue["flow"]["match"]["ip_dst"],
                    required_bandwidth=issue["flow"]["bytes"] / 
                                      issue["flow"]["duration"],
                    avoid_links=[issue["congested_link"]]
                )
                if alt_path:
                    decisions.append(ControlDecision(
                        action=ActionType.REROUTE,
                        target=self._flow_id(issue["flow"]["match"]),
                        parameters={"new_path": alt_path},
                        reason="Move elephant flow off congested link"
                    ))
 
            elif issue["type"] == "security_threat":
                decisions.append(ControlDecision(
                    action=ActionType.BLOCK,
                    target=issue["source"],
                    parameters={"duration": 3600},  # 1 hour
                    reason=f"Port scan detected: {issue['ports_per_second']} pps"
                ))
                decisions.append(ControlDecision(
                    action=ActionType.ALERT,
                    target="security_team",
                    parameters={
                        "threat": issue,
                        "action_taken": "blocked"
                    },
                    reason="Notify security team of threat"
                ))
 
        return decisions
 
    def _execute_decision(self, decision: ControlDecision):
        """Execute a control decision."""
        print(f"Executing: {decision.action.value} for {decision.target}")
        print(f"  Reason: {decision.reason}")
 
        if decision.action == ActionType.REROUTE:
            self.flow_manager.reroute_flow(
                flow_id=decision.target,
                new_path=decision.parameters["new_path"]
            )
        elif decision.action == ActionType.BLOCK:
            self.flow_manager.install_block_rule(
                source_ip=decision.target,
                duration=decision.parameters["duration"]
            )
        elif decision.action == ActionType.ALERT:
            self._send_alert(
                team=decision.target,
                details=decision.parameters
            )
 
    def _find_reroutable_flows(self, link: str) -> List[Dict]:
        """Find flows on a link that could use alternate paths."""
        pass
 
    def _compute_alternate_path(self, flow: Dict) -> List[str]:
        """Compute alternate path for flow."""
        pass
 
    def _flow_id(self, match: Dict) -> str:
        """Generate unique ID for flow match."""
        return f"{match.get('ip_src', '*')}_{match.get('ip_dst', '*')}"
 
    def _send_alert(self, team: str, details: Dict):
        """Send alert to operations team."""
        pass

Avoiding Oscillation

Summary: Network Monitoring in SDN

SDN transforms network monitoring from a distributed data-collection challenge into an integrated capability of the control plane. Let's consolidate the key concepts:

Key Takeaways

•Integrated monitoring architecture — The controller naturally aggregates statistics from all switches, providing unified visibility
•Per-flow visibility — OpenFlow counters enable tracking individual conversations through the network
•Dynamic measurement — Install temporary rules for on-demand deep visibility without permanent overhead
•Multiple sampling approaches — sFlow, Packet-In, INT provide different trade-offs between visibility and overhead
•Network-wide correlation — Trace flows across devices, correlate events temporally, identify root causes
•Monitoring drives control — The observe-analyze-decide-act loop enables adaptive network behavior
•Closed-loop automation — Monitoring data triggers automatic responses: rerouting, rate-limiting, blocking

What's Next:

Page Complete

2 / 5