Computer NetworksOpenFlow

OpenFlow: The Standard Southbound Interface for SDN

LevelAdvanced

Duration90 mins

TopicOpenFlow

4 / 5

Controller Communication: The SDN Control Loop

The Brain and the Body

If flow tables are the heart of OpenFlow switches, the controller is the brain of the SDN network. The communication between controller and switches forms the control loop—the essential feedback mechanism through which network intelligence translates into packet handling behavior.

This control loop is not a simple one-way command stream. It's a rich, bidirectional conversation: switches report events and request guidance; controllers respond with configuration and policies; both sides maintain connection health and negotiate capabilities. The efficiency, reliability, and scalability of this communication fundamentally determine what SDN can achieve.

Understanding controller communication patterns is essential for:

Performance: Knowing when reactive flow installation creates bottlenecks
Reliability: Designing for controller failures and network partitions
Scale: Building architectures that handle thousands of switches
Security: Protecting the control channel that governs the entire network

This page explores controller-switch communication in exhaustive depth: connection lifecycle, reactive versus proactive patterns, asynchronous message handling, multi-controller architectures, and strategies for scalability. By the end, you'll understand the control plane dynamics that bring SDN networks to life.

What You Will Master

By completing this page, you will understand: the complete OpenFlow connection lifecycle, reactive flow installation and its performance implications, proactive flow installation for predictable latency, PACKET_IN handling patterns and optimization strategies, multi-controller architectures (master/slave, equal), controller high availability mechanisms, and scalability limits and mitigation strategies.

Connection Lifecycle

Every OpenFlow session begins with a well-defined connection sequence. Understanding this lifecycle is essential for debugging connection issues and implementing robust SDN applications.

Phase 1: TCP Connection Establishment

OpenFlow runs over TCP (or TLS). The switch initiates the connection to a configured controller address:

Default port: 6653 (IANA registered) or 6633 (legacy)
Connection direction: Switch → Controller (switch is TCP client)
Multiple controllers: Switch can connect to multiple controller addresses

Phase 2: TLS Negotiation (Optional but Recommended)

If TLS is configured:

Switch and controller exchange certificates
Mutual authentication verifies both parties
Encrypted channel established for all subsequent messages

Converting Mermaid diagram...

Phase 3: OpenFlow Version Negotiation

Both sides exchange HELLO messages containing their supported versions. Key behaviors:

Bitmap format (OF 1.3.1+): HELLO includes a bitmap of all supported versions
Version selection: Use highest mutually supported version
Asymmetric support: A 1.5 controller can operate at 1.3 with an older switch
Negotiation failure: If no common version exists, send ERROR and close connection

Phase 4: Feature Discovery

The controller queries switch capabilities:

FEATURES_REPLY Content
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/* Switch features returned by FEATURES_REPLY */
struct ofp_switch_features {
    struct ofp_header header;
    
    uint64_t datapath_id;       /* Unique switch identifier (DPID)
                                   Often based on switch MAC address */
    
    uint32_t n_buffers;         /* Packets switch can buffer for PACKET_OUT
                                   (0 = no buffering, send full packet) */
    
    uint8_t  n_tables;          /* Number of flow tables supported */
    
    uint8_t  auxiliary_id;      /* Connection ID (0 = main, 1+ = auxiliary) */
    
    uint8_t  pad[2];
    
    /* Capabilities bitmap */
    uint32_t capabilities;
    /* OFPC_FLOW_STATS: Flow statistics supported
     * OFPC_TABLE_STATS: Table statistics supported
     * OFPC_PORT_STATS: Port statistics supported
     * OFPC_GROUP_STATS: Group statistics supported
     * OFPC_IP_REASM: Can reassemble IP fragments
     * OFPC_QUEUE_STATS: Queue statistics supported
     * OFPC_PORT_BLOCKED: Switch can block ports (STP) */
    
    uint32_t reserved;
};

Phase 5: Initial Configuration

The controller typically performs:

SET_CONFIG: Configure miss_send_len (bytes of unmatched packets to send)
Table feature queries: Discover match/action capabilities per table
Install table-miss entries: Configure default behavior for unmatched packets
Port queries: Discover physical ports and their status

Phase 6: Steady State

Once configured, the connection enters steady state:

ECHO keepalives: Verify connection liveness (controller-initiated, typically every 5-30 seconds)
Asynchronous messages: Switch sends PACKET_IN, PORT_STATUS, FLOW_REMOVED as events occur
Controller commands: FLOW_MOD, GROUP_MOD, PACKET_OUT as policies change

Connection Failure Handling

If the connection fails:

Switch enters configured fail mode (secure or standalone)
Switch attempts reconnection to backup controller addresses
Controller marks switch as disconnected, may trigger network reconvergence

The DPID is Your Switch Identity

The datapath_id (DPID) uniquely identifies each switch. It's typically derived from the switch's base MAC address. The controller uses DPID to correlate messages, maintain per-switch state, and build the network topology. DPID collisions (rare but possible with VM-based switches) cause serious confusion. Always verify DPID uniqueness in your network.

Reactive Flow Installation

Reactive flow installation means the controller installs flow entries in response to traffic. When a packet matches no flow entry, the switch sends it to the controller via PACKET_IN. The controller computes the appropriate handling, installs relevant flows via FLOW_MOD, and (optionally) instructs the switch to forward the buffered packet via PACKET_OUT.

The Reactive Pattern

Converting Mermaid diagram...

Reactive Pattern Advantages

Memory efficient: Only active flows consume table entries
Simple initial setup: No need to precompute all possible flows
Handles dynamic environments: New hosts automatically trigger flow installation
Fine-grained policies: Per-connection or per-packet decisions possible

Reactive Pattern Disadvantages

First-packet latency: Initial packets incur controller round-trip delay
Controller bottleneck: High table-miss rates overwhelm controller
Scalability limits: Controller processing becomes the system bottleneck
State explosion: Long-lived connections with many flows fill tables

When Reactive Works Well

Reactive installation is appropriate for: (1) Learning environments where traffic patterns are unknown, (2) Low-rate control plane traffic (management, monitoring), (3) Exception handling for unusual traffic, (4) Small networks where controller can handle the load. Avoid reactive for high-volume, latency-sensitive production traffic.

PACKET_IN Handling Best Practices

Efficient PACKET_IN Handler
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
class ReactiveL2Switch:
    """
    Efficient reactive L2 learning switch.
    Demonstrates best practices for PACKET_IN handling.
    """
    
    def __init__(self):
        # Per-switch MAC table: {dpid: {mac: port}}
        self.mac_to_port = {}
        
        # Rate limiting to prevent controller overload
        self.packet_in_count = {}
        self.MAX_PACKET_IN_PER_SECOND = 1000
    
    @set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)
    def packet_in_handler(self, ev):
        """Handle PACKET_IN messages from switches."""
        
        msg = ev.msg
        dp = msg.datapath
        dpid = dp.id
        ofproto = dp.ofproto
        parser = dp.ofproto_parser
        
        # Rate limit check - protect controller from flooding
        if not self._check_rate_limit(dpid):
            self.logger.warning(f"Rate limit exceeded for switch {dpid}")
            return
        
        # Extract packet info
        in_port = msg.match['in_port']
        pkt = packet.Packet(msg.data)
        eth = pkt.get_protocols(ethernet.ethernet)[0]
        
        # Ignore LLDP (topology discovery handled separately)
        if eth.ethertype == 0x88cc:
            return
        
        src_mac = eth.src
        dst_mac = eth.dst
        
        # Initialize MAC table for this switch
        self.mac_to_port.setdefault(dpid, {})
        
        # LEARN: Record source MAC → ingress port mapping
        self.mac_to_port[dpid][src_mac] = in_port
        
        # FORWARD: Lookup destination MAC
        if dst_mac in self.mac_to_port[dpid]:
            out_port = self.mac_to_port[dpid][dst_mac]
        else:
            out_port = ofproto.OFPP_FLOOD  # Unknown destination
        
        actions = [parser.OFPActionOutput(out_port)]
        
        # INSTALL FLOW: Only if destination is known (avoid flooding flows)
        if out_port != ofproto.OFPP_FLOOD:
            match = parser.OFPMatch(
                in_port=in_port,
                eth_dst=dst_mac,
                eth_src=src_mac
            )
            
            # Install bidirectional flows with timeouts
            self._add_flow(dp, priority=10, match=match, actions=actions,
                          idle_timeout=60, hard_timeout=300)
            
            # Also install reverse flow
            reverse_match = parser.OFPMatch(
                in_port=out_port,
                eth_dst=src_mac,
                eth_src=dst_mac
            )
            reverse_actions = [parser.OFPActionOutput(in_port)]
            self._add_flow(dp, priority=10, match=reverse_match,
                          actions=reverse_actions, idle_timeout=60, hard_timeout=300)
        
        # PACKET_OUT: Forward the buffered/received packet
        buffer_id = msg.buffer_id
        if buffer_id == ofproto.OFP_NO_BUFFER:
            # Full packet in message - include data
            out = parser.OFPPacketOut(
                datapath=dp, buffer_id=ofproto.OFP_NO_BUFFER,
                in_port=in_port, actions=actions, data=msg.data
            )
        else:
            # Packet buffered on switch - reference buffer
            out = parser.OFPPacketOut(
                datapath=dp, buffer_id=buffer_id,
                in_port=in_port, actions=actions
            )
        dp.send_msg(out)
    
    def _add_flow(self, datapath, priority, match, actions,
                  idle_timeout=0, hard_timeout=0):
        """Install a flow entry with proper instructions."""
        ofproto = datapath.ofproto
        parser = datapath.ofproto_parser
        
        inst = [parser.OFPInstructionActions(
            ofproto.OFPIT_APPLY_ACTIONS, actions)]
        
        mod = parser.OFPFlowMod(
            datapath=datapath, priority=priority,
            match=match, instructions=inst,
            idle_timeout=idle_timeout, hard_timeout=hard_timeout,
            flags=ofproto.OFPFF_SEND_FLOW_REM  # Notify on expiry
        )
        datapath.send_msg(mod)
    
    def _check_rate_limit(self, dpid):
        """Simple rate limiting per switch."""
        import time
        current_time = int(time.time())
        
        if dpid not in self.packet_in_count:
            self.packet_in_count[dpid] = (current_time, 1)
            return True
        
        last_time, count = self.packet_in_count[dpid]
        
        if current_time > last_time:
            self.packet_in_count[dpid] = (current_time, 1)
            return True
        
        if count < self.MAX_PACKET_IN_PER_SECOND:
            self.packet_in_count[dpid] = (current_time, count + 1)
            return True
        
        return False  # Rate limit exceeded

Proactive Flow Installation

Proactive flow installation means the controller installs flows before traffic arrives. The controller computes the necessary flows based on network topology, policy requirements, or traffic engineering objectives, then pushes them to switches in advance.

The Proactive Pattern

Converting Mermaid diagram...

Proactive Pattern Advantages

Zero first-packet latency: Traffic immediately hits cached flows
Controller not in data path: Eliminates controller as bottleneck
Predictable performance: No variance from reactive processing
Reduced controller load: Steady-state operation requires minimal messages

Proactive Pattern Disadvantages

Memory consumption: Must install flows for all possible traffic (or aggregates)
Topology dependency: Must know full network state before computing flows
Update complexity: Policy changes require recomputing and redistributing flows
Dynamic environments: New hosts/services may not have preinstalled flows

When Proactive Excels

Proactive installation is ideal for: (1) Known, stable topologies (data centers), (2) Aggregate policies expressed as IP prefixes or port ranges, (3) Latency-sensitive traffic requiring consistent performance, (4) High-throughput environments where controller bottleneck is unacceptable. Google's B4 WAN is famously proactive—traffic engineering is precomputed and pushed to all switches.

Proactive Installation Patterns

Proactive Flow Installation
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
class ProactiveRouter:
    """
    Proactive L3 router that computes and installs all paths at startup.
    """
    
    def __init__(self):
        self.topology = None  # Network graph
        self.switches = {}    # dpid -> datapath
        self.routing_table = {}  # (src_net, dst_net) -> path
    
    @set_ev_cls(ofp_event.EventOFPSwitchFeatures, CONFIG_DISPATCHER)
    def switch_features_handler(self, ev):
        """Handle new switch connections."""
        dp = ev.msg.datapath
        self.switches[dp.id] = dp
        
        # Install table-miss with drop (secure default)
        self._install_table_miss_drop(dp)
        
        # Check if all expected switches are connected
        if self._all_switches_connected():
            self._compute_and_install_all_routes()
    
    def _compute_and_install_all_routes(self):
        """Compute shortest paths and install flows proactively."""
        
        # Define network prefixes and their attachment points
        networks = {
            "10.0.1.0/24": {"switch": 1, "port": 1},
            "10.0.2.0/24": {"switch": 2, "port": 1},
            "10.0.3.0/24": {"switch": 3, "port": 1},
            "10.0.4.0/24": {"switch": 3, "port": 2},
        }
        
        # Compute all-pairs shortest paths
        for src_net, src_info in networks.items():
            for dst_net, dst_info in networks.items():
                if src_net == dst_net:
                    continue
                
                path = self._compute_shortest_path(
                    src_info["switch"], dst_info["switch"]
                )
                
                # Install flows along the path
                self._install_path_flows(
                    path, src_net, dst_net,
                    src_info["port"], dst_info["port"]
                )
    
    def _install_path_flows(self, path, src_net, dst_net, src_port, dst_port):
        """Install forwarding rules along a computed path."""
        
        for i, switch_id in enumerate(path):
            dp = self.switches[switch_id]
            parser = dp.ofproto_parser
            ofproto = dp.ofproto
            
            # Determine output port for this hop
            if i == len(path) - 1:
                # Last switch - output to destination network
                out_port = dst_port
            else:
                # Intermediate switch - output to next switch
                next_switch = path[i + 1]
                out_port = self._get_port_to_neighbor(switch_id, next_switch)
            
            # Match on destination prefix
            match = parser.OFPMatch(
                eth_type=0x0800,
                ipv4_dst=(dst_net.split('/')[0], 
                         self._prefix_to_mask(int(dst_net.split('/')[1])))
            )
            
            actions = [
                # Rewrite MACs (simplified - would lookup actual next-hop)
                parser.OFPActionDecNwTtl(),
                parser.OFPActionOutput(out_port)
            ]
            
            inst = [parser.OFPInstructionActions(
                ofproto.OFPIT_APPLY_ACTIONS, actions)]
            
            # Install with high timeout (proactive flows are persistent)
            mod = parser.OFPFlowMod(
                datapath=dp,
                priority=100,
                match=match,
                instructions=inst,
                idle_timeout=0,   # Never expire
                hard_timeout=0
            )
            dp.send_msg(mod)
        
        # Use BARRIER to confirm all flows installed
        for switch_id in path:
            dp = self.switches[switch_id]
            barrier = dp.ofproto_parser.OFPBarrierRequest(dp)
            dp.send_msg(barrier)
    
    def _prefix_to_mask(self, prefix_len):
        """Convert prefix length to netmask string."""
        mask = (0xFFFFFFFF << (32 - prefix_len)) & 0xFFFFFFFF
        return '.'.join([str((mask >> i) & 0xFF) for i in [24, 16, 8, 0]])
    
    def _install_table_miss_drop(self, datapath):
        """Secure default: drop unmatched traffic."""
        parser = datapath.ofproto_parser
        match = parser.OFPMatch()
        mod = parser.OFPFlowMod(
            datapath=datapath,
            priority=0,
            match=match,
            instructions=[]  # No instructions = drop
        )
        datapath.send_msg(mod)

Hybrid Approaches

Most production SDN deployments combine reactive and proactive patterns:

Traffic Type	Pattern	Rationale
Infrastructure routing	Proactive	Known, stable, latency-sensitive
VM-to-VM within datacenter	Proactive	Computed from VM placement
External/Internet traffic	Proactive aggregates	IP prefix-based policies
Unknown/exception traffic	Reactive	Handled by controller for policy decision
Security scanning	Reactive	Custom handling per detection

The key insight is that proactive handles the common case while reactive handles exceptions. This minimizes controller load while retaining flexibility.

Asynchronous Message Handling

Switches send asynchronous messages to controllers without explicit request. These messages notify the controller of events requiring attention. Efficient asynchronous message handling is critical for responsive SDN networks.

Asynchronous Message Types

OpenFlow Asynchronous Messages
Message	Trigger	Controller Action
PACKET_IN	Packet matches table-miss or send-to-controller action	Decide packet handling, potentially install flows
FLOW_REMOVED	Flow entry expired or was deleted	Update controller state, potentially reinstall
PORT_STATUS	Port state changed (up/down, speed, config)	Update topology, recompute affected paths
ROLE_STATUS (OF 1.4+)	Controller role changed	Adjust behavior for new role
TABLE_STATUS (OF 1.4+)	Table configuration changed	Update table feature knowledge

PORT_STATUS Handling

Port state changes are critical for topology maintenance:

PORT_STATUS Handler
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
class TopologyManager:
    """Manages network topology based on port status events."""
    
    def __init__(self):
        self.port_status = {}  # {dpid: {port_no: status}}
        self.links = {}        # Discovered links
    
    @set_ev_cls(ofp_event.EventOFPPortStatus, MAIN_DISPATCHER)
    def port_status_handler(self, ev):
        """Handle port state changes."""
        msg = ev.msg
        dp = msg.datapath
        reason = msg.reason
        port = msg.desc
        
        ofproto = dp.ofproto
        
        if reason == ofproto.OFPPR_ADD:
            self.logger.info(f"Port added: {dp.id}:{port.port_no}")
            self._handle_port_add(dp.id, port)
            
        elif reason == ofproto.OFPPR_DELETE:
            self.logger.info(f"Port deleted: {dp.id}:{port.port_no}")
            self._handle_port_delete(dp.id, port)
            
        elif reason == ofproto.OFPPR_MODIFY:
            self.logger.info(f"Port modified: {dp.id}:{port.port_no}")
            self._handle_port_modify(dp.id, port)
    
    def _handle_port_add(self, dpid, port):
        """New port available - may enable new links."""
        self.port_status.setdefault(dpid, {})[port.port_no] = {
            'state': port.state,
            'config': port.config,
            'name': port.name
        }
        
        # Trigger LLDP to discover if link exists
        self._send_lldp_packet(dpid, port.port_no)
    
    def _handle_port_delete(self, dpid, port):
        """Port removed - links through this port are down."""
        if dpid in self.port_status:
            self.port_status[dpid].pop(port.port_no, None)
        
        # Find and remove affected links
        affected_links = self._find_links_through_port(dpid, port.port_no)
        for link in affected_links:
            self._handle_link_down(link)
    
    def _handle_port_modify(self, dpid, port):
        """Port state changed - check if link is affected."""
        old_state = self.port_status.get(dpid, {}).get(port.port_no, {}).get('state')
        new_state = port.state
        
        # Update stored state
        self.port_status.setdefault(dpid, {})[port.port_no] = {
            'state': new_state,
            'config': port.config,
            'name': port.name
        }
        
        # Check for link state transition
        OFPPS_LINK_DOWN = 1  # Port link is down
        
        was_up = old_state is not None and not (old_state & OFPPS_LINK_DOWN)
        is_up = not (new_state & OFPPS_LINK_DOWN)
        
        if was_up and not is_up:
            # Link went DOWN
            self.logger.warning(f"Link down: {dpid}:{port.port_no}")
            affected_links = self._find_links_through_port(dpid, port.port_no)
            for link in affected_links:
                self._handle_link_down(link)
                
        elif not was_up and is_up:
            # Link came UP
            self.logger.info(f"Link up: {dpid}:{port.port_no}")
            self._send_lldp_packet(dpid, port.port_no)
    
    def _handle_link_down(self, link):
        """React to link failure - recompute affected routes."""
        self.logger.warning(f"Handling link failure: {link}")
        
        # Remove link from topology
        self.links.pop(link, None)
        
        # Recompute routes that used this link
        affected_flows = self._find_flows_using_link(link)
        
        for flow in affected_flows:
            # Compute new path avoiding failed link
            new_path = self._compute_alternate_path(flow)
            
            if new_path:
                # Install new path
                self._install_path_flows(new_path, flow)
                # Delete old flows using failed link
                self._delete_old_flows(flow)
            else:
                self.logger.error(f"No alternate path for {flow}")

FLOW_REMOVED Handling

Flow removal notifications enable state synchronization:

FLOW_REMOVED Handler
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
@set_ev_cls(ofp_event.EventOFPFlowRemoved, MAIN_DISPATCHER)
def flow_removed_handler(self, ev):
    """Handle flow expiration notifications."""
    msg = ev.msg
    dp = msg.datapath
    ofproto = dp.ofproto
    
    # Extract flow identification
    match = msg.match
    cookie = msg.cookie
    priority = msg.priority
    
    # Extract reason
    if msg.reason == ofproto.OFPRR_IDLE_TIMEOUT:
        reason = "idle_timeout"
    elif msg.reason == ofproto.OFPRR_HARD_TIMEOUT:
        reason = "hard_timeout"
    elif msg.reason == ofproto.OFPRR_DELETE:
        reason = "delete"
    elif msg.reason == ofproto.OFPRR_GROUP_DELETE:
        reason = "group_delete"
    elif msg.reason == ofproto.OFPRR_METER_DELETE:
        reason = "meter_delete"
    else:
        reason = "unknown"
    
    # Extract statistics
    duration_sec = msg.duration_sec
    duration_nsec = msg.duration_nsec
    packet_count = msg.packet_count
    byte_count = msg.byte_count
    
    self.logger.info(
        f"Flow removed from {dp.id}: cookie={cookie}, "
        f"reason={reason}, duration={duration_sec}.{duration_nsec}s, "
        f"packets={packet_count}, bytes={byte_count}"
    )
    
    # Update controller state
    self._update_flow_statistics(dp.id, cookie, packet_count, byte_count)
    
    # Handle based on reason
    if reason == "idle_timeout":
        # Flow expired due to inactivity - may reinstall if needed
        if self._should_reinstall_flow(cookie):
            self._reinstall_flow(dp, match, priority, cookie)
    
    elif reason == "hard_timeout":
        # Flow reached absolute timeout - planned expiration
        self._handle_planned_expiration(cookie)
    
    elif reason == "delete":
        # Flow was explicitly deleted - someone changed policy
        pass  # Controller already knows about this

Async Message Filtering

Controllers can filter which async messages they receive via SET_ASYNC. This is useful in multi-controller setups where slave controllers may not need all event types. Reducing unnecessary messages improves controller efficiency.

Multi-Controller Architectures

Production SDN deployments rarely rely on a single controller. Multi-controller architectures provide high availability, load distribution, and geographical distribution. OpenFlow includes mechanisms to coordinate multiple controllers.

Controller Roles

OpenFlow 1.2+ defines three controller roles:

Controller Role Capabilities
Role	Send Commands	Receive Async	Use Case
MASTER	Yes	Yes	Active controller with full control
SLAVE	No (read-only)	No	Standby for failover, monitoring
EQUAL	Yes	Yes	All controllers equal (default)

Master-Slave Architecture

In master-slave mode, one controller is MASTER (active) while others are SLAVE (standby):

Converting Mermaid diagram...

Generation IDs for Split-Brain Prevention

When the master fails, how do we ensure exactly one new master? OpenFlow uses generation IDs:

Each controller claims mastership with a generation ID
Switches reject claims with generation ID ≤ current
Higher generation ID always wins

This creates a total ordering of mastership claims, preventing split-brain scenarios where two controllers both think they're master.

Master Election with Generation IDs
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
class ControllerHAManager:
    """
    Manages controller high availability with master election.
    Uses generation IDs to ensure consistent mastership.
    """
    
    def __init__(self, controller_id):
        self.controller_id = controller_id
        self.generation_id = 0
        self.role = 'SLAVE'
        self.switches = {}  # dpid -> datapath
    
    def become_master(self, new_generation_id):
        """Attempt to become master with given generation ID."""
        
        if new_generation_id <= self.generation_id:
            self.logger.warning("Cannot use lower generation ID")
            return False
        
        self.generation_id = new_generation_id
        
        # Send ROLE_REQUEST to all switches
        for dpid, dp in self.switches.items():
            self._send_role_request(dp, 'MASTER', new_generation_id)
        
        return True
    
    def _send_role_request(self, datapath, role, generation_id):
        """Send role change request to switch."""
        ofproto = datapath.ofproto
        parser = datapath.ofproto_parser
        
        role_map = {
            'MASTER': ofproto.OFPCR_ROLE_MASTER,
            'SLAVE': ofproto.OFPCR_ROLE_SLAVE,
            'EQUAL': ofproto.OFPCR_ROLE_EQUAL
        }
        
        role_request = parser.OFPRoleRequest(
            datapath,
            role=role_map[role],
            generation_id=generation_id
        )
        datapath.send_msg(role_request)
    
    @set_ev_cls(ofp_event.EventOFPRoleReply, MAIN_DISPATCHER)
    def role_reply_handler(self, ev):
        """Handle role change confirmation."""
        msg = ev.msg
        dp = msg.datapath
        
        ofproto = dp.ofproto
        
        role_names = {
            ofproto.OFPCR_ROLE_MASTER: 'MASTER',
            ofproto.OFPCR_ROLE_SLAVE: 'SLAVE',
            ofproto.OFPCR_ROLE_EQUAL: 'EQUAL'
        }
        
        new_role = role_names.get(msg.role, 'UNKNOWN')
        gen_id = msg.generation_id
        
        self.logger.info(
            f"Role confirmed for switch {dp.id}: {new_role} (gen={gen_id})"
        )
        
        if new_role == 'MASTER':
            self.role = 'MASTER'
            self._on_become_master(dp.id)
    
    def _on_become_master(self, dpid):
        """Actions to take when becoming master of a switch."""
        self.logger.info(f"Now MASTER of switch {dpid}")
        
        # Re-sync state - ensure our view matches switch reality
        dp = self.switches[dpid]
        self._request_full_state_sync(dp)
    
    def _request_full_state_sync(self, datapath):
        """Request full switch state to synchronize controller view."""
        parser = datapath.ofproto_parser
        
        # Request port descriptions
        req = parser.OFPPortDescStatsRequest(datapath, 0)
        datapath.send_msg(req)
        
        # Request all flows
        match = parser.OFPMatch()
        req = parser.OFPFlowStatsRequest(datapath, 0, 
            datapath.ofproto.OFPTT_ALL, 
            datapath.ofproto.OFPP_ANY,
            datapath.ofproto.OFPG_ANY, 0, 0, match)
        datapath.send_msg(req)
        
        # Request group descriptions
        req = parser.OFPGroupDescStatsRequest(datapath, 0)
        datapath.send_msg(req)

State Synchronization Challenge

The hardest part of multi-controller architectures is state synchronization between controllers. When the master changes, the new master needs the current network state. Options: (1) Shared external database (e.g., Cassandra, Redis), (2) Controller-to-controller replication, (3) Re-read state from switches. Each has trade-offs in complexity, latency, and consistency.

Distributed Controller Architectures

For very large networks, hierarchical or partitioned controller architectures may be used:

Hierarchical: Root controller coordinates regional controllers, each managing a subset of switches. Used for geographically distributed networks.

Partitioned (EQUAL): Multiple controllers operate independently on different switch subsets. Used for load distribution in large flat networks.

Replicated: All controllers are fully synchronized replicas. Highest availability but most complex consistency requirements.

Controller Scalability

The controller is a potential bottleneck in SDN architectures. Understanding scalability limits and mitigation strategies is essential for production deployment.

Controller Performance Metrics

Key Controller Performance Metrics
Metric	Definition	Typical Range
Throughput	PACKET_IN messages processed per second	10K - 1M msgs/sec
Latency	Time from PACKET_IN to FLOW_MOD installation	1 - 100 ms
Flow setup rate	New flows installed per second	10K - 100K flows/sec
Switch capacity	Maximum switches per controller	100 - 1000 switches
Table capacity	Flow entries managed across all switches	100K - 10M entries

Scalability Bottlenecks

CPU processing: PACKET_IN parsing, path computation, FLOW_MOD generation
Network I/O: OpenFlow message serialization/deserialization
Memory: Per-switch and per-flow state maintenance
Lock contention: Multi-threaded controllers may contend on shared data structures
Database I/O: If using external state store

Scalability Strategies

•Proactive flow installation: Minimize PACKET_IN by pre-installing flows. The most effective strategy.
•Flow aggregation: Use wildcards to represent multiple flows with single entries. Reduces table size and flow setup rate.
•PACKET_IN rate limiting: Protect controller from storms. Accept some packet drops in extreme cases.
•Batching: Combine multiple FLOW_MODs into single transactions. Reduces message overhead.
•Asynchronous processing: Pipeline message handling. Don't block on switch responses.
•Horizontal scaling: Multiple controller instances with load distribution.
•Caching: Cache computed paths and decisions. Avoid recomputation for repeated patterns.

Auxiliary Connections

OpenFlow 1.3+ supports auxiliary connections—multiple parallel TCP connections between switch and controller. Use them to separate: (1) Main connection for control messages, (2) Auxiliary for PACKET_IN (can handle bursts without blocking control), (3) Auxiliary for statistics (bulk data without impacting reactivity). This naturally parallelizes the controller-switch communication.

Benchmarking Controller Performance

Controller Benchmark Configuration
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
"""
Controller performance testing using Cbench or similar tools.
Example configuration and expected results.
"""
 
# Cbench command for throughput testing
# -c: controller address
# -p: controller port  
# -m: milliseconds per test
# -l: loops
# -s: number of simulated switches
# -M: MAC addresses per switch (simulated hosts)
# -t: throughput vs latency mode
 
CBENCH_THROUGHPUT = """
cbench -c 127.0.0.1 -p 6653 \
       -m 10000 -l 10 \
       -s 16 -M 1000 \
       -t
"""
 
# Expected results for common controllers:
# (These are approximate - actual results depend on hardware)
 
CONTROLLER_BENCHMARKS = {
    "NOX (C++)": {
        "throughput": "30K-50K responses/sec",
        "latency": "~3ms per response",
        "notes": "Original reference implementation"
    },
    "Floodlight (Java)": {
        "throughput": "100K-500K responses/sec", 
        "latency": "~1ms per response",
        "notes": "Good for medium deployments"
    },
    "Ryu (Python)": {
        "throughput": "10K-30K responses/sec",
        "latency": "~5ms per response", 
        "notes": "Good for learning, limited for production"
    },
    "ONOS (Java)": {
        "throughput": "1M+ responses/sec (clustered)",
        "latency": "<1ms per response",
        "notes": "Carrier-grade, horizontal scaling"
    },
    "OpenDaylight (Java)": {
        "throughput": "500K-1M responses/sec (clustered)",
        "latency": "~1ms per response",
        "notes": "Highly extensible, enterprise focus"
    }
}
 
# Capacity planning formula (simplified)
def estimate_controller_needs(
    switches: int,
    new_flows_per_sec: float,
    topology_change_rate: float
) -> dict:
    """
    Estimate controller requirements based on network characteristics.
    """
    
    # PACKET_IN load (reactive flows)
    packet_in_rate = new_flows_per_sec
    
    # Topology updates (PORT_STATUS, LLDP)
    topology_rate = switches * topology_change_rate
    
    # Statistics polling (assuming 5-second interval)
    stats_rate = switches * 0.2  # 1 request per 5 seconds per switch
    
    total_message_rate = packet_in_rate + topology_rate + stats_rate
    
    # Estimate cores needed (rough: 50K msg/sec per core)
    cores_needed = max(2, int(total_message_rate / 50000) + 1)
    
    # Estimate memory (rough: 10KB per switch + 100 bytes per flow)
    estimated_flows = new_flows_per_sec * 60  # 60-second flow lifetime
    memory_mb = (switches * 10 + estimated_flows * 0.1) / 1024
    
    return {
        "message_rate": total_message_rate,
        "cores": cores_needed,
        "memory_mb": max(512, memory_mb),
        "recommendation": "single" if total_message_rate < 100000 else "clustered"
    }

Barrier and Atomicity

OpenFlow messages are processed asynchronously by default—the controller sends a FLOW_MOD and doesn't wait for confirmation. But sometimes we need guarantees about message processing order and completion.

The BARRIER Mechanism

BARRIER_REQUEST creates a synchronization point:

Controller sends multiple FLOW_MODs
Controller sends BARRIER_REQUEST
Switch processes all preceding messages
Switch sends BARRIER_REPLY
Controller knows all previous messages were processed

This enables:

Installation confirmation: Know when flows are active before routing traffic
Ordering guarantees: Ensure modifications happen in the intended sequence
Error correlation: If no error before BARRIER_REPLY, all preceding messages succeeded

Converting Mermaid diagram...

Bundles (OpenFlow 1.4+)

Bundles provide true atomicity—multiple messages either all succeed or all fail:

Bundle for Atomic Updates
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def atomic_path_update(self, datapath, old_flows, new_flows):
    """
    Atomically replace old path flows with new path flows.
    Uses OpenFlow 1.4+ bundles for all-or-nothing semantics.
    """
    ofproto = datapath.ofproto
    parser = datapath.ofproto_parser
    
    bundle_id = int(time.time())  # Unique bundle ID
    
    # Step 1: Open bundle
    open_request = parser.OFPBundleCtrlMsg(
        datapath,
        bundle_id=bundle_id,
        type_=ofproto.OFPBCT_OPEN_REQUEST,
        flags=ofproto.OFPBF_ATOMIC,  # Atomic semantics
        properties=[]
    )
    datapath.send_msg(open_request)
    
    # Step 2: Add delete messages for old flows
    for flow in old_flows:
        del_msg = parser.OFPFlowMod(
            datapath,
            command=ofproto.OFPFC_DELETE,
            match=flow['match'],
            priority=flow['priority']
        )
        bundle_add = parser.OFPBundleAddMsg(
            datapath,
            bundle_id=bundle_id,
            flags=ofproto.OFPBF_ATOMIC,
            message=del_msg,
            properties=[]
        )
        datapath.send_msg(bundle_add)
    
    # Step 3: Add install messages for new flows
    for flow in new_flows:
        add_msg = parser.OFPFlowMod(
            datapath,
            command=ofproto.OFPFC_ADD,
            match=flow['match'],
            priority=flow['priority'],
            instructions=flow['instructions']
        )
        bundle_add = parser.OFPBundleAddMsg(
            datapath,
            bundle_id=bundle_id,
            flags=ofproto.OFPBF_ATOMIC,
            message=add_msg,
            properties=[]
        )
        datapath.send_msg(bundle_add)
    
    # Step 4: Commit bundle (atomically applies all messages)
    commit_request = parser.OFPBundleCtrlMsg(
        datapath,
        bundle_id=bundle_id,
        type_=ofproto.OFPBCT_COMMIT_REQUEST,
        flags=ofproto.OFPBF_ATOMIC,
        properties=[]
    )
    datapath.send_msg(commit_request)
    
    # If commit fails, all changes are rolled back
    # Controller receives ERROR message indicating failure

Scheduled Bundles

OpenFlow 1.4+ bundles can be scheduled for a specific time. This enables network-wide coordinated changes—all switches apply new flows at exactly the same moment. Essential for maintenance windows or coordinated traffic engineering changes.

Summary and Key Takeaways

Controller-switch communication is the lifeblood of SDN. Understanding these patterns enables you to build responsive, reliable, and scalable software-defined networks.

Core Concepts Mastered

•Connection lifecycle: TCP → TLS → HELLO → FEATURES → Configuration → Steady state
•Reactive installation: Controller responds to PACKET_IN; good for learning, poor for scale
•Proactive installation: Pre-computed flows; zero first-packet latency, excellent for production
•Asynchronous handling: PACKET_IN, PORT_STATUS, FLOW_REMOVED drive controller logic
•Multi-controller: Master/Slave/Equal roles with generation IDs for split-brain prevention
•Scalability: Proactive flows, batching, aggregation, horizontal scaling mitigate bottlenecks
•Barrier/Bundles: Synchronization primitives for ordered processing and atomic updates

What's Next:

With controller communication patterns understood, we'll complete our OpenFlow exploration with OpenFlow switches—the hardware and software implementations that bring this protocol to life. You'll learn about switch architectures, performance characteristics, and selection criteria for production deployments.

Page Complete

You now understand the complete controller-switch communication model—from connection establishment through reactive/proactive flow installation, multi-controller coordination, and scalability strategies. This knowledge enables you to architect robust SDN control planes. Next, we examine OpenFlow switch implementations.

4 / 5

Loading learning content...

Computer NetworksOpenFlow

OpenFlow: The Standard Southbound Interface for SDN

LevelAdvanced

Duration90 mins

TopicOpenFlow

4 / 5

Controller Communication: The SDN Control Loop

The Brain and the Body

Understanding controller communication patterns is essential for:

Performance: Knowing when reactive flow installation creates bottlenecks
Reliability: Designing for controller failures and network partitions
Scale: Building architectures that handle thousands of switches
Security: Protecting the control channel that governs the entire network

What You Will Master

Connection Lifecycle

Every OpenFlow session begins with a well-defined connection sequence. Understanding this lifecycle is essential for debugging connection issues and implementing robust SDN applications.

Phase 1: TCP Connection Establishment

OpenFlow runs over TCP (or TLS). The switch initiates the connection to a configured controller address:

Default port: 6653 (IANA registered) or 6633 (legacy)
Connection direction: Switch → Controller (switch is TCP client)
Multiple controllers: Switch can connect to multiple controller addresses

Phase 2: TLS Negotiation (Optional but Recommended)

If TLS is configured:

Switch and controller exchange certificates
Mutual authentication verifies both parties
Encrypted channel established for all subsequent messages

Converting Mermaid diagram...

Phase 3: OpenFlow Version Negotiation

Both sides exchange HELLO messages containing their supported versions. Key behaviors:

Bitmap format (OF 1.3.1+): HELLO includes a bitmap of all supported versions
Version selection: Use highest mutually supported version
Asymmetric support: A 1.5 controller can operate at 1.3 with an older switch
Negotiation failure: If no common version exists, send ERROR and close connection

Phase 4: Feature Discovery

The controller queries switch capabilities:

FEATURES_REPLY Content
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/* Switch features returned by FEATURES_REPLY */
struct ofp_switch_features {
    struct ofp_header header;
    
    uint64_t datapath_id;       /* Unique switch identifier (DPID)
                                   Often based on switch MAC address */
    
    uint32_t n_buffers;         /* Packets switch can buffer for PACKET_OUT
                                   (0 = no buffering, send full packet) */
    
    uint8_t  n_tables;          /* Number of flow tables supported */
    
    uint8_t  auxiliary_id;      /* Connection ID (0 = main, 1+ = auxiliary) */
    
    uint8_t  pad[2];
    
    /* Capabilities bitmap */
    uint32_t capabilities;
    /* OFPC_FLOW_STATS: Flow statistics supported
     * OFPC_TABLE_STATS: Table statistics supported
     * OFPC_PORT_STATS: Port statistics supported
     * OFPC_GROUP_STATS: Group statistics supported
     * OFPC_IP_REASM: Can reassemble IP fragments
     * OFPC_QUEUE_STATS: Queue statistics supported
     * OFPC_PORT_BLOCKED: Switch can block ports (STP) */
    
    uint32_t reserved;
};

Phase 5: Initial Configuration

The controller typically performs:

SET_CONFIG: Configure miss_send_len (bytes of unmatched packets to send)
Table feature queries: Discover match/action capabilities per table
Install table-miss entries: Configure default behavior for unmatched packets
Port queries: Discover physical ports and their status

Phase 6: Steady State

Once configured, the connection enters steady state:

ECHO keepalives: Verify connection liveness (controller-initiated, typically every 5-30 seconds)
Asynchronous messages: Switch sends PACKET_IN, PORT_STATUS, FLOW_REMOVED as events occur
Controller commands: FLOW_MOD, GROUP_MOD, PACKET_OUT as policies change

Connection Failure Handling

If the connection fails:

Switch enters configured fail mode (secure or standalone)
Switch attempts reconnection to backup controller addresses
Controller marks switch as disconnected, may trigger network reconvergence

The DPID is Your Switch Identity

Reactive Flow Installation

The Reactive Pattern

Converting Mermaid diagram...

Reactive Pattern Advantages

Memory efficient: Only active flows consume table entries
Simple initial setup: No need to precompute all possible flows
Handles dynamic environments: New hosts automatically trigger flow installation
Fine-grained policies: Per-connection or per-packet decisions possible

Reactive Pattern Disadvantages

First-packet latency: Initial packets incur controller round-trip delay
Controller bottleneck: High table-miss rates overwhelm controller
Scalability limits: Controller processing becomes the system bottleneck
State explosion: Long-lived connections with many flows fill tables

When Reactive Works Well

PACKET_IN Handling Best Practices

Efficient PACKET_IN Handler
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
class ReactiveL2Switch:
    """
    Efficient reactive L2 learning switch.
    Demonstrates best practices for PACKET_IN handling.
    """
    
    def __init__(self):
        # Per-switch MAC table: {dpid: {mac: port}}
        self.mac_to_port = {}
        
        # Rate limiting to prevent controller overload
        self.packet_in_count = {}
        self.MAX_PACKET_IN_PER_SECOND = 1000
    
    @set_ev_cls(ofp_event.EventOFPPacketIn, MAIN_DISPATCHER)
    def packet_in_handler(self, ev):
        """Handle PACKET_IN messages from switches."""
        
        msg = ev.msg
        dp = msg.datapath
        dpid = dp.id
        ofproto = dp.ofproto
        parser = dp.ofproto_parser
        
        # Rate limit check - protect controller from flooding
        if not self._check_rate_limit(dpid):
            self.logger.warning(f"Rate limit exceeded for switch {dpid}")
            return
        
        # Extract packet info
        in_port = msg.match['in_port']
        pkt = packet.Packet(msg.data)
        eth = pkt.get_protocols(ethernet.ethernet)[0]
        
        # Ignore LLDP (topology discovery handled separately)
        if eth.ethertype == 0x88cc:
            return
        
        src_mac = eth.src
        dst_mac = eth.dst
        
        # Initialize MAC table for this switch
        self.mac_to_port.setdefault(dpid, {})
        
        # LEARN: Record source MAC → ingress port mapping
        self.mac_to_port[dpid][src_mac] = in_port
        
        # FORWARD: Lookup destination MAC
        if dst_mac in self.mac_to_port[dpid]:
            out_port = self.mac_to_port[dpid][dst_mac]
        else:
            out_port = ofproto.OFPP_FLOOD  # Unknown destination
        
        actions = [parser.OFPActionOutput(out_port)]
        
        # INSTALL FLOW: Only if destination is known (avoid flooding flows)
        if out_port != ofproto.OFPP_FLOOD:
            match = parser.OFPMatch(
                in_port=in_port,
                eth_dst=dst_mac,
                eth_src=src_mac
            )
            
            # Install bidirectional flows with timeouts
            self._add_flow(dp, priority=10, match=match, actions=actions,
                          idle_timeout=60, hard_timeout=300)
            
            # Also install reverse flow
            reverse_match = parser.OFPMatch(
                in_port=out_port,
                eth_dst=src_mac,
                eth_src=dst_mac
            )
            reverse_actions = [parser.OFPActionOutput(in_port)]
            self._add_flow(dp, priority=10, match=reverse_match,
                          actions=reverse_actions, idle_timeout=60, hard_timeout=300)
        
        # PACKET_OUT: Forward the buffered/received packet
        buffer_id = msg.buffer_id
        if buffer_id == ofproto.OFP_NO_BUFFER:
            # Full packet in message - include data
            out = parser.OFPPacketOut(
                datapath=dp, buffer_id=ofproto.OFP_NO_BUFFER,
                in_port=in_port, actions=actions, data=msg.data
            )
        else:
            # Packet buffered on switch - reference buffer
            out = parser.OFPPacketOut(
                datapath=dp, buffer_id=buffer_id,
                in_port=in_port, actions=actions
            )
        dp.send_msg(out)
    
    def _add_flow(self, datapath, priority, match, actions,
                  idle_timeout=0, hard_timeout=0):
        """Install a flow entry with proper instructions."""
        ofproto = datapath.ofproto
        parser = datapath.ofproto_parser
        
        inst = [parser.OFPInstructionActions(
            ofproto.OFPIT_APPLY_ACTIONS, actions)]
        
        mod = parser.OFPFlowMod(
            datapath=datapath, priority=priority,
            match=match, instructions=inst,
            idle_timeout=idle_timeout, hard_timeout=hard_timeout,
            flags=ofproto.OFPFF_SEND_FLOW_REM  # Notify on expiry
        )
        datapath.send_msg(mod)
    
    def _check_rate_limit(self, dpid):
        """Simple rate limiting per switch."""
        import time
        current_time = int(time.time())
        
        if dpid not in self.packet_in_count:
            self.packet_in_count[dpid] = (current_time, 1)
            return True
        
        last_time, count = self.packet_in_count[dpid]
        
        if current_time > last_time:
            self.packet_in_count[dpid] = (current_time, 1)
            return True
        
        if count < self.MAX_PACKET_IN_PER_SECOND:
            self.packet_in_count[dpid] = (current_time, count + 1)
            return True
        
        return False  # Rate limit exceeded

Proactive Flow Installation

The Proactive Pattern

Converting Mermaid diagram...

Proactive Pattern Advantages

Zero first-packet latency: Traffic immediately hits cached flows
Controller not in data path: Eliminates controller as bottleneck
Predictable performance: No variance from reactive processing
Reduced controller load: Steady-state operation requires minimal messages

Proactive Pattern Disadvantages

Memory consumption: Must install flows for all possible traffic (or aggregates)
Topology dependency: Must know full network state before computing flows
Update complexity: Policy changes require recomputing and redistributing flows
Dynamic environments: New hosts/services may not have preinstalled flows

When Proactive Excels

Proactive Installation Patterns

Proactive Flow Installation
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
class ProactiveRouter:
    """
    Proactive L3 router that computes and installs all paths at startup.
    """
    
    def __init__(self):
        self.topology = None  # Network graph
        self.switches = {}    # dpid -> datapath
        self.routing_table = {}  # (src_net, dst_net) -> path
    
    @set_ev_cls(ofp_event.EventOFPSwitchFeatures, CONFIG_DISPATCHER)
    def switch_features_handler(self, ev):
        """Handle new switch connections."""
        dp = ev.msg.datapath
        self.switches[dp.id] = dp
        
        # Install table-miss with drop (secure default)
        self._install_table_miss_drop(dp)
        
        # Check if all expected switches are connected
        if self._all_switches_connected():
            self._compute_and_install_all_routes()
    
    def _compute_and_install_all_routes(self):
        """Compute shortest paths and install flows proactively."""
        
        # Define network prefixes and their attachment points
        networks = {
            "10.0.1.0/24": {"switch": 1, "port": 1},
            "10.0.2.0/24": {"switch": 2, "port": 1},
            "10.0.3.0/24": {"switch": 3, "port": 1},
            "10.0.4.0/24": {"switch": 3, "port": 2},
        }
        
        # Compute all-pairs shortest paths
        for src_net, src_info in networks.items():
            for dst_net, dst_info in networks.items():
                if src_net == dst_net:
                    continue
                
                path = self._compute_shortest_path(
                    src_info["switch"], dst_info["switch"]
                )
                
                # Install flows along the path
                self._install_path_flows(
                    path, src_net, dst_net,
                    src_info["port"], dst_info["port"]
                )
    
    def _install_path_flows(self, path, src_net, dst_net, src_port, dst_port):
        """Install forwarding rules along a computed path."""
        
        for i, switch_id in enumerate(path):
            dp = self.switches[switch_id]
            parser = dp.ofproto_parser
            ofproto = dp.ofproto
            
            # Determine output port for this hop
            if i == len(path) - 1:
                # Last switch - output to destination network
                out_port = dst_port
            else:
                # Intermediate switch - output to next switch
                next_switch = path[i + 1]
                out_port = self._get_port_to_neighbor(switch_id, next_switch)
            
            # Match on destination prefix
            match = parser.OFPMatch(
                eth_type=0x0800,
                ipv4_dst=(dst_net.split('/')[0], 
                         self._prefix_to_mask(int(dst_net.split('/')[1])))
            )
            
            actions = [
                # Rewrite MACs (simplified - would lookup actual next-hop)
                parser.OFPActionDecNwTtl(),
                parser.OFPActionOutput(out_port)
            ]
            
            inst = [parser.OFPInstructionActions(
                ofproto.OFPIT_APPLY_ACTIONS, actions)]
            
            # Install with high timeout (proactive flows are persistent)
            mod = parser.OFPFlowMod(
                datapath=dp,
                priority=100,
                match=match,
                instructions=inst,
                idle_timeout=0,   # Never expire
                hard_timeout=0
            )
            dp.send_msg(mod)
        
        # Use BARRIER to confirm all flows installed
        for switch_id in path:
            dp = self.switches[switch_id]
            barrier = dp.ofproto_parser.OFPBarrierRequest(dp)
            dp.send_msg(barrier)
    
    def _prefix_to_mask(self, prefix_len):
        """Convert prefix length to netmask string."""
        mask = (0xFFFFFFFF << (32 - prefix_len)) & 0xFFFFFFFF
        return '.'.join([str((mask >> i) & 0xFF) for i in [24, 16, 8, 0]])
    
    def _install_table_miss_drop(self, datapath):
        """Secure default: drop unmatched traffic."""
        parser = datapath.ofproto_parser
        match = parser.OFPMatch()
        mod = parser.OFPFlowMod(
            datapath=datapath,
            priority=0,
            match=match,
            instructions=[]  # No instructions = drop
        )
        datapath.send_msg(mod)

Hybrid Approaches

Most production SDN deployments combine reactive and proactive patterns:

Traffic Type	Pattern	Rationale
Infrastructure routing	Proactive	Known, stable, latency-sensitive
VM-to-VM within datacenter	Proactive	Computed from VM placement
External/Internet traffic	Proactive aggregates	IP prefix-based policies
Unknown/exception traffic	Reactive	Handled by controller for policy decision
Security scanning	Reactive	Custom handling per detection

The key insight is that proactive handles the common case while reactive handles exceptions. This minimizes controller load while retaining flexibility.

Asynchronous Message Handling

Asynchronous Message Types

OpenFlow Asynchronous Messages
Message	Trigger	Controller Action
PACKET_IN	Packet matches table-miss or send-to-controller action	Decide packet handling, potentially install flows
FLOW_REMOVED	Flow entry expired or was deleted	Update controller state, potentially reinstall
PORT_STATUS	Port state changed (up/down, speed, config)	Update topology, recompute affected paths
ROLE_STATUS (OF 1.4+)	Controller role changed	Adjust behavior for new role
TABLE_STATUS (OF 1.4+)	Table configuration changed	Update table feature knowledge

PORT_STATUS Handling

Port state changes are critical for topology maintenance:

PORT_STATUS Handler
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
class TopologyManager:
    """Manages network topology based on port status events."""
    
    def __init__(self):
        self.port_status = {}  # {dpid: {port_no: status}}
        self.links = {}        # Discovered links
    
    @set_ev_cls(ofp_event.EventOFPPortStatus, MAIN_DISPATCHER)
    def port_status_handler(self, ev):
        """Handle port state changes."""
        msg = ev.msg
        dp = msg.datapath
        reason = msg.reason
        port = msg.desc
        
        ofproto = dp.ofproto
        
        if reason == ofproto.OFPPR_ADD:
            self.logger.info(f"Port added: {dp.id}:{port.port_no}")
            self._handle_port_add(dp.id, port)
            
        elif reason == ofproto.OFPPR_DELETE:
            self.logger.info(f"Port deleted: {dp.id}:{port.port_no}")
            self._handle_port_delete(dp.id, port)
            
        elif reason == ofproto.OFPPR_MODIFY:
            self.logger.info(f"Port modified: {dp.id}:{port.port_no}")
            self._handle_port_modify(dp.id, port)
    
    def _handle_port_add(self, dpid, port):
        """New port available - may enable new links."""
        self.port_status.setdefault(dpid, {})[port.port_no] = {
            'state': port.state,
            'config': port.config,
            'name': port.name
        }
        
        # Trigger LLDP to discover if link exists
        self._send_lldp_packet(dpid, port.port_no)
    
    def _handle_port_delete(self, dpid, port):
        """Port removed - links through this port are down."""
        if dpid in self.port_status:
            self.port_status[dpid].pop(port.port_no, None)
        
        # Find and remove affected links
        affected_links = self._find_links_through_port(dpid, port.port_no)
        for link in affected_links:
            self._handle_link_down(link)
    
    def _handle_port_modify(self, dpid, port):
        """Port state changed - check if link is affected."""
        old_state = self.port_status.get(dpid, {}).get(port.port_no, {}).get('state')
        new_state = port.state
        
        # Update stored state
        self.port_status.setdefault(dpid, {})[port.port_no] = {
            'state': new_state,
            'config': port.config,
            'name': port.name
        }
        
        # Check for link state transition
        OFPPS_LINK_DOWN = 1  # Port link is down
        
        was_up = old_state is not None and not (old_state & OFPPS_LINK_DOWN)
        is_up = not (new_state & OFPPS_LINK_DOWN)
        
        if was_up and not is_up:
            # Link went DOWN
            self.logger.warning(f"Link down: {dpid}:{port.port_no}")
            affected_links = self._find_links_through_port(dpid, port.port_no)
            for link in affected_links:
                self._handle_link_down(link)
                
        elif not was_up and is_up:
            # Link came UP
            self.logger.info(f"Link up: {dpid}:{port.port_no}")
            self._send_lldp_packet(dpid, port.port_no)
    
    def _handle_link_down(self, link):
        """React to link failure - recompute affected routes."""
        self.logger.warning(f"Handling link failure: {link}")
        
        # Remove link from topology
        self.links.pop(link, None)
        
        # Recompute routes that used this link
        affected_flows = self._find_flows_using_link(link)
        
        for flow in affected_flows:
            # Compute new path avoiding failed link
            new_path = self._compute_alternate_path(flow)
            
            if new_path:
                # Install new path
                self._install_path_flows(new_path, flow)
                # Delete old flows using failed link
                self._delete_old_flows(flow)
            else:
                self.logger.error(f"No alternate path for {flow}")

FLOW_REMOVED Handling

Flow removal notifications enable state synchronization:

FLOW_REMOVED Handler
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
@set_ev_cls(ofp_event.EventOFPFlowRemoved, MAIN_DISPATCHER)
def flow_removed_handler(self, ev):
    """Handle flow expiration notifications."""
    msg = ev.msg
    dp = msg.datapath
    ofproto = dp.ofproto
    
    # Extract flow identification
    match = msg.match
    cookie = msg.cookie
    priority = msg.priority
    
    # Extract reason
    if msg.reason == ofproto.OFPRR_IDLE_TIMEOUT:
        reason = "idle_timeout"
    elif msg.reason == ofproto.OFPRR_HARD_TIMEOUT:
        reason = "hard_timeout"
    elif msg.reason == ofproto.OFPRR_DELETE:
        reason = "delete"
    elif msg.reason == ofproto.OFPRR_GROUP_DELETE:
        reason = "group_delete"
    elif msg.reason == ofproto.OFPRR_METER_DELETE:
        reason = "meter_delete"
    else:
        reason = "unknown"
    
    # Extract statistics
    duration_sec = msg.duration_sec
    duration_nsec = msg.duration_nsec
    packet_count = msg.packet_count
    byte_count = msg.byte_count
    
    self.logger.info(
        f"Flow removed from {dp.id}: cookie={cookie}, "
        f"reason={reason}, duration={duration_sec}.{duration_nsec}s, "
        f"packets={packet_count}, bytes={byte_count}"
    )
    
    # Update controller state
    self._update_flow_statistics(dp.id, cookie, packet_count, byte_count)
    
    # Handle based on reason
    if reason == "idle_timeout":
        # Flow expired due to inactivity - may reinstall if needed
        if self._should_reinstall_flow(cookie):
            self._reinstall_flow(dp, match, priority, cookie)
    
    elif reason == "hard_timeout":
        # Flow reached absolute timeout - planned expiration
        self._handle_planned_expiration(cookie)
    
    elif reason == "delete":
        # Flow was explicitly deleted - someone changed policy
        pass  # Controller already knows about this

Async Message Filtering

Multi-Controller Architectures

Controller Roles

OpenFlow 1.2+ defines three controller roles:

Controller Role Capabilities
Role	Send Commands	Receive Async	Use Case
MASTER	Yes	Yes	Active controller with full control
SLAVE	No (read-only)	No	Standby for failover, monitoring
EQUAL	Yes	Yes	All controllers equal (default)

Master-Slave Architecture

In master-slave mode, one controller is MASTER (active) while others are SLAVE (standby):

Converting Mermaid diagram...

Generation IDs for Split-Brain Prevention

When the master fails, how do we ensure exactly one new master? OpenFlow uses generation IDs:

Each controller claims mastership with a generation ID
Switches reject claims with generation ID ≤ current
Higher generation ID always wins

This creates a total ordering of mastership claims, preventing split-brain scenarios where two controllers both think they're master.

Master Election with Generation IDs
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
class ControllerHAManager:
    """
    Manages controller high availability with master election.
    Uses generation IDs to ensure consistent mastership.
    """
    
    def __init__(self, controller_id):
        self.controller_id = controller_id
        self.generation_id = 0
        self.role = 'SLAVE'
        self.switches = {}  # dpid -> datapath
    
    def become_master(self, new_generation_id):
        """Attempt to become master with given generation ID."""
        
        if new_generation_id <= self.generation_id:
            self.logger.warning("Cannot use lower generation ID")
            return False
        
        self.generation_id = new_generation_id
        
        # Send ROLE_REQUEST to all switches
        for dpid, dp in self.switches.items():
            self._send_role_request(dp, 'MASTER', new_generation_id)
        
        return True
    
    def _send_role_request(self, datapath, role, generation_id):
        """Send role change request to switch."""
        ofproto = datapath.ofproto
        parser = datapath.ofproto_parser
        
        role_map = {
            'MASTER': ofproto.OFPCR_ROLE_MASTER,
            'SLAVE': ofproto.OFPCR_ROLE_SLAVE,
            'EQUAL': ofproto.OFPCR_ROLE_EQUAL
        }
        
        role_request = parser.OFPRoleRequest(
            datapath,
            role=role_map[role],
            generation_id=generation_id
        )
        datapath.send_msg(role_request)
    
    @set_ev_cls(ofp_event.EventOFPRoleReply, MAIN_DISPATCHER)
    def role_reply_handler(self, ev):
        """Handle role change confirmation."""
        msg = ev.msg
        dp = msg.datapath
        
        ofproto = dp.ofproto
        
        role_names = {
            ofproto.OFPCR_ROLE_MASTER: 'MASTER',
            ofproto.OFPCR_ROLE_SLAVE: 'SLAVE',
            ofproto.OFPCR_ROLE_EQUAL: 'EQUAL'
        }
        
        new_role = role_names.get(msg.role, 'UNKNOWN')
        gen_id = msg.generation_id
        
        self.logger.info(
            f"Role confirmed for switch {dp.id}: {new_role} (gen={gen_id})"
        )
        
        if new_role == 'MASTER':
            self.role = 'MASTER'
            self._on_become_master(dp.id)
    
    def _on_become_master(self, dpid):
        """Actions to take when becoming master of a switch."""
        self.logger.info(f"Now MASTER of switch {dpid}")
        
        # Re-sync state - ensure our view matches switch reality
        dp = self.switches[dpid]
        self._request_full_state_sync(dp)
    
    def _request_full_state_sync(self, datapath):
        """Request full switch state to synchronize controller view."""
        parser = datapath.ofproto_parser
        
        # Request port descriptions
        req = parser.OFPPortDescStatsRequest(datapath, 0)
        datapath.send_msg(req)
        
        # Request all flows
        match = parser.OFPMatch()
        req = parser.OFPFlowStatsRequest(datapath, 0, 
            datapath.ofproto.OFPTT_ALL, 
            datapath.ofproto.OFPP_ANY,
            datapath.ofproto.OFPG_ANY, 0, 0, match)
        datapath.send_msg(req)
        
        # Request group descriptions
        req = parser.OFPGroupDescStatsRequest(datapath, 0)
        datapath.send_msg(req)

State Synchronization Challenge

Distributed Controller Architectures

For very large networks, hierarchical or partitioned controller architectures may be used:

Hierarchical: Root controller coordinates regional controllers, each managing a subset of switches. Used for geographically distributed networks.

Partitioned (EQUAL): Multiple controllers operate independently on different switch subsets. Used for load distribution in large flat networks.

Replicated: All controllers are fully synchronized replicas. Highest availability but most complex consistency requirements.

Controller Scalability

The controller is a potential bottleneck in SDN architectures. Understanding scalability limits and mitigation strategies is essential for production deployment.

Controller Performance Metrics

Key Controller Performance Metrics
Metric	Definition	Typical Range
Throughput	PACKET_IN messages processed per second	10K - 1M msgs/sec
Latency	Time from PACKET_IN to FLOW_MOD installation	1 - 100 ms
Flow setup rate	New flows installed per second	10K - 100K flows/sec
Switch capacity	Maximum switches per controller	100 - 1000 switches
Table capacity	Flow entries managed across all switches	100K - 10M entries

Scalability Bottlenecks

CPU processing: PACKET_IN parsing, path computation, FLOW_MOD generation
Network I/O: OpenFlow message serialization/deserialization
Memory: Per-switch and per-flow state maintenance
Lock contention: Multi-threaded controllers may contend on shared data structures
Database I/O: If using external state store

Scalability Strategies

•Proactive flow installation: Minimize PACKET_IN by pre-installing flows. The most effective strategy.
•Flow aggregation: Use wildcards to represent multiple flows with single entries. Reduces table size and flow setup rate.
•PACKET_IN rate limiting: Protect controller from storms. Accept some packet drops in extreme cases.
•Batching: Combine multiple FLOW_MODs into single transactions. Reduces message overhead.
•Asynchronous processing: Pipeline message handling. Don't block on switch responses.
•Horizontal scaling: Multiple controller instances with load distribution.
•Caching: Cache computed paths and decisions. Avoid recomputation for repeated patterns.

Auxiliary Connections

Benchmarking Controller Performance

Controller Benchmark Configuration
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
"""
Controller performance testing using Cbench or similar tools.
Example configuration and expected results.
"""
 
# Cbench command for throughput testing
# -c: controller address
# -p: controller port  
# -m: milliseconds per test
# -l: loops
# -s: number of simulated switches
# -M: MAC addresses per switch (simulated hosts)
# -t: throughput vs latency mode
 
CBENCH_THROUGHPUT = """
cbench -c 127.0.0.1 -p 6653 \
       -m 10000 -l 10 \
       -s 16 -M 1000 \
       -t
"""
 
# Expected results for common controllers:
# (These are approximate - actual results depend on hardware)
 
CONTROLLER_BENCHMARKS = {
    "NOX (C++)": {
        "throughput": "30K-50K responses/sec",
        "latency": "~3ms per response",
        "notes": "Original reference implementation"
    },
    "Floodlight (Java)": {
        "throughput": "100K-500K responses/sec", 
        "latency": "~1ms per response",
        "notes": "Good for medium deployments"
    },
    "Ryu (Python)": {
        "throughput": "10K-30K responses/sec",
        "latency": "~5ms per response", 
        "notes": "Good for learning, limited for production"
    },
    "ONOS (Java)": {
        "throughput": "1M+ responses/sec (clustered)",
        "latency": "<1ms per response",
        "notes": "Carrier-grade, horizontal scaling"
    },
    "OpenDaylight (Java)": {
        "throughput": "500K-1M responses/sec (clustered)",
        "latency": "~1ms per response",
        "notes": "Highly extensible, enterprise focus"
    }
}
 
# Capacity planning formula (simplified)
def estimate_controller_needs(
    switches: int,
    new_flows_per_sec: float,
    topology_change_rate: float
) -> dict:
    """
    Estimate controller requirements based on network characteristics.
    """
    
    # PACKET_IN load (reactive flows)
    packet_in_rate = new_flows_per_sec
    
    # Topology updates (PORT_STATUS, LLDP)
    topology_rate = switches * topology_change_rate
    
    # Statistics polling (assuming 5-second interval)
    stats_rate = switches * 0.2  # 1 request per 5 seconds per switch
    
    total_message_rate = packet_in_rate + topology_rate + stats_rate
    
    # Estimate cores needed (rough: 50K msg/sec per core)
    cores_needed = max(2, int(total_message_rate / 50000) + 1)
    
    # Estimate memory (rough: 10KB per switch + 100 bytes per flow)
    estimated_flows = new_flows_per_sec * 60  # 60-second flow lifetime
    memory_mb = (switches * 10 + estimated_flows * 0.1) / 1024
    
    return {
        "message_rate": total_message_rate,
        "cores": cores_needed,
        "memory_mb": max(512, memory_mb),
        "recommendation": "single" if total_message_rate < 100000 else "clustered"
    }

Barrier and Atomicity

The BARRIER Mechanism

BARRIER_REQUEST creates a synchronization point:

Controller sends multiple FLOW_MODs
Controller sends BARRIER_REQUEST
Switch processes all preceding messages
Switch sends BARRIER_REPLY
Controller knows all previous messages were processed

This enables:

Installation confirmation: Know when flows are active before routing traffic
Ordering guarantees: Ensure modifications happen in the intended sequence
Error correlation: If no error before BARRIER_REPLY, all preceding messages succeeded

Converting Mermaid diagram...

Bundles (OpenFlow 1.4+)

Bundles provide true atomicity—multiple messages either all succeed or all fail:

Bundle for Atomic Updates
Python (Ryu)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def atomic_path_update(self, datapath, old_flows, new_flows):
    """
    Atomically replace old path flows with new path flows.
    Uses OpenFlow 1.4+ bundles for all-or-nothing semantics.
    """
    ofproto = datapath.ofproto
    parser = datapath.ofproto_parser
    
    bundle_id = int(time.time())  # Unique bundle ID
    
    # Step 1: Open bundle
    open_request = parser.OFPBundleCtrlMsg(
        datapath,
        bundle_id=bundle_id,
        type_=ofproto.OFPBCT_OPEN_REQUEST,
        flags=ofproto.OFPBF_ATOMIC,  # Atomic semantics
        properties=[]
    )
    datapath.send_msg(open_request)
    
    # Step 2: Add delete messages for old flows
    for flow in old_flows:
        del_msg = parser.OFPFlowMod(
            datapath,
            command=ofproto.OFPFC_DELETE,
            match=flow['match'],
            priority=flow['priority']
        )
        bundle_add = parser.OFPBundleAddMsg(
            datapath,
            bundle_id=bundle_id,
            flags=ofproto.OFPBF_ATOMIC,
            message=del_msg,
            properties=[]
        )
        datapath.send_msg(bundle_add)
    
    # Step 3: Add install messages for new flows
    for flow in new_flows:
        add_msg = parser.OFPFlowMod(
            datapath,
            command=ofproto.OFPFC_ADD,
            match=flow['match'],
            priority=flow['priority'],
            instructions=flow['instructions']
        )
        bundle_add = parser.OFPBundleAddMsg(
            datapath,
            bundle_id=bundle_id,
            flags=ofproto.OFPBF_ATOMIC,
            message=add_msg,
            properties=[]
        )
        datapath.send_msg(bundle_add)
    
    # Step 4: Commit bundle (atomically applies all messages)
    commit_request = parser.OFPBundleCtrlMsg(
        datapath,
        bundle_id=bundle_id,
        type_=ofproto.OFPBCT_COMMIT_REQUEST,
        flags=ofproto.OFPBF_ATOMIC,
        properties=[]
    )
    datapath.send_msg(commit_request)
    
    # If commit fails, all changes are rolled back
    # Controller receives ERROR message indicating failure

Scheduled Bundles

Summary and Key Takeaways

Controller-switch communication is the lifeblood of SDN. Understanding these patterns enables you to build responsive, reliable, and scalable software-defined networks.

Core Concepts Mastered

•Connection lifecycle: TCP → TLS → HELLO → FEATURES → Configuration → Steady state
•Reactive installation: Controller responds to PACKET_IN; good for learning, poor for scale
•Proactive installation: Pre-computed flows; zero first-packet latency, excellent for production
•Asynchronous handling: PACKET_IN, PORT_STATUS, FLOW_REMOVED drive controller logic
•Multi-controller: Master/Slave/Equal roles with generation IDs for split-brain prevention
•Scalability: Proactive flows, batching, aggregation, horizontal scaling mitigate bottlenecks
•Barrier/Bundles: Synchronization primitives for ordered processing and atomic updates

What's Next:

Page Complete

4 / 5