This chapter describes the configuration needed to support High Availability (HA) for controllers to OpenFlow switches. This is done by creating
region configurations in the controllers using the REST APIs provided by the Role Orchestration Service (ROS).
Putting the region configurations in place in a controller team ensures seamless failover and failback among the configured controllers for the specified network devices in a region. That is, when a master controller experiences a fault, the Role Orchestration Service ensures that a slave controller immediately assumes the master role over the group of network devices to which the failed controller was in the master role. Once the failed controller recovers and rejoins the team, the Role Orchestration Service ensures restoration of this controller’s role; that is, the rejoining controller takes back the role for which it was configured with respect to the other network devices. If the controller was configured to operate as the master in a region, then it would be restored to the master role. If it was configured to operate in the slave role, it would resume operation in the slave role.
Once the region definition(s) are in place, the ROS ensures that a master controller is always available to the respective network element(s) even if the configured master fails or there is a disruption of the communication channel between the controller and the network device(s).
NOTE: All region configuration operations (create, update, refresh, and delete) using the REST API require that every controller specified in the region, including the master controller and all slave controllers, be in an active state. If any controller in the region is in a "down" state, then the region configuration operations are disallowed
Controller failure: The ROS detects a controller failure in a team through notifications from the teaming subsystem. If ROS determines that the failed controller instance was a master for any region, it immediately elects one of the backup (slave) controllers to assume the master role over the affected region.
Device disconnect: The ROS instance in a controller is notified of a communication failure with network device(s) through the Controller Service notifications. It instantly communicates with all ROS instances in the team to determine if the network device(s) in question are still connected to any of the backup (slave) controllers within the team. If that is the case, it elects one of the slaves to assume the master role over the affected network device(s).
When the configured master recovers from a failure and rejoins the team, or when the connection from the disconnected device(s) with the original master is resumed, ROS initiates a failback operation in which the master role is restored to the configured master as defined in the region definition.
NOTE: Examples of cURL commands in this guide use the