Distributed Coordination Service

The Distributed Coordination Service provides the building blocks to achieve high availability in the HPE VAN SDN Controller environment.

The Distributed Coordination Service is a private message system bus between teamed controllers and is available to internal applications using Java APIs only. This guide includes examples of Java applications using this service.

The Distributed Coordination Service includes:

  • Publish Subscribe Service

  • Distributed maps

  • Distributed locks

  • Peer monitoring

Application view of the Distributed Coordination Service

Application view of coordination service in a three-controller team illustrates the communication between the controller applications and the HPE VAN SDN Controller Distributed Coordination infrastructure. Each of the large boxes represents an instance of a controller in a three-controller team. The two applications, App-1 and App-2, are installed on each of the controller instances and coordinate with each other and with controller using the Distributed Coordination Service of the controller.

Application view of coordination service in a three-controller team

Publish Subscribe Service

In a publish/subscribe message service, senders of messages, called publishers, do not send messages directly to specific receivers, called subscribers. Instead, published messages are characterized into classes, without knowledge of what subscribers there might be, if any. Similarly, subscribers register to receive all messages of one or more message classes, without knowledge of what publishers there might be, if any.

The Publish Subscribe Service for the controller provides a way for applications to communicate within a controller team. The Publish Subscribe Service provides a mechanism to enable several applications on different controllers to register for various types of bus messages, and then to send and receive messages without being required to handle delivery failures or out of order delivery.

When an application pushes a message, all the subscribers to that message type for active members of the team are notified regardless of their location in the controller team.

More information

Using the Publish Subscribe service

Message order and global ordering

In a typical implementation of global ordering, the designated reference node propagates the messages in the order they are received, so you cannot use global ordering to determine the order in which messages were generated.

Enabling global ordering can affect both performance and the order in which messages are received by each subscriber:

  • Enabling global ordering ensures that all messages from all publishers are received by all subscribers in the same order, but performance can be degraded.

  • Disabling global ordering maximizes performance, but two different subscribers could receive messages from different publishers in different orders.

  • For the controller, global ordering is disabled by default to maximize performance, but the Publish Subscribe service provides global ordering for specific message types.

By providing global ordering for specific message types, the Publish Subscribe service maximizes performance while ensuring that messages from the same publisher are ordered, even if global ordering is disabled. For example:

  • Let A and B be message publishers (sources).

  • Let R and W be message subscribers (receivers).

  • Assume A sends messages a1 a2 a3 in that order.

  • Assume B sends messages b1 b2 b3 in that order.

  • Let a1 b1 a2 a3 b2 b3 be the sequence of messages received by R.

With or without global ordering, The Publish Subscribe service ensures messages are ordered such that:

  • a1 arrives before a2

  • a2 arrives before a3

  • b1 arrives before b2

  • b2 arrives before b3

If global ordering is enabled, W receives messages in the same order as received by R.

If global ordering is disabled, W might or might not receive messages in the same order as received by R, but the Publish Subscribe service ensures that messages from each source arrive in the same order. For example, possible message sequences include but are not limited to the following:

  • a1 b1 a2 a3 b2 b3

  • b1 a1 b2 a2 b3 a3

  • a1 a2 a3 b1 b2 b3

  • a1 b1 b2 a2 b3 a3

Distributed map

A Distributed Map is a class of a decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a Distributed Map, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a Distributed Map to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

More information

Implementing a distributed map

Distributed lock

Protecting the access to shared resources becomes increasingly important in a distributed environment. A lock is a synchronization primitive that ensures only a single thread is able to access a critical section. Distributed locks offered by the Coordination Service provides an implementation of locks for distributed environments where threads can run either in the same JVM or in different JVMs.

More information

Implementing a distributed lock

Peer monitoring

In distributed environments such as controller teams, components, services, and applications running in a controller have peers that run on the other controllers in the team. A given application, component, or service might need to coordinate with its peers, and therefore must be aware of the status of each peer.

One approach to this problem is to define the status for an entire node, but this approach does not work for components that are initialized after the layer that manages status for the node is initialized. The status layer might show as initialized, but a component or application required by its peer might not have completed its initialization.

To avoid the initialization problem, you can use the peer monitoring service and APIs to implement an approach that monitors the status of a specific peer instead of the less-accurate method of attempting to monitor entire nodes.

More information

Implementing peer monitoring