Baselines for dynamic thresholds for monitors

Many agents monitor things like network traffic, which can vary throughout the day.

Monitors usually contain rules to generate an alert—and possibly execute other actions—when something like the rate of incoming packets exceeds a certain value. This value is commonly referred to as a threshold.

It is difficult for a script writer to choose a threshold value that is appropriate for all switches in all networks. For example:

  • The rate of incoming traffic for one switch might be relatively high, even under normal circumstances.

  • On a different switch on a different network, that same incoming traffic rate might be considered abnormally high, and the network admin would want an alert to be generated in those circumstances.

  • If the script writer chooses a threshold based on the lower incoming traffic rate, the agent monitoring the high-traffic switch will either generate numerous alerts or will remain in an alert state for most of the time. Because the threshold is lower than a traffic rate that is considered normal for that switch, the alerts generated because of the lower thresholds are considered "false positives" by the network administrator.

The Baseline function provides a way to specify thresholds that are appropriate to the network conditions on the switch. When an agent is created and enabled, it spends a specified amount of time "learning" about the data it is monitoring before it sets thresholds that are calculated based on what it learned.

In addition, these thresholds are dynamic. The agent continues to learn about the data it monitors and the Baseline function adjusts the thresholds accordingly. For example, if the lower-traffic switch starts to get consistently higher incoming traffic rates, the Baseline function adjusts the thresholds to reflect the newly learned rates.

If desired, the script writer can specify default thresholds that can be used to determine when to set and clear alerts while the agent in a learning state. Otherwise the agent does not generate alerts while it learning about the data.

The methods used to determine the baseline from which to calculate the thresholds—both during the initial learning state and over time—depend on the algorithm selected in the Baseline function.