Management Module failover overview

There are two types of Management Module (MM) failover:
  • Controlled failover: The user triggers this type of failover by rebooting the Active MM or running the redundancy switchover command.

  • Uncontrolled failover: This type of failover is triggered by unexpected events like a crash on the Active MM or hot removal of the Active MM.

In a dual MM chassis, the Standby MM detects the failover in one of the following ways:
  • A mailbox interrupt is received from the Active MM to indicate takeover. This interrupt can come for controlled or uncontrolled failover (except for a hot removal).

  • Active MM hot removal detection.

  • Heartbeat loss detected on the Standby MM for more than 10 seconds.
    NOTE:

    If the Active MM is not responding and is still not detected by the first two methods, it will be caught by this method.

Failover requirements:
  • The Standby MM must be present to trigger a failover. An Unassigned MM will never trigger a failover.

  • The Redundant Management Daemon (hpe-rdntmgmtd) is responsible for triggering failover from the Standby MM.

  • When a failover is triggered, the Standby MM takes over and becomes Active and the old Active MM is rebooted.

Standby recover requirements:
  • The Active MM must be present to trigger a recover.

  • The Redundant Management Daemon (hpe-rdntmgmtd) is responsible for triggering recover from the Active MM.

  • When a recover is triggered, the Active MM reboots the nonresponsive Standby MM. This action occurs for any of the following conditions:

Condition: Heartbeat lost from Active MM:
  • The failover monitor thread on the Standby MM will increment the heartbeat failed count.

  • The hpe-rdntmgmtd daemon on the Standby MM will:
    • Detect the failover condition due to heartbeat fail count increasing past the maximum of 10 and triggering failover

    • Initiate reboot of the Active MM.

  • Active MM will join as a standby after reboot.

Condition: Heartbeat lost from Standby MM:
  • The recover monitor thread on the Active MM will increment the heartbeat failed count.

  • The hpe-rdntmgmtd daemon on the Active MM will:
    • Detect the recover condition due to heartbeat fail count increasing past the maximum of 7 and triggering recover.

    • Initiate reboot of Standby MM.

  • Standby MM will join as a standby after reboot.

Condition: Planned reboot of Active MM:
  • A planned reboot on the Active MM will send a failover command to the Standby MM.

  • The hpe-rdntmgmtd daemon on the Standby MM will:
    • Process this command and perform a failover immediately instead of waiting for the failover monitor to detect it using heartbeats.

    • Initiate reboot of the Active MM.

  • Active MM will join as a standby after reboot.

Condition: Removal of Active MM:
  • Removal of the Active MM from Slot 1 triggers the hpe-rdntmgmtd daemon on the Standby MM to initiate failover immediately instead of waiting for the failover monitor to detect it using heartbeats.

  • Active MM will join as a standby after reboot.

Condition: Crash on Active MM:
  • A crash on the Active MM is handled by the crash handler, which sends a failover command to the Standby MM.

  • The hpe-rdntmgmtd daemon on the Standby MM will:
    • Process this command and perform failover immediately instead of waiting for the failover monitor to detect it using heartbeats.

    • Initiate reboot of the Active MM.

  • Active MM will join as a standby after reboot.

Condition: redundancy switchover command:
  • User executes the redundancy switchover command on the Active MM.

  • This action will send a takeover signal to the Standby MM and reboot the Active MM.

  • The hpe-rdntmgmtd daemon on Standby MM will process this takeover signal and perform failover immediately.

  • Active MM will join as a standby after reboot.

Why did my second MM not take over after Active failed?

This action will happen if the second MM is not Standby-Ready.
NOTE:

The second MM must be elected as Standby and in a ready state before failover. If not, a double fault occurs and the second MM will not take over.