Troubleshooting stacking

Troubleshooting OOBM and split stack issues

If all OOBM ports in the stack are in the same VLAN, you can use the show oobm commands to view the current state of all switches. For example, if you have a four-member chain and member 4 fails or loses power, a stack split occurs with an active fragment on members 1-2-3 and an inactive fragment on member 4.

There is one IP address for the active fragment, which can be statically set by assigning an IP address to the global OOBM port.

If the stack splits, you can connect to the Active Fragment using the global OOBM IP address and then enter the show oobm discovery command to see if this active fragment has discovered any other members that are connected using the OOBM LAN.

In the following four member chain example, the user connected using the global IP address of 10.0.11.49, logged on, and entered the show stacking command.

Displaying stacking member status

HP Stack 2920#: show stacking

Stack  ID          : 00011cc1-de4d87c0

MAC  Address       : 1cc1de -4d87e5
Stack  Topology    : Chain
Stack  Status      : Fragment
Active Uptime      : 0d 0h 5m
Software  Version  : WB.15.11.0000x

Mbr
ID  Mac Address    Model                                 Pri  Status
--- ------------- -------------------------------------- --- ---------------
1   1cc1de-4d87c0  HP J9727A 2920-24G-PoE+-2SFP+ Switch  128  Commander
2   1cc1de-4dc740  HP J9727A 2920-24G-PoE+-2SFP+ Switch  128  Member
3   1cc1de-4dbd40  HP J9726A 2920-24G-2SFP+ Switch       128  Standby
4   1cc1de-4da900  HP J9728A 2920-48G-4SFP+ Switch       200  Missing

The user then entered the show oobm discovery command to see if the members were discoverable using OOBM.

Displaying oobm discovery

HP Stack 2920#:  show oobm discovery

Active Stack Fragment(discovered) IP Address  : 10.0.11.49

Mbr          
ID   Mac Address   Status
--- -------------- ----------
1   1cc1de-4d87c0  Commander
2   1cc1de-4dc740  Member
3   1cc1de-4dbd40  Standby

Inactive Stack Fragment(discovered) IP Address  : 10.0.10.98

Mbr      
ID  Mac Address    Status
--- -------------- ----------
4   1cc1de-4da900  Commander

Member 4 is up but is an inactive fragment with an addressable IP address that can be used to connect to it.

Connecting to a stack fragment

HP Stack 2920#: telnet 10.0.10.98 oobm

HP J9728A 2920-48G-4SFP+ Switch
Software revision WB.15.11.0000x

Copyright (C) 1991-2013 Hewlett-Packard Development Company, L.P.

RESTRICTED RIGHTS LEGEND
Confidential computer software. Valid license from HP required for possession,
use or copying. Consistent with FAR 12.211 and 12.212, Commercial
Computer Software, Computer Software Documentation, and Technical
Data for Commercial Items are licensed to the U.S. Government under
vendor's standard commercial license.
HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
20555 State Highway 249, Houston, TX 77070

Enter the show stacking command.

Displaying missing stack members

HP Stack 2920#: show stacking

Stack  Topology      : Chain
Stack  Status        : Fragment Inactive
Uptime               : 0d 0h 7m
Software  Version    : WB.15.11.0000x


Mbr
ID  Mac Address   Model                                  Pri  Status
--- ------------- -------------------------------------- --- ---------------
1   1cc1de-4d87c0 HP J9727A 2920-24G-PoE+-2SFP+ Switch   128  Missing
2   1cc1de-4dc740 HP J9727A 2920-24G-PoE+-2SFP+ Switch   128  Missing
3   1cc1de-4dbd40 HP J9726A 2920-24G-2SFP+ Switch        128  Missing
4   1cc1de-4da900 HP J9728A 2920-48G-4SFP+ Switch        200  Commander

Confirm by entering the oobm discovery command. Member 4 is down.

Confirming stack member 4 is down

HP Stack 2920#: show oobm discovery

Inactive Stack Fragment(discovered) IP Address: 10.0.10.98

Mbr  
ID   Mac Address   Status
--- -------------- ----------
4   1cc1de-4da900 Commander

Active Stack Fragment(discovered) IP Address: 10.0.11.49

Mbr  
ID   Mac Address   Status
--- -------------- ----------
1   1cc1de-4d87c0  Commander
2   1cc1de-4dc740  Member
3   1cc1de-4dbd40  Member

OOBM and active and inactive fragments

When there are two fragments are the same size and OOBM is enabled and configured for all members of the stack, then the fragment that contains the Commander will be Active and the other fragment becomes Inactive. However, if OOBM is not enabled and configured for all members of the stack, then each fragment will be Active. To resolve this issue, OOBM must be enabled and configured.

Below are the behaviors observed with two 2920 member stack in a chain configuration and the stacking cable is removed to simulate a failure that results in two identical size fragments:

HP Stack 2920# show stack
 
Stack ID         : 01002c59-e51c5bc0
 
MAC Address      : 2c59e5-1c5be3
Stack Topology   : Chain
Stack Status     : Active
Split Policy     : One-Fragment-Up
Uptime           : 0d 0h 13m
Software Version : WB.15.16.0005
 
 Mbr
 ID  Mac Address   Model                                  Pri Status
 --- ------------- -------------------------------------- --- ---------------
 1   2c59e5-1c5bc0 HP J9726A 2920-24G Switch              128 Commander
 2   d4c9ef-a49e00 HP J9726A 2920-24G Switch              128 Standby

HP Stack 2920# show stack stack-ports
  Member Stacking Port State    Peer Member Peer Port
  ------ ------------- -------- ----------- ---------
  1      1             Up       2           2
  1      2             Down     0           0
  2      1             Down     0           0
  2      2             Up       1           1
Results without OOBM configured:

From physical console of Member 1 (original stack commander):

HP Stack 2920# show stack
 
Stack ID         : 01002c59-e51c5bc0
 
MAC Address      : 2c59e5-1c5be3
Stack Topology   : Chain
Stack Status     : 

Fragment Active
Split Policy     : One-Fragment-Up
Uptime           : 0d 0h 09m
Software Version : WB.15.16.0005
 
 Mbr
 ID  Mac Address   Model                                  Pri Status
 --- ------------- -------------------------------------- --- ---------------
 1   2c59e5-1c5bc0 HP J9726A 2920-24G Switch              128 

Commander
 2   d4c9ef-a49e00 HP J9726A 2920-24G Switch              128 Missing

From physical console of Member 2:

HP Stack 2920# show stack
 
Stack ID         : 01002c59-e51c5bc0
 
MAC Address      : 2c59e5-1c5be3
Stack Topology   : Chain
Stack Status     : 

Fragment Active
Split Policy     : One-Fragment-Up
Uptime           : 0d 0h 9m
Software Version : WB.15.16.0005
 
 Mbr
 ID  Mac Address   Model                                  Pri Status
 --- ------------- -------------------------------------- --- ---------------
 1   2c59e5-1c5bc0 HP J9726A 2920-24G Switch              128 Missing
 2   d4c9ef-a49e00 HP J9726A 2920-24G Switch              128 

Commander

Note that both switches are Commanders and both Fragments remain Active.

Results with OOBM configured:

OOBM is now enabled with both oobm ports connected to the same network segment:

HP Stack 2920# sh oobm ip
 
  IPv4 Status    : Enabled
  IPv4 Default Gateway : 10.238.255.254
 
        |                                     Address  Interface
Member | IP Config IP Address/Prefix Length  Status   Status
------ + --------- ------------------------- -------- ---------
Global | manual    10.238.123.37/16          Active   Up
1      | manual    10.238.123.37/16          Active   Up
2      | manual    10.238.123.39/16          Active   Up


HP Stack 2920# show stack 

(from console of member 1)
 
Stack ID         : 01002c59-e51c5bc0
 
MAC Address      : 2c59e5-1c5be3
Stack Topology   : Chain
Stack Status     : 

Fragment Active
Split Policy     : One-Fragment-Up
Uptime           : 0d 0h 16m
Software Version : WB.15.16.0005
 
 Mbr
 ID  Mac Address   Model                                  Pri Status
 --- ------------- -------------------------------------- --- ---------------
 1   2c59e5-1c5bc0 HP J9726A 2920-24G Switch              128 Commander
 2   d4c9ef-a49e00 HP J9726A 2920-24G Switch              128 Missing


HP Stack 2920# show stack 

(from console of member 2)
 
Stack ID         : 01002c59-e51c5bc0
 
MAC Address      : 2c59e5-1c5be3
Stack Topology   : Chain
Stack Status     : 

Fragment Inactive
Split Policy     : One-Fragment-Up
Uptime           : 497d 1h 57m
Software Version : WB.15.16.0005
 
 Mbr
 ID  Mac Address   Model                                  Pri Status
 --- ------------- -------------------------------------- --- ---------------
 1   2c59e5-1c5bc0 HP J9726A 2920-24G Switch              128 Missing
 2   d4c9ef-a49e00 HP J9726A 2920-24G Switch              128 Commander

Note that both switches are Commanders, but that the stack fragment containing member 2 is listed as "Fragment Inactive". This is the correct behavior and why it is important that OOBM is enabled and configured for all members of the stack.

Using fault recovery/troubleshooting tools

Stacking provides tools and logging information to aid in troubleshooting problems specific to stacking. Problems may include:

  • Installation/deployment issues

  • Problems with initial stack creation

    • Problems with adding or removing members

    • Booting an existing stack

  • Stacking failures encountered while running an existing stack

The tools used in troubleshooting problems are:

  • Event Log

  • Show commands

    • Show stacking

    • Show system

    • Show boot history

  • Show tech

  • LEDs

Troubleshooting installation and deployment issues

Installation and deployment issues include the initial deployment or creation of a stack, as well as adding additional members or removing members from a stack.

Problem:

When using the Deterministic method, one or more of the statically provisioned members did not join the stack.

Possible reasons for a switch not joining a stack:

  • The switch being added is already a member of another stack and has a different stack ID.

  • The maximum number of switches is already configured.

  • The switch being added has been statically provisioned. The MAC address matches, but the switch type does not.

  • There is a problem with the stack cable.

  • The stack cables are connected in a way that creates an unsupported topology.

  • Stack module failure.

Solution:

Perform a diagnostic fingerprint.

Troubleshooting issues when adding or removing members in the stack

Problem:

Cannot add a new switch to an existing stack.

Solution:

Identify root cause. Possible reasons for a member not joining an existing stack are:

  • The switch being added has already been a member of another stack and has a different stack ID.

  • The maximum number of switches is already configured.

  • The switch being added has been statically provisioned, but switch type and MAC address in the configuration do not match the switch being added.

  • There is a problem with the stack cable.

  • There is a problem with the stack physical cabling (illegal topology).

Problem:

The entire stack does not come up after a boot.

Solution:

There are several reasons why all members do not join the stack:

  • There is a problem with the stack cable.

  • Physical cabling was changed.

  • Stack booted on incorrect configuration.

  • One or more of the switches has a hardware problem (for example, bad power supply, back stacking module, corrupt flash).

Problem:

One or more of the members keeps rebooting and does not join the stack.

Possible reasons:

  • An unresponsive member.

  • Heartbeat loss–a stack that has a member no longer in the stack or a member failing after joining the stack.

  • Illegal topology.

Problem:

After initial boot sequence, the activity and Link LEDs of an interface are not on and the ports are not passing traffic.

Solutions:

  • Identify the “inactive fragment” and provide alternatives for recovery.

  • Verify that all OOBMs are connected so that there is uninterrupted access.

Problem:

After a reboot, the selected Command or Standby are not the expected switches.

Solutions:

Check to see if the log files provide a reason why the Commander and Standby were chosen and which rule they matched.

Troubleshooting a strictly provisioned, mismatched MAC address

When switches are strictly provisioned, it is possible to enter an incorrect type or incorrect MAC address. If this occurs, the switch does not match the intended configuration entry and stacking attempts to add this switch as a new “plug-and-go” switch. If the stacking configuration already has 10 switches, then the “plug-and-go” fails.

The following example shows a stack with 3 members. There is a new J9728A switch that is supposed to be member 4; however, the MAC address was mis-typed, and there is an “opening” for a plug-and-go at member 4. It will join as member 4.

Displaying a stack with 3 members

This shows the stack before boot.

HP Stack 2920(stacking)#: show stacking

Stack ID         : 00031cc1-de4d87c0
MAC Address      : 1cc1de-4dc765
Stack Topology   : Ring021560
Stack Status     : Active
Uptime           : 0d 0h 56m
Software Version : WB.15.11.0000x

Mbr
ID  Mac Address   Model                                 Pri Status
--- ------------- ------------------------------------- --- --------------
1   1cc1de-4d87c0 HP J9727A 2920-24G-PoE+-2SFP+ Switch  200 Standby
2   1cc1de-4dc740 HP J9727A 2920-24G-PoE+-2SFP+ Switch  128 Commander
3   1cc1de-4dbd40 HP J9726A 2920-24G Switch             128 Member

Displaying a member joining the stack

This shows that, after booting, the switch is joined as member 4.

HP Stack 2920(config)#: show stacking

Stack ID : 00031cc1-de4d87c0
MAC Address : 1cc1de-4dc765
Stack Topology : Ring
Stack Status : Active
Uptime : 0d 1h 11m
Software Version : WB.15.11.0000x

Mbr
ID  Mac Address   Model                                  Pri Status
--- ------------- -------------------------------------- --- -------------
1   1cc1de-4d87c0 HP J9727A 2920-24G-PoE+-2SFP+ Switch   200 Standby
2   1cc1de-4dc740 HP J9727A 2920-24G-PoE+-2SFP+ Switch   128 Commander
3   1cc1de-4dbd40 HP J9726A 2920-24G Switch              128 Member
4   1cc1de-4d79c0 HP J9728A 2920-48G-4SFP+ Switch        128 Member

To correct this issue:

  1. Write down the correct MAC address.

  2. Remove the member that was added using plug-and-go with the strictly provisioned, mismatched MAC address as shown in the following example.

  3. Update the strictly provisioned entry with the correct MAC address.

  4. Boot the switch.

Removing a member and updating the entry with a MAC address

HP Stack 2920(config)#: stacking member 4 remove reboot

The specified stack member will be removed from the stack and
its configuration will be erased. The resulting configuration
will be saved. The stack member will be rebooted and join as
a new member. Continue [y/n]? 

Y

HP Stack 2920(config)#: stacking member 4 type J9728A mac-address

Showing that member 4 joined the stack

This shows that member 4 has joined the stack.

HP Stack 2920(config)#: show stacking

Stack ID         : 00031cc1-de4d87c0
MAC Address      : 1cc1de-4dc765
Stack Topology   : Ring
Stack Status     : Active
Uptime           : 0d 1h 19m
Software Version : WB.15.11.0000x

Mbr
ID  Mac Address   Model                                  Pri Status
--- ------------- -------------------------------------- --- ---------------
1   1cc1de-4d87c0 HP J9727A 2920-24G-PoE+-2SFP+ Switch   200  Standby
2   1cc1de-4dc740 HP J9727A 2920-24G-PoE+-2SFP+ Switch   128  Commander
3   1cc1de-4dbd40 HP J9726A 2920-24G-2SFP+ Switch        128  Member
4   1cc1de-4d79c0 HP J9728A 2920-48G-4SFP+ Switch        175  Member

Troubleshooting a mismatched stack-ID

Displaying a stack with 3 unjoined switches

This stack example has two members and two more members that were strictly provisioned, following the initial install deterministic method.

HP Stack 2920#: show stack

Stack ID         : 00031cc1-de4d87c0
MAC Address      : 1cc1de-4dc765
Stack Topology   : Chain
Stack Status     : Active
Uptime           : 0d 0h 2m
Software Version : WB.15.11.0000x

Mbr
ID  Mac Address   Model                                 Pri Status
--- -----------   ------------------------------------- --- -----------
1   1cc1de-4d87c0 HP J9727A 2920-24G-PoE+-2SFP+ Switch  200 Standby
2   1cc1de-4dc740 HP J9729A 2920-24G-PoE+-2SFP+ Switch  128 Commander
3   1cc1de-4dbd40 HP J9726A 2920-24G-2SFP+ Switch       128 Not Joined
4   1cc1de-4d79c0 HP J9728A 2920-48G-4SFP+ Switch       175 Not Joined

When powering on switch #:3, it does not join the stack.

Logging output

HP Stack 2920 #: show logging -r -s
I 10/02/00 00:46:56 02558 chassis: ST1-STBY: Stack port 3 is now on-line.
I 10/02/00 00:46:56 02558 chassis: ST2-CMDR: Stack port 2 is now on-line.

The stack ports for the new switch appear online, but the show stacking command shows that the switch has not been recognized.

Displaying the switch is not recognized

HP Stack 2920(config)#: show stacking stack-ports member 1,2

Member 1

Member Stacking Port  State Peer Member   Peer Port
------ -------------- ----- ------------- -------
1      1              Down  0             0
1      2              Up    2             1
1      3              Down  0             0
1      4              Down  0             0

Member 2

Member Stacking Port  State Peer Member   Peer Port
------ -------------- ----- ------------- -------
2      1              Up    1             2
2      2              Up    0             0
2      3              Down  0             0
2      4              Down  0             0

The show stacking command does not show that the member is “Not Joined.”

A log file indicates that a “topo /hello” was seen from a switch that was not part of the current stack ID. The console of the switch that should have been member 3 shows the following example output.

Output from the “not joined” switch

HP Stack 2920#: show stacking

Stack ID         : 00011cc1-de4dbd40
MAC Address      : 1cc1de-4dbd64
Stack Topology   : Unknown
Stack Status     : Active
Uptime           : 0d 0h 1m
Software Version : WB.15.11.0000x

Mbr
ID  Mac Address   Model                             Pri Status
--- ------------- --------------------------------- --- -------------------
1   1cc1de-4dbd40 HP J9726A 2920-24G-2SFP+ Switch   128 Commander

The output is different if you have an inactive fragment, since this switch can have the configuration from an old stack. In this case, it might be inactive and show ‘missing’ switches from the old configuration. The stack-id value does not match the stack ID of the HP Stack 2920 stacking factory reset.

HP Stack 2920#: stacking factory-reset
Configuration will be deleted and device rebooted,continue [y/n]? 


Y

To join this switch to the other stack, execute the stacking factory-reset command to erase all of the stale stacking configuration information. This command automatically reboots the switch and on its subsequent boot, the switch is able to join the new stack.

Troubleshooting a strictly provisioned, mismatched type

When the MAC address matches a strictly provisioned configuration, it either matches the configured type and succeeds, or it does not match the type and fails. This is because the MAC address is unique and you cannot have duplicate MAC addresses.

The log messages indicate that this was the type of failure. The information in the log message helps you correct the configuration.

The switch that fails to join automatically reboots. Execute the show stacking command to view the mis-configured entry.

Displaying the mis-configured entry

HP Stack 2920(config)#: show stacking

Stack ID         : 00011cc1-de4d87c0
MAC Address      : 1cc1de-4d87e5
Stack Topology   : Chain
Stack Status     : Active
Uptime           : 4d 0h 2m
Software Version : WB.15.11.0000x

Mbr
ID  Mac Address   Model                                 Pri Status
--- ------------- ------------------------------------- --- ----------------
1   1cc1de-4d87c0 HP J9727A 2920-24G-PoE+-2SFP+ Switch  128 Commander
2   1cc1de-4dc740 HP J9727A 2920-24G-PoE+-2SFP+ Switch  128 Standby
3   1cc1de-4dbd40 HP J9726A 2920-24G-2SFP+ Switch       128 Member
4   1cc1de-4da900 HP J9726A 2920-24G-2SFP+ Switch       128 Not Joined

The configuration entry for member 4 matches a J9726A switch that will be added, however, it will fail because it is configured as a J9728A switch.

The following example shows the log entries with the failure to join the stack.

Log entries displaying stacking failures

W 10/06/00 03:24:37 03255 stacking: ST2-STBY: Provisioned switch with Member ID 4 
  removed due to loss of communication
I 10/06/00 03:24:37 02558 chassis: ST2-STBY: Stack port 4 is now on-line.
I 10/06/00 03:24:35 02558 chassis: ST4-MMBR: Stack port 2 is now on-line.
W 10/06/00 03:24:35 03274 stacking: ST1-CMDR: Member 4 (1cc1de-4da900) 
  cannot join stack due to incorrect product id: J9278A

You cannot re-type the configuration command with the same MAC address, member ID, and a different J-number. You must remove the configuration and then reconfigure this switch member entry.

Removing a stack member and reconfiguring

HP Stack 2920(config)#: stacking member 5 remove

The specified stack member configuration will be erased. The
resulting configuration will be saved. Continue [y/n]? 

Y

HP Stack 2920(config)#: stacking member 5 type J9726A mac 1cc1de-4da900
This will save the current configuration. Continue [y/n]? 

Y

Stack ID : 00011cc1-de4d87c0

tty=ansi HP Stack 2920(config)#: show stacking
Strictly provisioned: Mis-Matched Type

Stack ID         : 00011cc1-de4d87c0
MAC Address      : 1cc1de-4d87e5
Stack Topology   : Ring
Stack Status     : Active
Uptime           : 4d 0h 35m
Software Version : WB.15.11.0000x

Mbr
ID  Mac Address   Model                                 Pri Status
--- ------------- ------------------------------------- --- ---------------
1   1cc1de-4d87c0 HP J9727A 2920-24G-PoE+-2SFP+ Switch  128 Commander
2   1cc1de-4dc740 HP J9727A 2920-24G-PoE+-2SFP+ Switch  128 Standby
3   1cc1de-4dbd40 HP J9726A 2920-24G-2SFP+ Switch       128 Member
4   1cc1de-4da900 HP J9728A 2920-48G-4SFP+ Switch       128 Not Joined

Boot the switch with the matching MAC/Type.

Displaying joined stack members

HP Stack 2920(config)#: show stacking

Stack ID         : 00011cc1-de4d87c0
MAC Address      : 1cc1de-4d87e5
Stack Topology   : Ring
Stack Status     : Active
Uptime           : 4d 0h 50m
Software Version : WB.15.11.0000x

Mbr
ID  Mac Address   Model                                 Pri Status
--- ------------- ------------------------------------- --- ----------------
1   1cc1de-4d87c0 HP J9727A 2920-24G-PoE+-2SFP+ Switch  128 Commander
2   1cc1de-4dc740 HP J9727A 2920-24G-PoE+-2SFP+ Switch  128 Standby
3   1cc1de-4dbd40 HP J9726A 2920-24G-2SFP+ Switch       128 Member
4   1cc1de-4d79c0 HP J9728A 2920-48G-4SFP+ Switch       175 Member

Troubleshooting maximum stack members exceeded

This failure can happen if you have an active stack that has already reached its maximum number of members. It can also happen when the maximum number of switches is reached with a combination of active members and strictly provisioned members.

Since one of the suggested initial deployment techniques is a deterministic method using strictly provisioned entries, this failure example demonstrates what occurs if the maximum number of members is reached by strictly provisioning ten members. At least one of these configuration entries has an incorrect MAC address. Similar to the mismatched MAC address example, the stack attempts to “plug-and-go” to add the switch, however, since the maximum number of members has already been reached, the switch cannot join the stack.

The following example shows the show stacking output before the switch attempts to join.

Displaying stack members before the join

HP Stack 2920(config)#: show stacking

Stack ID         : 00031cc1-de4d87c0
MAC Address      : 1cc1de-4dc765
Stack Topology   : Ring
Stack Status     : Active
Uptime           : 0d 1h 27m
Software Version : WB.15.11.0000x

Mbr
ID  Mac Address   Model                                  Pri Status
--- ------------- -------------------------------------- --- ---------------
1   1cc1de-4d87c0 HP J9727A 2920-24G-PoE+-2SFP+ Switch   200 Standby
2   1cc1de-4dc740 HP J9727A 2920-24G-PoE+-2SFP+ Switch   128 Commander
3   1cc1de-4dbd40 HP J9726A 2920-24G Switch              128 Member
4   1cc1de-4d79c0 HP J9728A 2920-48G Switch              175 Member
5   1cc1de-000005 HP J9728A 2920-48G Switch              128 Not Joined

When a switch that does not match the MAC addresses attempts to join, that switch reboots when the maximum configuration is detected. The active stack logs the following:

W 10/07/00 06:01:11 03253 stacking: ST3–CMDR: Maximum number of switches in the stack has been reached. 
 Cannot add 1cc1de-4da900 type J9728A

The failure can be due to one of the strictly provisioned entries being incorrect. To correct this entry, reboot the switch. If there are already 10 switches in the stack, you cannot add additional switches at this time.

Troubleshooting a bad cable

Bad cables can cause the stack port to flap or go down completely. If there are an excessive number of port flaps, the port is disabled and the following log message appears:

W 10/06/00 23:23:16 02560 chassis: ST4–CMDR: Stack port 1 disabled due to excessive errors. Check cable.
  To reenable use 'stacking member 4 stack-port 1 enable'.

When this occurs, the show stacking stack-ports command shows the port with a status of “Disabled”.

Displaying a disabled stack port

HP Stack 2920$ show stacking stack-ports

Member   Stacking  Port State  Peer Member Peer Port
------------------------------------------------------
1          1         Up           5            2
1          2         Up           2            1
1          3         Up           3            3
1          4         Up           4            3
2          1         Up           1            2
2          2         Up           3            1
2          3         Up           4            4
2          4         Up           5            3
3          1         Up           2            2
3          2         Down         0            0
3          3         Up           1            3
3          4         Up           5            4
4          1         Disabled     0            0
4          2         Up           5            1
4          3         Up           1            4
4          4         Up           2            3
5          1         Up           4            2
5          2         Up           1            1
5          3         Up           2            4
5          4         Up           3            4

If the cable failure is more solid, the port is in the DOWN state. The logs show any transition.

I 10/07/00 06:01:15 02559 chassis: ST4-STBY: Stack port 3 is now off-line.
I 10/07/00 06:01:16 02559 chassis: ST3-CMDR: Stack port 3 is now off-line.
I 10/07/00 06:01:16 02559 chassis: ST2-MMBR: Stack port 1 is now off-line.
I 10/07/00 06:01:15 02559 chassis: ST5-MMBR: Stack port 2 is now off-line.
I 10/07/00 06:01:15 02558 chassis: ST2-MMBR: Stack port 1 is now on-line.
I 10/07/00 06:01:12 02558 chassis: ST5-MMBR: Stack port 2 is now on-line.
I 10/07/00 06:01:10 02558 chassis: ST4-STBY: Stack port 3 is now on-line.

The following example shows member 3, port 2, which should be connected to member 4, port 1. The ports are down because the cable is bad or disconnected.

Displaying that two ports are down due to a bad connection

HP Stack 2920#: show stacking stack-ports

Member   Stacking  Port State  Peer Member Peer Port
------------------------------------------------------
1          1         Up           5            2
1          2         Up           2            1
1          3         Up           3            3
1          4         Up           4            3
2          1         Up           1            2
2          2         Up           3            1
2          3         Up           4            4
2          4         Up           5            3
3          1         Up           2            2
3          2         Down         0            0
3          3         Up           1            3
3          4         Up           5            4
4          1         Down         0            0
4          2         Up           5            1
4          3         Up           1            4
4          4         Up           2            3
5          1         Up           4            2
5          2         Up           1            1
5          3         Up           2            4
5          4         Up           3            4

The solution in both cases is to ensure that the cable is firmly connected at both ends. If the problem continues, replace the cable. It is possible that there could be a problem with the stack port itself. To confirm this possibility, install a known good cable to see if that cable also fails.

The port state is not UP until both ends of the cable are connected and the cable has been validated as a genuine HP cable.

To view the statistics on the physical port, execute the show tech command in member-context 4. The following examples show the types of information displayed.

Displaying show tech output

Port Number : 1                    State : Available
Last Event : Available             Start Req : 1
NE Present : 1                     HPID Good : 1
HPID Fails : 0                     FE Present : 1
Rem Dev Rdy : 1
ESSI Link : 1                      ESSI Good : 1
ESSI Fails : 0                     ESSI TX En : 1
ICL Good : 1                       ICL Enabled : 1
LP Local RDY: 1                    LP Rem RDY : 1
LP DONE : 1                        ICL FailCnt : 0 (10 second interval)
ICL FailCnt : 0 (10 minute interval)
NE Presence HW : 1
FE Presence HW : 1
Rem Dev Rdy HW : 1
Local Dev Rdy HW : 1
Asserted NE Presence HW : 1
Asserted FE Presence HW : 1
Asserted Rem Dev Rdy HW : 1
Phy Frame Errors : 0
Invalid Status Errors : 0
Invalid Packet Type Errors : 0
Incomplete Packet Errors : 0
Checksum Errors : 0
ESSI Flow Out This Port (HW) : 0x2

Displaying trace information for a port

Trace for Port 1
[ 0] [Info ] Start Request Received (Empty) [0]
[ 1] [Info ] Waiting for Stack Module Good (Empty) [0]
[ 2] [Info ] Stack Module Good Received (Empty) [0]
[ 3] [Info ] Cable Insertion Detected (Empty) [1]
[ 4] [Info ] Re-enable NE Present Int [487]
[ 5] [Info ] Starting Cable HPID Validation (Inserted) [488]
[ 6] [Info ] Skipping Cable HPID Validation (Inserted) [488]
[ 7] [Info ] Far End Insertion Detected (Valid) [988]
[ 8] [Info ] Polling for ESSI phy link up (Valid) [988]
[ 9] [Info ] ESSI Link Up [9988]
[10] [Info ] ESSI Link Good (Valid) [9988]
[11] [Info ] ESSI Linked at 9988 ms [9988]
[12] [Info ] Remote Device Ready Detected (Valid) [10898]
[13] [Info ] ICL Change Request Enable (Cable Ready) [10898]
[14] [Info ] Detected Remote Ready Drop. (Cable Ready) [12651]
[15] [Info ] ICL Good. Behind. Partner ready. (Cable Ready) [12988]
[16] [Info ] ICL GOOD received at 2091 ms [12988]
[17] [Info ] Partner LP ready. (Cable Ready) [13980]
[18] [Info ] Set Device Ready. (Cable Ready) [13987]
[19] [Info ] ESSI Link Verfied [13988]
[20] [Info ] ESSI Able to Transmit (Cable Ready) [13988]
[21] [Info ] ESSI Verified at 3091 ms [13988]
[22] [Info ] Cable Available (Available) [13988]

Troubleshooting when a switch crashes and reboots

Although the switch software is highly reliable, a switch in the stack can experience a software issue that causes a crash and reboot of that switch. This crash can happen in the software running on the CPU in the management CPU or on the software running on the CPUs in the interfaces. In either case, crash information is generated and the switch is rebooted.

Resiliency of the stack is determined by the stacking topology; however, in all cases, the interfaces/ports on the switch that crashes are brought down and a reboot of that switch occurs.

The following table describes how the stack reacts to the crashing switch, depending on what role the switch had when the crash occurred. The assumption in this table is that the topology is a resilient topology (that is, a ring).

Stacking role Description
Commander
  • The standby takes over as the new Commander.

  • A new standby is elected.

  • Crashing switch writes core file to local stable storage.

  • Crashing switch reboots and joins the stack.

  • Core file and crash information for this switch is available from the Commander.

Standby
  • A new standby is elected.

  • Crashing switch writes core file to local stable storage.

  • Crashing switch reboots and joins the stack.

  • Core file and crash information for this switch is available from the Commander.

Member
  • Crashing switch writes core file to local stable storage.

  • Crashing switch reboots and joins the stack.

  • Core file and crash information for this switch is available from the Commander.

After a switch crashes, you can collect data to help HP understand why the crash occurred. The information is a combination of crash data, crash log, and core-dump files. The show tech command displays logs of events that happened right before the crash.

Troubleshooting an unresponsive reboot

An unresponsive reboot occurs when a member does not respond to an update packet.

Reboot output

 
/* SSM_SWITCH_LOST_EVENT */
// 30 States -> Initial !Discovery !Bid for Cmdr !Become cmdr !
/* Switch Lost */{ssmIgnore ,ssmMbrRmvMbr ,ssmMbrRmvMbr ,ssmMbrRmvCmdr ,
// States -> Cmdr chas wait!Cmdr RFS start! Commander ! Cmdr Merge !
/* Switch Lost */ ssmMbrRmvCmdr ,ssmMbrRmvCmdr ,ssmMbrRmvCmdr ,ssmMbrRmvCmdr ,
// States -> Wait for chas !wait for type !stby RFS start!stby RFS sync !
/* Switch lost */ ssmMbrRmvMbr ,ssmMbrRmvMbr ,ssmMbrRmvMbr ,ssmMbrRmvMbr ,
// States -> Standby !Mbr RFS wait ! Member !Pass through !
/* Switch Lost */ ssmMbrRmvStby ,ssmMbrRmvMbr ,ssmMbrRmvMbr ,ssmIllegal },

Troubleshooting an unexpected Commander or Standby switch selection

When a switch stack is established and a boot/reboot of the stack occurs, the Commander and Standby are selected based on the configured switch priority. Other rules in the election process can override this priority.

Displaying the running configuration with priority

HP Switch(config)#: show running-config

; hpStack Configuration Editor; Created on release #:WB.15.11.0000x
; Ver #:01:00:01

hostname "HP Stack 2920"
stacking
member 1 type "J9726A" mac-address 1cc1de-4d87c0
member 2 type "J9726A" mac-address 1cc1de-4dc740
member 3 type "J9728A" mac-address 1cc1de-4dbd40
member 3 priority 200
member 4 type "J9729A" mac-address 1cc1de-4d79c0
member 4 priority 175

On a boot of the stack, member 3 becomes a Commander and member 4 becomes a Standby, based on priority. If this were a chain with member 1 at one end of the chain and member 4 at the other end, the number of hops between switches will be part of the election process.