Failure of worker node due to MTU mismatch

Symptom

Scale in of last RHCOS worker node fails and autodeployment exits from execution.

Cause

Mismatch in MTU size of the RHEL worker node and the data switches cause the added RHEL worker node to enter into tainted/non schedulable state.

Action
  1. Login to the data switch and verify the MTU size for the bond interface for the FLR NIC of the RHEL worker node by running the below command:
    net show interface
  2. Login to the worker nodes and run the following command to check MTU on the bond0 interface:
    ip address
  3. If there is a mismatch in the MTU size, manually modify the MTU on the switch for the bond interface using following command:
    net add bond <bond_interface> mtu 9216
    net pending
    net commit
    
    NOTE:

    Ensure that the MTU size on the ports of data switch connected to the RHEL worker node and the MTU size of the ports of the worker nodes must be set to 9216.