Installing image service on RHCOS nodes is not complete

Symptom

The installation status for the nps show --service baremetal command is failed or Not done during the installation of image service on RHCOS nodes.

  1. Do the following:

    1. Check image service pod(image-service-0) is in running state using command:

      kubectl get pods -n nps
    2. Check the /var/nps/logs/nps_error.log file if the image service is not running.

  2. Check the baremetal service status using the command:

    nps show --service baremetal
  3. Use solution one, if the image-service-0 pod is running and installation status is Not done.

  4. Use solution two, if installation status is failed.

Solution 1
Action
  1. Check whether FLR NIC ports of the corresponding iLO's are in connected state. If not check the physical connectivity between FLR NIC port and data switch port.
  2. If auto deploy is used for installation, after connecting the ports, run the nps autodeploy command again if the status of auto deploy command is FAILED.

    If the prompt to enter Y or N is seen, press Y to resume the auto deploy command else press N to terminate the command. Fix the issue and restart the nps autodeploy command.

  3. If manual procedure is used for installation, after connecting the ports, do the following:
    1. To install operating system through image service, trigger one time pxeboot for the failed servers using following command:
      nps baremetal -a temp-pxeboot -sl <iLO IP1, iLO IP2> -l debug

      Wait till the OS is installed.

    2. Verify the baremetal installation status again using the command:
      nps show --service baremetal
Solution 2
Action
  1. Run the following command:
    nps show --data servers --node <iLO IP of failed server>
  2. Use the host_ipaddress under custom_data from the output of the step 1.

    This is the OCP IP address which will be assigned to the host.

  3. Ping the host_ipaddress.
  4. If no response received from the ping, login to the iLO console and check if the operating system is installed.
  5. If the operating system is not installed, make sure the physical drives are cleaned properly. See HPE DL server cleanup procedure.
  6. If the installation is done through auto deploy, after cleaning up physical drives, run the nps autodeploy command again.
  7. If the installation is done through manual procedure, after cleaning up physical drives:
    1. To install operating system through image service, trigger onetime pxeboot for the failed servers using the following command:
      nps baremetal -a temp-pxeboot -sl <iLO IP1, iLO IP2> -l debug

      Wait till the OS is installed.

    2. Verify the baremetal installation status again using the command:
      nps show --service baremetal