Overcloud installation fails

Symptom
Overcloud installation fails with the following error messages:
  • Stack overcloud UPDATE_FAILED
    2018-08-Heat Stack update failed.
    Heat Stack update failed.
    18 07:09:20Z [overcloud-ObjectStorageServiceChain-vqzfry76iufs-ServiceChain-6yzaquuzslkd.20]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:20Z [overcloud-ComputeOvsDpdkSriovServiceChain-bna227axlowz-ServiceChain-ry6h52mffgup.11]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:20Z [overcloud-ComputeServiceChain-63rgq3rwdncp-ServiceChain-taben5ugmets.16]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:22Z [overcloud-ObjectStorageServiceChain-vqzfry76iufs-ServiceChain-6yzaquuzslkd.12]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:23Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: 
    DBConnectionError: resources.ControllerServiceChain: (pymysql.err.OperationalError) 
    (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
    2018-08-18 07:09:25Z [overcloud-ComputeServiceChain-63rgq3rwdncp-ServiceChain-taben5ugmets.34]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:28Z [overcloud-ComputeServiceChain-63rgq3rwdncp-ServiceChain-taben5ugmets.15]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:28Z [overcloud-ComputeServiceChain-63rgq3rwdncp-ServiceChain-taben5ugmets.35]: 
    UPDATE_COMPLETE  state changed
    2018-08-18 07:09:29Z [overcloud-CephStorageServiceChain-2twbtmsnygnc-ServiceChain-vgcpxjsxcm5u.11]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:30Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: 
    DBConnectionError: resources.ObjectStorageServiceChain: (pymysql.err.OperationalError) 
    (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
    2018-08-18 07:09:31Z [overcloud-ComputeOvsDpdkSriovServiceChain-bna227axlowz-ServiceChain-ry6h52mffgup.35]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:32Z [overcloud-CephstorageServiceChain-xc6yzdnpnnss-ServiceChain-smibz2nb4jcz.25]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:33Z [overcloud-ComputeServiceChain-63rgq3rwdncp-ServiceChain-taben5ugmets.23]: 
    UPDATE_IN_PROGRESS  state changed
    2018-08-18 07:09:33Z [overcloud-CephstorageServiceChain-xc6yzdnpnnss-ServiceChain-smibz2nb4jcz.36]: 
    UPDATE_IN_PROGRESS  state changed
    
    Stack overcloud UPDATE_FAILED 
    
    overcloud.ControllerServiceChain:
      resource_type: OS::TripleO::Services
      physical_resource_id: d1080322-1ece-4cf8-9882-dff12382aca0
      status: UPDATE_FAILED
      status_reason: |
        DBConnectionError: resources.ControllerServiceChain: (pymysql.err.OperationalError) 
    (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
    overcloud.ObjectStorageServiceChain:
      resource_type: OS::TripleO::Services
      physical_resource_id: 2b68738e-4d59-4cd1-bcf3-390c6af51f6c
      status: UPDATE_FAILED
      status_reason: |
        DBConnectionError: resources.ObjectStorageServiceChain: (pymysql.err.OperationalError) 
    (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
    
  • Stack overcloud CREATE_FAILED

  • "CREATE failed: Error: resources.NetworkDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1"

IMPORTANT:

The following resolution applies only to the Overcloud installation failure. Do not use this resolution for Overcloud scaling (Ceph storage/compute expansion/HCI) failure.

Solution 1
Action

When the Overcloud installation fails with Stack overcloud CREATE_FAILED error messages /home/stack/overcloud_install.log, delete the Overcloud.

  1. Delete the Overcloud using the following command:
    nps deploy -s vim_overcloud -a overcloud-delete
  2. Ensure that the status of the delete command displays OVERCLOUD_DELETED, using the following command:
    root@npsvm rhosp]# nps show --data vim --node overcloud
  3. Confirm the deletion of the Overcloud, log in to the Undercloud VM, and run the following commands:
    source stackrc
    openstack stack list

    Verify that the Overcloud stack is not displayed.

  4. To check the status of all the nodes, run the following command and ensure that all the nodes are in Available state.
    openstack baremetal node list
  5. Install the Overcloud again using the following command:
    nps deploy -s vim_overcloud -a overcloud-install
Solution 2
Action
  1. Log in to the undercloud VM and navigate to the path /home/stack and check the log file "overcloud_install.log".

If the message "CREATE_FAILED Resource CREATE failed: Error: resources.NetworkDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1" is seen for any of the overcloud controller nodes, proceed with the following steps:

  1. Log in to the iLO console of the overcloud controller/s as "root" user.
  2. Check the detailed logs using the following command:

    journalctl -u os-collect-config

  3. Traverse through the logs carefully and check if for any interface the message is displayed as "not identified as an active nic".
  4. If this message is found, exit the journalctl and check the names of the interfaces in the server using the command – "ip a".
  5. Get the interface names for the management (FLR) NICs and data (PCI) NICs.
  6. Log in to the undercloud VM and navigate to the "templates" directory in the Undercloud VM:

    cd /home/stack/templates

  7. Navigate to the path "/templates/nic-configs".
  8. Replace the management (FLR) NIC and data (PCI) NIC port names with the actual port names by editing the "controller.yaml" file:
    1. Management (FLR) NIC port names can be found as members under the type "Linux bond" and name "br-mgmt".

    2. Data (PCI) NIC port names can be found as a nested-member under the type "ovs_bridge" and name "br-data".

  9. Save the file after the changes are done.
  10. Delete the failed stack using the following command:
    source stackrc
    heat stack-delete overcloud
  11. Log in to the NPS Toolkit VM and execute the following individual command to trigger the overcloud installation:

    nps deploy -s vim_overcloud -a overcloud-install

    NOTE:

    Since the templates are manually edited, do not run the autodeploy command as it will recreate the templates again.