Starting up RHOCP nodes

Prerequisites
  • Ensure to start up the external storage such as NAS, SAN, or external devices that need to online prior to boot up.

  • Ensure the external applications such as DNS, Load Balancer, and DHCP are up and running and reachable from cluster.

NOTE:

If any of the components fail to start, see restore section in the link provided in Backup and Restore of RHOCP.

Procedure
  1. Start the switches.
  2. Power on all the infrastructure services like DHCP server, DNS, load balancer, and so on.
  3. Power on the registry VM (applicable only for disconnected mode of deployment).
    1. Log in to the registry server.
    2. Check the status of docker.
      systemctl status docker
    3. Restart the docker service if the status is inactive.
      systemctl start docker
    4. Check the status of the registry container.
      docker ps –a
    5. If the container is in exited state, start the container with below mentioned command:
      docker start containerID
    6. Validate that the docker registry is working.
      • docker login  -u <username> -p <password>  https://<registryvm_fqdn_name>:<port>/v2/_catalog

        Output:

        Login Succeeded

        OR

      • curl -u <username>:<password> -k https://<registryvm_fqdn_name>:<port>/v2/_catalog
  4. Power on the Nimble storage nodes.
    1. Power on the disk shelves and wait till all the disk lights are blinking.
    2. Power on the array head shelf.
  5. Power on all master nodes.
    1. Log in to a master node using iLO IP address.
    2. Open the iLO console.
    3. Click Momentary Press to power on the node and wait until the node comes up.
    4. Repeat the steps for the second and third master nodes.
  6. Start static pods on all master nodes.
        for master in ${masters[@]}
        do
          echo "==== $master ====" 
          ssh core@$master 'sudo mv -v $(ls /etc/kubernetes/manifests.stop/*) /etc/kubernetes/manifests/ && sudo rmdir /etc/kubernetes/manifests.stop'
        done
    
  7. Verify all master nodes are in Ready state in the oc get nodes command output.
  8. Power on all worker nodes.
    1. Log in to a worker node using ILO IP address.
    2. Open the iLO console.
    3. Click Momentary Press to power on the node and wait until the node comes up.
    4. Repeat the steps for the rest of the worker nodes in the cluster.
  9. Check the cluster status.
    1. Log in to cluster using kubeadmin using one of the following methods:
      • export KUBECONFIG=/var/nps/ISO/ign_config/auth/kubeconfig
      • login -u kubeadmin -p <token-key> --server=https://api.<cluster_domain>:6443
      • Access the cluster using the temporary authentication information or kubeadmin.
        export KUBECONFIG=$(pwd)/tmpadmin-kubeconfig
        oc get nodes
        pwd: Directory where the tmpadmin-kubeconfig file is created.
    2. Run the following command to check status of cluster operator:
      oc get co
      oc get nodes

      Wait for few seconds till all the worker nodes are in Ready state.

    3. Run the following command to verify all the default Pod and daemons are either in Running or Completed status:
      oc get pods –-all-namespaces
  10. Remove the temporary authentication information.
    NOTE:

    Remove temporary admin only in case you are able to access the cluster successfully with kubeadmin or with tmp-admin and the cluster is successfully up and running.

    oc adm policy remove-cluster-role-from-user cluster-admin -z tmp-admin -n default
    oc delete sa tmp-admin -n default