BGP Graceful-Restart and high availability

The BGP Graceful-Restart (GR) feature allows a BGP speaker to express its ability to preserve forwarding state during:
  • BGP protocol (or daemon) restart

  • Management Module (MM) switchover

BGP GR is enabled by default and announces GR capability in the BGP OPEN message to peers. BGP initiates the graceful-restart process when an MM switchover occurs and also acts as a GR-aware device.

A GR-aware device, also known as GR helper mode, is notified that the peer router is transitioning and takes appropriate actions based on configuration or default timers.

When a BGP restart happens on the peer router or when MM switchover occurs, the routes currently held in the forwarding table are marked as stable. Thus the forwarding state is preserved as the control plane and the forwarding plane operates independently.

On the restarting peer on which the switchover occurred, BGP on the newly active MM starts to establish sessions with all configured peers. BGP on the non-restarting side sees new connection requests coming in while BGP is in an established state. Such an event is an indication for the non-restarting peer that the peer has restarted. At this point, the restarting peer sends the GR capability with Restart bit set to 1 and Forwarding State bit set to 1.

The non-restarting peer:
  • Cleans up old BGP sessions and marks all the routes in the BGP table that are received from the restarting peer as stale

  • Purges all stale routes after the Restart Time expires (if the restarting peer never re-establishes the BGP session)

  • Sends an initial routing table update, followed by an End-of-RIB (EoR)

The restarting peer:
  • Delays best-path calculation until after receiving EoR from all peers

  • Generates updates for its peers and sends the EoR marker after the initial table is sent

The non-restarting peers:
  • Receive the routing updates from the restarting peer

  • Remove stale marking for any refreshed route

  • Purge any remaining stale routes after EoR is received from the restarting peer or the stale path timer expires