Load Balancer Member Respawning¶
As a cloud operator, whenever a load balancer member node fails, I want the load balancer to stop directing traffic to the failed member and for a new member to be spawned.
Fault class¶
Hardware failure
Software error
Network failure
OpenStack projects used¶
Openstack Aodh (telemetry alarm service)
Openstack Heat (orchestration)
Openstack Octavia (load balancer as a service)
Remediation class¶
Reactive
Fault detection¶
From the Octavia admin guide:
Octavia will use the health information from the underlying load balancing application to determine the health of members. This information will be streamed to the Octavia database and made available via the status tree or other API methods.
In addition, an Aodh alarm is defined to detect load balancer member
node failure and trigger the alarm action to notify Heat. This
loadbalancer_member_health
type alarm rule was added to Aodh in
April 2019, and at the time
of writing a patch is under review to add a Heat resource for
creating this alarm type automatically via Heat templates. It is intended to update
this document later with sample Heat templates.
Inputs, decision-making, and remediation¶
Octavia’s builtin behavior automatically stops directing traffic to the unresponsive member node.
Heat receives the Aodh alarm regarding the unresponsive member node, and according to the behavior defined in the stack template, spawns a new instance to replace the unresponsive member node.
Octavia detects when the new member node is operational and begins directing some traffic to the new node.