The deletion policy is provided to help users control the election of victim nodes when a cluster is about to be shrank. In other words, when the size of a cluster is to be decreased, which node(s) should be removed first.
Currently, this policy is applicable to clusters of all profile types and it is enforced when the cluster’s size is about to be reduced.
Below is a typical spec for a deletion policy:
type: senlin.policy.deletion
version: 1.1
properties:
criteria: OLDEST_FIRST
destroy_after_deletion: false
grace_period: 30
reduce_desired_capacity: true
The valid values for the “criteria
property include:
OLDEST_FIRST
: always select node(s) which were created earlier than
other nodes.YOUNGEST_FIRST
: always select node(s) which were created recently
instead of those created earlier.OLDEST_PROFILE_FIRST
: compare the profile used by each individual nodes
and select the node(s) whose profile(s) were created earlier than others.RANDOM
: randomly select node(s) from the cluster for deletion. This is
the default criteria if omitted.Note
There is an implicit rule (criteria) when electing victim nodes. Senlin engine always rank those nodes which are not in ACTIVE state before others.
There are more several actions that can trigger a deletion policy. Some of
them may already carry a list of candidates to remove, e.g.
CLUSTER_DEL_NODES
or NODE_DELETE
; others may only carry a number of
nodes to remove, e.g. CLUSTER_SCALE_IN
or CLUSTER_RESIZE
. For actions
that already have a list of candidates, the deletion policy will respect the
action inputs. The election of victims only happens when no such candidates
have been identified.
There are cases where you don’t want the node(s) removed from a cluster to be destroyed. Instead, you prefer them to become “orphan” nodes so that in future you can quickly add them back to the cluster without having to create new nodes.
If this is your situation, you may want to set destroy_after_deletion
to
false
. Senlin engine won’t delete the node(s) after removing them from the
cluster.
The default behavior is to delete (destroy) the node(s) after they are deprived of their cluster membership.
Another common scenario is to grant a node a period of time for it to shutdown gracefully. Even if a node doesn’t have a builtin logic to perform a graceful shutdown, granting them some extra time may still help ensure the resources they were using have been properly released.
The default value for grace_period
property is 0, which means the node
deletion happens as soon as it is removed from the cluster. You can customize
this value according to your need. Note that the grace period will be granted
to all node(s) deleted. When setting this value to a large number, be sure
it will not exceed the typical timeout value for action execution. Or else the
node deletion will be a failure.
In most cases, users would anticipate the “desired_capacity” of a cluster be reduced when there are nodes removed from it. Since the victim selection algorithm always pick nodes in non-ACTIVE status over ACTIVE ones, you can actually remove erroneous nodes by taking advantage of this rule.
For example, there are 4 nodes in a cluster and 2 of them are known to be in inactive status. You can use the command openstack cluster members del to remove the bad nodes. If you have a deletion policy attached to the cluster, you get a chance to tell the Senlin engine that you don’t want to change the capacity of the cluster. Instead, you only want the bad nodes removed. With the help of other cluster health related commands, you can quickly recover the cluster to a healthy status. You don’t have to change the desired capacity of the cluster to a smaller value and then change it back.
If this is your use case, you can set reduce_desired_capacity
to false
in the policy spec. The cluster’s desired capacity won’t be changed after
cluster membership is modified.
If there is a need to receive notification of a node deletion, you can specify a lifecycle hook in the deletion policy:
type: senlin.policy.deletion
version: 1.1
properties:
hooks:
type: 'zaqar'
timeout: 120
params:
queue: 'my_queue'
The valid values for the type
are:
zaqar
: send message to zaqar queue. The name of the zaqar must be
specified in queue
property.webhook
: send message to webhook URL. The URL of the webhook must be
specified in url
property.timeout
property specifies the number of seconds to wait before the
actual node deletion happens. This timeout can be preempted by calling
complete lifecycle hook API.
Note
Hooks of type webhook
will be supported in a future version. Currently
only hooks of type zaqar
are supported.
With the help of Region Placement Policy, you will be able to distribute a cluster’s nodes into different regions as instructed. However, when you are removing nodes from more than one regions, the same distribution rule has to be respected as well.
When there is a region placement policy in effect, the deletion policy will first determine the number of nodes to be removed from each region. Then in each region, the policy performs a victim election based on the criteria you specified in the policy spec.
Similarly, when there is a zone placement policy attached to the cluster in question, nodes in the cluster may get distributed across a few availability zones based on a preset algorithm.
The deletion policy, when triggered, will first determine the number for nodes to be removed from each availability zone. Then it proceeds to elect victim nodes based on the criteria specified in the policy spec within each availability zone.
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.