Clusters¶
Clusters are first-class citizens in Senlin service design. A cluster is defined as a collection of homogeneous objects. The “homogeneous” here means that the objects managed (aka. Nodes) have to be instantiated from the same “profile type”.
A cluster can contain zero or more nodes. Senlin provides REST APIs for users to create, retrieve, update, delete clusters. Using these APIs, a user can manage the node membership of a cluster.
A cluster is owned by a user (the owner), and it is accessible from within the Keystone project (tenant) which is the default project of the user.
A cluster has the following timestamps when instantiated:
init_at
: the timestamp when a cluster object is initialized in the Senlin database, but the actual cluster creation has not yet started;created_at
: the timestamp when the cluster object is created, i.e. theCLUSTER_CREATE
action has completed;updated_at
: the timestamp when the cluster was last updated.
Cluster Statuses¶
A cluster can have one of the following statuses during its lifecycle:
INIT
: the cluster object has been initialized, but not created yet;ACTIVE
: the cluster is created and providing service;CREATING
: the cluster creation action is still on going;ERROR
: the cluster is still providing services, but there are things going wrong that needs human intervention;CRITICAL
: the cluster is not operational, it may or may not be providing services as expected. Senlin cannot recover it from its current status. The best way to deal with this cluster is to delete it and then re-create it if needed.DELETING
: the cluster deletion is ongoing;WARNING
: the cluster is operational, but there are some warnings detected during past operations. In this case, human involvement is suggested but not required.UPDATING
: the cluster is being updated.
Along with the status
property, Senlin provides a status_reason
property for users to check what is the cause of the cluster’s current status.
To avoid frequent databases accesses, a cluster object has a runtime data
property named rt
which is a Python dictionary. The property caches the
profile referenced by the cluster, the list of nodes in the cluster and the
policies attached to the cluster. The runtime data is not directly visible to
users. It is merely a convenience for cluster operations.
Creating A Cluster¶
When creating a cluster, the Senlin API will verify whether the request
carries a body with valid, sufficient information for the engine to complete
the creation job. The following fields are required in a map named cluster
in the request JSON body:
name
: the name of the cluster to be created;profile
: the name or ID or short-ID of a profile to be used;desired_capacity
: the desired number of nodes in the cluster, which is treated also as the initial number of nodes to be created.
The following optional fields can be provided in the cluster
map in the
JSON request body:
min_size
: the minimum number of nodes inside the cluster, default value is 0;max_size
: the maximum number of nodes inside the cluster, default value is -1, which means there is no upper limit on the number of nodes;timeout
: the maximum number of seconds to wait for the cluster to become ready, i.e.ACTIVE
.metadata
: a list of key-value pairs to be associated with the cluster.dependents
: A dict contains dependency information between nova server/ heat stack cluster and container cluster. The container node’s id will be stored in ‘dependents’ property of its host cluster.
The max_size
and the min_size
fields, when specified, will be checked
against each other by the Senlin API. The API also checks if the specified
desired_capacity
falls out of the range [min_size
, max_size
]. If
any verification failed, a HTTPBadRequest
exception is thrown and the
cluster creation request is rejected.
A cluster creation request is then forwarded to the Senlin RPC engine for processing, where the engine creates an Action for the request and queues it for any worker threads to execute. Once the action is queued, the RPC engine returns the current cluster properties in a map to the API. Along with these properties, the engine also returns the UUID of the Action that will do the real job of cluster creation. A user can check the status of the action to determine whether the cluster has been successfully completed or failed.
Listing Clusters¶
Clusters in the current project can be queried using some query parameters. None of these parameters is required. By default, the Senlin API will return all clusters that are not deleted.
When listing clusters, the following query parameters can be specified, individually or combined:
filters
: a map containing key-value pairs for matching. Records that fail to match the criteria will be filtered out. The valid keys in this map include:name
: name of clusters to list, can be a string or a list of strings;status
: status of clusters, can be a string or a list of strings;
limit
: a number that restricts the maximum number of records to be returned from the query. It is useful for displaying the records in pages where the page size can be specified as the limit.marker
: A string that represents the last seen UUID of clusters in previous queries. This query will only return results appearing after the specified UUID. This is useful for displaying records in pages.sort
: A string to enforce sorting of the results. It accepts a list of known property names of a cluster as sorting keys separated by commas. Each sorting key can optionally have either:asc
or:desc
appended to the key for controlling the sorting direction.global_project
: A boolean indicating whether cluster listing should be done in a tenant-safe way. When this value is specified as False (the default), only clusters from the current project that match the other criteria will be returned. When this value is specified as True, clusters that matching all other criteria would be returned, no matter in which project a cluster was created. Only a user with admin privilege is permitted to do a global listing.
Getting a Cluster¶
When a user wants to check the details about a specific cluster, he or she can specify one of the following keys for query:
cluster UUID: Clusters are queried strictly based on the UUID given. This is the most precise query supported.
cluster name: Senlin allows multiple clusters to have the same name. It is user’s responsibility to avoid name conflicts if needed. The output may be the details of a cluster if the cluster name is unique, or else Senlin will return a message telling users that multiple clusters found matching the specified name.
short ID: Considering that UUID is a long string not so convenient to input, Senlin supports a short version of UUIDs for query. Senlin engine will use the provided string as a prefix to attempt a matching in the database. When the “ID” is long enough to be unique, the details of the matching cluster is returned, or else Senlin will return an error message indicating that more than one cluster matching the short ID have been found.
Senlin engine service will try the above three ways in order to find a match in database.
In the returned result, Senlin injects a list of node IDs for nodes in the cluster. It also injects the name of the profile used by the cluster. These are all for user’s convenience.
Updating A Cluster¶
A cluster can be updated upon user’s requests. In theory, all properties of a cluster could be updated/changed. However, some update operations are light -weight ones, others are heavy weight ones. This is because the semantics of properties differ a lot from each other. Currently, cluster profile related changes and cluster size related changes are heavy weight because they may induce a chain of operations on the cluster. Updating other properties are light weight operations.
In the JSON body of a cluster_update
request, users can specify new values
for the following properties:
name
: new cluster name;profile_id
: ID or name or short ID of a profile object to use;metadata
: a list of key-value pairs to be associated with the cluster, this dict will be merged with the existing key-value pairs based on keys.desired_capacity
: new desired size for the cluster;min_size
: new lower bound for the cluster size;max_size
: new upper bound for the cluster size.timeout
: new timeout value for the specified cluster.profile_only
: a boolean value indicating whether cluster will be only updated with profile.
Update Cluster’s Profile¶
When profile_id
is specified, the request will be interpreted as a
wholistic update to all nodes across the cluster. The targeted use case is to
do a cluster wide system upgrade. For example, replacing glance images used by
the cluster nodes when new kernel patches have been applied or software
defects have been fixed.
When receiving such an update request, the Senlin engine will check if the new profile referenced does exist and whether the new profile has the same profile type as that of the existing profile. Exceptions will be thrown if any verification has failed and thus the request is rejected.
After the engine has validated the request, an Action of CLUSTER_UPDATE
is
created and queued internally for execution. Later on, when a worker thread
picks up the action for execution, it will first lock the whole cluster and
mark the cluster status as UPDATING
. It will then fork NODE_UPDATE
actions per node inside the cluster, which are in turn queued for execution.
Other worker threads will pick up the node level update action for execution
and mark the action as completed/failed. When all these node level updates are
completed, the CLUSTER_UPDATE
operation continues and marks the cluster as
ACTIVE
again.
Senlin also provides a parameter profile_only
for this action, so that any
newly created nodes will use the new profile, but existing nodes should not be
changed.
The cluster update operation may take a long time to complete, depending on the response time from the underlying profile operations. Note also, when there is a update policy is attached to the cluster and enabled, the update operation may be split into several batches so that 1) there is a minimum number of nodes remained in service at any time; 2) the pressure on the underlying service is controlled.
Update Cluster Size Properties¶
When either one of the desired_capacity
, min_size
and max_size
property is specified in the CLUSTER_UPDATE
request, it may lead to a
resize operation on the cluster.
The Senlin API will do a preliminary validation upon the new property values.
For example, if both min_size
and max_size
are specified, they have to
be integers and the value for max_size
is greater than the value for
min_size
, unless the value of max_size
is -1 which means the upper
bound of cluster size is unlimited.
When the request is then received by the Senlin engine, the engine first
retrieves the cluster properties from the database and do further
cross-verifications between the new property values and the current values.
For example, it is treated as an invalid request if a user has specified value
for min_size
but no value for max_size
, however the new min_size
is greater than the existing max_size
of the cluster. In this case, the
user has to provide a valid max_size
to override the existing value, or
he/she has to lower the min_size
value so that the request becomes
acceptable.
Once the cross-verification has passed, Senlin engine will calculate the new
desired_capacity
and adjust the size of the cluster if deemed necessary.
For example, when the cluster size is below the new min_size
, new nodes
will be created and added to the cluster; when the cluster size is above the
new max_size
, some nodes will be removed from the cluster. If the
desired_capacity
is set and the property value falls between the new range
of cluster size, Senlin tries resize the cluster to the desired_capacity
.
When the size of the cluster is adjusted, Senlin engine will check if there are relevant policies attached to the cluster so that the engine will add and/or remove nodes in a predictable way.
Update Other Cluster Properties¶
The update to other cluster properties is relatively straightforward. Senlin engine simply verifies the data types when necessary and override the existing property values in the database.
Note that in the cases where multiple properties are specified in a single
CLUSTER_UPDATE
request, some will take a longer time to complete than
others. Any mixes of update properties are acceptable to the Senlin API and
the engine.
Cluster Actions¶
A cluster object supports the following asynchronous actions:
add_nodes
: add a list of nodes into the target cluster;del_nodes
: remove the specified list of nodes from the cluster;replace_nodes
: replace the specified list of nodes in the cluster;resize
: adjust the size of the cluster;scale_in
: explicitly shrink the size of the cluster;scale_out
: explicitly enlarge the size of the cluster.policy_attach
: attach a policy object to the cluster;policy_detach
: detach a policy object from the cluster;policy_update
: modify the settings of a policy that is attached to the cluster.
The scale_in
and the scale_out
actions are subject to change in future.
We recommend using the unified CLUSTER_RESIZE
action for cluster size
adjustments.
Software or a user can trigger a cluster_action
API to issue an action
for Senlin to perform. In the JSON body of these requests, Senlin will verify
if the top-level key contains one of the above actions. When no valid action
name is found or more than one action is specified, the API will return error
messages to the caller and reject the request.
Adding Nodes to a Cluster¶
Senlin API provides the add_nodes
action for user to add some existing
nodes into the specified cluster. The parameter for this action is interpreted
as a list in which each item is the UUID, name or short ID of a node.
When receiving an add_nodes
action request, the Senlin API only validates
if the parameter is a list and if the list is empty. After this validation,
the request is forwarded to the Senlin engine for processing.
The Senlin engine will examine nodes in the list one by one and see if any of the following conditions is true. Senlin engine rejects the request if so.
Any node from the list is not in
ACTIVE
state?Any node from the list is still member of another cluster?
Any node from the list is not found in the database?
Number of nodes to add is zero?
When this phase of validation succeeds, the request is translated into a
CLUSTER_ADD_NODES
builtin action and queued for execution. The engine
returns to the user an action UUID for checking.
When the action is picked up by a worker thread for execution, Senlin checks
if the profile type of the nodes to be added matches that of the cluster.
Finally, a number of NODE_JOIN
action is forked and executed from the
CLUSTER_ADD_NODES
action. When NODE_JOIN
actions complete, the
CLUSTER_ADD_NODES
action returns with success.
In the cases where there are load-balancing policies attached to the cluster,
the CLUSTER_ADD_NODES
action will save the list of UUIDs of the new nodes
into the action’s data
field so that those policies could update the
associated resources.
Deleting Nodes from a Cluster¶
Senlin API provides the del_nodes
action for user to delete some existing
nodes from the specified cluster. The parameter for this action is interpreted
as a list in which each item is the UUID, name or short ID of a node.
When receiving a del_nodes
action request, the Senlin API only validates
if the parameter is a list and if the list is empty. After this validation,
the request is forwarded to the Senlin engine for processing.
The Senlin engine will examine nodes in the list one by one and see if any of the following conditions is true. Senlin engine rejects the request if so.
Any node from the list cannot be found from the database?
Any node from the list is not member of the specified cluster?
Number of nodes to delete is zero?
When this phase of validation succeeds, the request is translated into a
CLUSTER_DEL_NODES
builtin action and queued for execution. The engine
returns to the user an action UUID for checking.
When the action is picked up by a worker thread for execution, Senlin forks a
number of NODE_DELETE
actions and execute them asynchronously. When all
forked actions complete, the CLUSTER_DEL_NODES
returns with a success.
In the cases where there are load-balancing policies attached to the cluster,
the CLUSTER_DEL_NODES
action will save the list of UUIDs of the deleted
nodes into the action’s data
field so that those policies could update the
associated resources.
If a deletion policy with hooks property is attached to the cluster, the
CLUSTER_DEL_NODES
action will create the CLUSTER_DEL_NODES
actions
in WAITING_LIFECYCLE_COMPLETION
status which does not execute them. It
also sends the lifecycle hook message to the target specified in the
deletion policy. If the complete lifecylcle API is called for a
CLUSTER_DEL_NODES
action, it will be executed. If all the
CLUSTER_DEL_NODES
actions are not executed before the hook timeout
specified in the deletion policy is reached, the remaining
CLUSTER_DEL_NODES
actions are moved into READY
status and scheduled
for execution. When all actions complete, the CLUSTER_DEL_NODES
returns with a success.
Note also that by default Senlin won’t destroy the nodes that are deleted
from the cluster. It simply removes the nodes from the cluster so that they
become orphan nodes.
Senlin also provides a parameter destroy_after_deletion
for this action
so that a user can request the deleted node(s) to be destroyed right away,
instead of becoming orphan nodes.
Replacing Nodes in a Cluster¶
Senlin API provides the replace_nodes
action for user to replace some existing
nodes in the specified cluster. The parameter for this action is interpreted
as a dict in which each item is the node-pair{OLD_NODE:NEW_NODE}. The key OLD_NODE
is the UUID, name or short ID of a node to be replaced, and the value NEW_NODE is
the UUID, name or short ID of a node as replacement.
When receiving a replace_nodes
action request, the Senlin API only validates
if the parameter is a dict and if the dict is empty. After this validation,
the request is forwarded to the Senlin engine for processing.
The Senlin engine will examine nodes in the dict one by one and see if all of the following conditions is true. Senlin engine accepts the request if so.
All nodes from the list can be found from the database.
All replaced nodes from the list are the members of the specified cluster.
All replacement nodes from the list are not the members of any cluster.
The profile types of all replacement nodes match that of the specified cluster.
The statuses of all replacement nodes are ACTIVE.
When this phase of validation succeeds, the request is translated into a
CLUSTER_REPLACE_NODES
builtin action and queued for execution. The engine
returns to the user an action UUID for checking.
When the action is picked up by a worker thread for execution, Senlin forks a
number of NODE_LEAVE
and related NODE_JOIN
actions, and execute them
asynchronously. When all forked actions complete, the CLUSTER_REPLACE_NODES
returns with a success.
Resizing a Cluster¶
In addition to the cluster_update
request, Senlin provides a dedicated API
for adjusting the size of a cluster, i.e. cluster_resize
. This operation
is designed for the auto-scaling and manual-scaling use cases.
Below is a list of API parameters recognizable by the Senlin API when parsing
the JSON body of a cluster_resize
request:
adjustment_type
: type of adjustment to be performed where the value should be one of the followings:EXACT_CAPACITY
: the adjustment is about the targeted size of the cluster;CHANGE_IN_CAPACITY
: the adjustment is about the number of nodes to be added or removed from the cluster and this is the default setting;CHANGE_IN_PERCENTAGE
: the adjustment is about a relative percentage of the targeted cluster.
This field is mandatory.
number
: adjustment number whose value will be interpreted base on the value ofadjustment_type
. This field is mandatory.min_size
: the new lower bound for the cluster size;max_size
: the new upper bound for the cluster size;min_step
: the minimum number of nodes to be added or removed when theadjustment_type
is set toCHANGE_IN_PERCENTAGE
and the absolute value computed is less than 1;strict
: a boolean value indicating whether the service should do a best-effort resizing operation even if the request cannot be fully met.
For example, the following request is about increasing the size of the cluster by 20% and Senlin can try a best-effort if the calculated size is greater than the upper limit of the cluster size:
{
"adj_type": "CHANGE_IN_PERCENTAGE",
"number": "20",
"strict": False,
}
When Senlin API receives a cluster_resize
request, it first validates the
data type of the values and the sanity of the value collection. For example,
you cannot specify a min_size
greater than the current upper bound (i.e.
the max_size
property of the cluster) if you are not providing a new
max_size
that is greater than the min_size
.
After the request is forwarded to the Senlin engine, the engine will further
validates the parameter values against the targeted cluster. When all
validations pass, the request is converted into a CLUSTER_RESIZE
action
and queued for execution. The API returns the cluster properties and the UUID
of the action at this moment.
When executing the action, Senlin will analyze the request parameters and determine the operations to be performed to meet user’s requirement. The corresponding cluster properties are updated before the resize operation is started.
Scaling in/out a Cluster¶
As a convenience method, Senlin provides the scale_out
and the scale_in
action API for clusters. With these two APIs, a user can request a cluster to
be resized by the specified number of nodes.
The scale_out
and the scale_in
APIs both take a parameter named
count
which is a positive integer. The integer parameter is optional, and
it specifies the number of nodes to be added or removed if provided. When it
is omitted from the request JSON body, Senlin engine will check if the cluster
has any relevant policies attached that will decide the number of nodes to be
added or removed respectively. The Senlin engine will use the outputs from
these policies as the number of nodes to create (or delete) if such policies
exist. When the request does contain a count
parameter and there are
policies governing the scaling arguments, the count
parameter value may
be overridden/ignored.
When a scale_out
or a scale_in
request is received by the Senlin
engine, a CLUSTER_SCALE_OUT
or a CLUSTER_SCALE_IN
action is then
created and queued for execution after some validation of the parameter value.
A worker thread picks up the action and execute it. The worker will check if
there are outputs from policy checkings. For CLUSTER_SCALE_OUT
actions,
the worker checks if the policies checked has left a count
key in the
dictionary named creation
from the action’s runtime data
attribute.
The worker will use such a count
value for node creation. For a
CLUSTER_SCALE_OUT
action, the worker checks if the policies checked has
left a count
key in the dictionary named deletion
from the action’s
runtime data
attribute. The worker will use such a count
value for
node deletion.
Note that both scale_out
and scale_in
actions will adjust the
desired_capacity
property of the target cluster.
Cluster Policy Bindings¶
Senlin API provides the following action APIs for managing the binding relationship between a cluster and a policy:
policy_attach
: attach a policy to a cluster;policy_detach
: detach a policy from a cluster;policy_update
: update the properties of the binding between a cluster and a policy.
Attaching a Policy to a Cluster¶
Once a policy is attached (bound) to a cluster, it will be enforced when related actions are performed on that cluster, unless the policy is (temporarily) disabled on the cluster.
When attaching a policy to a cluster, the following properties can be specified:
enabled
: a boolean indicating whether the policy should be enabled on the cluster once attached. Default is True. When specified, it will override the default setting for the policy.
Upon receiving the policy_attach
request, the Senlin engine will perform
some validations then translate the request into a CLUSTER_ATTACH_POLICY
action and queue the action for execution. The action’s UUID is then returned
to Senlin API and finally the requestor.
When the engine executes the action, it will try find if the policy is already attached to the cluster. This checking was not done previously because the engine must ensure that the cluster has been locked before this checking, or else there might be race conditions.
The engine calls the policy’s attach
method when attaching the policy and
record the binding into database if the attach
method returns a positive
response.
Currently, Senlin does not allow two policies of the same type to be attached to the same cluster. This constraint may be relaxed in future, but for now, it is checked and enforced before a policy gets attached to a cluster.
Policies attached to a cluster are cached at the target cluster as part of its
runtime rt
data structure. This is an optimization regarding DB queries.
Detaching a Policy from a Cluster¶
Once a policy is attached to a cluster, it can be detached from the cluster at
user’s request. The only parameter required for the policy_detach
action
API is policy_id
, which can be the UUID, the name or the short ID of the
policy.
Upon receiving a policy_detach
request, the Senlin engine will perform
some validations then translate the request into a CLUSTER_DETACH_POLICY
action and queue the action for execution. The action’s UUID is then returned
to Senlin API and finally the requestor.
When the Senlin engine executes the CLUSTER_DETACH_POLICY
action, it will
try find if the policy is already attached to the cluster. This checking was
not done previously because the engine must ensure that the cluster has been
locked before this checking, or else there might be race conditions.
The engine calls the policy’s detach
method when detaching the policy from
the cluster and then removes the binding record from database if the
detach
method returns a True value.
Policies attached to a cluster are cached at the target cluster as part of its
runtime rt
data structure. This is an optimization regarding DB queries.
The CLUSTER_DETACH_POLICY
action will invalidate the cache when detaching
a policy from a cluster.
Updating a Policy on a Cluster¶
When a policy is attached to a cluster, there are some properties pertaining to the binding. These properties can be updated as long as the policy is still attached to the cluster. The properties that can be updated include:
enabled
: a boolean value indicating whether the policy should be enabled or disabled. There are cases where some policies have to be temporarily disabled when other manual operations going on.
Upon receiving the policy_update
request, Senlin API performs some basic
validations on the parameters passed.
Senlin engine translates the policy_update
request into an action
CLUSTER_UPDATE_POLICY
and queue it for execution. The UUID of the action
is then returned to Senlin API and eventually the requestor.
During execution of the CLUSTER_UPDATE_POLICY
action, Senlin engine
simply updates the binding record in the database and returns.