All Sahara Cluster operations are performed in multiple steps. A Cluster object
has a Status
attribute which changes when Sahara finishes one step of
operations and starts another one. Also a Cluster object has a Status
description
attribute which changes whenever Cluster errors occur.
Before performing any operations with OpenStack environment, Sahara validates user input.
If any of the validations fails during creating, the Cluster object will still
be kept in the database with an Error
status. If any validations fails
during scaling the Active
Cluster, it will be kept with an Active
status. In both cases status description will contain error messages about the
reasons of failure.
This status means that the Provisioning plugin is performing some infrastructure updates.
It takes some time for OpenStack to schedule all the required VMs and Volumes,
so sahara will wait until all of the VMs are in an Active
state.
Sahara waits while VMs’ operating systems boot up and all internal infrastructure components like networks and volumes are attached and ready to use.
Sahara prepares a Cluster for starting. This step includes generating the
/etc/hosts
file or changing /etc/resolv.conf
file (if you use Designate
service), so that all instances can access each other by a hostname.
Also Sahara updates the authorized_keys
file on each VM, so that VMs can
communicate without passwords.
Sahara pushes service configurations to VMs. Both XML and JSON based configurations and environmental variables are set on this step.
Sahara is starting Hadoop services on Cluster’s VMs.
Active status means that a Cluster has started successfully and is ready to run EDP Jobs.
Sahara checks the scale/shrink request for validity. The Plugin method called for performing Plugin specific checks is different from the validation method in creation.
Sahara performs database operations updating all affected existing Node Groups and creating new ones to join the existing Node Groups.
Status is similar to Spawning
in Cluster creation. Sahara adds required
amount of VMs to the existing Node Groups and creates new Node Groups.
Status is similar to Configuring
in Cluster creation. New instances are
being configured in the same manner as already existing ones. The VMs in the
existing Cluster are also updated with a new /etc/hosts
file or
/etc/resolv.conf
file.
Sahara stops Hadoop services on VMs that will be deleted from a Cluster. Decommissioning a Data Node may take some time because Hadoop rearranges data replicas around the Cluster, so that no data will be lost after that Data Node is deleted.
The same Active
status as after Cluster creation.
The only step, that releases all Cluster’s resources and removes it from the database.
In extreme cases the regular “Deleting” step may hang. Sahara APIv2 introduces the ability to force delete a Cluster. This prevents deleting from hanging but comes with the risk of orphaned resources.
If the Cluster creation fails, the Cluster will enter the Error
state.
This status means the Cluster may not be able to perform any operations
normally. This cluster will stay in the database until it is manually deleted.
The reason for failure may be found in the sahara logs. Also, the status
description will contain information about the error.
If an error occurs during the Adding Instances
operation, Sahara will first
try to rollback this operation. If a rollback is impossible or fails itself,
then the Cluster will also go into an Error
state. If a rollback was
successful, Cluster will get into an Active
state and status description
will contain a short message about the reason of Adding Instances
failure.
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.