Available Plugins

Available Plugins

In this section we present all the plugins that are shipped along with Watcher. If you want to know which plugins your Watcher services have access to, you can use the Guru Meditation Reports to display them.

Goals

airflow_optimization

AirflowOptimization

This goal is used to optimize the airflow within a cloud infrastructure.

dummy

Dummy

Reserved goal that is used for testing purposes.

noisy_neighbor

NoisyNeighborOptimization

This goal is used to identify and migrate a Noisy Neighbor - a low priority VM that negatively affects peformance of a high priority VM in terms of IPC by over utilizing Last Level Cache.

server_consolidation

ServerConsolidation

This goal is for efficient usage of compute server resources in order to reduce the total number of servers.

thermal_optimization

ThermalOptimization

This goal is used to balance the temperature across different servers.

unclassified

Unclassified

This goal is used to ease the development process of a strategy. Containing no actual indicator specification, this goal can be used whenever a strategy has yet to be formally associated with an existing goal. If the goal achieve has been identified but there is no available implementation, this Goal can also be used as a transitional stage.

workload_balancing

WorkloadBalancing

This goal is used to evenly distribute workloads across different servers.

Scoring Engines

dummy_scorer

Sample Scoring Engine implementing simplified workload classification.

Typically a scoring engine would be implemented using machine learning techniques. For example, for workload classification problem the solution could consist of the following steps:

  1. Define a problem to solve: we want to detect the workload on the machine based on the collected metrics like power consumption, temperature, CPU load, memory usage, disk usage, network usage, etc.
  2. The workloads could be predefined, e.g. IDLE, CPU-INTENSIVE, MEMORY-INTENSIVE, IO-BOUND, … Or we could let the ML algorithm to find the workloads based on the learning data provided. The decision here leads to learning algorithm used (supervised vs. non-supervised learning).
  3. Collect metrics from sample servers (learning data).
  4. Define the analytical model, pick ML framework and algorithm.
  5. Apply learning data to the data model. Once taught, the data model becomes a scoring engine and can start doing predictions or classifications.
  6. Wrap up the scoring engine with the class like this one, so it has a standard interface and can be used inside Watcher.

This class is a greatly very simplified version of the above model. The goal is to provide an example how such class could be implemented and used in Watcher, without adding additional dependencies like machine learning frameworks (which can be quite heavy) or over-complicating it’s internal implementation, which can distract from looking at the overall picture.

That said, this class implements a workload classification “manually” (in plain python code) and is not intended to be used in production.

Scoring Engine Containers

dummy_scoring_container

Sample Scoring Engine container returning a list of scoring engines.

Please note that it can be used in dynamic scenarios and the returned list might return instances based on some external configuration (e.g. in database). In order for these scoring engines to become discoverable in Watcher API and Watcher CLI, a database re-sync is required. It can be executed using watcher-sync tool for example.

Strategies

basic

Basic offline consolidation using live migration

dummy

Dummy strategy used for integration testing via Tempest

Description

This strategy does not provide any useful optimization. Its only purpose is to be used by Tempest tests.

Requirements

<None>

Limitations

Do not use in production.

Spec URL

<None>

dummy_with_resize

Dummy strategy used for integration testing via Tempest

Description

This strategy does not provide any useful optimization. Its only purpose is to be used by Tempest tests.

Requirements

<None>

Limitations

Do not use in production.

Spec URL

<None>

dummy_with_scorer

A dummy strategy using dummy scoring engines.

This is a dummy strategy demonstrating how to work with scoring engines. One scoring engine is predicting the workload type of a machine based on the telemetry data, the other one is simply calculating the average value for given elements in a list. Results are then passed to the NOP action.

The strategy is presenting the whole workflow: - Get a reference to a scoring engine - Prepare input data (features) for score calculation - Perform score calculation - Use scorer’s metadata for results interpretation

noisy_neighbor

Warning

No documentation found in noisy_neighbor = watcher.decision_engine.strategy.strategies.noisy_neighbor:NoisyNeighbor

outlet_temperature

[PoC] Outlet temperature control using live migration

Description

It is a migration strategy based on the outlet temperature of compute hosts. It generates solutions to move a workload whenever a server’s outlet temperature is higher than the specified threshold.

Requirements

  • Hardware: All computer hosts should support IPMI and PTAS technology
  • Software: Ceilometer component ceilometer-agent-ipmi running in each compute host, and Ceilometer API can report such telemetry hardware.ipmi.node.outlet_temperature successfully.
  • You must have at least 2 physical compute hosts to run this strategy.

Limitations

  • This is a proof of concept that is not meant to be used in production
  • We cannot forecast how many servers should be migrated. This is the reason why we only plan a single virtual machine migration at a time. So it’s better to use this algorithm with CONTINUOUS audits.
  • It assume that live migrations are possible

Spec URL

https://github.com/openstack/watcher-specs/blob/master/specs/mitaka/approved/outlet-temperature-based-strategy.rst

uniform_airflow

[PoC]Uniform Airflow using live migration

Description

It is a migration strategy based on the airflow of physical servers. It generates solutions to move VM whenever a server’s airflow is higher than the specified threshold.

Requirements

  • Hardware: compute node with NodeManager 3.0 support
  • Software: Ceilometer component ceilometer-agent-compute running in each compute node, and Ceilometer API can report such telemetry “airflow, system power, inlet temperature” successfully.
  • You must have at least 2 physical compute nodes to run this strategy

Limitations

  • This is a proof of concept that is not meant to be used in production.
  • We cannot forecast how many servers should be migrated. This is the reason why we only plan a single virtual machine migration at a time. So it’s better to use this algorithm with CONTINUOUS audits.
  • It assumes that live migrations are possible.

vm_workload_consolidation

VM Workload Consolidation Strategy

workload_balance

[PoC]Workload balance using live migration

Description

It is a migration strategy based on the VM workload of physical servers. It generates solutions to move a workload whenever a server’s CPU utilization % is higher than the specified threshold. The VM to be moved should make the host close to average workload of all compute nodes.

Requirements

  • Hardware: compute node should use the same physical CPUs
  • Software: Ceilometer component ceilometer-agent-compute running in each compute node, and Ceilometer API can report such telemetry “cpu_util” successfully.
  • You must have at least 2 physical compute nodes to run this strategy

Limitations

  • This is a proof of concept that is not meant to be used in production
  • We cannot forecast how many servers should be migrated. This is the reason why we only plan a single virtual machine migration at a time. So it’s better to use this algorithm with CONTINUOUS audits.
  • It assume that live migrations are possible

workload_stabilization

Workload Stabilization control using live migration

Actions

change_node_power_state

Compute node power on/off

By using this action, you will be able to on/off the power of a compute node.

The action schema is:

schema = Schema({
 'resource_id': str,
 'state': str,
})

The resource_id references a ironic node id (list of available ironic node is returned by this command: ironic node-list). The state value should either be on or off.

change_nova_service_state

Disables or enables the nova-compute service, deployed on a host

By using this action, you will be able to update the state of a nova-compute service. A disabled nova-compute service can not be selected by the nova scheduler for future deployment of server.

The action schema is:

schema = Schema({
 'resource_id': str,
 'state': str,
})

The resource_id references a nova-compute service name (list of available nova-compute services is returned by this command: nova service-list --binary nova-compute). The state value should either be ONLINE or OFFLINE.

migrate

Migrates a server to a destination nova-compute host

This action will allow you to migrate a server to another compute destination host. Migration type ‘live’ can only be used for migrating active VMs. Migration type ‘cold’ can be used for migrating non-active VMs as well active VMs, which will be shut down while migrating.

The action schema is:

schema = Schema({
 'resource_id': str,  # should be a UUID
 'migration_type': str,  # choices -> "live", "cold"
 'destination_node': str,
 'source_node': str,
})

The resource_id is the UUID of the server to migrate. The source_node and destination_node parameters are respectively the source and the destination compute hostname (list of available compute hosts is returned by this command: nova service-list --binary nova-compute).

nop

logs a message

The action schema is:

schema = Schema({
 'message': str,
})

The message is the actual message that will be logged.

resize

Resizes a server with specified flavor.

This action will allow you to resize a server to another flavor.

The action schema is:

schema = Schema({
 'resource_id': str,  # should be a UUID
 'flavor': str,  # should be either ID or Name of Flavor
})

The resource_id is the UUID of the server to resize. The flavor is the ID or Name of Flavor (Nova accepts either ID or Name of Flavor to resize() function).

sleep

Makes the executor of the action plan wait for a given duration

The action schema is:

schema = Schema({
 'duration': float,
})

The duration is expressed in seconds.

Workflow Engines

taskflow

Taskflow as a workflow engine for Watcher

Full documentation on taskflow at http://docs.openstack.org/developer/taskflow/

Planners

weight

Weight planner implementation

This implementation builds actions with parents in accordance with weights. Set of actions having a higher weight will be scheduled before the other ones. There are two config options to configure: action_weights and parallelization.

Limitations

  • This planner requires to have action_weights and parallelization configs tuned well.

workload_stabilization

Workload Stabilization planner implementation

This implementation comes with basic rules with a set of action types that are weighted. An action having a lower weight will be scheduled before the other ones. The set of action types can be specified by ‘weights’ in the watcher.conf. You need to associate a different weight to all available actions into the configuration file, otherwise you will get an error when the new action will be referenced in the solution produced by a strategy.

Limitations

  • This is a proof of concept that is not meant to be used in production

Cluster Data Model Collectors

compute

Nova cluster data model collector

The Nova cluster data model collector creates an in-memory representation of the resources exposed by the compute service.

Creative Commons Attribution 3.0 License

Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.