Outlet Temperature Based Strategy¶
https://blueprints.launchpad.net/watcher/+spec/outlet-temperature-based-strategy
Outlet(Exhaust Air) temperature is a new thermal telemetry which can be used to measure the server’s thermal/workload status.
This spec proposes a new Watcher migration strategy based on the outlet temperature of servers. This strategy makes decisions to migrate workloads to the servers with good thermal condition (lowest outlet temperature) when the outlet temperature of source servers reach a configurable threshold.
Note: “server” in this document means “hypervisor”.
Problem description¶
In current Data Center infrastructure, the cooling air supply to servers can be different. When a server is overloaded or the supply air is too hot, the outlet temperature telemetry can be used to detect the problem. In order to have the server in a reliable thermal condition, some of the server’s workloads should be migrated to other server with safer thermal conditions.
Use Cases¶
As an administrator, I want to be able to trigger an audit that controls the temperature and perform workload load balancing.
In order to :
Reduce the total power consumption spent on cooling.
Increase the lifespan of the data center because cooling effectiveness is a first order factor.
Project Priority¶
Not relevant because Watcher is not in the big tent so far.
Proposed change¶
Watcher already has its decision framework, so this strategy should be a new class which extend the base strategy class.
Set the threshold in 2 steps : hard coded first, then through the template.
Create a new Python class to extend the “BaseStrategy” class.
Use the Ceilometer client to get Outlet temperature metrics of hypervisors.
Use the Nova objects framework to get free CPU/Memory/Disk of hypervisors.
An algorithm to detect if the threshold of Outlet temperature has been reached and to choose the migration target server. It will filter the viable targets according to the free resource information of hypervisors from previous step.
Alternatives¶
No alternative
Data model impact¶
None
REST API impact¶
None
Security impact¶
None
Notifications impact¶
None
Other end user impact¶
None
Performance Impact¶
There used to be some performance issues regarding the query of metrics from the Ceilometer database. This is one of the reason why it was rarely used in production environment. These issues may now be solved thanks to an abstraction layer which enables anybody to change the underlying metrics storage backend easily. There is also a performance issue when you query the Nova DB to get cpu usage metrics.
Other deployer impact¶
None
Developer impact¶
None
Implementation¶
Assignee(s)¶
- Primary assignee:
<junjie-huang>
Work Items¶
function to use Ceilometer client to get outlet temperature of hypervisors.
function to filter servers by Nova basic metrics(free CPU/Memory/Disk)
Rewrite execute function to add the algorithm to detect if the threshold of outlet T has been reached and choose the target hypervisor, generate action plan.
Dependencies¶
Testing¶
Unit tests and functional test, will use a fake metrics set for running functional test.
Documentation Impact¶
A documentation explaining how to use this new optimization strategy.
References¶
http://www.intel.com/content/www/us/en/servers/ipmi/ipmi-home.html
History¶
None