Configurable Power Manager¶
Configurable Power Manager focuses on containerized applications that use power profiles individually by the core and/or the application.
StarlingX has the capability to regulate the frequency of the entire processor. However, this control is primarily directed towards the classification of the core, distinguishing between application and platform cores. Consequently, if a user requires to control over an individual core, such as Core 10 in a 24-core CPU, adjustments must be applied to all cores collectively. In the context of containerized operations, it becomes imperative to establish personalized configurations. This entails assigning each container the requisite power configuration. In essence, this involves providing specific and individualized power configurations to each core or group of cores.
With the introduction of Configurable Power Manager, it is possible to highlight the control of acceptable frequency ranges (minimum and maximum frequency) per core, the behavior of the core in this range (governor), which power levels (c-states) a given core can access, as well as the behavior of the system related to workloads with known intervals/demands.
To encapsulate the dependencies, images, profiles and configurations, Power Manager is delivered as a StarlingX Application.
System Requirements
Support and enable the BIOS functionality to delegate
c-states
andp-states
control to the operating systemintel_pstate and acpi_cpufreq drivers
intel_idle or acpi_idle module
3rd and 4th Generation Intel® Xeon® Scalable Processors
Dependencies
The installer will look for the NFD application on the cluster. In case the
NFD is not installed, as Intel recommends its use, the installation will fail
until you either install the NFD or create the user_override
parameter
nfd-required
with value False to allow the installation of Power Manager
without the NFD application.
You can see an example below on how to override nfd-required
parameter:
(keystone_admin)]$ system helm-override-update --set nfd-required=False kubernetes-power-manager kubernetes-power-manager intel-power
Power Manager Installation¶
The installation follows the standard procedure to install a StarlingX application.
Go to the path for application tgz file:
/usr/local/share/applications/helm/kubernetes-power-manager-<VERSION>.tgz
.(keystone_admin)]$ system application-upload /usr/local/share/applications/helm/kubernetes-power-manager-<VERSION>.tgz (keystone_admin)]$ system application-apply kubernetes-power-manager
The namespace, service accounts, RBAC rules and CRDs in Kubernetes are all provided by the Power Manager project.
Resource Type
Resource Names
Namespace
intel-power
Service Account
intel-power-node-agent
intel-power-operator
Role
operator-custom-resource-definitions-role
RoleBinding
operator-custom-resource-definitions-role-binding
Cluster Role
operator-nodes
manager-role
node-agent-cluster-resources
Cluster Role Binding
operator-nodes-binding
node-agent-cluster-resources-binding
Custom Resource Definition
cstates.power.intel.com
powerconfigs.power.intel.com
powernodes.power.intel.com
powerpods.power.intel.com
powerprofiles.power.intel.com
powerworkloads.power.intel.com
timeofdaycronjobs.power.intel.com
timeofdays.power.intel.com
uncores.power.intel.com
The manager container (Kubernetes Operator) of Kubernetes Power Manager (monitors and starts the Power Manager agent on selected nodes) will be deployed. There will only be one instance of the operator in the cluster and it will preferably run on one of the control plane nodes.
Publish the power configuration profile to the cluster (this resource is responsible for exposing the standard power profiles of Intel Power Optimization Library). The default power profiles are: performance, balance-performance, and balance-power.
The Power Manager will create the available configurations. If you want to customize your application, apply those modifications via
helm-override
. To see an example of a customization see User Defined Settings.
Label Assignment¶
A Kubernetes label will control which hosts the Power Manager agent should run. The operator (manager) listens for changes in hosts and when detecting the label it will start the agent container on that host.
The agent is responsible for monitoring and applying the power configurations described by Custom Resources (c-state, Power Profiles, Power Workloads, etc) or in the Pod specifications.
Important
In the kubelet configuration file, the cpuManagerPolicy
has to be set
to “static”, and the reservedSystemCPUs
must be set to the desired
value:
(keystone_admin)]$ system host-label-assign --overwrite <HOSTNAME> kube-cpu-mgr-policy=static
To create the label, manually enter the command below to inform the host where the agent must be deployed:
(keystone_admin)]$ system host-label-assign <HOSTNAME> power-management=enabled
Note
This command will only be accepted if the max_cpu_mhz_configured
parameter is disabled. Do not have both activated simultaneously.
Once the label is applied, the following tasks will be automatically performed:
Default CPU c-states
During the installation process, default c-state levels are configured. By default, platform cores can access the available levels up to C6, while application cores can access levels up to C1.
This configuration is performed automatically on each node and is based on the levels available in the processor. If the target levels do not exist, the application will choose to maintain only C0 on the application cores, and the lowest available level on the platform cores.
Default CPU Frequency (p-state)
CPU p-state management can be controlled either through power profiles applied to containers or through a shared profile that manages CPU cores individually or in groups.
By default, all CPU cores will use the full frequency range available and CPU governor in performance mode.
Two resources will be deployed on Kubernetes: Shared profile and Shared workload profile.
If you want to create a custom profile use the parameters in the yaml file below:
apiVersion: power.intel.com/v1
kind: PowerProfile
metadata:
name: profile-name
namespace: intel-power
spec:
name: profile-name
max: <HOST-MAX-CPU-FREQ> # Maximum core frequency supported
min: <HOST-MIN-CPU-FREQ> # Minimum core frequency supported
epp: performance
governor: performance
Shared Profile
This resource specifies the minimum and maximum core frequencies and CPU
governor for each host in the cluster. When the label is assigned to a host, it
will trigger the creation of this profile applying the minimum and maximum
frequencies supported and the CPU governor will always be performance
.
Note
In real-time systems the minimum and maximum frequency are the same in all cores (min = max). This is standard behavior for real time systems, and different configurations will affect the maximum frequency.
Shared Workload Profile
This resource binds the Shared Profile to CPU cores on the host. Once the label
is created on the host, the created profile will point to the Shared Profile
and select all CPU cores available except the platform cores that use the
reservedCPUs
parameter.
Note
The CPU p-state of the platform cores is managed by the use of the reservedProfile
parameter.
Node Agent Pod
The Pod Controller watches for pods. When a pod comes along, the Pod Controller checks if the pod is in the guaranteed quality of service class (using exclusive cores, taking a core out of the shared pool - it is the only option in Kubernetes that can do this operation. For more details see https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/). Then it examines the Pods to determine which Power Profile has been requested and then creates or updates the appropriate Power Workload.
Note
The request and the limits must have a matching number of cores on a
container-by-container basis. Currently, Power Manager only supports a
single Power Profile per pod. If two profiles are requested in different
containers, the pod gets created, but the cores are not tuned.
This will only work if the pods use isolcpus
.
Exclude kernel parameter¶
When you apply the power-management
label, the intel_idle.max_cstate
parameter is removed from the kernel arguments.
Note
This change will take effect after reboot, until then, it retains the
current behavior and the Power Manager will manage the CPU c-states using
the acpi_idle
driver which may not expose all c-states supported by the
processor. After reboot, ensure that all overrides are applied.
User Defined Settings¶
You can override the auto-generated settings using the user_override
functionality of the Power Manager application. It allows you to customize the
settings on a per-host basis for:
Shared Profile [section sharedProfile]:
- governor:
CPU governor
- max:
Maximum CPU frequency
- min:
Minimum CPU frequency
- reservedCPUs:
List of CPU cores to not apply the profile (platform cores)
- reservedProfile:
The profile to apply to platform cores
c-states Profile [section cstatesProfile]:
- sharedPoolCStates:
List of CPU c-states for all application cores and their status (on/off)
- individualCoreCStates:
List of all platform CPU cores:
List of CPU c-states for each application core and their status (on/off)
See the example below to configure host controller-0. This setting will override the CPU governor and maximum CPU frequency in Shared Profile and disable C6 state for the platform cores (0,96) and enable C6 state for all application cores through the c-state Profile.
sharedProfile:
controller-0:
governor: powersave
max: 2000
cstatesProfile:
controller-0:
individualCoreCStates:
"0":
C6: false
"96":
C6: false
sharedPoolCStates:
C6: true
Applying these user_overrides
will generate a new configuration
(combined_overrides
) by merging and overriding the auto-generated
configuration with the user’s definitions. Also, you can view both
configurations individually: the auto-generated configuration by Power Manager
in the system_overrides
section and the user configuration in
user_overrides
section as below.
(keystone_admin)]$ system helm-override-show kubernetes-power-manager kubernetes-power-manager intel-power
+--------------------+---------------------------------------------------------------------+
| Property | Value |
+--------------------+---------------------------------------------------------------------+
| attributes | enabled: true |
| | |
| combined_overrides | cstatesProfile: |
| | controller-0: |
| | individualCoreCStates: |
| | "0": |
| | C1: true |
| | C1E: true |
| | C6: false |
| | C6: false |
| | POLL: true |
| | "96": |
| | C1: true |
| | C1E: true |
| | C6: false |
| | POLL: true |
| | sharedPoolCStates: |
| | C1: true |
| | C1E: false |
| | C6: true |
| | POLL: true |
| | sharedProfile |
| | controller-0: |
| | governor: powersave |
| | max: 2000 |
| | min: 800 |
| | reservedCPUs: '[0, 96]' |
| | reservedProfile: performance |
| | shared: true |
| | |
| | |
| name | kubernetes-power-manager |
| namespace | intel-power |
| system_overrides | cstatesProfile: |
| | controller-0: |
| | individualCoreCStates: |
| | '0': {C1: true, C1E: true, C6: true, POLL: true} |
| | '96': {C1: true, C1E: true, C6: true, POLL: true} |
| | sharedPoolCStates: {C1: true, C1E: false, C6: false, POLL: true}|
| | sharedProfile: |
| | controller-0: |
| | governor: performance |
| | max: 3000 |
| | min: 800 |
| | reservedCPUs: [0, 96] |
| | reservedProfile: performance, shared: true} |
| user_overrides | cstatesProfile: |
| | controller-0: |
| | individualCoreCStates: |
| | "0": |
| | C6: false |
| | "96": |
| | C6: false |
| | sharedPoolCStates: |
| | C6: true |
| | sharedProfile: |
| | controller-0: |
| | governor: powersave |
| | max: 2000 |
| | |
+--------------------+---------------------------------------------------------------------+
This final configuration will be published into Kubernetes as a Shared Profile and c-state Profile when you reapply the application.
(keystone_admin)]$ system application-apply kubernetes-power-manager
It is also possible (and optional) to add a c-state for a specific profile. To
do this, you need to add exclusivePoolCstates
tag. See the example below
including c-states for performance profile:
sharedProfile:
controller-0:
governor: powersave
max: 2000
cstatesProfile:
controller-0:
individualCoreCStates:
"0":
C6: false
"96":
C6: false
sharedPoolCStates:
C6: true
exclusivePoolCstates:
performance:
C6: true
There are other features available in the Power Manager, such as Uncore Frequency, and Time of Day that can be used, but their settings should be deployed directly to the cluster using the procedures described in Power Manager documentation in https://github.com/intel/kubernetes-power-manager.
Inconsistent Settings¶
It is important to consider that when using the application, you will have to configure frequency and power profiles with caution. However, such settings, if inconsistent, may result in an undesired power state of the pods, whether due to the partial application of settings (only c-states or only p-states) or the non-application of settings in general (pod deployed without power settings).
Power Manager Uninstall¶
To uninstall the application you must use the following commands to remove any StarlingX application.
(keystone_admin)]$ system application-remove kubernetes-power-manager
(keystone_admin)]$ system application-delete kubernetes-power-manager
The uninstall process will shut down the containers (manager and all agents)
and remove all configurations deployed to Kubernetes related to Power Manager,
including the namespace intel-power
. The NFD application will not be
unistalled even if it had been installed as dependency on Power Manager, this
will avoid the disruption of other applications that use it. The
power-management
label should be manually removed.
(keystone_admin)]$ system host-label-remove <HOSTNAME> power-management
Note
While the label is assigned to a host, the intel_idle.max_cstate
kernel
parameter will not be restored on that host and the
max_cpu_mhz_configured
parameter will remain disabled.