Resource Providers - Compute Node Inventory

https://blueprints.launchpad.net/nova/+spec/compute-node-inventory-newton

As we move towards a system for generic tracking of all quantitative resources in the system using the resource providers modeling system, we need to transition the object model and database schema for a compute node to store inventory information in the resource provider inventories table instead of the compute_nodes table. This spec outlines the part of this transition process that deals with capacity of resources on a compute node – the inventory records.

Problem description

Long-term, we would like to be able to add new types of resources (see the resource-classes blueprint) to the system and do so without requiring invasive database schema changes. In order to move to this more generic modeling of quantitative resources and capacity records (see resource-providers blueprint) we must transition the storage of inventory information from where that information currently resides to the new inventories table in the resource providers modeling system.

Use Cases

As a deployer, I wish to add new classes of resources to my system and do so without any downtime caused by database schema migrations.

Proposed change

The two major components of this spec are the alignment of the underlying database schema and the changes needed to the nova.objects.ComputeNode object model to read and write inventory/capacity information from the inventories table instead of the compute_nodes table.

Alignment of database schema

To align the underlying database storage for inventory records, we propose to move the resource usage and capacity fields from their current locations in the database to the new inventories table added in the resource-providers blueprint.

Currently, the Nova database stores inventory records for the following resource classes:

  • vCPUs:

  • compute_nodes.vcpus: Total physical CPU cores on the compute node

  • compute_nodes.vcpus_used: Number of vCPUs allocated to virtual machines running on that compute node

  • compute_nodes.cpu_allocation_ratio: Overcommit ratio for vCPU on the compute node

  • RAM:

  • compute_nodes.memory_mb: Total amount of physical RAM in MB on the compute node

  • compute_nodes.memory_mb_used: Amount of RAM allocated to virtual machines running on that compute node

  • compute_nodes.ram_allocation_ratio: Overcommit ratio for memory on the compute node

  • compute_nodes.free_ram_mb: A calculated field that can go away since its value can be determined by looking at used versus capacity values

  • Disk:

  • compute_nodes.local_gb: Amount of disk storage available to the compute node for storage virtual machine ephemeral disks. While this is denoted “local” disk storage, currently if the local storage for ephemeral disks is shared storage, the compute node has no idea that this storage is shared among other compute nodes. See the generic-resource-pools and resource-providers blueprints for the solution to this problem

  • compute_nodes.local_gb_used: Amount of disk storage allocated for ephemeral disks of virtual machines running on the compute node. The same problem with shared storage for ephemeral disks applies to this field as well

  • compute_nodes.free_disk_gb: A calculated field that can go away since its value can be determined by looking at used versus capacity values

  • disk_available_least: A field that stores the sum of actual used disk amounts on the local compute node. This information can be stored in the new max_unit field of the inventories table for the DISK_GB resource class

  • PCI devices:

  • pci_stats: Stores summary information about device “pools” (per product_id and vendor_id combination). This information is made redundant by the pci-generate-stats blueprint, which generates a summary view of pool information for PCI devices from the main record table, pci_devices table

  • pci_devices table stores all the individual PCI device records, including the status of the device and which instance (if any) the device has been assigned to.

  • NUMA topologies:

  • compute_nodes.numa_topology: Serialized nova.objects.numa.NUMATopology object that represents both the compute node’s NUMA topology and the assigned NUMA topologies for instances on the compute node.

To recap from the resource-providers blueprint, the schema of the inventories table in the database looks like this:

CREATE TABLE inventories (
    id INT UNSIGNED NOT NULL AUTOINCREMENT PRIMARY KEY,
    resource_provider_id INT UNSIGNED NOT NULL,
    resource_class_id INT UNSIGNED NOT NULL,
    total INT UNSIGNED NOT NULL,
    min_unit INT UNSIGNED NOT NULL,
    max_unit INT UNSIGNED NOT NULL,
    step_size INT UNSIGNED NOT NULL,
    allocation_ratio FLOAT NOT NULL,
    INDEX (resource_provider_id),
    INDEX (resource_class_id)
);

We propose to consolidate all of the inventory/capacity fields from the above locations into the new inventories table in the following manner:

Remember that all compute nodes are resource providers, but not all resource providers are compute nodes. There is no globally-unique identifier for a compute node within the OpenStack deployment, and we need a globally-unique identifier for the resource provider.

1) (COMPLETED IN MITAKA) We must first add a new uuid field to the compute_nodes table:

ALTER TABLE compute_nodes ADD COLUMN uuid VARCHAR(36) NULL;

Note

The uuid field must be NULL at first, since we will not be generating values in a schema migration script. See below for where we generate UUIDs for each compute node on-demand as each compute node without a UUID specified is read from the database.

Because we do not want to do any data migrations in SQL migration scripts, we need to do the following data migrations in the nova.objects.ComputeNode object. We propose having a method called _migrate_inventory() that handles the data migration steps that is called on _from_db_object() when certain conditions are found to be in place (for instance, the compute node doesn’t have a UUID field value). The _migrate_inventory() method should use a single database transaction to ensure all DB writes are done atomically and it should first check to ensure that all API and conductor nodes have been upgraded to a version that can support the migration.

2) (COMPLETED IN MITAKA) Compute nodes that have no uuid field set should have a new random UUID generated on-demand.

3) A record must be added to the resource_providers table for each compute node:

INSERT INTO resource_providers (uuid)
SELECT uuid FROM compute_nodes;

4) We need to create the inventory records for each compute node. For each of the resource classes that the compute node provides, we need to store the capacity, min and max unit values, and allocation ratios.

4a) For the vCPU resource class, we would do the following steps for each compute node. Grab the resource class identifier for CPU from the resource_classes table (see resource-classes blueprint).

Insert into the inventories table a record for the CPU resource class with the total, min, max, and allocation ratio. For example:

INSERT INTO inventories (
    resource_provider_id,
    resource_class_id,
    total,
    min_unit,
    max_unit,
    allocation_ratio
)
SELECT
    rp.id,
    $CPU_RESOURCE_CLASS_ID,
    cn.vcpus,
    1,
    cn.vcpus,
    cn.cpu_allocation_ratio
FROM compute_nodes AS cn
    JOIN resource_providers rp
       ON cn.uuid = rp.uuid
WHERE cn.id = $COMPUTE_NODE_ID;

4b) Do the same for the RAM and DISK resource classes. For the DISK resource class, do not perform the INSERT if the compute node uses shared storage for the ephemeral disks.

4c) For the PCI device resource classes (PCI_GENERIC, PCI_SRIOV_PF and PCI_SRIOV_VF), the inventories table records represent the class of resources as a whole, not, for example, individual VFs on an SR-IOV-enabled NIC PF. As such, a single record representing the total amount of each PCI resource class would be added to the inventories table for each compute node that has PCI devices.

For example, let us assume that a compute node has one SR-IOV-enabled NIC, supporting 255 virtual functions (VFs) and not exposing the physical function (PF) for use by a cloud user. We want to limit the number of VFs that any single instance can consume to 8.

We would insert the following into the inventories table:

INSERT INTO inventories (
    resource_provider_id,
    resource_class_id,
    total,
    min_unit,
    max_unit,
    allocation_ratio
)
SELECT
    rp.id,
    $PCI_SRIOV_VF_RESOURCE_CLASS_ID,
    255,
    1,
    8,
    1.0
FROM compute_nodes AS cn
    JOIN resource_providers rp
       ON cn.uuid = rp.uuid
WHERE cn.id = $COMPUTE_NODE_ID;

4d) For the NUMA resource classes (NUMA_SOCKETS, NUMA_CORES, NUMA_THREADS and NUMA_MEMORY), create an inventory record for each compute node that exposes NUMA topology resources.

For example, let us assume we have a compute node that exposes 2 NUMA nodes (cells), each with 4 cores and 8 threads. We would set the min_unit and max_unit values of the inventory records to the single-NUMA-cell constraints and the total value to the combined number of the resource. So, for instance, for the NUMA_CORES, we’d set total to 8 (2 sockets having 4 cores each), min_unit to 1, and max_unit to 4 (since each cell has 4 cores).

Note

In the following release from when this code merges, we will do a followup patch that makes the UUID column non-nullable and adds a unique constraint on the compute_nodes.uuid column.

Changes to ComputeNode object model

In order to ease the transition from the old-style mechanism for determining inventory/capacity information, we propose modifying the nova.objects.ComputeNode object in following ways:

1) Make the existing vcpus, memory_mb, local_gb, cpu_allocation_ratio, and ram_allocation_ratio, disk_allocation_ratio fields be read using a single query against the inventories table and populate the values of the object fields so that the user is none the wiser that the storage mechanism has changed behind the scenes. A single SQL query may be used to grab the above fields:

SELECT
    i.resource_class_id,
    i.total,
    i.min_unit,
    i.max_unit,
    i.allocation_ratio
FROM inventories i
  JOIN resource_providers rp
  ON i.resource_provider_id = rp.id
WHERE rp.uuid = $COMPUTE_NODE_UUID;

2) The only piece of code that writes changes to the vcpus, memory_mb, local_gb, cpu_allocation_ratio, and ram_allocation_ratio fields of the ComputeNode is in the resource tracker, which sets the field values and calls save() on the ComputeNode object. We can modify the save() method to write any changes to inventory/capacity information to the new inventories table instead of the compute_nodes table.

Note

The object should be changed to only save capacity information to the inventory table, but only if all conductor and API nodes have been upgraded to a version that supports the new inventory schema.

Alternatives

This is step 3 in an irreversible process that completely changes the way that quantitative things are tracked and claimed in Nova.

Data model impact

No other database schema changes will be required by this blueprint. The work in this blueprint only populates the inventories table that is created in the resource-providers blueprint.

REST API impact

None.

Security impact

None.

Notifications impact

None.

Other end user impact

None.

Performance Impact

None.

Other deployer impact

There will be a database schema migration needed that adds the uuid column to the compute_nodes table.

Developer impact

None.

Implementation

Assignee(s)

Primary assignee:

jaypipes

Other contributors:

cdent dansmith

Work Items

The following distinct tasks are involved in this spec’s implementation:

  • Create the database schema migration that adds the uuid column to the compute_nodes table

  • Modify nova.objects.ComputeNode.create() to populate the uuid attribute of the compute node, insert a record into the resource_providers table and add any inventory/capacity fields to the inventories table.

  • Add a nova.objects.ComputeNode._migrate_inventory() method to migrate the inventory/capacity fields from compute_nodes to inventories and populate uuid column value if it is None, as it would be if an older nova-compute daemon sent a serialized ComputeNode object model to an updated conductor. The _migrate_inventory() method should also create a record in the resource_providers table for the compute node

  • Modify nova.objects.ComputeNode model to read inventory/capacity information from the inventories table instead of the compute_nodes table

  • Modify nova.objects.ComputeNode model to store changed inventory information (total amount, min and max unit constraints, and allocation ratio) to the inventories table instead of the compute_nodes table, and read the inventory information from the inventories table instead of the compute_nodes table

Dependencies

  • resource-classes blueprint implemented

  • resource-providers blueprint implemented

Testing

Full unit, functional, and integration testing of the ComputeNode._migrate_inventory() method that performs the data migration itself.

Documentation Impact

Developer reference documentation only. No user-facing impact is expected from this spec’s implementation.

References

History

Revisions

Release Name

Description

Newton

Introduced