Allocation API¶
https://storyboard.openstack.org/#!/story/2004341
This spec proposed creating of API for allocation of nodes for deployment.
Problem description¶
The users of standalone ironic do not have an out-of-box means to find a suitable node to deploy onto. The metalsmith project was created to add this gap short-term, but it is not suitable for consumer code that is not written in Python. A potential consumer is a K8S provider for standalone ironic.
The API user story is as follows:
Given a resource class and, optionally, a list of required traits, return me an available bare metal node and set
instance_uuidon it to make it as reserved.
Proposed change¶
Overview¶
This RFE proposed a new ReST API endpoint /v1/allocations that will
initially allow creating and deleting Allocation resources.
Two implementations of the allocation process are planned:
Via the database, similar to how metalsmith now operates.
Via the Placement service, similar to how nova currently operates.
This spec concentrates on the API design and the first (standalone) case.
Allocation process¶
An allocation happens as follows:
An API client sends a
POST /v1/allocationsrequest, specifying a resource class, and optionally traits and node UUID.The allocation request is routed to a random available conductor.
The conductor creates an Allocation object in the database with
state=allocatingandconductor_affinity=<host name>.A thread is spawned for the remaining allocation process, and the allocation object is returned to the caller.
Allocation: database backend¶
The following actions are done by the conductor handling the allocation when database is used as backend (the only option in this spec):
Fetch list of nodes from the database with:
provision_state=availablemaintenance=Falsepower_state!=NoneNote
This is required for compatibility with really old API versions that allow creating nodes directly in the
availablestate.instance_uuid=Noneresource_class=<requested resource class>uuidin the list of candidate nodes (if provided)requested traits are a superset of node traits
If the list is empty, set allocation’s
statetoerrorandlast_errorto the explanation.Shuffle the list, so that several processes do not try reserving nodes in the same order.
Acquire a lock on the first node. If locking succeeds, check that the assumptions are still valid about this node, and reserve it by setting its
instance_uuidto theuuidof the allocation. In the same database transaction:set allocation’s
node_idto the node’s ID,set allocation’s
statetoactive,set node’s
allocation_idto the allocation’s ID,add matched traits to node’s
instance_info.
Note
Since the conductor may not have the hardware type for the selected node, we will update TaskManager to avoid constructing the
driverobject.If something fails on the previous step, proceed to the next node. If no more nodes are left, set the allocation’s
statetoerrorandlast_errorto the explanation.
Deallocation: database backend¶
The deallocation process will in one transaction:
unset node’s
instance_uuid,unset node’s
allocation_id,delete the allocation.
The deallocation is triggered either explicitly via API or when a node is torn
down (at the same time when node’s instance_uuid and instance_info are
purged).
Note
In the future we might consider supporting sticky allocations which survive node’s tear down. This is outside the scope of this spec.
There is one important difference between using just instance_uuid and
using the allocation API: instance_uuid can be set and unset for active
nodes, while for allocations it will be forbidden. The reason is that with the
future Placement backend removing an allocation would mark the node as
available in Placement.
HA and take over¶
When a conductor restarts, it fetches allocations with
conductor_affinity=<host name>state=allocating
and starts the allocation procedure for each of them.
If the conductor handling an allocation stops without a replacement, the reservation becomes orphaned. All conductors periodically fetch list of allocations belonging to dead conductors and each tries to resume them.
First, it tries to update the
conductor_affinityby doing something like:UPDATE allocations SET conductor_affinity=<new host name> WHERE id=<allocation ID> AND conductor_affinity=<dead host name>
If the query updated 1 row, we know that the new conductor now manages the allocation. Otherwise we know that another conductor took it over.
To avoid rare races with this take over procedure, the normal update will also look like:
UPDATE allocations SET <new values> WHERE id=<allocation ID> AND conductor_affinity=<current host name>
Alternatives¶
Make each consumer invent their own allocation procedure or use metalsmith.
Write a new service for managing reservations (probably based on metalsmith code base).
Make the API blocking and avoid having states for allocations. Such an approach would result in easier API and implementation, but it can be problematic when using an external system, such as Placement, as a backend, since the requests to it make block the RPC thread.
Additionally, the asynchronous design will make it easier to introduce a bulk allocation API in the future, if we decide so.
Data model impact¶
Introduce a new database/RPC object Allocation with the following fields:
idinternal integer ID, not exposed to users.uuidunique UUID of the allocation.nameunique name of the allocation, follows the same format as node’s names.Note
This field is useful, for example, for systems using host names, like metalsmith.
node_idreserved node ID (can benull) - foreign key to thenodestable.updated_at/created_atstandard update/creation date time fields.resource_classmandatory requested resource class.candidate_nodeslist of node UUIDs to choose from (can benull).Note
This allows a caller to pre-filter nodes by arbitrary criteria.
stateallocation state, see State Machine Impact.last_errorlast error message.conductor_affinityinternal field, specifying which conductor currently handles this allocation.
Introduce a helper table allocation_traits mapping an allocation to its
requested traits (very similar to node_traits).
Update the nodes table with a new foreign key allocation_id. It will be
set to a ID of a corresponding allocation. Unlike instance_uuid, it will
only be set when an allocation is created. If allocation_id is not empty,
instance_uuid will hold the UUID of the corresponding allocation (the
opposite is not necessary true).
State Machine Impact¶
No impact on the node state machine.
This RFE introduces a simple state machine for Allocation objects, consisting of three states:
allocatingallocation is in progress (initial state).activeallocation active.errorallocation failed.
In the initial version only the following paths are possible:
from
allocatingtoactiveon success.from
allocatingtoerroron failure.
In the future we may allow moving from error back to allocating to
retry the allocation.
REST API impact¶
POST /v1/allocationscreate an allocation.The API accepts a JSON object. The following field is mandatory:
resource_classrequested node’s resource class.
The following fields are accepted:
uuidto create an allocation with the specific UUID.candidate_nodesto limit the query to one of these nodes.Note
This value is converted from names to UUIDs internally.
traitslist of requested traits.nameallocation name.
The normal response is HTTP CREATED with the response body being the created allocation representation. An allocation is created in the
allocatingstate.Error codes:
400 Bad Request if
any node from
candidate_nodescannot be found (by name or UUID).the
resource_classvalue is invalid.traitsis provided and is not a list of valid trait strings.nameis not an accepted name.
406 Conflict if
the provided
uuidis not unique or matchesinstance_uuidof any node.the provided
nameis not unique.
GET /v1/allocationslist allocations.Parameters:
fieldslist of fields to retrieve for each allocation.statefilter allocations by the state.resource_classfilter allocations by resource class.nodefilter allocations by node UUID or name.
Error codes:
400 Bad Request if
stateis invalid.resource_classis invalid.nodedoes not exist.any of the requested fields is invalid.
GET /v1/allocations/<uuid or name>retrieve an allocation.Parameters:
fieldslist of fields to retrieve.
Error codes:
400 Bad Request if any of the requested fields is invalid.
404 Not Found if the allocation is not found.
DELETE /v1/allocations/<uuid or name>remove the allocation and release the node.No request or response body. Response code is HTTP 204 No Content.
Error codes:
404 Not Found if the allocation is not found.
409 Conflict if the corresponding node is
activeor is in a state where updates are not allowed.
Note
This request will succeed only for real allocations. It will not be possible to unset
instance_uuidcreated without an allocation (i.e. by directPATCHto a node) using this API.GET /v1/nodes/<node UUID or name>/allocationget allocation associated with the node.Parameters:
fieldslist of fields to retrieve.
The response body is the Allocation object representation.
Error codes:
404 Not Found if
the node cannot be found.
there is no allocation for the node.
400 Bad Request if
node has
instance_uuidthat does not correspond to any allocation.any of the requested fields is invalid.
Update
GET /v1/nodes,GET /v1/nodes/detailandGET /v1/nodes/<node UUID or name:Expose the new
allocation_uuidfield (converted from the node’sallocation_id).Update
PATCH /v1/nodes/<node UUID or name>:If
instance_uuidis unset and the current value corresponds to an allocation:if node is
activeor in a state that disallows updates, andmaintenancemode is off, return HTTP 409 Conflict,otherwise delete the allocation.
If
instance_uuidis set, do NOT create an allocation, keep the previous behavior.Note
This is needed to avoid affecting the nova virt driver. This decision may be revisited in future API versions.
The
allocation_uuidfield is read-only, an attempt to change it directly will result in HTTP 400 (Bad Request).Update
DELETE /v1/nodes/<node UUID or name>:If a node is deleted with allocation (possible only in maintenance mode), the allocation is deleted as well.
Client (CLI) impact¶
“ironic” CLI¶
None.
“openstack baremetal” CLI¶
The matching commands will be created:
openstack baremetal allocation create --resource-class <class> \
[--trait <trait>] [--trait <trait>] [--uuid <uuid>] [--name <name>]
openstack baremetal allocation list [--state <state>] [--fields <fields>]
[--resource-class <class>] [--node <UUID or name>]
openstack baremetal allocation get <uuid or name> [--fields <fields>]
openstack baremetal allocation delete <uuid or name>
The allocation_uuid field will be exposed.
RPC API impact¶
Two new RPC calls are introduced:
Creating an allocation
def create_allocation(self, context, allocation): """Create an allocation. Creates the provided allocation in the database, then starts a thread to process it. :param context: context :param allocation: allocation object. """
Deleting an allocation
def destroy_allocation(self, context, allocation): """Destroy an allocation. Removes the allocation from the database and releases the node. :param context: context :param allocation: allocation object. """
metalsmith impact¶
The metalsmith project implements a superset of the proposed feature on a
client side. After this API is introduced, metalsmith will switch the
reserve_node call to using it in the following way:
If the request contains a
resource_classand, optionally,traitsand candidate nodes, the new API will be used.If the request contains anything not supported by the new API, metalsmith will continue client-side node filtering, and will create an allocation with a list of suitable nodes.
Driver API impact¶
None
Nova driver impact¶
None
In the future we may use the allocation API in the nova driver, but there are no plans for it now. Currently going through the allocation API will result in an attempt of double allocation in Placement if Placement is used as an allocation backend.
Ramdisk impact¶
None
Security impact¶
None
Other end user impact¶
None
Scalability impact¶
None
Performance Impact¶
A new periodic task will run on each conductor to periodically check for stalled reservations belonging to dead conductors. The default period will be 60 seconds. It will be possible to disable it, in which case the allocations may get stuck forever if their assigned conductor dies.
Other deployer impact¶
None
Developer impact¶
None
Implementation¶
Assignee(s)¶
- Primary assignee:
dtantsur
Work Items¶
Add new tables and the Allocation RPC object.
Add RPC for allocating/deallocating.
Add API for allocations creation and deletion, and API reference.
Update conductor starting procedure to check for unfinished allocations.
Add a periodic task to check for orphaned unfinished allocations.
Dependencies¶
None
Testing¶
Unit tests and Tempest API will be provided.
The standalone integration tests will be updated to use the new API.
We can add support for the new API to bifrost (e.g. via metalsmith), and test it in a bifrost CI job.
Upgrades and Backwards Compatibility¶
This change is fully backward compatible. Code using instance_uuid for
allocations is not affected.
Documentation Impact¶
API reference will be provided.