Conductor/node grouping affinity

https://storyboard.openstack.org/#!/story/2001795

This spec proposes adding a conductor_group property to nodes, which can be matched to one or more conductors configured with a matching conductor_group configuration option, to restrict control of those nodes to these conductors.

Problem description

Today, there is no way to control the conductor-to-node mapping. This is desirable for a number of reasons:

  • An operator may have an Ironic deployment that spans multiple sites. Without control of this mapping, images may be pulled over WAN links. This causes slower deployments and may be less secure.

  • Similarly, an operator may want to map nodes to conductors that are physically closer to the nodes in the same site, to reduce the number of network hops between the node and the conductor. A prime example of this would be to place a conductor in each rack, reducing the path to only go through the top-of-rack switch.

  • A deployer may have multiple networks for out-of-band control, that must be completely isolated. This feature would allow isolating a conductor to a single out-of-band network.

  • A deployer may have multiple physical networks that not all conductors are connected to. By configuring the mapping correctly, conductors can manage only the nodes which they can communicate with. This is described further in another RFE.[0]

Proposed change

We propose adding a conductor_group configuration option for the conductor, which is a single arbitrary string specifying some grouping characteristic of the conductor.

We also propose a conductor_group field in the node record, which will be used to map a node to a conductor. This matching will be done case-insensitive, to make things a bit easier for operators.

A blank conductor_group field or config is the default. A conductor without a group can only manage nodes without a group, and a node without a group can only be managed by a conductor without a group.

The hash ring will need to be modified to take grouping into account, as described below in the RPC API Impact section.

Alternatives

Another RFE[1] proposes a complex system of hard and soft affinity, affinity and anti-affinity, and scoring of placement to a conductor with multiple tags. This is quite complex, and I don’t believe we’ll get it done in the short term. Completing this more basic work doesn’t block this more complex work, and so we should take it one step at a time.

Data model impact

A conductor_group field will be added to the nodes table, as a VARCHAR(255). This will have a default of "", or the empty string. This string will be used in the hash ring calculation, so there’s no sense in defaulting to NULL.

A conductor_group field will also be added to the conductors table, also as a VARCHAR(255). This will also have a default of "", or the empty string. This will be used to build the hash ring to look up where nodes should land.

State Machine Impact

None.

REST API impact

The conductor_group field of the node will be added to the node object in the REST API, with a microversion as usual. It will be allowed in POST and PATCH requests. As with the database, it will be restricted to 255 characters. There must be a conductor in that group available, as the conductor services node creation and updates, and is selected via the hash ring.

It’s worth noting that we would like to expose the grouping of conductors via the REST API eventually. However, the best way to do this isn’t immediately clear, so we leave it outside the scope of this spec for now. Another RFE[3] proposes a service management API that may be a good fit.

Client (CLI) impact

“ironic” CLI

None, it’s deprecated.

“openstack baremetal” CLI

The conductor_group field for a node will be exposed in the client output, and added to the node create and node set commands.

RPC API impact

This will affect which conductor is the destination for RPC calls corresponding to a given node, however won’t have a direct effect on the RPC API itself.

The hash ring will change such that the internal keys for the hash ring will now be of the structure "$conductor_group:$drivername". A colon (:) is used as the separator between the two, to eliminate conflicts between conductor groups and drivers or hardware types. For example, an agent_ilo key with no separator could mean a node with no group and the agent_ilo driver, or it could mean a node with group agent_ using the ilo hardware type. To handle upgrades, hash ring keys will be built without the conductor group while the service is pinned to a version before this feature, and built with the conductor group when the service is unpinned or pinned to a version after this feature is implemented.

We handle upgrades by ignoring grouping for services which have a pin in the RPC version that is less than the release with this feature. Once everything is upgraded and unpinned, we begin using the grouping tags configured.

Operators should leave a sufficient number of conductors available without a grouping tag configured to run the cluster, until nodes can be configured with the grouping tag. Any nodes without a grouping tag will only be managed by conductors without a grouping tag.

Driver API impact

Hash ring generation and lookup will include the grouping tag, as specified above in the RPC API Impact section.

Nova driver impact

This change is transparent to Nova.

Ramdisk impact

None.

Security impact

No direct impact; however this provides another mechanism for securing a deployment by enabling logical infrastructure segregation.

Other end user impact

None.

Scalability impact

None.

Performance Impact

None.

Other deployer impact

Deployers that wish to use this feature will need to manage the process of labeling conductors and nodes to enable it, which may be a non-trivial task.

Developer impact

None.

Implementation

Assignee(s)

Primary assignee:

jroll

Other contributors:

dtantsur

Work Items

  • Add database fields.

  • Add conductor config and populate conductor DB field.

  • Change the hash ring calculation, and bump the RPC API so that we can pin during upgrades.

  • Add fields to the node and conductor objects.

  • Make the REST API changes.

  • Update the client library/CLI.

  • Document the feature.

Dependencies

None.

Testing

Unit tests should be sufficient, as that’s how we test our hash ring now. It’s difficult to test this with Tempest without exposing conductor grouping via the REST API.

Upgrades and Backwards Compatibility

This is described in the RPC API Impact section.

Documentation Impact

This should be documented in the install guide and admin guide.

References

[0] https://storyboard.openstack.org/#!/story/1734876

[1] https://storyboard.openstack.org/#!/story/1739426

[2] Notes from the Rocky PTG session:

https://etherpad.openstack.org/p/ironic-rocky-ptg-location-awareness

[3] https://storyboard.openstack.org/#!/story/1526759