Affinity policy violated with parallel requests

Problem

Parallel server create requests for affinity or anti-affinity land on the same host and servers go to the ACTIVE state even though the affinity or anti-affinity policy was violated.

Solution

There are two ways to avoid anti-/affinity policy violations among multiple server create requests.

Create multiple servers as a single request

Use the multi-create API with the min_count parameter set or the multi-create CLI with the --min option set to the desired number of servers.

This works because when the batch of requests is visible to nova-scheduler at the same time as a group, it will be able to choose compute hosts that satisfy the anti-/affinity constraint and will send them to the same hosts or different hosts accordingly.

Adjust Nova configuration settings

When requests are made separately and the scheduler cannot consider the batch of requests at the same time as a group, anti-/affinity races are handled by what is called the “late affinity check” in nova-compute. Once a server lands on a compute host, if the request involves a server group, nova-compute contacts the API database (via nova-conductor) to retrieve the server group and then it checks whether the affinity policy has been violated. If the policy has been violated, nova-compute initiates a reschedule of the server create request. Note that this means the deployment must have scheduler.max_attempts set greater than 1 (default is 3) to handle races.

An ideal configuration for multiple cells will minimize upcalls from the cells to the API database. This is how devstack, for example, is configured in the CI gate. The cell conductors do not set api_database.connection and nova-compute sets workarounds.disable_group_policy_check_upcall to True.

However, if a deployment needs to handle racing affinity requests, it needs to configure cell conductors to have access to the API database, for example:

[api_database]
connection = mysql+pymysql://root:a@127.0.0.1/nova_api?charset=utf8

The deployment also needs to configure nova-compute services not to disable the group policy check upcall by either not setting (use the default) workarounds.disable_group_policy_check_upcall or setting it to False, for example:

[workarounds]
disable_group_policy_check_upcall = False

With these settings, anti-/affinity policy should not be violated even when parallel server create requests are racing.

Future work is needed to add anti-/affinity support to the placement service in order to eliminate the need for the late affinity check in nova-compute.