2024.1 Series Release Notes

29.2.0-10

Bug Fixes

  • Fixes a regression for live migration on shared storage that was removing the backing disk and instance folder during the cleanup of a virtual machine post live migration. bug 2080436 for details.

29.0.2

Bug Fixes

  • Nova now ensures that an instance cannot move between availability zones when the host of the instance is added or removed to an aggregate that is part of another availability zone. Moving from or to the default availability zone is also rejected.

    This resolves bug 1907775 where after such move the instance become stuck in between availability zones.

29.0.1

Prelude

The OpenStack 2024.1 (Nova 29.0.0) release includes many new features and bug fixes. Please be sure to read the upgrade section which describes the required actions to upgrade your cloud from 28.0.0 (2023.2) to 29.0.0 (2024.1). As a reminder, OpenStack 2024.1 is a Skip-Level-Upgrade Release (starting from now, we name it a SLURP release) meaning that you can do rolling-upgrade from 2023.1 and skip 2023.2.

There are a few major changes worth mentioning. This is not an exhaustive list:

  • The latest Compute API microversion supported for 2024.1 is v2.96.

  • The Ironic driver [ironic]/peer_list configuration option has been deprecated. The Ironic driver now more closely models other Nova drivers where compute nodes do not move between compute service instances. If high availability of a single compute service is required, operators should use active/passive failover between 2 compute service agents configured to share the same compute service host value``[DEFAULT]/host``. Ironic nova-compute services can now be configured to target a specific shard of ironic nodes by setting the [ironic]/shard configuration option and a new nova-manage db ironic_compute_node_move command can help the operators when upgrading their computes to specify which shard they should manage.

  • Instances using vGPUs can now be live-migrated if both of the compute nodes support libvirt-8.6.0 and QEMU-8.1.0, as the source mediated device will migrate the GPU memory to another target mediated device automatically. In order to do this, [libvirt/live_migration_downtime config option needs to be modified according to the aforementioned documentation.

  • As of the new 2.96 microversion, the server show and server list APIs now return a new parameter called pinned_availability_zone that indicates whether the instance is confined to a specific AZ. This field supplements the existing availability_zone field which reports the availability zone of the host where the service resides. The two values may be different if the service is shelved or is not pinned to an AZ which can help operators plan maintenance and better understand the workload constraints.

  • Instances using virtio-net will see an increase in performance between 10% and 20% if their image uses a new hw_virtio_packed_ring=true property or their flavor contains hw:virtio_packed_ring=true extra spec, provided libvirt version is >= 6.3 and QEMU >= 4.2.

  • As a security mechanism, a new [consoleauth]/enforce_session_timeout configuration option provides the ability to automatically close a server console session when the token expires. This is disabled by default to preserve the existing behaviour for upgrades.

  • The libvirt driver now supports requesting a configurable memory address space for the instances. This allows instances with large RAM requirements to be created by specifying either hw:maxphysaddr_mode=emulate and hw:maxphysaddr_bits flavor extra specs or hw_maxphysaddr_mode and hw_maxphysaddr_bits``image properties. The ``ImagePropertiesFilter and ComputeCapabilitiesFilter filters are required to support this functionality.

  • The Hyper-V virt driver has been removed. It was deprecated in the Nova 27.2.0 (Antelope) release. This driver was untested and has no maintainers. In addition, it had a dependency on the OpenStack Winstacker project that also has been retired.

  • A couple of other improvements target reducing the number of bugs we have: one automatically detecting the maximum number of instances with memory encryption which can run concurrently, another one allows specifying a specific IP address or hostname for incoming move operations (by setting [libvirt]/migration_inbound_addr) and yet another one that improves stability of block device management using libvirt device aliases.

New Features

  • Added new flavor extra_specs and image properties to control the physical address bits of vCPUs in Libvirt guests. This option is used to boot a guest with large RAM.

  • Instances using vGPUs can now be correctly live-migrated by the libvirt driver between compute nodes supporting the same mediated device types used by the instance. In order to be able to do this, the compute hosts need to support at least the minimum versions of libvirt-8.6.0, QEMU-8.1.0 and Linux kernel 5.18.0. If operators use multiple vGPU types per compute, they need to make sure they already use custom traits or custom resource classes for the GPUs resource providers and that the instance was created with a flavor using either a custom resource class or asking for a custom trait in order to make sure that Placement API will provide the right target GPU using the same mdev type for the instance.

  • The new config option [libvirt]migration_inbound_addr is now used to determine the address for incoming move operations (cold migrate, resize, evacuate). This config is defaulted to [DEFAULT]my_ip to keep the configuration backward compatible. However it allows an explicit hostname or FQDN to be specified, or allows to specify ‘%s’ that is then resolved to the hostname of compute host. Note that this config should only be changed from its default after every compute is upgraded.

  • This is a security-enhancing feature that automatically closes console sessions exceeding a defined timeout period. To enable this functionality, operators are required to set the ‘enforce_session_timeout’ boolean configuration option to True.

    The enforcement is implemented via a timer mechanism, initiating when users access the console and concluding upon the expiration of the set console token.

    This ensures the graceful closure of console sessions on the server side, aligning with security best practices.

  • Ironic nova-compute services can now target a specific shard of ironic nodes by setting the config [ironic]shard. This is particularly useful when using active-passive methods to choose on which physical host your ironic nova-compute process is running, while ensuring [DEFAULT]host stays the same for each shard. You can use this alongside [ironic]conductor_group to further limit which ironic nodes are managed by each nova-compute service. Note that when you use [ironic]shard the [ironic]peer_list is hard coded to a single nova-compute service.

    There is a new nova-manage command db ironic_compute_node_move that can be used to move ironic nodes, and the associated instances, between nova-compute services. This is useful when migrating from the legacy hash ring based HA towards the new sharding approach.

  • Now the libvirt driver is capable to detect maximum number of guests with memory encrypted which can run concurrently in its compute host using the new fields in libvirt API available since version 8.0.0.

  • The 2.96 microversion has been added. This microversion adds pinned_availability_zone in server show and server list –long responses.

Upgrade Notes

  • The HyperV virt driver has been removed. It was deprecated in the Nova 27.2.0 (Antelope) release. This driver was untested and has no maintainers. In addition, it has a dependency on the OpenStack Winstacker project that also has been retired.

    The RDP console was only available for the HyperV driver, therefore the RDP console related APIs below will return HTTP 400 (BadRequest) error:

    • GET RDP console:

      • Server Action Get RDP Console: POST /servers/{server_id}/action (os-getRDPConsole Action)

      • RDP protocol support from remote console API: POST /servers/{server_id}/remote-consoles

    • GET RDP console connection information:

      • Show Console Connection Information: GET /os-console-auth-tokens/{console_token}

    The following config options which only apply for the HyperV virt driver or RDP console APIs also have been removed:

    • [hyperv] dynamic_memory_ratio

    • [hyperv] enable_instance_metrics_collection

    • [hyperv] instances_path_share

    • [hyperv] limit_cpu_features

    • [hyperv] mounted_disk_query_retry_count

    • [hyperv] mounted_disk_query_retry_interval

    • [hyperv] power_state_check_timeframe

    • [hyperv] power_state_event_polling_interval

    • [hyperv] qemu_img_cmd

    • [hyperv] vswitch_name

    • [hyperv] wait_soft_reboot_seconds

    • [hyperv] config_drive_cdrom

    • [hyperv] config_drive_inject_password

    • [hyperv] volume_attach_retry_count

    • [hyperv] volume_attach_retry_interval

    • [hyperv] enable_remotefx

    • [hyperv] use_multipath_io

    • [hyperv] iscsi_initiator_list

    • [rdp] enabled

    • [rdp] html5_proxy_base_url

    The following extra specs which only apply for the HyperV virt driver have been removed.

    • os:resolution

    • os:monitors

    • os:vram

  • The deprecated [upgrade_levels] cert option has been removed.

  • The deprecated [api] use_forwarded_for option has been removed.

Bug Fixes

  • Some OS platforms don’t provide by default cpufreq resources in sysfs, so they don’t have CPU scaling governors. That’s why we should let the governor strategy to be optional for CPU power management.

  • Relaxed the config option checking of the cpu_power_management feature of the libvirt driver. The nova-compute service will start with [libvirt]cpu_power_management=True and an empty [compute]cpu_dedicated_set configuration. The power management is still only applied to dedicated CPUs. So the above configuration only allowed to ensure that cpu_power_management can be enabled independently for configuring cpu_dedicated_set during deployment.

  • With the change from ml2/ovs DHCP agents towards OVN implementation in neutron there is no port with device_owner network:dhcp anymore. Instead DHCP is provided by network:distributed port. Fix relies on enable_dhcp provided by neutron-api if no port with network:dhcp owner is found. See bug 2055245 for details.

  • Ironic virt driver now uses the node cache and respects partition keys, such as conductor group, for list_instances and list_instance_uuids calls. This fix will improve performance of the periodic queries which use these driver methods and reduce API and DB load on the backing Ironic service.

  • Bug 2009280 has been fixed by no longer enabling the evmcs enlightenment in the libvirt driver. evmcs only works on Intel CPUs, and domains with that enlightenment cannot be started on AMD hosts. There is a possible future feature to enable support for generating this enlightenment only when running on Intel hosts.

  • Previously switchdev capabilities should be configured manually by a user with admin privileges using port’s binding profile. This blocked regular users from managing ports with Open vSwitch hardware offloading as providing write access to a port’s binding profile to non-admin users introduces security risks. For example, a binding profile may contain a pci_slot definition, which denotes the host PCI address of the device attached to the VM. A malicious user can use this parameter to passthrough any host device to a guest, so it is impossible to provide write access to a binding profile to regular users in many scenarios.

    This patch fixes this situation by translating VF capabilities reported by Libvirt to Neutron port binding profiles. Other VF capabilities are translated as well for possible future use.