Stein Series Release Notes¶
19.3.2-19¶
アップグレード時の注意¶
The default for
[glance] num_retries
has changed from0
to3
. The option controls how many times to retry a Glance API call in response to a HTTP connection failure. When deploying Glance behind HAproxy it is possible for a response to arrive just after the HAproxy idle time. As a result, an exception will be raised when the connection is closed resulting in a failed request. By increasing the default value, Nova can be more resilient to this scenario were HAproxy is misconfigured by retrying the request.
バグ修正¶
Improved detection of anti-affinity policy violation when performing live and cold migrations. Most of the violations caused by race conditions due to performing concurrent live or cold migrations should now be addressed by extra checks in the compute service. Upon detection, cold migration operations are automatically rescheduled, while live migrations have two checks and will be rescheduled if detected by the first one, otherwise the live migration will fail cleanly and revert the instance state back to its previous value.
Fixes bug 1892361 in which the pci stat pools are not updated when an existing device is enabled with SRIOV capability. Restart of nova-compute service updates the pci device type from type-PCI to type-PF but the pools still maintain the device type as type-PCI. And so the PF is considered for allocation to instance that requests vnic_type=direct. With this fix, the pci device type updates are detected and the pci stat pools are updated properly.
19.3.2¶
バグ修正¶
Since Libvirt v.1.12.0 and the introduction of the libvirt issue , there is a fact that if we set cache mode whose write semantic is not O_DIRECT (i.e. "unsafe", "writeback" or "writethrough"), there will be a problem with the volume drivers (i.e. LibvirtISCSIVolumeDriver, LibvirtNFSVolumeDriver and so on), which designate native io explicitly.
When the driver_cache (default is none) has been configured as neither "none" nor "directsync", the libvirt driver will ensure the driver_io to be "threads" to avoid an instance spawning failure.
Addressed an issue that prevented instances using multiqueue feature from being created successfully when their vif_type is TAP.
19.3.0¶
バグ修正¶
This release contains a fix for a regression introduced in 15.0.0 (Ocata) where server create failing during scheduling would not result in an instance action record being created in the cell0 database. Now when creating a server fails during scheduling and is "buried" in cell0 a
create
action will be created with an event namedconductor_schedule_and_build_instances
.
A new
[workarounds]/reserve_disk_resource_for_image_cache
config option was added to fix the bug 1878024 where the images in the compute image cache overallocate the local disk. If this new config is set then the libvirt driver will reserve DISK_GB resources in placement based on the actual disk usage of the image cache.
Previously, attempting to configure an instance with the
e1000e
or legacyVirtualE1000e
VIF types on a host using the QEMU/KVM driver would result in an incorrectUnsupportedHardware
exception. These interfaces are now correctly marked as supported.
19.2.0¶
バグ修正¶
The Compute service has never supported direct booting of an instance from an image that was created by the Block Storage service from an encrypted volume. Previously, this operation would result in an ACTIVE instance that was unusable. Beginning with this release, an attempt to boot from such an image will result in the Compute API returning a 400 (Bad Request) response.
A new config option
[neutron]http_retries
is added which defaults to 3. It controls how many times to retry a Neutron API call in response to a HTTP connection failure. An example scenario where it will help is when a deployment is using HAProxy and connections get closed after idle time. If an incoming request tries to re-use a connection that is simultaneously being torn down, a HTTP connection failure will occur and previously Nova would fail the entire request. With retries, Nova can be more resilient in this scenario and continue the request if a retry succeeds. Refer to https://launchpad.net/bugs/1866937 for more details.
19.1.0¶
バグ修正¶
The
DELETE /os-services/{service_id}
compute API will now return a409 HTTPConflict
response when trying to delete anova-compute
service which is involved in in-progress migrations. This is because doing so would not only orphan the compute node resource provider in the placement service on which those instances have resource allocations but can also break the ability to confirm/revert a pending resize properly. See https://bugs.launchpad.net/nova/+bug/1852610 for more details.
An instance can be rebuilt in-place with the original image or a new image. Instance resource usage cannot be altered during a rebuild. Previously Nova would have ignored the NUMA topology of the new image continuing to use the NUMA topology of the existing instance until a move operation was performed. As Nova did not explicitly guard against inadvertent changes to resource requests contained in a new image, it was possible to rebuild with an image that would violate this requirement; see bug #1763766 for details. This resulted in an inconsistent state as the instance that was running did not match the instance that was requested. Nova now explicitly checks if a rebuild would alter the requested NUMA topology of an instance and rejects the rebuild if so.
With the changes introduced to address bug #1763766, Nova now guards against NUMA constraint changes on rebuild. As a result the
NUMATopologyFilter
is no longer required to run on rebuild since we already know the topology will not change and therefore the existing resource claim is still valid. As such it is now possible to do an in-place rebuild of an instance with a NUMA topology even if the image changes provided the new image does not alter the topology which addresses bug #1804502.
その他の注意点¶
A
--dry-run
option has been added to thenova-manage placement heal_allocations
CLI which allows running the command to get output without committing any changes to placement.
An
--instance
option has been added to thenova-manage placement heal_allocations
CLI which allows running the command on a specific instance given its UUID.
19.0.3¶
既知の問題¶
Operators should be aware that nova-api has a dependency on eventlet for executing parallel queries across multiple cells and is monkey-patched accordingly. When nova-api is running under uWSGI or mod_wsgi, the wsgi app will pause after idle time. While the wsgi app is paused, rabbitmq heartbeats will not be sent and log messages related to this can be seen in the nova-api logs when the wsgi app resumes when new requests arrive to the nova-api. These messages are not harmful. When the wsgi app resumes, oslo.messaging will reconnect to rabbitmq and requests will be served successfully.
There is one caveat, which is that the wsgi app configuration must be left as the default
threads=1
or set explicitly tothreads=1
to ensure that reconnection will work properly. When threads > 1, it is not guaranteed that oslo.messaging will reconnect to rabbitmq when the wsgi app resumes after pausing during idle time. Threads are used internally by oslo.messaging for heartbeats and more, and it may fail in a variety of ways if run under eventlet with an app that violates eventlet's threading guarantees. When oslo.messaging does not reconnect to rabbitmq after a wsgi app pause, RPC requests will fail with aMessagingTimeout
error. So, it is necessary to have the wsgi app configured withthreads=1
for reconnection to work properly.If running with
threads=1
is not an option in a particular environment, there are two other workarounds:Use the eventlet wsgi server instead of uWSGI or mod_wsgi, or
Disable eventlet monkey-patching using the environment variable
OS_NOVA_DISABLE_EVENTLET_PATCHING=yes
.Note that disabling eventlet monkey-patching will cause queries across multiple cells to be serialized instead of running in parallel and this may be undesirable in a large deployment with multiple cells, for performance reasons.
Please see the following related bugs for more details:
In Stein the Placement service is available either as part of Nova, or independently packaged from its own project. This is to allow easier migration from one to another. See the upgrade notes for more information.
When using the Placement packaged from Nova, some deployment strategies can lead to the service stalling with error messages similar to:
Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 460, in fire_timers timer() File "/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/usr/lib/python2.7/site-packages/eventlet/semaphore.py", line 147, in _do_acquire waiter.switch() error: cannot switch to a different thread
The reasons this is happening are discussed in bug 1829062. There are three workarounds available:
In the environment of the web server running the placement service, set
OS_NOVA_DISABLE_EVENTLET_PATCHING=yes
so that eventlet does not conflict with thread handling in the web server.Turn off threading in the web server. For example, if using
mod_wsgi
oruwsgi
, setthreads=1
in their respective configurations.Switch to using the extracted placement. It does not suffer from eventlet.
19.0.2¶
セキュリティー上の問題¶
OSSA-2019-003: Nova Server Resource Faults Leak External Exception Details (CVE-2019-14433)
This release contains a security fix for bug 1837877 where users without the admin role can be exposed to sensitive error details in the server resource fault
message
.There is a behavior change where non-nova exceptions will only record the exception class name in the fault
message
field which is exposed to all users, regardless of the admin role.The fault
details
, which are only exposed to users with the admin role, will continue to include the traceback and also include the exception value which for non-nova exceptions is what used to be exposed in the faultmessage
field. Meaning, the information that admins could see for server faults is still available, but the exception value may be indetails
rather thanmessage
now.
バグ修正¶
Bug 1811726 is fixed by deleting the resource provider (in placement) associated with each compute node record managed by a
nova-compute
service when that service is deleted via theDELETE /os-services/{service_id}
API. This is particularly important for compute services managing ironic baremetal nodes.
Add support for noVNC >= v1.1.0 for VNC consoles. Prior to this fix, VNC console token validation always failed regardless of actual token validity with noVNC >= v1.1.0. See https://bugs.launchpad.net/nova/+bug/1822676 for more details.
19.0.1¶
バグ修正¶
The os-volume_attachments update API, commonly referred to as the swap volume API will now return a
400
(BadRequest) error when attempting to swap from a multi attached volume with more than one active read/write attachment resolving bug #1775418.
Fixes a bug that caused Nova to fail on mounting Quobyte volumes whose volume URL contained multiple registries.
19.0.0¶
紹介¶
The 19.0.0 release includes many new features and bug fixes. Please be sure to read the upgrade section which describes the required actions to upgrade your cloud from 18.0.0 (Rocky) to 19.0.0 (Stein).
There are a few major changes worth mentioning. This is not an exhaustive list:
The latest Compute API microversion supported for Stein is v2.72. Details on REST API microversions added since the 18.0.0 Rocky release can be found in the REST API Version History page.
It is now possible to run Nova with version 1.0.0 of the recently extracted placement service, hosted from its own repository. Note that install/upgrade of an extracted placement service is not yet fully implemented in all deployment tools. Operators should check with their particular deployment tool for support before proceeding. See the placement install and upgrade documentation for more details. In Stein, operators may choose to continue to run with the integrated placement service from the Nova repository, but should begin planning a migration to the extracted placement service by Train, as the removal of the integrated placement code from Nova is planned for the Train release.
Users can now specify a volume type when creating servers when using the 2.67 compute API microversion. See the block device mapping documentation for more details.
The 2.69 compute API microversion adds handling of server details in the presence of down or poor-performing cells in a multi-cell environment for the
GET /servers
,GET /servers/detail
,GET /servers/{server_id}
,GET /os-services
REST APIs. See the handling down cells documentation for more details.Users are now able to create servers with Neutron ports that have QoS minimum bandwidth rules when using the 2.72 compute API microversion. See the using ports with resource request documentation for more details.
Operators can now set overcommit allocation ratios using Nova configuration files or the placement API, by making use of the initial allocation ratio configuration options. See the initial allocation ratios documentation for more details.
Compute capabilities are now exposed as traits in the placement API. See the compute capabilities as traits documentation for more details.
The configuration option
[compute]resource_provider_association_refresh
can now be set to zero to disable refresh entirely. This should be useful for large-scale deployments.The VMwareVCDriver now supports live migration. See the live migration configuration documentation for information on how to enable it.
Nova now supports nested resource providers in two cases:
QoS-enabled ports will have inventories and allocations created on nested resource providers from the start.
Libvirt compute nodes reporting VGPU inventory will have that VGPU inventory and corresponding allocations moved to a child resource provider on restart of the nova-compute service after upgrading to Stein.
In both cases this means when looking at resource providers, depending on the scenario, you can see more than one provider where there was initially just a root compute node provider per compute service.
新機能¶
The field
instance_name
has been added to theInstanceCreatePayload
in the following versioned notifications:instance.create.start
instance.create.end
instance.create.error
A new policy rule
os_compute_api:servers:allow_all_filters
has been added to control whether a user can use all filters when listing servers.
Microversion 2.67 adds the optional parameter
volume_type
to block_device_mapping_v2, which can be used to specifyvolume_type
when creating a server. This would only apply to BDMs withsource_type
of blank, image and snapshot anddestination_type
of volume. The compute API will reject server create requests with a specifiedvolume_type
until all nova-compute services are upgraded to Stein.
Adds support for extending RBD attached volumes using the libvirt network volume driver.
From microversion 2.69 the responses of
GET /servers
,GET /servers/detail
,GET /servers/{server_id}
andGET /os-services
will contain missing keys during down cell situations because of adding support for returning minimal constructs based on the available information from the API database for those records in the down cells. See Handling Down Cells for more information on the missing keys.
Introduced a new config option
[compute]/max_concurrent_disk_ops
to reduce disk contention by specifying the maximum number of concurrent disk-IO-intensive operations per compute service. This would include operations such as image download, image format conversion, snapshot extraction, etc. The default value is 0, which means that there is no limit.
Added support for the High Precision Event Timer (HPET) for x86 guests in the libvirt driver when image property
hypervisor_type=qemu
is set. The timer can be set by setting ahw_time_hpet=True
image property key/value pair. By default HPET remains turned off. When it is turned on the HPET is activated in libvirt.
A new configuration option,
[compute]/max_disk_devices_to_attach
, which defaults to-1
(unlimited), has been added and can be used to configure the maximum number of disk devices allowed to attach to a single server, per compute host. Note that the number of disks supported by a server depends on the bus used. For example, theide
disk bus is limited to 4 attached devices.Usually, disk bus is determined automatically from the device type or disk device, and the virtualization type. However, disk bus can also be specified via a block device mapping or an image property. See the
disk_bus
field in https://docs.openstack.org/nova/latest/user/block-device-mapping.html for more information about specifying disk bus in a block device mapping, and see https://docs.openstack.org/glance/latest/admin/useful-image-properties.html for more information about thehw_disk_bus
image property.The configured maximum is enforced during server create, rebuild, evacuate, unshelve, live migrate, and attach volume. When the maximum is exceeded during server create, rebuild, evacuate, unshelve, or live migrate, the server will go into
ERROR
state and the server fault message will indicate the failure reason. When the maximum is exceeded during a server attach volume API operation, the request will fail with a403 HTTPForbidden
error.
The configuration option
[compute]resource_provider_association_refresh
can now be set to zero to disable refresh entirely. This follows on from bug 1767309 allowing more aggressive reduction in the amount of traffic to the placement service.The cache can be cleared manually at any time by sending SIGHUP to the compute process. This will cause the cache to be repopulated the next time the data is accessed.
Compute drivers now expose capabilities via traits in the Placement API. Capabilities must map to standard traits defined in the os-traits project; for now these are:
COMPUTE_NET_ATTACH_INTERFACE
COMPUTE_DEVICE_TAGGING
COMPUTE_NET_ATTACH_INTERFACE_WITH_TAG
COMPUTE_VOLUME_ATTACH_WITH_TAG
COMPUTE_VOLUME_EXTEND
COMPUTE_VOLUME_MULTI_ATTACH
COMPUTE_TRUSTED_CERTS
Any traits provided by the driver will be automatically added during startup or a periodic update of a compute node. Similarly any traits later retracted by the driver will be automatically removed.
However any traits which are removed by the admin from the compute node resource provider via the Placement API will not be reinstated until the compute service's provider cache is reset. This can be triggered via a
SIGHUP
.
Instance list operations across cells are now made more efficient by batching queries as a fraction of the total limit for a request. Before this, an instance list with a limit of 1000 records (the default) would generate queries to each cell with that limit, and potentially process/sort/merge $num_cells*$limit records, despite only returning $limit to the user. The strategy can now be controlled via
[api]/instance_list_cells_batch_strategy
and related options to either use fixed batch sizes, or a fractional value that scales with the number of configured cells.
In deployments with Ironic, adds the ability for compute services to manage a subset of Ironic nodes. If the
[ironic]/partition_key
configuration option is set, the compute service will only consider nodes with a matchingconductor_group
attribute for management. Setting the[ironic]/peer_list
configuration option allows this subset of nodes to be distributed among the compute services specified to further reduce failure domain. This feature is useful to co-locate nova-compute services with ironic-conductor services managing the same nodes, or to better control failure domain of a given compute service.
A new configuration option
[libvirt]/live_migration_timeout_action
is added. This new option will have choicesabort
(default) orforce_complete
. This option will determine what actions will be taken against a VM afterlive_migration_completion_timeout
expires. Currently nova just aborts the live migrate operation after completion timeout expires. By default, we keep the same behavior of aborting after completion timeout.force_complete
will either pause the VM or trigger post-copy depending on if post copy is enabled and available.The
[libvirt]/live_migration_completion_timeout
is restricted by minimum 0 and will now raise a ValueError if the configuration option value is less than minimum value.Note if you configure Nova to have no timeout, post copy will never be automatically triggered. None of this affects triggering post copy via the force live-migration API, that continues to work in the same way.
The 2.70 compute API microversion exposes virtual device tags for volume attachments and virtual interfaces (ports). A
tag
parameter is added to the response body for the following APIs:Volumes
GET /servers/{server_id}/os-volume_attachments (list)
GET /servers/{server_id}/os-volume_attachments/{volume_id} (show)
POST /servers/{server_id}/os-volume_attachments (attach)
Ports
GET /servers/{server_id}/os-interface (list)
GET /servers/{server_id}/os-interface/{port_id} (show)
POST /servers/{server_id}/os-interface (attach)
Added the ability to allow users to use
Aggregate
'smetadata
to override the global config options for weights to achieve more fine-grained control over resource weights.Such as, for the CPUWeigher, it weighs hosts based on available vCPUs on the compute node, and multiplies it by the cpu weight multiplier. If per-aggregate value (which the key is "cpu_weight_multiplier") is found, this value would be chosen as the cpu weight multiplier. Otherwise, it will fall back to the
[filter_scheduler]/cpu_weight_multiplier
. If more than one value is found for a host in aggregate metadata, the minimum value will be used.
Microversion 1.30 of the placement API adds support for a
POST /reshaper
resource that provides for atomically migrating resource provider inventories and associated allocations when some of the inventory moves from one resource provider to another, such as when a class of inventory moves from a parent provider to a new child provider.注釈
This is a special operation that should only be used in rare cases of resource provider topology changing when inventory is in use. Only use this if you are really sure of what you are doing.
Added configuration option
[api]/local_metadata_per_cell
to allow users to run Nova metadata API service per cell. Doing this could provide performance improvement and data isolation in a multi-cell deployment. But it has some caveats, see the Metadata api service in cells v2 layout for more details.
Starting with the 2.71 microversion the
server_groups
parameter will be in the response body of the following APIs to list the server groups to which the server belongs:GET /servers/{server_id}
PUT /servers/{server_id}
POST /servers/{server_id}/action (rebuild)
API microversion 2.72 adds support for creating servers with neutron ports that has resource request, e.g. neutron ports with QoS minimum bandwidth rule. Deleting servers with such ports have already been handled properly as well as detaching these type of ports.
API limitations:
Creating servers with Neutron networks having QoS minimum bandwidth rule is not supported.
Attaching Neutron ports and networks having QoS minimum bandwidth rule is not supported.
Moving (resizing, migrating, live-migrating, evacuating, unshelving after shelve offload) servers with ports having resource request is not yet supported.
The libvirt driver now supports "QEMU-native TLS" transport for live migration. This will provide encryption for all migration streams, namely: guest RAM, device state and disks on a non-shared setup that are transported over NBD (Network Block Device), also called as "block migration".
This can be configured via a new configuration attribute
[libvirt]/live_migration_with_native_tls
. Refer to its documentation innova.conf
for usage details. Note that this is the preferred the way to secure all migration streams in an OpenStack network, instead of[libvirt]/live_migration_tunnelled
.
Microversion 2.66 adds the optional filter parameter
changes-before
which can be used to get resources changed before or equal to the specified date and time.Like the
changes-since
filter, thechanges-before
filter will also return deleted servers.This parameter (
changes-before
) does not change any read-deleted behavior in the os-instance-actions or os-migrations APIs. The os-instance-actions API with the 2.21 microversion allows retrieving instance actions for a deleted server resource. The os-migrations API takes an optionalinstance_uuid
filter parameter but does not support returning deleted migration records.The
changes-before
request parameter can be passed to the servers, os-instance-action and os-migrations APIs:GET /servers
GET /servers/detail
GET /servers/{server_id}/os-instance-actions
GET /os-migrations
The versioned notification interface of nova is now complete and in feature parity with the legacy interface. The emitted notifications are documented in notification dev ref with full sample files. The deprecation of the legacy notification interface is under dicussion and will be handled separately.
For the VMware vCenter driver, added support for the configured video ram
hw_video_ram
from the image, which will be checked against the maximum allowed video ramhw_video:ram_max_mb
from the flavor. If the selected video ram from the image is less than or equal to the maximum allowed ram, thevideoRamSizeInKB
will be set. If the selected ram is more than the maximum allowed one, then server creation will fail for the given image and flavor. If the maximum allowed video ram is not set in the flavor we do not setvideoRamSizeInKB
in the VM.
The VMware compute driver now supports live migration. Each compute node must be managing a cluster in the same vCenter and ESX hosts must have vMotion enabled.
This release adds support for
direct
andvirtio-forwarder
VNIC types to thevrouter
VIF type. In order to use these VNIC types, support is required from the version of OpenContrail, Contrail or Tungsten Fabric that is installed, as well the required hardware. At this time, the reference os-vif plugin is hosted on OpenContrail at https://github.com/Juniper/contrail-nova-vif-driver but is expected to transition to Tungsten Fabric in the future. Version 5.1 or later of the plugin is required to use these new VNIC types. Consult the Tungsten Fabric documentation for release notes, when available, about hardware support. For commercial support, consult the release notes from a downstream vendor.
既知の問題¶
Operators changing the
[compute]/max_disk_devices_to_attach
on a compute service that is hosting servers should be aware that it could cause rebuilds to fail, if the maximum is decreased lower than the number of devices already attached to servers. For example, if server A has 26 devices attached and an operators changes[compute]/max_disk_devices_to_attach
to 20, a request to rebuild server A will fail and go into ERROR state because 26 devices are already attached and exceed the new configured maximum of 20.Operators setting
[compute]/max_disk_devices_to_attach
should also be aware that during a cold migration, the configured maximum is only enforced in-place and the destination is not checked before the move. This means if an operator has set a maximum of 26 on compute host A and a maximum of 20 on compute host B, a cold migration of a server with 26 attached devices from compute host A to compute host B will succeed. Then, once the server is on compute host B, a subsequent request to rebuild the server will fail and go into ERROR state because 26 devices are already attached and exceed the configured maximum of 20 on compute host B.The configured maximum is not enforced on shelved offloaded servers, as they have no compute host.
Nova leaks resource allocations in placement during
POST /servers/{server_id}/action (revertResize Action)
andPOST /servers/{server_id}/action (confirmResize Action)
andPOST /servers/{server_id}/action (os-migrateLive Action)
and if the allocation held by the migration_uuid is modified in parallel with the lifecycle operation. Nova will log an ERROR and will put the server into ERROR state but will not delete the migration allocation. We assume that this can only happen if somebody outside of nova is actively changing the migration allocation in placement. Therefore it is not considered as a bug.
Nova leaks bandwidth resources if a bound port that has QoS minimum bandwidth rules is deleted in Neutron before the port is logically detached from the server. To avoid any leaks, users should detach the port from the server using the Nova API first before deleting the port in Neutron. If the server is in a state such that the port cannot be detached using the Nova API, bandwidth resources will be freed when the server is deleted. Another alternative to clean up the leak is to remove the
NET_BW_EGR_KILOBIT_PER_SEC
and/orNET_BW_IGR_KILOBIT_PER_SEC
allocations related to the deleted port for the server using the CLI. See related bug https://bugs.launchpad.net/nova/+bug/1820588 for more details.
アップグレード時の注意¶
The default QEMU machine type for ARMv7 architecture is now changed to
virt
(from the oldervexpress-a15
, which is a particular ARM development board). Thevirt
board is the recommended default for ARMv7, which is explicitly designed to be used with virtual machines. It is more flexible, supports PCI and 'virtio' devices, has decent RAM limits, and so forth. For pre-existing Nova guests on ARMv7 to acquire thevirt
machine type: (a) upgrade Nova with this fix; (b) explicitly start and stop the guests, then they will pick up the 'virt' machine type.
The default value for the "cpu_allocation_ratio", "ram_allocation_ratio" and "disk_allocation_ratio" configurations have been changed to
None
.The
initial_cpu_allocation_ratio
,initial_ram_allocation_ratio
andinitial_disk_allocation_ratio
configuration options have been added to theDEFAULT
group:initial_cpu_allocation_ratio with default value 16.0
initial_ram_allocation_ratio with default value 1.5
initial_disk_allocation_ratio with default value 1.0
These options help operators specify initial virtual CPU/ram/disk to physical CPU/ram/disk allocation ratios. These options are only used when initially creating the
computes_nodes
table record for a given nova-compute service.Existing
compute_nodes
table records with0.0
orNone
values forcpu_allocation_ratio
,ram_allocation_ratio
ordisk_allocation_ratio
will be migrated online when accessed or when thenova-manage db online_data_migrations
command is run.For more details, refer to the spec.
Adds a
use_cache
parameter to the virt driverget_info
method. Out of tree drivers should add support for this parameter.
The new configuration option,
[compute]/max_disk_devices_to_attach
defaults to-1
(unlimited). Users of the libvirt driver should be advised that the default limit for non-ide disk buses has changed from 26 to unlimited, upon upgrade to Stein. Theide
disk bus continues to be limited to 4 attached devices per server.
The defalut value for policy rule
os_compute_api:servers:create:zero_disk_flavor
has changed fromrule:admin_or_owner
torule:admin_api
which means that by default, users without the admin role will not be allowed to create servers using a flavor withdisk=0
unless they are creating a volume-backed server. If you have these kinds of flavors, you may need to take action or temporarily override the policy rule. Refer to bug 1739646 for more details.
Live migration of instances with NUMA topologies is now disabled by default when using the libvirt driver. This includes live migration of instances with CPU pinning or hugepages. CPU pinning and huge page information for such instances is not currently re-calculated, as noted in bug #1289064. This means that if instances were already present on the destination host, the migrated instance could be placed on the same dedicated cores as these instances or use hugepages allocated for another instance. Alternately, if the host platforms were not homogeneous, the instance could be assigned to non-existent cores or be inadvertently split across host NUMA nodes.
The long term solution to these issues is to recalculate the XML on the destination node. When this work is completed, the restriction on live migration with NUMA topologies will be lifted.
For operators that are aware of the issues and are able to manually work around them, the
[workarounds] enable_numa_live_migration
option can be used to allow the broken behavior.For more information, refer to bug #1289064.
The online data migration
migrate_instances_add_request_spec
, which was added in the 14.0.0 Newton release, has now been removed. Compatibility code in the controller services for old instances without a matchingrequest_specs
entry in thenova_api
database is also gone. Ensure that theRequest Spec Migration
check in thenova-status upgrade check
command is successful before upgrading to the 19.0.0 Stein release.
The
nova-manage db online_data_migrations
command will now fill missingvirtual_interfaces
records for instances created before the Newton release. This is related to a fix for https://launchpad.net/bugs/1751923 which makes the _heal_instance_info_cache periodic task in thenova-compute
service regenerate an instance network info cache from the current neutron port list, and the VIFs from the database are needed to maintain the port order for the instance.
With added validations for flavor extra-specs and image properties, the APIs for server create, resize and rebuild will now return 400 exceptions where they did not before due to the extra-specs or properties not being properly formatted or being mutually incompatible.
For all three actions we will now check both the flavor and image to validate the CPU policy, CPU thread policy, CPU topology, memory topology, hugepages, serial ports, realtime CPU mask, NUMA topology details, CPU pinning, and a few other things.
The main advantage to this is to catch invalid configurations as early as possible so that we can return a useful error to the user rather than fail later on much further down the stack where the operator would have to get involved.
Ironic nodes are now only scheduled using the
resource_class
field set on the node. CPUs, RAM, and disks are not reported to the resource tracker. Ironic nodes must have theresource_class
field set before upgrading. Flavors must also be configured to use resource classes instead of node properties. See the ironic flavor configuration guide for more information on doing this.
The libvirt compute driver will "reshape" VGPU inventories and allocations on start of the
nova-compute
service. This will result in moving VGPU inventory from the root compute node resource provider to a nested (child) resource provider in the tree and move any associated VGPU allocations with it. This will be a one-time operation on startup in Stein. There is no end-user visible impact for this; it is for internal resource tracking purposes. See the spec for more details.
Config option
[libvirt]/live_migration_progress_timeout
was deprecated in Ocata, and has now been removed.Current logic in libvirt driver to auto trigger post-copy based on progress information is removed as it has proved impossible to detect when live-migration appears to be making little progress.
The default value for the
[compute]/live_migration_wait_for_vif_plug
configuration option has been changed to True. As noted in the help text for the option, some networking backends will not work with this set to True, although OVS and linuxbridge will.
The
maximum_instance_delete_attempts
configuration option has been restricted by the minimum value and now raises a ValueError if the value is less than 1.
This release moves the
vrouter
VIF plug and unplug code to a separate package calledcontrail-nova-vif-driver
. This package is a requirement on compute nodes when using Contrail, OpenContrail or Tungsten Fabric as a Neutron plugin. At this time, the reference plugin is hosted on OpenContrail at https://github.com/Juniper/contrail-nova-vif-driver but is expected to transition to Tungsten Fabric in the future. Releaser5.1.alpha0
or later of the plugin is required, which will be included in Tungsten Fabric 5.1.
The
nova-manage db online_data_migrations
command now returns exit status 2 in the case where some migrations failed (raised exceptions) and no others were completed successfully from the last batch attempted. This should be considered a fatal condition that requires intervention. Exit status 1 will be returned in the case where the--max-count
option was used and some migrations failed but others succeeded (updated at least one row), because more work may remain for the non-failing migrations, and their completion may be a dependency for the failing ones. The command should be reiterated while it returns exit status 1, and considered completed successfully only when it returns exit status 0.
The
nova-consoleauth
service is deprecated and should no longer be deployed, however, if there is a requirement to maintain support for existing console sessions through a live/rolling upgrade, operators should set[workarounds]enable_consoleauth = True
in their configuration and continue runningnova-consoleauth
for the duration of the live/rolling upgrade. A new check has been added to thenova-status upgrade check
CLI to help with this and it will emit a warning and provide additional instructions to set[workarounds]enable_consoleauth = True
while performing a live/rolling upgrade.
Added a new
unique
choice to the[libvirt]/sysinfo_serial
configuration which if set will result in the guest serial number being set toinstance.uuid
. This is now the default value of the[libvirt]/sysinfo_serial
config option and is the recommended choice since it ensures the guest serial is the same even if the instance is migrated between hosts.
The
caching_scheduler
scheduler driver, which was deprecated in the 16.0.0 Pike release, has now been removed. Unlike the defaultfilter_scheduler
scheduler driver which creates resource allocations in the placement service during scheduling, thecaching_scheduler
driver did not interface with the placement service. As more and more functionality within nova relies on managing (sometimes complex) resource allocations in the placement service, compatibility with thecaching_scheduler
driver is difficult to maintain, and seldom tested. The original reasons behind the need for the CachingScheduler should now be resolved with the FilterScheduler and the placement service, notably:resource claims (allocations) are made atomically during scheduling to alleviate the potential for racing to concurrently build servers on the same compute host which could lead to failures
because of the atomic allocation claims made during scheduling by the
filter_scheduler
driver, it is safe [1] to run multiple scheduler workers and scale horizontally
To migrate from the CachingScheduler to the FilterScheduler, operators can leverage the
nova-manage placement heal_allocations
command:https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement
Finally, it is still technically possible to load an out-of-tree scheduler driver using the
nova.scheduler.driver
entry-point. However, out-of-tree driver interfaces are not guaranteed to be stable:https://docs.openstack.org/nova/latest/contributor/policies.html#out-of-tree-support
And as noted above, as more of the code base evolves to rely on resource allocations being tracked in the placement service (created during scheduling), out-of-tree scheduler driver support may be severely impacted.
If you rely on the
caching_scheduler
driver or your own out-of-tree driver which setsUSES_ALLOCATION_CANDIDATES = False
to bypass the placement service, please communicate with the nova development team in the openstack-dev mailing list and/or #openstack-nova freenode IRC channel to determine what prevents you from using thefilter_scheduler
driver.
The following deprecated Policy Rules have been removed:
Show & List server details
os_compute_api:os-config-drive
os_compute_api:os-extended-availability-zone
os_compute_api:os-extended-status
os_compute_api:os-extended-volumes
os_compute_api:os-keypairs
os_compute_api:os-server-usage
os_compute_api:os-security-groups (only from /servers APIs)
Create, Update, Show & List flavor details
os_compute_api:os-flavor-rxtx
os_compute_api:os-flavor-access (only from /flavors APIs)
Show & List image details
os_compute_api:image-size
These were deprecated in the 17.0.0 release as nova removed the concept of API extensions.
The
os_compute_api:flavors
policy deprecated in 16.0.0 has been removed.
The
os_compute_api:os-flavor-manage
policy has been removed because it has been deprecated since 16.0.0. Use the following policies instead:os_compute_api:os-flavor-manage:create
os_compute_api:os-flavor-manage:delete
The
os_compute_api:os-server-groups
policy deprecated in 16.0.0 has been removed.
It is no longer possible to force server live migrations or evacuations to a specific destination host starting with API microversion 2.68. This is because it is not possible to support these requests for servers with complex resource allocations. It is still possible to request a destination host but it will be validated by the scheduler.
The following configuration options in the
quota
group have been removed because they have not been used since 17.0.0.reservation_expire
until_refresh
max_age
The
chance_scheduler
scheduler driver was deprecated in Pike and has now been removed. You should enable thefilter_scheduler
driver instead. Ifchance_scheduler
behavior is desired (i.e. speed is valued over correctness) then configuring thefilter_scheduler
with only theAllHostsFilter
enabled and adjusting[filter_scheduler]/host_subset_size
will provide similar performance.
The
[filter_scheduler]/soft_affinity_weight_multiplier
and[filter_scheduler]/soft_anti_affinity_weight_multiplier
configuration options now have a hard minimum value of 0.0. Also, the deprecated alias to the[DEFAULT]
group has been removed so the options must appear in the[filter_scheduler]
group.
The
[api]/hide_server_address_states
configuration option andos_compute_api:os-hide-server-addresses
policy rule were deprecated in the 17.0.0 Queens release. They have now been removed. If you never changed these values, the API behavior remains unchanged.
廃止予定の機能¶
The
config_drive_format
config option has been deprecated. This was necessary to workaround an issue with libvirt that was later resolved in libvirt v1.2.17. For more information refer to bug #1246201.
The
CoreFilter
,DiskFilter
andRamFilter
are now deprecated. VCPU, DISK_GB and MEMORY_MB filtering is performed natively using the Placement service when using thefilter_scheduler
driver. Users of thecaching_scheduler
driver may still rely on these filters but thecaching_scheduler
driver is itself deprecated. Furthermore, enabling these filters may incorrectly filter out baremetal nodes which must be scheduled using custom resource classes.
The
[workarounds] disable_libvirt_livesnapshot
config option has been deprecated. This was necessary to work around an issue with libvirt v1.2.2, which we no longer support. For more information refer to bug #1334398.
The
nova-console
service is deprecated as it is XenAPI specific, does not function properly in a multi-cell environment, and has effectively been replaced by noVNC and thenova-novncproxy
service. noVNC should therefore be configured instead.
The nova-xvpvncproxy service is deprecated as it is Xen specific and has effectively been replaced by noVNC and the nova-novncproxy service.
DEFAULT
にある次のオプションは nova-network を設定するために使われるだけであり、nova-network 自体と同様に非推奨とされました。defer_iptables_apply
バグ修正¶
PUT /os-aggregates/{aggregate_id}
andPOST /os-aggregates/{aggregate_id}/action
(for set_metadata action) will now return HTTP 400 for availability zone renaming if the hosts of the aggregate have any instances.
Bug 1675791 has been fixed by granting image membership access to snapshot images when the owner of the server is not performing the snapshot/backup/shelve operation on the server. For example, an admin shelves a user's server and the user needs to unshelve the server so the user needs access to the shelved snapshot image.
Note that only the image owner may delete the image, so in the case of a shelved offloaded server, if the user unshelves or deletes the server, that operation will work but there will be a warning in the logs because the shelved snapshot image could not be deleted since the user does not own the image. Similarly, if an admin creates a snapshot of a server in another project, the admin owns the snapshot image and the non-admin project, while having shared image member access to see the image, cannot delete the snapshot.
The bug fix applies to both the
nova-osapi_compute
andnova-compute
service so older compute services will need to be patched.Refer to the image API reference for details on image sharing:
https://developer.openstack.org/api-ref/image/v2/index.html#sharing
Fixes bug 1773342 where the Hyper-v driver always deleted unused images ignoring
remove_unused_images
config option. This change will now allow deployers to disable the auto-removal of old images.
The
long_rpc_timeout
configuration option is now used for the RPC call to the scheduler to select a host. This is in order to avoid a timeout when scheduling multiple servers in a single request and/or when the scheduler needs to process a large number of hosts.
The
[DEFAULT]/shutdown_timeout
configuration option minimum value has been fixed to be 0 rather than 1 to align with the correspondingos_shutdown_timeout
image property. See bug https://launchpad.net/bugs/1799707 for details.
When testing whether direct IO is possible on the backing storage for an instance, Nova now uses a block size of 4096 bytes instead of 512 bytes, avoiding issues when the underlying block device has sectors larger than 512 bytes. See bug https://launchpad.net/bugs/1801702 for details.
Fixes a race condition that could allow a newly created Ironic instance to be powered off after deployment, without letting the user power it back on.
The
os-simple-tenant-usage
pagination has been fixed. In some cases, nova usage-list would have returned incorrect results because of this. See bug https://launchpad.net/bugs/1796689 for details.
Note that the original fix for bug 1414559 committed early in rocky was automatic and always enabled. Because of bug 1786346 that fix has since been reverted and superseded by an opt-in mechanism which must be enabled. Setting
[compute]/live_migration_wait_for_vif_plug=True
will restore the behavior of waiting for neutron events during the live migration process.
A change has been introduced in the libvirt driver to correctly handle IPv6 addresses for live migration.
By using
writeback
QEMU cache mode, make Nova's disk image conversion (e.g. from raw to QCOW2 or vice versa) dramatically faster, without compromising data integrity. Bug 1818847.
[bug 1818295] Fixes the problem with endpoint lookup in Ironic driver where only public endpoint is possible, which breaks deployments where the controllers have no route to the public network per security requirement. Note that python-ironicclient fix I610836e5038774621690aca88b2aee25670f0262 must also be present to resolve the bug.
その他の注意点¶
The
[workarounds]/ensure_libvirt_rbd_instance_dir_cleanup
configuration option has been introduced. This can be used by operators to ensure that instance directories are always removed during cleanup within the Libvirt driver while using[libvirt]/images_type = rbd
. This works around known issues such as bug 1414895 when cleaning up after an evacuation and bug 1761062 when reverting from an instance resize.Operators should be aware that this workaround only applies when using the libvirt compute driver and rbd images_type as enabled by the following configuration options:
[DEFAULT]/compute_driver = libvirt
[libvirt]/images_type = rbd
警告
Operators will need to ensure that the instance directory itself, specified by
[DEFAULT]/instances_path
, is not shared between computes before enabling this workaround, otherwise files associated with running instances may be removed.
The
[cinder]/catalog_info
default value is changed such that theservice_name
portion of the value is no longer set and is also no longer required. Since looking up the cinder endpoint in the service catalog should only need the endpoint type (volumev3
by default) and interface (publicURL
by default), the service name is dropped and only provided during endpoint lookup if configured. See bug 1803627 for details.
In case of infrastructure failures like non-responsive cells, prior to change e3534d we raised an API 500 error. However currently when listing instances or migrations, we skip that cell and display results from the up cells with the aim of increasing availability at the expense of accuracy. If the old behaviour is desired, a new flag called
CONF.api.list_records_by_skipping_down_cells
has been added which can be set to False to mimic the old behavior. Both of these potential behaviors will be unified in an upcoming microversion done through the blueprint handling-down-cell where minimal constructs would be returned for the down cell instances instead of raising 500s or skipping down cells.
The
POST /servers/{server_id}/os-interface
request will be rejected with HTTP 400 if the Neutron port referenced in the request body has resource request as Nova currently cannot support such operation. For example a Neutron port has resource request if a QoS minimum bandwidth rule is attached to that port in Neutron.
The
POST /servers/{server_id}/os-interface
request and thePOST /servers
request will be rejected with HTTP 400 if the Neutron network referenced in the request body has QoS minimum bandwidth rule attached as Nova currently cannot support such operations.
CI testing of Cells v1 has been moved to the
experimental
queue meaning changes proposed to nova will not be tested against a Cells v1 setup unless explicitly run through theexperimental
queue by leaving a review comment on the patch of "check experimental". Cells v1 has been deprecated since the 16.0.0 Pike release and this is a further step in its eventual removal.