Yoga Series (19.0.0 - 20.1.x) Release Notes¶
20.1.3-10¶
Upgrade Notes¶
When upgrading Ironic to address the
qemu-img
image conversion security issues, theironic-python-agent
ramdisks will also need to be upgraded.
As a result of security fixes to address
qemu-img
image conversion security issues, a new configuration parameter has been added to Ironic,[conductor]permitted_image_formats
with a default value of “raw,qcow2,iso”. Raw and qcow2 format disk images are the image formats the Ironic community has consistently stated as what is supported and expected for use with Ironic. These formats also match the formats which the Ironic community tests. Operators who leverage other disk image formats, may need to modify this setting further.
Security Issues¶
Ironic now checks the supplied image format value against the detected format of the image file, and will prevent deployments should the values mismatch. If being used with Glance and a mismatch in metadata is identified, it will require images to be re-uploaded with a new image ID to represent corrected metadata. This is the result of CVE-2024-44082 tracked as bug 2071740.
Ironic always inspects the supplied user image content for safety prior to deployment of a node should the image pass through the conductor, even if the image is supplied in
raw
format. This is utilized to identify the format of the image and the overall safety of the image, such that source images with unknown or unsafe feature usage are explicitly rejected. This can be disabled by setting[conductor]disable_deep_image_inspection
toTrue
. This is the result of CVE-2024-44082 tracked as bug 2071740.
Ironic also inspect images which would normally be provided as a URL for direct download by the
ironic-python-agent
ramdisk. This is enabled by default and increases the overall network traffic and disk space utilization of the conductor. This level of inspection can be disabled by setting[conductor]conductor_always_validates_images
toFalse
. Doing so is not advisable as Zed release and earlierironic-python-agent
ramdisks will not be made available due to backport regression risk. This is the result of CVE-2024-44082 tracked as bug 2071740.
Ironic now explicitly enforces a list of permitted image types for deployment via the
[conductor]permitted_image_formats
setting, which defaults to “raw”, “qcow2”, and “iso”. While the project has classically always declared permissible images as “qcow2” and “raw”, it was previously possible to supply other image formats known toqemu-img
, and the utility would attempt to convert the images. The “iso” support is required for “boot from ISO” ramdisk support.
Ironic now explicitly passes the source input format to executions of
qemu-img
to limit the permitted qemu disk image drivers which may evaluate an image to prevent any mismatched format attacks againstqemu-img
.
The
ansible
deploy interface example playbooks now supply an input format to execution ofqemu-img
. If you are using customized playbooks, please add “-f {{ ironic.image.disk_format }}” to your invocations ofqemu-img
. If you do not do so,qemu-img
will automatically try and guess which can lead to known security issues with the incorrect source format driver.
Operators who have implemented any custom deployment drivers or additional functionality like machine snapshot, should review their downstream code to ensure they are properly invoking
qemu-img
. If there are any questions or concerns, please reach out to the Ironic project developers.
Operators are reminded that they should utilize cleaning in their environments. Disabling any security features such as cleaning or image inspection are at your own risk. Should you have any issues with security related features, please don’t hesitate to open a bug with the project.
The
[conductor]disable_deep_image_inspection
setting is conveyed to theironic-python-agent
ramdisks automatically, and will prevent those operating ramdisks from performing deep inspection of images before they are written.
The
[conductor]permitted_image_formats
setting is conveyed to theironic-python-agent
ramdisks automatically. Should a need arise to explicitly permit an additional format, that should take place in the Ironic service configuration.
An issue in Ironic has been resolved where image checksums would not be checked prior to the conversion of an image to a
raw
format image from another image format.With default settings, this normally would not take place, however the
image_download_source
option, which is available to be set at anode
level for a single deployment, by default for that baremetal node in all cases, or via the[agent]image_download_source
configuration option when set tolocal
. By default, this setting ishttp
.This was in concert with the
[DEFAULT]force_raw_images
when set toTrue
, which caused Ironic to download and convert the file.In a fully integrated context of Ironic’s use in a larger OpenStack deployment, where images are coming from the Glance image service, the previous pattern was not problematic. The overall issue was introduced as a result of the capability to supply, cache, and convert a disk image provided as a URL by an authenticated user.
Ironic will now validate the user supplied checksum prior to image conversion on the conductor. This can be disabled using the
[conductor]disable_file_checksum
configuration option.
Bug Fixes¶
Fixes multiple issues in the handling of images as it relates to the execution of the
qemu-img
utility, which is used for image format conversion, where a malicious user could craft a disk image to potentially extract information from anironic-conductor
process’s operating environment.Ironic now explicitly enforces a list of approved image formats as a
[conductor]permitted_image_formats
list, which mirrors the image formats the Ironic project has historically tested and expressed as known working. Testing is not based upon file extension, but upon content fingerprinting of the disk image files. This is tracked as CVE-2024-44082 via bug 2071740.
Fixes a security issue where Ironic would fail to checksum disk image files it downloads when Ironic had been requested to download and convert the image to a raw image format. This required the
image_download_source
to be explicitly set tolocal
, which is not the default.This fix can be disabled by setting
[conductor]disable_file_checksum
toTrue
, however this option will be removed in new major Ironic releases.As a result of this, parity has been introduced to align Ironic to Ironic-Python-Agent’s support for checksums used by
standalone
users of Ironic. This includes support for remote checksum files to be supplied by URL, in order to prevent breaking existing users which may have inadvertently been leveraging the prior code path. This support can be disabled by setting[conductor]disable_support_for_checksum_files
toTrue
.
Fixes issues with Lenovo hardware where the system firmware may display a blue “Boot Option Restoration” screen after the agent writes an image to the host in UEFI boot mode, requiring manual intervention before the deployed node boots. This issue is rooted in multiple changes being made to the underlying NVRAM configuration of the node. Lenovo engineers have suggested to only change the UEFI NVRAM and not perform any further changes via the BMC to configure the next boot. Ironic now does such on Lenovo hardware. More information and background on this issue can be discovered in bug 2053064.
20.1.3¶
Bug Fixes¶
Modify iRMC driver to use ironic.conf [deploy] default_boot_mode to determine default boot_mode.
20.1.2¶
Upgrade Notes¶
Adds
sha256
,sha384
andsha512
as supported SNMPv3 authentication protocols to iRMC driver.
Bug Fixes¶
Fixes Ironic integration with Cinder because of changes which resulted as part of the recent Security related fix in bug 2004555. The work in Ironic to track this fix was logged in bug 2019892. Ironic now sends a service token to Cinder, which allows for access restrictions added as part of the original CVE-2023-2088 fix to be appropriately bypassed. Ironic was not vulnerable, but the restrictions added as a result did impact Ironic’s usage. This is because Ironic volume attachments are not on a shared “compute node”, but instead mapped to the physical machines and Ironic handles the attachment life-cycle after initial attachment.
When aborting cleaning, the
last_error
field is no longer initially empty. It is now populated on the state transition toclean failed
.
When cleaning or deployment fails, the
last_error
field is no longer temporary set toNone
while the power off action is running.
Fixes an issue where if selinux is enabled and enforcing, and the published image is a hardlink, the source selinux context is preserved, causing access denied when retrieving the image using hardlink URL.
Fixes bug of iRMC driver in parse_driver_info where, if FIPS is enabled, SNMP version is always required to be version 3 even though iRMC driver’s xxx_interface doesn’t use SNMP actually.
Fixes
'NoneType' object is not iterable
in conductor logs forredfish
andidrac-redfish
RAID clean and deploy steps. The message should no longer appear. For affected nodes re-create the node or deleteraid_configs
entry fromdriver_internal_info
field.
Fixes an issue in the online upgrade logic where database models for Node Traits and BIOS Settings resulted in an error when performing the online data migration. This was because these tables were originally created as extensions of the Nodes database table, and the schema of the database was slightly different enough to result in an error if there was data to migrate in these tables upon upgrade, which would have occured if an early BIOS Setting adopter had data in the database prior to upgrading to the Yoga release of Ironic.
The online upgrade parameter now subsitutes an alternate primary key name name when applicable.
Fixes SNMPv3 message authentication and encryption functionality of iRMC driver. The SNMPv3 authentication between iRMC driver and iRMC was only by the security name with no passwords and encryption. To increase security, the following parameters are now added to the node’s
driver_info
, and can be used for authentication:irmc_snmp_user
irmc_snmp_auth_password
irmc_snmp_priv_password
irmc_snmp_auth_proto
(Optional, defaults tosha
)irmc_snmp_priv_proto
(Optional, defaults toaes
)
irmc_snmp_user
replacesirmc_snmp_security
.irmc_snmp_security
will be ignored ifirmc_snmp_user
is set.irmc_snmp_auth_proto
andirmc_snmp_priv_proto
can also be set through the following options in the[irmc]
section of/etc/ironic/ironic.conf
:snmp_auth_proto
snmp_priv_proto
Fixes a race condition in PXE initialization where logic to retry what we suspect as potentially failed PXE boot operations was not consulting if an
agent token
had been established, which is the very first step in agent initialization.
Fixes an issue where an agent token was being orphaned if a baremetal node timed out during cleaning operations, leading to issues where the node would not be able to establish a new token with Ironic upon future in some cases. We now always wipe the token in this case.
Other Notes¶
Updates the minimum version of
python-scciclient
library to0.12.2
.
20.1.1¶
Known Issues¶
When using
jsonschema
4.0.0 or newer, make sure to include a proper$schema
field in your custom network data or RAID schemas.
Security Issues¶
Modifies the
irmc
hardware type to include a capability to control enforcement of HTTPS certificate verification. By default this is enforced. python-scciclient version must be one of >=0.8.2,<0.9.0, >=0.9.4,<0.10.0, >=0.10.1,<0.11.0, >=0.11.3,<0.12.0 or >=0.12.0,<0.13.0 Or certificate verification will not occur.
Bug Fixes¶
Fixes detecting of allowable values for a BIOS settings enumeration in the
redfish
BIOS interface when onlyValueDisplayName
is provided.
The combined
ironic
executable now starts the API only after the built-in conductor starts. This avoids error 500 on requests while the conductor is starting.
Fixes an issue where a conductor would attempt local takeover. In case of heartbeat failure due to resource starvation, the current conductor was detected as offline when querying the database. In this scenario the conductor would forcibly remove reservations of it’s own and initiate takeover. Current conductor is now excluded from the list of offline conductors, so that local takeover does not occur for this case. A warning is logged to highlight the potential resource starvation issue. See bug: 2010016.
Fixes rebooting into the agent after changing BIOS settings in fast-track mode with the
redfish-virtual-media
boot interface. Previously, the ISO would not be configured.
Fixes
OSError: [Errno 36] File name too long
when building a virtual media ISO from a long kernel, ramdisk or ESP URL.
Fixes
redfish
andidrac-redfish
RAIDcreate_configuration
,apply_configuration
,delete_configuration
clean and deploy steps to update node’sraid_config
field at the end of the steps.
Fixes
redfish-virtual-media
boot
interface to allow it with iDRAC firmware from 6.00.00.00 (released June 2022) as it has virtual media boot issue fixed that prevented iDRAC firmware to work withredfish-virtual-media
before. Consider upgrading iDRAC firmware if not done already, otherwise will still get an error when trying to useredfish-virtual-media
with iDRAC.
Adds
driver_info/irmc_verify_ca
option to specify certification file. Default value of driver_info/irmc_verify_ca is True.
Fix a bug when configuring RAID caused by not converting the port value to int type when the node managed by the irmc hardware type.
Fixes API error messages with jsonschema>=4.8. A possible root cause is now detected for generic schema errors.
Fixes compatibility with
jsonschema
package version 4.0.0 or newer by providing a proper schema version (Draft-07 currently).
When the
ramdisk
deploy interface is used and automated cleaning is disabled, thepxe
,ipxe
andredfish-virtual-media
boot interfaces no longer require a deploy kernel/ramdisk to be provided.
Fixes an issue where the Redfish session cache would continue using an old session when a password for a Redfish BMC was changed. Now the old session will not be found in this case, and a new session will be created with the latest credential information available.
Resolved clear_job_queue and reset_idrac verify step failures which occur when the functionality is not supported by the iDRAC. When this condition is detected, the code in the step handles the exception and logs a warning and completes successfully in case of verification steps but fails in case of cleaning steps.
Other Notes¶
Known issue when using iDRAC with Swift to stage firmware update files in Management interface
firmware_update
clean step ofredfish
oridrac
hardware type has been fixed in iDRAC firmware 6.00.00.00. Upgrade when possible or use HTTP service to stage firmware files for iDRAC.
20.1.0¶
Prelude¶
The Ironic community is pleased to announce the release of Ironic 20.1.
During the Yoga cycle, we had forty-three contributors. They are responsible for more than 35,000 lines of code and more than twenty new features that will improve the experience of our end-users! Please reach out to our community if you have any questions or feedback!
New Features¶
For
redfish
andidrac-redfish
management interfacefirmware_update
clean step adds Swift, HTTP service and file system support to serve and Ironic’s HTTP and Swift service to stage files. Also adds mandatory parameterchecksum
for file checksum verification.
Adds support for
idrac-wsman
RAID, BIOS and management clean steps to be run without IPA when disabling ramdisk during cleaning.
Supports listening on a Unix socket instead of a normal TCP socket. This is useful with an HTTP server such as nginx in proxy mode.
Known Issues¶
When using iDRAC with Swift to stage firmware update files in Management interface
firmware_update
clean step ofredfish
oridrac
hardware type, the cleaning fails with error “An internal error occurred. Unable to complete the specified operation.” in iDRAC job. Until this is fixed, use HTTP service to stage firmware files for iDRAC.
Upgrade Notes¶
For
redfish
andidrac-redfish
management interfacefirmware_update
clean step there is now mandatorychecksum
parameter necessary. Update existing clean steps to include it, otherwise clean step will fail with error “‘checksum’ is a required property”.
Deprecation Notes¶
Booting final instances via network (as opposed to via a local bootloader) is now deprecated, except for the cases of booting from volume or the ramdisk deploy interface.
Network boot for whole disk images only works reliable for legacy (BIOS) boot. In case of partition images, there is no way to update the kernel, which makes this approach insecure.
Users of partition images must ensure that they either contain the
grub-install
binary, enough EFI artifacts to boot the operating system or a legacy boot partition.
Bug Fixes¶
The anaconda deploy interface was treating the config drive as a dict, whereas it could be a dict or in iso6600 format, gzipped and base64-encoded. This has been fixed.
The anaconda deploy interface was adding commands that deal with the config drive, to the end of the kickstart config file. Which means that they are handled after an ironic API request is sent (to the conductor) to indicate that the node has been provisioned and is ready to be rebooted. Which means that there is a possible race condition wrt these commands being completed before the node is powered off. A sync is added to ensure that all modifications have been written to disk, before the API request is sent – as the last thing.
Extra newlines (’n’) were incorrectly added to the user data content. This broke the content-type decoding and cloud-init was unable to proces them. The extra newlines have been removed.
Fixes the logic for the anaconda deploy interface. If the ironic node’s instance_info doesn’t have both ‘stage2’ and ‘ks_template’ specified, we weren’t using the instance_info at all. This has been fixed to use the instance_info if it was specified. Otherwise, ‘stage2’ is taken from the image’s properties (assumed that it is set there). ‘ks_template’ value is from the image properties if specified there (since it is optional); else we use the config setting ‘[anaconda] default_ks_template’.
For the anaconda deploy interface, the ‘stage2’ directory was incorrectly being created using the full path of the stage2 file; this has been fixed.
The anaconda deploy interface expects the node’s instance_info to be populated with the ‘image_url’; this is now populated (via PXEAnacondaDeploy’s prepare() method).
For the anaconda deploy interface, when the deploy was finished and the bm node was being rebooted, the node’s provision state was incorrectly being set to ‘active’ – the provisioning state-machine mechanism now handles that.
For the anaconda deploy interface, the code that was doing the validation of the kickstart file was incorrect and resulted in errors; this has been addressed.
For the anaconda deploy interface, the ‘%traceback’ section in the packaged ‘ks.cfg.template’ file is deprecated and fails validation, so it has been removed.
The anaconda deploy interface was saving internal information in the node’s
instance_info
, in the user-facingstage2
andks_template
fields. This broke rebuilds using a different image with differentstage2
or template specified in the image properties. This has been fixed by saving the information in the node’sdriver_internal_info
instead.
Fixes the
redfish
hardware type RAID device creation and deletion when creating or deleting more than 1 logical disk on RAID controllers that require rebooting and do not allow more than 1 running task per RAID controller. Before this fix 2nd logical disk would fail to be created or deleted. With this change it is now possible to useredfish
raid
interface on iDRAC systems.