Xena Series (18.0.0 - 18.2.x) Release Notes

18.3.0-17

Upgrade Notes

  • When upgrading Ironic to address the qemu-img image conversion security issues, the ironic-python-agent ramdisks will also need to be upgraded.

  • As a result of security fixes to address qemu-img image conversion security issues, a new configuration parameter has been added to Ironic, [conductor]permitted_image_formats with a default value of “raw,qcow2,iso”. Raw and qcow2 format disk images are the image formats the Ironic community has consistently stated as what is supported and expected for use with Ironic. These formats also match the formats which the Ironic community tests. Operators who leverage other disk image formats, may need to modify this setting further.

Security Issues

  • Ironic now checks the supplied image format value against the detected format of the image file, and will prevent deployments should the values mismatch. If being used with Glance and a mismatch in metadata is identified, it will require images to be re-uploaded with a new image ID to represent corrected metadata. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic always inspects the supplied user image content for safety prior to deployment of a node should the image pass through the conductor, even if the image is supplied in raw format. This is utilized to identify the format of the image and the overall safety of the image, such that source images with unknown or unsafe feature usage are explicitly rejected. This can be disabled by setting [conductor]disable_deep_image_inspection to True. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic also inspect images which would normally be provided as a URL for direct download by the ironic-python-agent ramdisk. This is enabled by default and increases the overall network traffic and disk space utilization of the conductor. This level of inspection can be disabled by setting [conductor]conductor_always_validates_images to False. Doing so is not advisable as Zed release and earlier ironic-python-agent ramdisks will not be made available due to backport regression risk. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic now explicitly enforces a list of permitted image types for deployment via the [conductor]permitted_image_formats setting, which defaults to “raw”, “qcow2”, and “iso”. While the project has classically always declared permissible images as “qcow2” and “raw”, it was previously possible to supply other image formats known to qemu-img, and the utility would attempt to convert the images. The “iso” support is required for “boot from ISO” ramdisk support.

  • Ironic now explicitly passes the source input format to executions of qemu-img to limit the permitted qemu disk image drivers which may evaluate an image to prevent any mismatched format attacks against qemu-img.

  • The ansible deploy interface example playbooks now supply an input format to execution of qemu-img. If you are using customized playbooks, please add “-f {{ ironic.image.disk_format }}” to your invocations of qemu-img. If you do not do so, qemu-img will automatically try and guess which can lead to known security issues with the incorrect source format driver.

  • Operators who have implemented any custom deployment drivers or additional functionality like machine snapshot, should review their downstream code to ensure they are properly invoking qemu-img. If there are any questions or concerns, please reach out to the Ironic project developers.

  • Operators are reminded that they should utilize cleaning in their environments. Disabling any security features such as cleaning or image inspection are at your own risk. Should you have any issues with security related features, please don’t hesitate to open a bug with the project.

  • The [conductor]disable_deep_image_inspection setting is conveyed to the ironic-python-agent ramdisks automatically, and will prevent those operating ramdisks from performing deep inspection of images before they are written.

  • The [conductor]permitted_image_formats setting is conveyed to the ironic-python-agent ramdisks automatically. Should a need arise to explicitly permit an additional format, that should take place in the Ironic service configuration.

  • An issue in Ironic has been resolved where image checksums would not be checked prior to the conversion of an image to a raw format image from another image format.

    With default settings, this normally would not take place, however the image_download_source option, which is available to be set at a node level for a single deployment, by default for that baremetal node in all cases, or via the [agent]image_download_source configuration option when set to local. By default, this setting is http.

    This was in concert with the [DEFAULT]force_raw_images when set to True, which caused Ironic to download and convert the file.

    In a fully integrated context of Ironic’s use in a larger OpenStack deployment, where images are coming from the Glance image service, the previous pattern was not problematic. The overall issue was introduced as a result of the capability to supply, cache, and convert a disk image provided as a URL by an authenticated user.

    Ironic will now validate the user supplied checksum prior to image conversion on the conductor. This can be disabled using the [conductor]disable_file_checksum configuration option.

Bug Fixes

  • Fixes multiple issues in the handling of images as it relates to the execution of the qemu-img utility, which is used for image format conversion, where a malicious user could craft a disk image to potentially extract information from an ironic-conductor process’s operating environment.

    Ironic now explicitly enforces a list of approved image formats as a [conductor]permitted_image_formats list, which mirrors the image formats the Ironic project has historically tested and expressed as known working. Testing is not based upon file extension, but upon content fingerprinting of the disk image files. This is tracked as CVE-2024-44082 via bug 2071740.

  • Fixes a security issue where Ironic would fail to checksum disk image files it downloads when Ironic had been requested to download and convert the image to a raw image format. This required the image_download_source to be explicitly set to local, which is not the default.

    This fix can be disabled by setting [conductor]disable_file_checksum to True, however this option will be removed in new major Ironic releases.

    As a result of this, parity has been introduced to align Ironic to Ironic-Python-Agent’s support for checksums used by standalone users of Ironic. This includes support for remote checksum files to be supplied by URL, in order to prevent breaking existing users which may have inadvertently been leveraging the prior code path. This support can be disabled by setting [conductor]disable_support_for_checksum_files to True.

  • Fixes Ironic integration with Cinder because of changes which resulted as part of the recent Security related fix in bug 2004555. The work in Ironic to track this fix was logged in bug 2019892. Ironic now sends a service token to Cinder, which allows for access restrictions added as part of the original CVE-2023-2088 fix to be appropriately bypassed. Ironic was not vulnerable, but the restrictions added as a result did impact Ironic’s usage. This is because Ironic volume attachments are not on a shared “compute node”, but instead mapped to the physical machines and Ironic handles the attachment life-cycle after initial attachment.

  • Fixes bug of iRMC driver in parse_driver_info where, if FIPS is enabled, SNMP version is always required to be version 3 even though iRMC driver’s xxx_interface doesn’t use SNMP actually.

  • Fixes an issue in the online upgrade logic where database models for Node Traits and BIOS Settings resulted in an error when performing the online data migration. This was because these tables were originally created as extensions of the Nodes database table, and the schema of the database was slightly different enough to result in an error if there was data to migrate in these tables upon upgrade, which would have occured if an early BIOS Setting adopter had data in the database prior to upgrading to the Yoga release of Ironic.

    The online upgrade parameter now subsitutes an alternate primary key name name when applicable.

  • Fixes an issue where a System Scoped user could not trigger a node into a manageable state with cleaning enabled, as the Neutron client would attempt to utilize their user’s token to create the Neutron port for the cleaning operation, as designed. This is because with requests made in the system scope, there is no associated project and the request fails.

    Ironic now checks if the request has been made with a system scope, and if so it utilizes the internal credential configuration to communicate with Neutron.

  • Modify iRMC driver to use ironic.conf [deploy] default_boot_mode to determine default boot_mode.

  • Fixes issues with Lenovo hardware where the system firmware may display a blue “Boot Option Restoration” screen after the agent writes an image to the host in UEFI boot mode, requiring manual intervention before the deployed node boots. This issue is rooted in multiple changes being made to the underlying NVRAM configuration of the node. Lenovo engineers have suggested to only change the UEFI NVRAM and not perform any further changes via the BMC to configure the next boot. Ironic now does such on Lenovo hardware. More information and background on this issue can be discovered in bug 2053064.

18.3.0

Upgrade Notes

  • Adds sha256, sha384 and sha512 as supported SNMPv3 authentication protocols to iRMC driver.

Bug Fixes

  • Fixes an issue where if selinux is enabled and enforcing, and the published image is a hardlink, the source selinux context is preserved, causing access denied when retrieving the image using hardlink URL.

  • Fixes SNMPv3 message authentication and encryption functionality of iRMC driver. The SNMPv3 authentication between iRMC driver and iRMC was only by the security name with no passwords and encryption. To increase security, the following parameters are now added to the node’s driver_info, and can be used for authentication:

    • irmc_snmp_user

    • irmc_snmp_auth_password

    • irmc_snmp_priv_password

    • irmc_snmp_auth_proto (Optional, defaults to sha)

    • irmc_snmp_priv_proto (Optional, defaults to aes)

    irmc_snmp_user replaces irmc_snmp_security. irmc_snmp_security will be ignored if irmc_snmp_user is set. irmc_snmp_auth_proto and irmc_snmp_priv_proto can also be set through the following options in the [irmc] section of /etc/ironic/ironic.conf:

    • snmp_auth_proto

    • snmp_priv_proto

  • Fixes a race condition in PXE initialization where logic to retry what we suspect as potentially failed PXE boot operations was not consulting if an agent token had been established, which is the very first step in agent initialization.

Other Notes

  • Updates the minimum version of python-scciclient library to 0.11.3.

18.2.2

Known Issues

  • When using jsonschema 4.0.0 or newer, make sure to include a proper $schema field in your custom network data or RAID schemas.

Security Issues

  • Modifies the irmc hardware type to include a capability to control enforcement of HTTPS certificate verification. By default this is enforced. python-scciclient version must be one of >=0.8.2,<0.9.0, >=0.9.4,<0.10.0, >=0.10.1,<0.11.0 or >=0.11.3,<0.12.0 Or certificate verification will not occur.

Bug Fixes

  • Fixes detecting of allowable values for a BIOS settings enumeration in the redfish BIOS interface when only ValueDisplayName is provided.

  • The anaconda deploy interface was treating the config drive as a dict, whereas it could be a dict or in iso6600 format, gzipped and base64-encoded. This has been fixed.

  • The anaconda deploy interface was adding commands that deal with the config drive, to the end of the kickstart config file. Which means that they are handled after an ironic API request is sent (to the conductor) to indicate that the node has been provisioned and is ready to be rebooted. Which means that there is a possible race condition wrt these commands being completed before the node is powered off. A sync is added to ensure that all modifications have been written to disk, before the API request is sent – as the last thing.

  • Extra newlines (’n’) were incorrectly added to the user data content. This broke the content-type decoding and cloud-init was unable to proces them. The extra newlines have been removed.

  • Fixes the logic for the anaconda deploy interface. If the ironic node’s instance_info doesn’t have both ‘stage2’ and ‘ks_template’ specified, we weren’t using the instance_info at all. This has been fixed to use the instance_info if it was specified. Otherwise, ‘stage2’ is taken from the image’s properties (assumed that it is set there). ‘ks_template’ value is from the image properties if specified there (since it is optional); else we use the config setting ‘[anaconda] default_ks_template’.

  • For the anaconda deploy interface, the ‘stage2’ directory was incorrectly being created using the full path of the stage2 file; this has been fixed.

  • The anaconda deploy interface expects the node’s instance_info to be populated with the ‘image_url’; this is now populated (via PXEAnacondaDeploy’s prepare() method).

  • For the anaconda deploy interface, when the deploy was finished and the bm node was being rebooted, the node’s provision state was incorrectly being set to ‘active’ – the provisioning state-machine mechanism now handles that.

  • For the anaconda deploy interface, the code that was doing the validation of the kickstart file was incorrect and resulted in errors; this has been addressed.

  • For the anaconda deploy interface, the ‘%traceback’ section in the packaged ‘ks.cfg.template’ file is deprecated and fails validation, so it has been removed.

  • The anaconda deploy interface was saving internal information in the node’s instance_info, in the user-facing ‘stage2’ and ‘ks_template’ fields. This broke rebuilds using a different image with different stage2 or template specified in the image properties. This has been fixed by saving the information in the node’s driver_internal_info instead.

  • Fixes pagination for the following collections:

    /v1/allocations
    /v1/chassis
    /v1/conductors
    /v1/deploy_templates
    /v1/nodes/{node}/history
    

    The next link now contains a valid URL.

  • Fixes rebooting into the agent after changing BIOS settings in fast-track mode with the redfish-virtual-media boot interface. Previously, the ISO would not be configured.

  • Fixes redfish and idrac-redfish RAID create_configuration, apply_configuration, delete_configuration clean and deploy steps to update node’s raid_config field at the end of the steps.

  • Fixes the determination of a failed RAID configuration task in the redfish hardware type. Prior to this fix the tasks that have failed were reported as successful.

  • Fixes the redfish hardware type RAID device creation and deletion when creating or deleting more than 1 logical disk on RAID controllers that require rebooting and do not allow more than 1 running task per RAID controller. Before this fix 2nd logical disk would fail to be created or deleted. With this change it is now possible to use redfish raid interface on iDRAC systems.

  • Fixes redfish-virtual-media boot interface to allow it with iDRAC firmware from 6.00.00.00 (released June 2022) as it has virtual media boot issue fixed that prevented iDRAC firmware to work with redfish-virtual-media before. Consider upgrading iDRAC firmware if not done already, otherwise will still get an error when trying to use redfish-virtual-media with iDRAC.

  • Fixes the initrd kernel parameter when booting ramdisk directly from Swift/RadosGW using iPXE. Previously it was always deploy_ramdisk, even when the actual file name is different.

  • Adds driver_info/irmc_verify_ca option to specify certification file. Default value of driver_info/irmc_verify_ca is True.

  • Fixes compatibility with jsonschema package version 4.0.0 or newer by providing a proper schema version (Draft-07 currently).

  • The image cache now respects the Cache-Control: no-store header for HTTP(s) images.

  • File images are no longer cached in the image cache to avoid unnecessary consumption of the disk space.

18.2.1

Bug Fixes

  • No longer validates boot interface parameters when adopting a node that uses local boot.

  • Fixes installation and unit testing of ironic when using the sushy library by setting an appropriate upper constraint. This version of Ironic is not compatible with Sushy 4.0.0.

  • Fixes a bug in the anaconda deploy interface where the ‘ks_options’ key was not found when rendering the default kickstart template.

  • Fixes issue where PXEAnacondaDeploy interface’s deploy() method did not return states.DEPLOYWAIT so the instance went straight to ‘active’ instead of ‘wait call-back’.

  • Fixes an issue where the anaconda deploy interface mistakenly expected ‘squashfs_id’ instead of ‘stage2_id’ property on the image.

  • Fixes the heartbeat mechanism in the default kickstart template ks.cfg.template as the heartbeat API only accepts ‘POST’ and expects a mandatory ‘callback_url’ parameter.

  • Fixes handling of tarball images in anaconda deploy interface. Allows user specified file extensions to be appended to the disk image symlink. Users can now set the file extensions by setting the ‘disk_file_extension’ property on the OS image. This enables users to deploy tarballs with anaconda deploy interface.

  • Fixes issue where automated cleaning was not supported when anaconda deploy interface is used.

  • Fixed an issue where duplicate extra DHCP options was passed in the port update request to the Networking service. The duplicate DHCP options caused an error in the Networking service and node provisioning would fail. See bug: 2009774.

  • Fixes idrac-wsman management interface set_boot_device method that would fail deployment when there are existing jobs present with error “Failed to change power state to ‘’power on’’ by ‘’rebooting’’. Error: DRAC operation failed. Reason: Unfinished config jobs found: <list of existing jobs>. Make sure they are completed before retrying.”. Now there can be non-BIOS jobs present during deployment. This will still fail for cases when there are BIOS jobs present. In such cases should consider moving to idrac-redfish that does not have this limitation when setting boot device.

  • Fixed an issue where provisioning/cleaning would fail on IPv6 routed provider networks. See bug: 2009773.

  • Fixes validation of input argument firmware_images of redfish hardware type clean step update_firmware. Now it validates the argument at the beginning of clean step. Prior to this fix issues were determined at the time of executing firmware update or not at all (for example, mistyping optional field ‘wait’).

  • Fixes redfish hardware type update_firmware cleaning step to work with Sushy version 4.0.0 or greater.

  • Fixes an issue where clients would get a 404 due to the node pagination breaking at max_limit due to an uninitialised resource_url.

  • Fixes an issue where clients would get a 404 due to the port and portgroups pagination breaking at max_limit due to an uninitialised resource_url.

  • Fixes File name too long in the image caching code when a URL contains a long query string.

  • Inspection no longer fails when one of the NICs reports NIC address that is not a valid MAC (e.g. a WWN).

  • Fixed the bug of repeated resume cleaning due to the value of fgi_status not being updated correctly when obtaining the RAID configuration status of the node managed by the irmc hardware type.

  • When configuring RAID on iRMC machines through ironic, polling is not set when RAID is created. After creating the RAID, set up polling will notify ironic to wait for the RAID configuration to complete before proceeding to the next step instead of check IPA.

  • Fixes connection caching issues with Redfish BMCs where AccessErrors were previously not disqualifying the cached connection from being re-used. Ironic will now explicitly open a new connection instead of using the previous connection in the cache. Under normal circumstances, the sushy redfish library would detect and refresh sessions, however a prior case exists where it may not detect a failure and contain cached session credential data which is ultimately invalid, blocking future access to the BMC via Redfish until the cache entry expired or the ironic-conductor service was restarted. For more information please see story 2009719.

  • Removing ?filename=file.iso suffix from the virtual media image URL when the image is a regular file due to incompatibility with SuperMicro X12 machines which do not accept special characters such as = or ? in the URL. Historically, this suffix was being added to improve compatibility with those BMCs which require .iso suffix in the URL while using swift as the image store. Old behaviour will remain for swift backed images.

18.2.0

Prelude

The Ironic team hearby announces the release of Ironic 18.2.

During the Xena development cycle, thirty eight contributors collaborated together, and with our adjacent communities to support the needs of our end users in all the many forms they take. Over 48,000 lines of code were modified, and twenty two new features made it into Ironic along with a number of bug fixes. We sincerely hope you enjoy!

New Features

  • Adds support for fields selector in driver api. See story 1674775.

    • GET /v1/drivers?fields=...

    • GET /v1/drivers/{driver_name}?fields=...

  • Adds API version 1.78 which provides the capability to retrieve node history events which may have been recorded in the process of management of the node, which may be aid in troubleshooting or identifying a problem area with a specific node or configuration which has been supplied.

  • Adds a capability to allow bootloaders to be copied into the configured network boot path. This capability can be opted in by using the [pxe]loader_file_paths by being set to a list of key, value pairs of destination filename, and source file path.

    [pxe]
    loader_file_paths = bootx64.efi:/path/to/shimx64.efi,grubx64.efi:/path/to/grubx64.efi
    
  • Manual clean step clear_ca_certificates is added to remove the CA certificates from iLO.

  • Adds endpoints to change boot mode and secure boot state of node.

    • PUT /v1/nodes/{node_ident}/states/boot_mode

    • PUT /v1/nodes/{node_ident}/states/secure_boot

    The API will respond with 202 (Accepted) on validating the request and accepting to process it. Changes occur asynchronously in a background task. The user can then poll the states endpoint /v1/nodes/{node_ident}/states for observing current status of the requested change.

  • Allows limiting the number of parallel downloads for cached images (instance and TFTP images currently).

  • Adds support to specify HttpHeaders when creating a subscription via redfish vendor passthru.

Upgrade Notes

  • The parallel_image_downloads option is now set to True by default. Use the new image_download_concurrency option to tune the behavior, the default concurrency is 20.

  • In-band cleaning has been fixed for ramdisk and anaconda deploy interfaces. If you rely on actual clean steps not running, you need to disable cleaning instead for the relevant nodes:

    baremetal node set <node> --no-automated-clean
    

Deprecation Notes

  • Ironic previously announced the default for the [deploy]default_boot_mode would be changing “in a future release”. This was announced during the Stein development cycle. Ironic will change This default to uefi during the Yoga development cycle.

  • The parallel_image_downloads option is deprecated in favour of the new image_download_concurrency option that allows more precise tuning.

Bug Fixes

  • Fixes a regression in the ramdisk deploy where custom kernel parameters were not used during inspection and cleaning.

  • Resolve issue where [conductor]clean_step_priority_override values are applied too late, after disabled steps have been already filtered out. With this change, priority overrides are applied prior to filtering out disabled steps, so that this configuration option can use used to enable or disable steps (in particular clean steps) in addition to changing priorities they are run with.

  • The validation for create_subscription now uses the default values from Redfish for Context and Protocol to avoid None. The fields returned by create_subscription and get_subscription are now filtered by the common fields between vendors. Deleting a subscription that doesn’t exist will return 404 instead of 500.

  • Fixes an issue in db schema version testing where objects with a initial version, e.g. “1.0”, are allowed to not already have their DB tables pre-exist when performing the pre-upgrade compatability check for the database. This allows the upgrade to proceed and update the database schema without an explicit known list having to be maintained in Ironic.

  • Handles excessively long errors when the status upgrade check is executed, and simply indicates now if a table is missing, suggesting to update the database schema before proceeding.

  • Fixes issue in idrac-redfish clean/deploy step import_configuration where partially successful jobs were treated as fully successful. Such jobs, completed with errors, are now treated as failures.

  • Fix idrac-redfish clean/deploy step import_configuration to handle completed import configuration tasks that are deleted by iDRAC before Ironic has checked task’s status. Prior iDRAC firmware version 5.00.00.00 completed tasks are deleted after 1 minute in iDRAC Redfish. That is not always sufficient to check for their status in periodic check that runs every minute by default. Before this fix node got stuck in wait mode forever. This is fixed by failing the step with error informing to decrease periodic check interval or upgrade iDRAC firmware if not done already.

  • Fixes idrac-redfish RAID interface delete_configuration clean/deploy step for controllers having foreign physical disks. Now foreign configuration is cleared after deleting virtual disks.

  • Fixes idrac-redfish RAID interface in create_configuration clean step and apply_configuration deploy step when there are drives in non-RAID mode. With this fix, non-RAID drives are converted to RAID mode before creating virtual disks.

  • Fixes idrac-wsman BIOS and RAID interface steps to correctly check status of iDRAC job that completed with errors. Now these jobs are treated as failures. Before this fix node stayed in wait state as it was only checking for “Completed” or “Failed” job status, but not “Completed with Errors”.

  • Fixes idrac-wsman power interface to wait for the hardware to reach the target state before returning. For systems where soft power off at the end of deployment to boot to instance failed and forced hard power off was used, this left node successfully deployed in off state without any errors. This broke other workflows expecting node to be on booted into OS at the end of deployment. Additional information can be found in story 2009204.

  • When an http(s):// image is used, the cached copy of the image will always be updated if the HTTP server does not provide the last modification date and time. Previously the cached image would be considered up-to-date, which could cause invalid behavior if the image is generated on fly or was modified while being served.

  • Fixes the pattern of execution for periodic tasks such that the majority of drivers now evaluate if work needs to be performed in advance of creating a node task. Depending on the individual driver query pattern, this prevents excess database queries from being triggered with every task execution.

  • Fixes in-band cleaning for the ramdisk and anaconda deploy interfaces. Previously no in-band steps were fetched from the ramdisk.

  • Retries ssl.SSLError when connecting to the agent.

Other Notes

  • Removes a NEW_MODELS internal list from the dbsync utility which helped the tool navigate new models, however it was never used. Instead the tool now utilizes the database version and appropriate base version to make the appropriate decision in pre-upgrade checks.

  • The cleaning code has been moved from AgentDeployMixin to AgentBaseMixin. Most of 3rd party deploy interfaces will need to include both anyway.