2023.2 Series Release Notes

17.5.0-21

New Features

Upgrade Notes

  • Support for failing execution early if fact collection fails on any of the hosts by setting kolla_ansible_setup_any_errors_fatal to true has been removed. This is due to Ansible’s any_errors_fatal parameter not being templated, resulting in the value always being interpreted as true, even though the default value of kolla_ansible_setup_any_errors_fatal is false.

    Equivalent behaviour is possible by setting the maximum failure percentage to 0. This may be done specifically for fact gathering using gather_facts_max_fail_percentage or globally using kolla_max_fail_percentage.

Bug Fixes

  • Fixes an issue with ironic dnsmasq failing to start in deployments using podman because it requires the NET_RAW capability. See LP#2055282.

  • Fixes keystone service configuration for haproxy when using federation. LP#2058656

  • Fixes the MariaDB recovery issue when kolla-ansible is running from a docker container. LP#2073370

  • Fixes an issue during fact gathering when using the --limit argument where a host that fails to gather facts could cause another host to fail during delegated fact gathering.

  • Add skip_kpartx yes to multipath.conf defaults section to prevent kpartx scanning multipath devices and unlock multipathd del map operation of os-brick for volume detaching oprtaions. LP#2078973 <https://launchpad.net/bugs/2078973>`__

  • Fixes 2067036. Added octavia_interface_wait_timeout to control octavia-interface.service timeout to be able wait openvswitch agent sync has been finished and octavia-lb-net is reachable from the host. Also set restart policy for this unit to on-failure LP#2067036

  • Fixes Octavia service upgrade issue where it can fail when Octavia persistence database user is missing. LP#2065591

  • Fixes unreliable health checks for neutron_ovn_agent and neutron_ovn_metadata_agent bug. Changed to check OVS DB connection instead of OVN southbound DB connection. LP#2084128

  • Fixes an issue, when using podman, with named volumes that use a mode specifier. See LP#2054834 for more details.

  • Fixes parsing of JSON output of inner modules called by kolla-toolbox when data was returned on standard error. LP#2080544

17.5.0

New Features

  • Modifies public API firewalld rules to be applied immediately to a running firewalld service. This requires firewalld to be running, but avoids reloading firewalld, which is disruptive due to the way in which firewalld builds its firewall chains.

Bug Fixes

  • Fixes an deploy opensearch with enable TLS on the internal VIP.

  • Fixes handling of openvswitch on manila-share nodes. LP#1993285

  • Fixes the Python requests library issue when using custom CA by adding the REQUESTS_CA environment variable to the kolla-toolbox container. See LP#1967132

  • Fixes configuration of CloudKitty when internal TLS is enabled. LP#1998831

  • Fixes the detection of the Nova Compute Ironic service when a custom host option is set in the service config file. See LP#2056571

  • Removes the default /tmp/ mountpoint from the horizon container. This change is made to harden the container and prevent potential security issues. For more information, see the Bug Report: LP#2068126.

  • Fixes an issue where OVN northbound or southbound database deployment could fail when a new leader is elected. LP#2059124

17.4.0

Upgrade Notes

  • MariaDB backup now uses the same image as the running MariaDB server. The following variables relating to MariaDB backups are no longer used and have been removed:

    • mariabackup_image

    • mariabackup_tag

    • mariabackup_image_full

Deprecation Notes

  • Support for deploying Masakari is no longer deprecated. The Masakari CI scenarios are now working again, and commitment has been made to improve the health of the project.

Bug Fixes

  • Add conditionals for IPv6 sysctl settings that have IPV6 disabled in kernel. Changing sysctl settings related to IPv6 on those systems lead to errors. LP#1906306

  • Fixes nova-cell not updating the cell0 database address when VIP changes. LP#1915302

  • Fixes trove module imports. Path to the modules needed by trove-api changed in source trove package so the configuration was updated. LP#1937120

  • Incorrect condition in Podman part prevented the retrieval of facts of all the containers when no names were provided. LP#2058492

  • Modifies the MariaDB procedure to use the same container image as the running MariaDB server container. This should prevent compatibility issues that may cause the backup to fail.

  • Fixes a bug in kolla_podman_worker, where missing commas in list of strings create implicit concatenation of items that should be separate. LP#2067278

  • Fixed ‘cinder-backup’ service when Swift with TLS enabled. LP#2051986

  • Fixes the dimensions comparison when we set values like 1g in the container dimensions configuration, making the docker container getting restarted even with no changes, as we are comparing 1g with 1073741824, which is displayed in the docker inspect while 1g is in the configuration.

  • Fixes keystone port in skyline-console pointing to wrong endpoint port. LP#2069855

  • Fixes 2065168. Fix kolla systemd unit template to prevent restart all kolla services with docker.service restart. LP#[2065168]

  • Fixes a bug in kolla-ansible where the keystone service role was not being created during an upgrade. This was due to the service-ks-register role not being imported in the upgrade.yml file. The service-ks-register role is now imported in the upgrade.yml file. See bug: https://bugs.launchpad.net/kolla-ansible/+bug/2056761

  • Fixed an issue where the MariaDB Cluster recovery process would fail if the sequence number was not found in the logs. The recovery process now checks the complete log file for the sequence number and recovers the cluster. See LP#1821173 for details.

  • Updates the default Grafana OpenSearch datasource configuration to use values for OpenSearch that work out of the box. Replaces the Elasticsearch values that were previously being used. The new configuration can be applied by deleting your datasource and reconfiguring Grafana through kolla ansible. In order to prevent dashboards from breaking when the datasource is deleted, one should use datasource variables in Grafana. See bug 2039500.

  • All stable RabbitMQ feature flags are now enabled during deployments, reconfigures, and upgrades. As such, the variable rabbitmq_feature_flags is no longer required. This is a partial fix to RabbitMQ SLURP support. LP#2049512

  • Fixes an issue where the Keystone admin endpoint would be recreated when upgrading Keystone. The endpoint is now explicitly removed during the upgrade process.

  • Fixes skyline’s old format of stop task. It used docker_container which would cause problems with podman deployments.

17.3.0

Upgrade Notes

  • If credentials are updated in passwords.yml kolla-ansible is now able to update these credentials in the keystone database and in the on disk config files.

    The changes to passwords.yml are applied once kolla-ansible -i INVENTORY reconfigure has been run.

    If you want to revert to the old behavior - credentials not automatically updating during reconfigure if they changed in passwords.yml - you can specify this by setting update_keystone_service_user_passwords: false in your globals.yml.

    Notice that passwords are only changed if you change them in passwords.yml. This mechanism is not a complete solution for automatic credential rollover. No passwords are changed if you do not change them inside passwords.yml.

Bug Fixes

  • Fixes configuration of nova-compute and nova-compute-ironic, that will enable exposing vendordata over configdrive. LP#2049607

  • Fixes mariadb role deployment when using Ansible check mode. LP#2052501

  • Updated configuration of service user tokens for all Nova and Cinder services to stop using admin role for service_token and use service role.

    See LP#[2004555] and LP#[2049762] for more details.

  • Changes to service user passwords in passwords.yml will now be applied when reconfiguring services.

    This behaviour can reverted by setting update_keystone_service_user_passwords: false.

    Fixes LP#2045990

17.2.0

Bug Fixes

  • Fixes enabled usage audit notifications when they are not needed. See LP##2049503.

  • Fixes an idempotency issue in the OpenSearch upgrade tasks where subsequent runs of kolla-ansible upgrade would leave shard allocation disabled. LP#2049512

17.1.0

New Features

  • Set a log retention policy for OpenSearch via Index State Management (ISM). Documentation.

Upgrade Notes

  • Added log retention in OpenSearch, previously handled by Elasticsearch Curator. By default the soft and hard retention periods are 30 and 60 days respectively. If you are upgrading from Elasticsearch, and have previously configured elasticsearch_curator_soft_retention_period_days or elasticsearch_curator_hard_retention_period_days, those variables will be used instead of the defaults. You should migrate your configuration to use the new variable names before the Caracal release.

Bug Fixes

  • Fixes non-persistent Neutron agent state data. LP2009884

  • Fixes long service restarts while using systemd LP#2048130.

  • Fixes an issue with high CPU usage of the cAdvisor container by setting the per-container housekeeping interval to the same value as the Prometheus scrape interval. LP#2048223

  • Fixes Nova operations using the scp command, such as cold migration or resize, on Debian Bookworm. LP#2048700

  • Fixes Docker health check for the sahara_engine container. LP#2046268

  • Added log retention in OpenSearch, previously handled by Elasticsearch Curator, now using Index State Management (ISM) OpenSearch bundled plugin. LP#2047037.

17.0.0

New Features

  • Adds http/2 support to HAProxy frontends.

  • Add Lets Encrypt TLS certificate service integration into Openstack deployment. Enables trusted TLS certificate generation option for secure communcation with OpenStack HAProxy instances using letsencrypt_email, kolla_internal_fqdn and/or kolla_external_fqdn is required. One container runs an Apache ACME client webserver and one runs Lego for certificate retrieval and renewal. The Lego container starts a cron job which attempts to renew certificates every 12 hours.

  • Added capability to specify custom kernel modules for Neutron: neutron_modules_default: Lists default modules. neutron_modules_extra: For custom modules and parameters.

  • A custom event_pipeline.yaml file for the Ceilometer notification service is now processed with merge_yaml. This allows Jinja2 to be used. Furthermore, it is possible to have a global event_pipeline.yaml and host-specific event_pipeline.yaml files.

  • Supports Debian Bookworm (12) as host distribution.

  • Adds support for exposing Prometheus server on the external interface. This is disabled by default and can be enabled by setting enable_prometheus_server_external to true. Basic auth is used to protect the endpoint.

  • Adds prometheus_external_fqdn and prometheus_internal_fqdn to customise prometheus FQDNs.

  • Implements support for Podman deployment as an alternative to Docker. To perform deployment using Podman, set the variable kolla_container_engine to value podman inside of the globals.yml file.

  • Adds the ability for the instance label on prometheus metrics to be replaced with the inventory hostname as opposed to using the ip address as the metric label. The ip address is still used as the target address which means that there is no issue of the hostname being unresolvable.

    More information on how to use this feature can be found in the reference documentation for logging and monitoring.

  • HAProxy supports setting nbthread via variable haproxy_threads. Threads are recommended instead of processes since HAProxy 1.8. They cannot be used both at the same time.

  • Adds single service external frontend feature to haproxy. Details are in the haproxy guide section of the documentation.

  • HORIZON_IMAGES_UPLOAD_MODE is now set to 'direct' by default. This improves image uploads from clients, because these no longer use Horizon’s webserver as a staging area - the image upload goes directly to Glance API.

  • Updates apache grok pattern to match the size of response in bytes, time taken to serve the request and user agent.

  • With the parameter ironic_agent_files_directory it is possible to provide the directory for the ironic-agent.kernel and ironic-agent.initramfs files. By default the parameter is set to the value of node_custom_config. This corresponds to the existing behaviour.

  • Adds keystone_federation_oidc_additional_options that allows to pass additional OIDC options.

  • Adds support for copying in {{ node_custom_config }}/magnum/kubeconfig to Magnum containers for magnum-cluster-api driver.

  • You can now enable the usage of quorum queues in RabbitMQ for all services by setting the variable om_enable_rabbitmq_quorum_queues to true. Notice that you can’t use quorum queues and high availability at the same time. This is caught by a precheck. This feature is enabled by default to improve reliability of the messaging queues.

  • The etcd tooling has been updated to handle adding and removing nodes. Previously this was an undocumented manual process and required creating service containers. Operators can refer to the etcd admin guide for more details.

  • Added a neutron check for ML2/OVS and ML2/OVN presence at the start of deploy phase. It will fail if neutron_plugin_agent is set to ovn and use of ML2/OVS container detected. In case where neutron_plugin_agent is set to openvswitch the check will fail when it detects ML2/OVN container or any of the OVN specific volumes.

  • Glance, cinder, manila services now support configuration of multiple ceph cluster backends. For nova and gnocchi there is the possibility to configure different ceph clusters - for gnocchi this is possible at the service level while for nova at the host level. See the external ceph guide docs. on how to set multiple ceph backends for more details.

  • The flag --check-expiry has been added to the octavia-certificates command. kolla-ansible octavia-certificates --check-expiry <days> will check if the Octavia certificates are set to expire within a given number of days.

  • The Octavia amphora provider driver improves control plane resiliency. Should a control plane host go down during a load balancer provisioning operation, an alternate controller can resume the in-process provisioning and complete the request. This solves the issue with resources stuck in PENDING_* states by writing info about task states in persistent backend and monitoring job claims via jobboard. The jobboard feature is now enabled by default. It requires the Redis service to be enabled as a dependency. Use the enable_octavia_jobboard variable to override if needed.

  • With the parameter rabbitmq_datadir_volume it is possible to use a directory as volume for the rabbitmq service. By default, a volume named rabbitmq is used (the previous default).

  • Adds new restart_policy called oneshot that does not create systemd units and is used for bootstrap tasks.

  • Added support for Cinder-Backup with S3 backend.

  • Added support for Glance with S3 backend

  • In the configuration template of the Senlin service the cafile parameter is now set by default in the authentication section. This way the use of self-signed certificates on the internal Keystone endpoint is also usable in the Senlin service.

  • Adds support for ansible-core only installation (use kolla-ansible install-deps to install required collections).

Upgrade Notes

  • Minimum supported Ansible version is now 7 (ansible-core 2.14) and maximum supported is 8 (ansible-core 2.15).

  • Default keystone user role has been changed from deprecated role _member_ to member role.

  • Now ironic_tftp service does not bind on 0.0.0.0, by default it uses ip address of the api_interface. To revert to the old behaviour, please set ironic_tftp_interface_address: 0.0.0.0 in globals.yml.

  • Remnants of Monasca, Storm, Kafka and Zookeeper have been removed (including kolla-ansible monasca_cleanup command).

  • etcd has been upgraded to version 3.4 in this release. Operators are highly encouraged to read the upgrade notes for impacts on etcd clients. Upgrades are only supported from etcd v3.3: Skip version upgrades are not supported. Please ensure that adequate backups are taken before running the upgrade to guard against dataloss.

  • etcd version 3.4 drops support for the v3alpha endpoint. Internal kolla-ansible endpoints have been updated, but operators are strongly encouraged to audit any customizations or external users of etcd.

  • Prometheus now uses basic auth. The password is under the key prometheus_password in the Kolla passwords file. The username is admin. The default set of users can be changed using the variable: prometheus_basic_auth_users.

  • Configure Nova libvirt.num_pcie_ports to 16 by default. Nova currently sets ‘num_pcie_ports’ to “0” (defaults to libvirt’s “1”), which is not sufficient for hotplug use with ‘q35’ machine type.

  • Configuring HAProxy nbproc setting via haproxy_processes and haproxy_process_cpu_map variables has been dropped since threads are the recommended way to scale CPU performance since 1.8. This covers haproxy, glance-tls-proxy and neutron-tls-proxy. Please use haproxy_threads and haproxy_thread_cpu_map instead (or glance_tls_proxy_threads and glance_tls_proxy_thread_cpu_map for Glance TLS proxy and neutron_tls_proxy_threads and neutron_tls_proxy_thread_cpu_map for Neutron TLS proxy).

  • HORIZON_IMAGES_UPLOAD_MODE is now set to 'direct' by default. In order to retain the previous default ('legacy') - please set HORIZON_IMAGES_UPLOAD_MODE: 'legacy' in your custom_local_settings file.

  • Quorum queues in RabbitMQ (controlled by om_enable_rabbitmq_quorum_queues variable) is enabled by default from now on. Support for non-HA RabbitMQ queues is dropped. Either quorum queues that are enabled by default, or classic mirrored queues are required now. Migration procedure from non-HA to HA

  • Removes the restriction on the maximum supported version of 2.14.2 for ansible-core. Any 2.14 series release is now supported.

  • The default value for ceph_cinder_keyring has been changed from: “ceph.client.cinder.keyring” to: “client.{{ ceph_cinder_user }}.keyring”

    the default value for ceph_cinder_backup_keyring has been changed from: “ceph.client.cinder-backup.keyring” to: “client.{{ ceph_cinder_backup_user }}.keyring”

    the default value for ceph_glance_keyring has been changed from: “ceph.client.glance.keyring” to: “client.{{ ceph_glance_user }}.keyring”

    the default value for ceph_manila_keyring has been changed from: “ceph.client.manila.keyring” to: “client.{{ ceph_manila_user }}.keyring”

    and the default value for ceph_gnocchi_keyring has been changed from: “ceph.client.gnocchi.keyring” to: “client.{{ ceph_gnocchi_user }}.keyring”

    User who did override default values for the above variables have to change them according to the new pattern.

  • The Octavia amphora provider by default is now deployed with the jobboard feature enabled. This requires the Redis service to be enabled as a dependency, please update your configuration accordingly if needed. For futher information see Amphorav2 docs

  • Enabled ovn_emit_need_to_frag setting by default. It is useful when external network’s MTU is lower then internal geneve networks. Host kernel needs to be in version >= 5.2 for this option to work. All Kolla supported host operating systems have higher kernel version.

  • Since kolla-ansible now also supports Podman, ansible module kolla_docker has been renamed to kolla_container.

  • restart_policy: no will now create systemd units, but with Restart property set to no.

  • Changes default value of nova libvirt driver setting skip_cpu_compare_on_dest to true. With the libvirt driver, during live migration, skip comparing guest CPU with the destination host. When using QEMU >= 2.9 and libvirt >= 4.4.0, libvirt will do the correct thing with respect to checking CPU compatibility on the destination host during live migration.

  • Zun is currently incompatible with Debian Bookworm. This is because Zun currently has a hard dependency on a deprecated Docker feature. Operators upgrading from Bullseye are strongly encouraged to disable Zun first. While workarounds may be possible, none are currently tested in CI.

  • Support for Zun for this release has been provisionally dropped. This is due to a number of base dependencies that require updating. The Zun images remain buildable, and the roles have not been removed, but a precheck has been added to prevent breaking current deployments.

    Operators are strongly encouraged to hold off upgrading if Zun is a requirement. Please also consult the deprecation notes.

Deprecation Notes

  • Deprecates support for deploying Masakari. Support for deploying Masakari will be removed from Kolla Ansible in the Caracal Release.

  • Zun is currently provisionally deprecated but not removed. If Zun regains compatibility within the next release cycle, backports to this version of Kolla and Kolla-Ansible will be considered to provide a smooth upgrade path.

Security Issues

  • The kolla-genpwd, kolla-mergepwd, kolla-readpwd and kolla-writepwd commands now creates or updates passwords.yml with correct permissions. Also they display warning message about incorrect permissions.

  • Restrict the access to the http Openstack services exposed /server-status by default through the HAProxy on the public endpoint. Fixes issue for Ubuntu/Debian installations. RockyLinux/CentOS not affected. LP#1996913

Bug Fixes

  • Fixes issues with OVN NB/SB DB deployment, where first node needs to be rebootstrapped. LP#1875223

  • Fix MariaDB backup if enable_proxysql is enable

  • Fixes 504 timeout when scraping openstack exporter. Ensures that HAProxy server timeout is the same as the scrape timeout for the openstack exporter backend. LP#2006051

  • Fix improper use of --file parameter with designate-manage pool update command. LP#2012292 <https://bugs.launchpad.net/kolla-ansible/+bug/2012292>

  • Set correct permissions for opensearch-dashboard data location LP#2020152 https://bugs.launchpad.net/kolla-ansible/+bug/2020152

  • Fix issue with octavia security group rules creation when using IPv6 configuration for octavia management network. See LP#2023502 for more details.

  • Fixes glance-api failed to start privsep daemon when cinder_backend_ceph is set to true. See LP#2024541 for more details.

  • Fixes 2024554. Adds host and mariadb_port to the wsrep sync status check. This is so none standard ports can be used for mariadb deployments. LP#2024554

  • enable_keystone_federation and keystone_enable_federation_openid have not been explicitly handled as bool in various templates in the keystone role so far. LP#2036390

  • Starting with ansible-core 2.13, list concatenation format is changed which resulted in inability to override horizon policy files. See LP#2045660 for more details.

  • Fixes an issue when Kolla is setting the producer tasks to None, and this disables all designate producer tasks. LP#1879557

  • Fixes CloudKitty failing to query Prometheus now that basic authentication is required.

  • Fixes ironic_tftp which binds to all ip addresses on the system. Added ironic_tftp_interface, ironic_tftp_address_family and ironic_tftp_interface_address parameters to set the address for the ironic_tftp service. LP#2024664

  • Fixes the incorrect endpoint URLs and service type information for the Cyborg service in the Keystone. LP#2020080

  • Fixes an issue when using enable_prometheus_server_external in conjunction with haproxy_single_external_frontend.

  • Fixes an issue where a Docker health check wasn’t configured for the OpenSearch Dashboards container. See bug 2028362.

  • Fixes an issue where Prometheus would fail to scrape the OpenStack exporter when using internal TLS with an FQDN. LP#2008208

  • Fixes prometheus grafana datasource using incorrect basic auth credentials.

  • Fixes an issue where ‘q35’ libvirt machine type VM could not hotplug more than one PCIe device at a time.

  • Fixes an issue with prometheus scraping itself now that basic auth has been enabled.

  • Fixes an issue where Fluentd was parsing Horizon WSGI application logs incorrectly. Horizon error logs are now written to horizon-error.log instead of horizon.log. See LP#1898174

  • Fixes an issue where keepalived track script fails on single controller environment and keepalived VIP goes into BACKUP state. keepalived_track_script_enabled variable has been introduced (default: true), which can be used to disable track scripts in keepalived configuration. LP#2025219

  • The etcd tooling has been updated to better serialize restarts when applying configuration or updates. Previously minor outages might have occurred since all services were restarted in the same task.

  • Fixes an issue were an OVS-DPDK task had a different name to how it was being notified.

  • Fixes an issue where Prometheus scraping of Etcd metrics would fail if Etcd TLS is enabled. LP#2036950

  • Fixes an issue where it wasn’t possible to customise Nova service config at the individual service level, which is required in some use cases.

  • Added ability to define address for a separate tgtd network interface.

Other Notes

  • Refactors the MariaDB and RabbitMQ restart procedures to be compatible with Ansible 2.14.3+. See Ansible issue 80848 for details.