2023.1 Series Release Notes¶

2023.1-eom-11¶

Bug Fixes¶

Fixes an issue with Grafana datasource updates by removing hardcoded version number. This ensures proper datasource configuration updates. LP#[2096664]

2023.1-eom¶

New Features¶

Adds the ability to provide the NTP (time source) server for multiple DHCP ranges in the Ironic Inspector DHCP server.

kolla-ansible now validates the Prometheus configuration files when called via kolla-ansible -i $inventory validate-config. This validation is done by running the promtool check config command. See the documentation for the kolla-ansible validate-config command for details.

Upgrade Notes¶

Support for failing execution early if fact collection fails on any of the hosts by setting kolla_ansible_setup_any_errors_fatal to true has been removed. This is due to Ansible’s any_errors_fatal parameter not being templated, resulting in the value always being interpreted as true, even though the default value of kolla_ansible_setup_any_errors_fatal is false.

Equivalent behaviour is possible by setting the maximum failure percentage to 0. This may be done specifically for fact gathering using gather_facts_max_fail_percentage or globally using kolla_max_fail_percentage.

Bug Fixes¶

Fixes nova-cell not updating the cell0 database address when VIP changes. LP#1915302

Fixes keystone service configuration for haproxy when using federation. LP#2058656

Fixes the MariaDB recovery issue when kolla-ansible is running from a docker container. LP#2073370

Fixes busy libvirt’s secret volume while secrets are changing. LP#2073678

Fix ProxySQL unable to bind due to incorrect format of IPv6 addresses in the mysql_ifaces configuration. LP#2081106

Fixes an issue during fact gathering when using the --limit argument where a host that fails to gather facts could cause another host to fail during delegated fact gathering.

Add skip_kpartx yes to multipath.conf defaults section to prevent kpartx scanning multipath devices and unlock multipathd del map operation of os-brick for volume detaching oprtaions. LP#2078973 <https://launchpad.net/bugs/2078973>`__

Fixes 2067036. Added octavia_interface_wait_timeout to control octavia-interface.service timeout to be able wait openvswitch agent sync has been finished and octavia-lb-net is reachable from the host. Also set restart policy for this unit to on-failure LP#2067036

Fixes unreliable health checks for neutron_ovn_agent and neutron_ovn_metadata_agent bug. Changed to check OVS DB connection instead of OVN southbound DB connection. LP#2084128

Fixes parsing of JSON output of inner modules called by kolla-toolbox when data was returned on standard error. LP#2080544

Adds a check to stop deploying/upgrading the RabbitMQ containers if it will result in downgrading the version of RabbitMQ running.

Fixes a bug where the RabbitMQ version check would fail to pull the new image due to lack of auth. LP#2086171

Fixes a bug where the IP address comparison was not done properly for the variable kolla_same_external_internal_vip. Fix the comparison to use the ipaddr filter instead. For details see LP#2076889.

16.7.0¶

New Features¶

Modifies public API firewalld rules to be applied immediately to a running firewalld service. This requires firewalld to be running, but avoids reloading firewalld, which is disruptive due to the way in which firewalld builds its firewall chains.

Added a command to upgrade to a target version of RabbitMQ. This is required before a SLURP upgrade. See the docs for more details: https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rabbitmq.html#slurp

Bug Fixes¶

Fixes an deploy opensearch with enable TLS on the internal VIP.

Fixes handling of openvswitch on manila-share nodes. LP#1993285

Fixes behaviour of Change Password screen in Horizon until bug #2073639 is resolved. LP#2073159

Fixes the Python requests library issue when using custom CA by adding the REQUESTS_CA environment variable to the kolla-toolbox container. See LP#1967132

Fixes configuration of CloudKitty when internal TLS is enabled. LP#1998831

Fixes the dimensions comparison when we set values like 1g in the container dimensions configuration, making the docker container getting restarted even with no changes, as we are comparing 1g with 1073741824, which is displayed in the docker inspect while 1g is in the configuration.

Fixes the detection of the Nova Compute Ironic service when a custom host option is set in the service config file. See LP#2056571

Removes the default /tmp/ mountpoint from the horizon container. This change is made to harden the container and prevent potential security issues. For more information, see the Bug Report: LP#2068126.

Fixes an issue where OVN northbound or southbound database deployment could fail when a new leader is elected. LP#2059124

16.6.0¶

Upgrade Notes¶

MariaDB backup now uses the same image as the running MariaDB server. The following variables relating to MariaDB backups are no longer used and have been removed:
- mariabackup_image
- mariabackup_tag
- mariabackup_image_full

Bug Fixes¶

Add conditionals for IPv6 sysctl settings that have IPV6 disabled in kernel. Changing sysctl settings related to IPv6 on those systems lead to errors. LP#1906306

Fixes trove module imports. Path to the modules needed by trove-api changed in source trove package so the configuration was updated. LP#1937120

Fixes ovs-dpdk images pull. LP#[2041864]

Fixes configuration of nova-compute and nova-compute-ironic, that will enable exposing vendordata over configdrive. LP#2049607

Modifies the MariaDB procedure to use the same container image as the running MariaDB server container. This should prevent compatibility issues that may cause the backup to fail.

Fixes a bug where Nova and Cinder would not register the required keystone service role if keystone tags are skipped. LP#2049762

Fixed ‘cinder-backup’ service when Swift with TLS enabled. LP#2051986

Fixes an idempotency issue in the OpenSearch upgrade tasks where subsequent runs of kolla-ansible upgrade would leave shard allocation disabled. LP#2049512

Fixes 2065168. Fix kolla systemd unit template to prevent restart all kolla services with docker.service restart. LP#[2065168]

Fixed an issue where the MariaDB Cluster recovery process would fail if the sequence number was not found in the logs. The recovery process now checks the complete log file for the sequence number and recovers the cluster. See LP#1821173 for details.

A precheck has been added to catch when om_enable_rabbitmq_quorum_queues is set to True, but quorum queues have not been configured on all appropriate queues. A manual migration is required, see here for details: https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rabbitmq.html#high-availability LP#2045887

All stable RabbitMQ feature flags are now enabled during deployments, reconfigures, and upgrades. As such, the variable rabbitmq_feature_flags is no longer required. This is a partial fix to RabbitMQ SLURP support. LP#2049512

Fixes an issue where the Keystone admin endpoint would be recreated when upgrading Keystone. The endpoint is now explicitly removed during the upgrade process.

Fixes skyline’s old format of stop task. It used docker_container which would cause problems with podman deployments.

16.5.0¶

Upgrade Notes¶

If credentials are updated in passwords.yml kolla-ansible is now able to update these credentials in the keystone database and in the on disk config files.

The changes to passwords.yml are applied once kolla-ansible -i INVENTORY reconfigure has been run.

If you want to revert to the old behavior - credentials not automatically updating during reconfigure if they changed in passwords.yml - you can specify this by setting update_keystone_service_user_passwords: false in your globals.yml.

Notice that passwords are only changed if you change them in passwords.yml. This mechanism is not a complete solution for automatic credential rollover. No passwords are changed if you do not change them inside passwords.yml.

Bug Fixes¶

Fixes mariadb role deployment when using Ansible check mode. LP#2052501

Updated configuration of service user tokens for all Nova and Cinder services to stop using admin role for service_token and use service role.

See LP#[2004555] and LP#[2049762] for more details.

Add Keystone Service role. Keystone is creating service in bootstrap since Bobcat. Service role is needed for SLURP to work from Antelope. This role is also needed in Antelope and Zed for Cinder for proper service token support. LP#2049762

Changes to service user passwords in passwords.yml will now be applied when reconfiguring services.

This behaviour can reverted by setting update_keystone_service_user_passwords: false.

Fixes LP#2045990

16.4.0¶

Bug Fixes¶

Fixes enabled usage audit notifications when they are not needed. See LP##2049503.

Fixes a bug where Octavia configuration couldn’t be generated for Antelope release while running Zed release kolla toolbox. LP#2049364

16.3.0¶

New Features¶

The new command kolla-ansible rabbitmq-reset-state has been added. It force-resets the state of RabbitMQ. This is primarily designed to be used when enabling HA queues, see docs: https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rabbitmq.html#high-availability

Updates apache grok pattern to match the size of response in bytes, time taken to serve the request and user agent.

You can now enable the usage of quorum queues in RabbitMQ for all services by setting the variable om_enable_rabbitmq_quorum_queues to true. Notice that you can’t use quorum queues and high availability at the same time. This is caught by a precheck.

Set a log retention policy for OpenSearch via Index State Management (ISM). Documentation.

Adds new restart_policy called oneshot that does not create systemd units and is used for bootstrap tasks.

Upgrade Notes¶

Added log retention in OpenSearch, previously handled by Elasticsearch Curator. By default the soft and hard retention periods are 30 and 60 days respectively. If you are upgrading from Elasticsearch, and have previously configured elasticsearch_curator_soft_retention_period_days or elasticsearch_curator_hard_retention_period_days, those variables will be used instead of the defaults. You should migrate your configuration to use the new variable names before the Caracal release.

restart_policy: no will now create systemd units, but with Restart property set to no.

Bug Fixes¶

Fix MariaDB backup if enable_proxysql is enable

Fixes 504 timeout when scraping openstack exporter. Ensures that HAProxy server timeout is the same as the scrape timeout for the openstack exporter backend. LP#2006051

Fixes non-persistent Neutron agent state data. LP2009884

Fix issue with octavia security group rules creation when using IPv6 configuration for octavia management network. See LP#2023502 for more details.

Fixes glance-api failed to start privsep daemon when cinder_backend_ceph is set to true. See LP#2024541 for more details.

Fixes 2024554. Adds host and mariadb_port to the wsrep sync status check. This is so none standard ports can be used for mariadb deployments. LP#2024554

Starting with ansible-core 2.13, list concatenation format is changed which resulted in inability to override horizon policy files. See LP#2045660 for more details.

Fixes long service restarts while using systemd LP#2048130.

Fixes an issue with high CPU usage of the cAdvisor container by setting the per-container housekeeping interval to the same value as the Prometheus scrape interval. LP#2048223

Fixes glance image import LP#2048525.

Fixes an issue where Prometheus would fail to scrape the OpenStack exporter when using internal TLS with an FQDN. LP#2008208

Fixes Docker health check for the sahara_engine container. LP#2046268

Fixes an issue where Fluentd was parsing Horizon WSGI application logs incorrectly. Horizon error logs are now written to horizon-error.log instead of horizon.log. See LP#1898174

Added log retention in OpenSearch, previously handled by Elasticsearch Curator, now using Index State Management (ISM) OpenSearch bundled plugin. LP#2047037.

Fixes an issue where Prometheus scraping of Etcd metrics would fail if Etcd TLS is enabled. LP#2036950

16.2.0¶

New Features¶

Added capability to specify custom kernel modules for Neutron: neutron_modules_default: Lists default modules. neutron_modules_extra: For custom modules and parameters.

Supports Debian Bookworm (12) as host distribution.

Added a neutron check for ML2/OVS and ML2/OVN presence at the start of deploy phase. It will fail if neutron_plugin_agent is set to ovn and use of ML2/OVS container detected. In case where neutron_plugin_agent is set to openvswitch the check will fail when it detects ML2/OVN container or any of the OVN specific volumes.

In the configuration template of the Senlin service the cafile parameter is now set by default in the authentication section. This way the use of self-signed certificates on the internal Keystone endpoint is also usable in the Senlin service.

Upgrade Notes¶

Default keystone user role has been changed from deprecated role _member_ to member role.

Now ironic_tftp service does not bind on 0.0.0.0, by default it uses ip address of the api_interface. To revert to the old behaviour, please set ironic_tftp_interface_address: 0.0.0.0 in globals.yml.

Configure Nova libvirt.num_pcie_ports to 16 by default. Nova currently sets ‘num_pcie_ports’ to “0” (defaults to libvirt’s “1”), which is not sufficient for hotplug use with ‘q35’ machine type.

Changes default value of nova libvirt driver setting skip_cpu_compare_on_dest to true. With the libvirt driver, during live migration, skip comparing guest CPU with the destination host. When using QEMU >= 2.9 and libvirt >= 4.4.0, libvirt will do the correct thing with respect to checking CPU compatibility on the destination host during live migration.

Security Issues¶

Restrict the access to the http Openstack services exposed /server-status by default through the HAProxy on the public endpoint. Fixes issue for Ubuntu/Debian installations. RockyLinux/CentOS not affected. LP#1996913

Bug Fixes¶

Fixes issues with OVN NB/SB DB deployment, where first node needs to be rebootstrapped. LP#1875223

enable_keystone_federation and keystone_enable_federation_openid have not been explicitly handled as bool in various templates in the keystone role so far. LP#2036390

Fixes an issue when Kolla is setting the producer tasks to None, and this disables all designate producer tasks. LP#1879557

Fixes ironic_tftp which binds to all ip addresses on the system. Added ironic_tftp_interface, ironic_tftp_address_family and ironic_tftp_interface_address parameters to set the address for the ironic_tftp service. LP#2024664

Fixes an issue where a Docker health check wasn’t configured for the OpenSearch Dashboards container. See bug 2028362.

Fixes an issue where ‘q35’ libvirt machine type VM could not hotplug more than one PCIe device at a time.

Fixes an issue where keepalived track script fails on single controller environment and keepalived VIP goes into BACKUP state. keepalived_track_script_enabled variable has been introduced (default: true), which can be used to disable track scripts in keepalived configuration. LP#2025219

Fixes an issue were an OVS-DPDK task had a different name to how it was being notified.

16.1.0¶

Upgrade Notes¶

Removes the restriction on the maximum supported version of 2.14.2 for ansible-core. Any 2.14 series release is now supported.

Security Issues¶

The kolla-genpwd, kolla-mergepwd, kolla-readpwd and kolla-writepwd commands now creates or updates passwords.yml with correct permissions. Also they display warning message about incorrect permissions.

Bug Fixes¶

Set correct permissions for opensearch-dashboard data location LP#2020152 https://bugs.launchpad.net/kolla-ansible/+bug/2020152

Fixes the incorrect endpoint URLs and service type information for the Cyborg service in the Keystone. LP#2020080

Other Notes¶

Refactors the MariaDB and RabbitMQ restart procedures to be compatible with Ansible 2.14.3+. See Ansible issue 80848 for details.

16.0.0¶

New Features¶

Adds the flag om_enable_rabbitmq_high_availablity. Setting this to true will enable both durable queues and classic mirrored queues in RabbitMQ. Note that classic queue mirroring and transient (aka non-durable) queues are deprecated and subject to removal in RabbitMQ version 4.0 (date of release unknown). Changes the pattern used in classic mirroring to exclude some queue types. This pattern is ^(?!(amq\\.)|(.*_fanout_)|(reply_)).*.

Since CVE-2022-29404 is fixed the default value for the LimitRequestBody directive in the Apache HTTP Server has been changed from 0 (unlimited) to 1073741824 (1 GiB). This limits the size of images (for example) uploaded in Horizon. Now this limit can be configured via horizon_httpd_limitrequestbody. LP#2012588

Add skyline ansible role

Adds support for container state control through systemd in kolla_docker. Every container logs only to journald and has it’s own unit file in /etc/systemd/system named kolla-<container name>-container.service. Systemd control is implemented in new file ansible/module_utils/kolla_systemd_worker.py.

etcd is now exposed internally via HAProxy on etcd_client_port.

Adds the command kolla-ansible validate-config. This runs oslo-config-validator against the configurgation files present in the deployed OpenStack services. By default, results are saved to /var/log/kolla/config-validate

With the parameter mariadb_datadir_volume it is possible to use a directory as volume for the mariadb service. By default, a volume named mariadb is used (the previous default).

Adds support for deploying neutron-ovn-agent. The agent is disabled by default and can be enabled using neutron_enable_ovn_agent. This new agent will run on a compute node using OVN as network backend, similar to other ML2 mechanism drivers as ML2/OVS or ML2/SRIOV. This new agent will perform those actions that the ovn-controller service cannot execute. More details in RFE <https://bugs.launchpad.net/neutron/+bug/1998608>__

With the new neutron_ovn_availability_zones parameter it is possible to define network availability zones for OVN. Further details can be found in the Neutron OVN documentation: https://docs.openstack.org/neutron/latest/admin/ovn/availability_zones.html#how-to-configure-it

Masakari coordination backend can now be configured via masakari_coordination_backend variable. Coordination is optional and can now be set to either redis or etcd.

Adds ovn-monitor-all variable. A boolean value that tells if ovn-controller should unconditionally monitor all records in OVS databases. Setting ovn-monitor-all variable to ‘true’ will remove some CPU load from OVN SouthBound DB but will effect with more updates comming to ovn-controller. Might be helpfull in large deployments with many compute hosts.

Added two new flags to alter behaviour in RabbitMQ: * rabbitmq_message_ttl_ms, which lets you set a TTL on messages. * rabbitmq_queue_expiry_ms, which lets you set an expiry time on queues. See https://www.rabbitmq.com/ttl.html for more information on both.

Adds the ability to configure rabbitmq via rabbitmq_extra_config which can be overriden in globals.yml.

The config option rabbitmq_ha_replica_count is added, to allow for changing the replication factor of mirrored queues in RabbitMQ. While the flag is unset, the queues are mirrored across all nodes using “ha-mode”:”all”. Note that this only has an effect if the flag ` om_enable_rabbitmq_high_availability` is set to True, as otherwise queues are not mirrored.

The config option rabbitmq_ha_promote_on_shutdown has been added, which allows changing the RabbitMQ definition ha-promote-on-shutdown. By default ha-promote-on-shutdown is “when-synced”. We recommend changing this to be “always”. This basically means we don’t mind losing some messages, instead we give priority to rabbitmq availability. This is most relevant when restarting rabbitmq, such as when upgrading. Note that setting the value of this flag, even to the default value of “when-synced”, will cause RabbitMQ to be restarted on the next deploy. For more details please see: https://www.rabbitmq.com/ha.html#cluster-shutdown

When restarting a RabbitMQ container, the node is now first put into maintenance mode. This will make the node shutdown less disruptive. For details on what maintenance mode does, see: https://www.rabbitmq.com/upgrade.html#maintenance-mode

Switch trove-api to WSGI running under Apache.

Added configuration options to enable backend TLS encryption from HAProxy to the Trove service.

Services using etcd3gw via tooz now use etcd via haproxy. This removes a single point of failure, where we hardcoded the first etcd host for backend_url.

Upgrade Notes¶

Minimum supported Ansible version is now 6 (ansible-core 2.13) and maximum supported is 7 (ansible-core 2.14). Due to a regression in ansible-core, it must not be greater than 2.14.2.

skydive service deployment support has been dropped, following removal of Kolla skydive images.

RabbitMQ replica count has changed from n to (n//2+1) where n is the number of RabbitMQ nodes. That is, for a 3 node clusters, we request exactly 2 replicas, for a 1 node cluster, we request 1 replica, and for a 5 node cluster, we request 3 replicas. This only has an effect if om_enable_rabbitmq_high_availability is set to True, otherwise queues are not replicated. The number of mirrored queues is not changed automatically, and instead requires the queues to be recreated (for example, by restarting RabbitMQ). This follows the good practice advice here: https://www.rabbitmq.com/ha.html#replication-factor A major motivation is to reduce the load on RabbitMQ in larger deployments. It is hoped, the improved performance should also help rabbitmq recover more quickly from cluster issues. Note that the contents of the RabbitMQ definitions.json are now changed, meaning RabbitMQ containers will be restarted on next deploy/upgrade.

The RabbitMQ variable rabbitmq-ha-promote-on-shutdown now defaults to “always”. This only has an effect if om_enable_rabbitmq_high_availability is set to True. When ha-promote-on-shutdown is set to always, queue mirrors are promted on shutdown even if they aren’t fully synced. This means that value availability over the risk of losing some messages. Note that the contents of the RabbitMQ definitions.json are now changed, meaning RabbitMQ containers will be restarted on next deploy/upgrade.

In RabbitMQ, messages now have a TTL of 10 minutes and inactive queues will expire after 1 hour. These queue arguments can be changed dynamically at runtime [1], but it should be noted that applying a TTL to queues which already have messages will discard the messages when specific events occur. See [2] for more details. Note that the contents of the RabbitMQ definitions.json are now changed, meaning RabbitMQ containers will be restarted on next deploy/upgrade. [1] https://www.rabbitmq.com/queues.html#optional-arguments [2] https://www.rabbitmq.com/ttl.html#per-message-ttl-caveats

Changes rabbitmq upgrade procedure from full stop of a cluster to a rolling upgrade that is supported since RabbitMQ 3.8.

OpenStack services (except Ironic and Keystone) stopped supporting the system scope in their API policy. Kolla who started using the system scope token during the OpenStack Xena release needs to revert it and use the project scope token to perform those services API operations. The Ironic and Keystone operations are still performed using the system scope token.

Default tags of neutron_tls_proxy and glance_tls_proxy have been changed to haproxy_tag, as both services are using haproxy container image. Any custom tag overrides for those services should be altered before upgrade.

Deprecation Notes¶

Deprecates support for deploying Sahara. Support for deploying Sahara will be removed from Kolla Ansible in the Bobcat Release.

Deprecates support for deploying Vitrage. Support for deploying Vitrage will be removed from Kolla Ansible in the Bobcat Release.

Bug Fixes¶

The precheck for RabbitMQ failed incorrectly when kolla_externally_managed_cert was set to true. LP#1999081

Fixes kolla_docker module which did not take into account the common_options parameter, so there were always module’s default values. LP#2003079

Fixes keystone’s task which is connecting via ssh instead locally. LP#2004224

Fixes create sasl account before config file is ready. LP#2015589

The flags --db-nb-pid and --db-sb-pid have been corected to be --db-nb-pidfile and --db-sb-pidfile respectively. See here for reference: https://github.com/ovn-org/ovn/blob/6c6a7ad1c64a21923dc9b5bea7069fd88bcdd6a8/utilities/ovn-ctl#L1045 LP#2018436

Configuration of service user tokens for all Nova and Cinder services is now done automatically, to ensure security of block-storage volume data.

See LP#[2004555] for more details.

The value of [oslo_messaging_rabbit] heartbeat_in_pthread is explicitly set to either true for wsgi applications, or false otherwise.

Fixes deployment when using Ansible check mode. LP#2002661

Set the etcd internal hostname and cacert for tls internal enabled deployments. This allows services to work with etcd when coordination is enabled for TLS interal deployments. Without this fix, the coordination backend fails to connect to etcd and the service itself crashes.

Fix issue with octavia config generation when using octavia_auto_configure and the genconfig command. Note that access to the OpenStack API is necessary for Octavia auto configuration to work, even when generating config. See LP#1987299 for more details.

Fixes OVN deployment order - as recommended in OVN docs. LP#1979329

When upgrading Nova to a new release, we use the tool nova-status upgrade check to make sure that there are no nova-compute that are older than N-1 releases. This was performed using the current nova-api container, so computes which will be too old after the upgrade were not caught. Now the upgraded nova-api container image is used, so older computes are identified correctly. LP#1957080

Fixes an issue where some prechecks would fail or not run when running in check mode. LP#2002657

When upgrading or deploying RabbitMQ, the policy ha-all is cleared if om_enable_rabbitmq_high_availability is set to false.

In HA mode, parallel restart of neutron-l3-agent containers will cause a network outage. Adding routers increases the recovery time. This release makes restarts serial and adds a user-configurable delay to ensure each agent is returned to operation before the next one is restarted.

The default value is 0. A nonzero starting value would only result in outages if the failover time was greater than the delay, which would be more difficult to diagnose than consistent behaviour.

Prevent haproxy-config role from attempting to configure firewalld during a kolla-ansible genconfig. LP#2002522

2023.1 Series Release Notes

2023.1 Series Release Notes¶

2023.1-eom-11¶

Bug Fixes¶

2023.1-eom¶

New Features¶

Upgrade Notes¶

Bug Fixes¶

16.7.0¶

New Features¶

Bug Fixes¶

16.6.0¶

Upgrade Notes¶

Bug Fixes¶

16.5.0¶

Upgrade Notes¶

Bug Fixes¶

16.4.0¶

Bug Fixes¶

16.3.0¶

New Features¶

Upgrade Notes¶

Bug Fixes¶

16.2.0¶

New Features¶

Upgrade Notes¶

Security Issues¶

Bug Fixes¶

16.1.0¶

Upgrade Notes¶

Security Issues¶

Bug Fixes¶

Other Notes¶

16.0.0¶

New Features¶

Upgrade Notes¶

Deprecation Notes¶

Bug Fixes¶

Kolla Ansible Release Notes

Page Contents