2023.2 Series Release Notes¶
17.5.0-21¶
New Features¶
Added a command to upgrade to a target version of RabbitMQ. This is required before a SLURP upgrade. See the docs for more details: https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rabbitmq.html#slurp
Upgrade Notes¶
Support for failing execution early if fact collection fails on any of the hosts by setting
kolla_ansible_setup_any_errors_fatal
totrue
has been removed. This is due to Ansible’sany_errors_fatal
parameter not being templated, resulting in the value always being interpreted astrue
, even though the default value ofkolla_ansible_setup_any_errors_fatal
isfalse
.Equivalent behaviour is possible by setting the maximum failure percentage to 0. This may be done specifically for fact gathering using
gather_facts_max_fail_percentage
or globally usingkolla_max_fail_percentage
.
Bug Fixes¶
Fixes an issue with ironic dnsmasq failing to start in deployments using podman because it requires the NET_RAW capability. See LP#2055282.
Fixes keystone service configuration for haproxy when using federation. LP#2058656
Fixes the MariaDB recovery issue when kolla-ansible is running from a docker container. LP#2073370
Fixes an issue during fact gathering when using the
--limit
argument where a host that fails to gather facts could cause another host to fail during delegated fact gathering.
Add
skip_kpartx yes
to multipath.confdefaults
section to prevent kpartx scanning multipath devices and unlockmultipathd del map
operation of os-brick for volume detaching oprtaions. LP#2078973 <https://launchpad.net/bugs/2078973>`__
Fixes 2067036. Added
octavia_interface_wait_timeout
to control octavia-interface.service timeout to be able wait openvswitch agent sync has been finished and octavia-lb-net is reachable from the host. Also set restart policy for this unit to on-failure LP#2067036
Fixes Octavia service upgrade issue where it can fail when Octavia persistence database user is missing. LP#2065591
Fixes unreliable health checks for neutron_ovn_agent and neutron_ovn_metadata_agent bug. Changed to check OVS DB connection instead of OVN southbound DB connection. LP#2084128
Fixes an issue, when using podman, with named volumes that use a mode specifier. See LP#2054834 for more details.
Fixes parsing of JSON output of inner modules called by
kolla-toolbox
when data was returned on standard error. LP#2080544
17.5.0¶
New Features¶
Modifies public API firewalld rules to be applied immediately to a running firewalld service. This requires firewalld to be running, but avoids reloading firewalld, which is disruptive due to the way in which firewalld builds its firewall chains.
Bug Fixes¶
Fixes an deploy opensearch with enable TLS on the internal VIP.
Fixes handling of openvswitch on
manila-share
nodes. LP#1993285
Fixes behaviour of Change Password screen in Horizon until bug #2073639 is resolved. LP#2073159
Fixes the Python requests library issue when using custom CA by adding the REQUESTS_CA environment variable to the kolla-toolbox container. See LP#1967132
Fixes configuration of CloudKitty when internal TLS is enabled. LP#1998831
Fixes the detection of the Nova Compute Ironic service when a custom host option is set in the service config file. See LP#2056571
Removes the default /tmp/ mountpoint from the horizon container. This change is made to harden the container and prevent potential security issues. For more information, see the Bug Report: LP#2068126.
Fixes an issue where OVN northbound or southbound database deployment could fail when a new leader is elected. LP#2059124
17.4.0¶
Upgrade Notes¶
MariaDB backup now uses the same image as the running MariaDB server. The following variables relating to MariaDB backups are no longer used and have been removed:
mariabackup_image
mariabackup_tag
mariabackup_image_full
Deprecation Notes¶
Support for deploying Masakari is no longer deprecated. The Masakari CI scenarios are now working again, and commitment has been made to improve the health of the project.
Bug Fixes¶
Add conditionals for IPv6 sysctl settings that have IPV6 disabled in kernel. Changing sysctl settings related to IPv6 on those systems lead to errors. LP#1906306
Fixes nova-cell not updating the cell0 database address when VIP changes. LP#1915302
Fixes trove module imports. Path to the modules needed by trove-api changed in source trove package so the configuration was updated. LP#1937120
Fixes
ovs-dpdk
images pull. LP#[2041864]
Incorrect condition in Podman part prevented the retrieval of facts of all the containers when no names were provided. LP#2058492
Modifies the MariaDB procedure to use the same container image as the running MariaDB server container. This should prevent compatibility issues that may cause the backup to fail.
Fixes a bug in kolla_podman_worker, where missing commas in list of strings create implicit concatenation of items that should be separate. LP#2067278
Fixed ‘cinder-backup’ service when Swift with TLS enabled. LP#2051986
Fixes the dimensions comparison when we set values like 1g in the container dimensions configuration, making the docker container getting restarted even with no changes, as we are comparing 1g with 1073741824, which is displayed in the docker inspect while 1g is in the configuration.
Fixes keystone port in skyline-console pointing to wrong endpoint port. LP#2069855
Fixes 2065168. Fix kolla systemd unit template to prevent restart all kolla services with docker.service restart. LP#[2065168]
Fixes a bug in kolla-ansible where the keystone service role was not being created during an upgrade. This was due to the service-ks-register role not being imported in the upgrade.yml file. The service-ks-register role is now imported in the upgrade.yml file. See bug: https://bugs.launchpad.net/kolla-ansible/+bug/2056761
Fixed an issue where the MariaDB Cluster recovery process would fail if the sequence number was not found in the logs. The recovery process now checks the complete log file for the sequence number and recovers the cluster. See LP#1821173 for details.
Fix the Octavia jobboard boolean value. See https://bugs.launchpad.net/kolla-ansible/+bug/2058046 for details.
Updates the default Grafana OpenSearch datasource configuration to use values for OpenSearch that work out of the box. Replaces the Elasticsearch values that were previously being used. The new configuration can be applied by deleting your datasource and reconfiguring Grafana through kolla ansible. In order to prevent dashboards from breaking when the datasource is deleted, one should use datasource variables in Grafana. See bug 2039500.
All stable RabbitMQ feature flags are now enabled during deployments, reconfigures, and upgrades. As such, the variable
rabbitmq_feature_flags
is no longer required. This is a partial fix to RabbitMQ SLURP support. LP#2049512
Fixes an issue where the Keystone admin endpoint would be recreated when upgrading Keystone. The endpoint is now explicitly removed during the upgrade process.
Fixes skyline’s old format of stop task. It used docker_container which would cause problems with podman deployments.
17.3.0¶
Upgrade Notes¶
If credentials are updated in
passwords.yml
kolla-ansible is now able to update these credentials in the keystone database and in the on disk config files.The changes to
passwords.yml
are applied oncekolla-ansible -i INVENTORY
reconfigure has been run.If you want to revert to the old behavior - credentials not automatically updating during reconfigure if they changed in
passwords.yml
- you can specify this by settingupdate_keystone_service_user_passwords: false
in your globals.yml.Notice that passwords are only changed if you change them in
passwords.yml
. This mechanism is not a complete solution for automatic credential rollover. No passwords are changed if you do not change them insidepasswords.yml
.
Bug Fixes¶
Fixes configuration of nova-compute and nova-compute-ironic, that will enable exposing vendordata over configdrive. LP#2049607
Fixes mariadb role deployment when using Ansible check mode. LP#2052501
Updated configuration of service user tokens for all Nova and Cinder services to stop using admin role for service_token and use service role.
See LP#[2004555] and LP#[2049762] for more details.
Changes to service user passwords in
passwords.yml
will now be applied when reconfiguring services.This behaviour can reverted by setting
update_keystone_service_user_passwords: false
.Fixes LP#2045990
17.2.0¶
Bug Fixes¶
Fixes enabled usage audit notifications when they are not needed. See LP##2049503.
Fixes an idempotency issue in the OpenSearch upgrade tasks where subsequent runs of kolla-ansible upgrade would leave shard allocation disabled. LP#2049512
17.1.0¶
New Features¶
Set a log retention policy for OpenSearch via Index State Management (ISM). Documentation.
Upgrade Notes¶
Added log retention in OpenSearch, previously handled by Elasticsearch Curator. By default the soft and hard retention periods are 30 and 60 days respectively. If you are upgrading from Elasticsearch, and have previously configured
elasticsearch_curator_soft_retention_period_days
orelasticsearch_curator_hard_retention_period_days
, those variables will be used instead of the defaults. You should migrate your configuration to use the new variable names before the Caracal release.
Bug Fixes¶
Fixes non-persistent Neutron agent state data. LP2009884
Fixes long service restarts while using systemd LP#2048130.
Fixes an issue with high CPU usage of the cAdvisor container by setting the per-container housekeeping interval to the same value as the Prometheus scrape interval. LP#2048223
Fixes glance image import LP#2048525.
Fixes Nova operations using the
scp
command, such as cold migration or resize, on Debian Bookworm. LP#2048700
Fixes Docker health check for the
sahara_engine
container. LP#2046268
Added log retention in OpenSearch, previously handled by Elasticsearch Curator, now using Index State Management (ISM) OpenSearch bundled plugin. LP#2047037.
A precheck has been added to catch when
om_enable_rabbitmq_quorum_queues
is set toTrue
, but quorum queues have not been configured on all appropriate queues. A manual migration is required, see here for details: https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rabbitmq.html#high-availability LP#2045887
17.0.0¶
New Features¶
Adds http/2 support to HAProxy frontends.
Adds support for deploying the ironic-prometheus-exporter, ‘a Tool to expose hardware sensor data in the Prometheus format through an HTTP endpoint’. See https://opendev.org/openstack/ironic-prometheus-exporter for more details about the exporter.
Add Lets Encrypt TLS certificate service integration into Openstack deployment. Enables trusted TLS certificate generation option for secure communcation with OpenStack HAProxy instances using
letsencrypt_email
,kolla_internal_fqdn
and/orkolla_external_fqdn
is required. One container runs an Apache ACME client webserver and one runs Lego for certificate retrieval and renewal. The Lego container starts a cron job which attempts to renew certificates every 12 hours.
Added capability to specify custom kernel modules for Neutron: neutron_modules_default: Lists default modules. neutron_modules_extra: For custom modules and parameters.
A custom
event_pipeline.yaml
file for the Ceilometer notification service is now processed withmerge_yaml
. This allows Jinja2 to be used. Furthermore, it is possible to have a globalevent_pipeline.yaml
and host-specificevent_pipeline.yaml
files.
The new command
kolla-ansible rabbitmq-reset-state
has been added. It force-resets the state of RabbitMQ. This is primarily designed to be used when enabling HA queues, see docs: https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rabbitmq.html#high-availability
Supports Debian Bookworm (12) as host distribution.
Adds support for exposing Prometheus server on the external interface. This is disabled by default and can be enabled by setting
enable_prometheus_server_external
totrue
. Basic auth is used to protect the endpoint.
Adds
prometheus_external_fqdn
andprometheus_internal_fqdn
to customise prometheus FQDNs.
Implements support for Podman deployment as an alternative to Docker. To perform deployment using Podman, set the variable
kolla_container_engine
to valuepodman
inside of theglobals.yml
file.
Adds the ability for the instance label on prometheus metrics to be replaced with the inventory hostname as opposed to using the ip address as the metric label. The ip address is still used as the target address which means that there is no issue of the hostname being unresolvable.
More information on how to use this feature can be found in the reference documentation for logging and monitoring.
HAProxy supports setting nbthread via variable haproxy_threads. Threads are recommended instead of processes since HAProxy 1.8. They cannot be used both at the same time.
Adds single service external frontend feature to haproxy. Details are in the haproxy guide section of the documentation.
HORIZON_IMAGES_UPLOAD_MODE
is now set to'direct'
by default. This improves image uploads from clients, because these no longer use Horizon’s webserver as a staging area - the image upload goes directly to Glance API.
Updates apache grok pattern to match the size of response in bytes, time taken to serve the request and user agent.
With the parameter
ironic_agent_files_directory
it is possible to provide the directory for theironic-agent.kernel
andironic-agent.initramfs
files. By default the parameter is set to the value ofnode_custom_config
. This corresponds to the existing behaviour.
Adds
keystone_federation_oidc_additional_options
that allows to pass additional OIDC options.
Adds support for copying in
{{ node_custom_config }}/magnum/kubeconfig
to Magnum containers formagnum-cluster-api
driver.
You can now enable the usage of quorum queues in RabbitMQ for all services by setting the variable
om_enable_rabbitmq_quorum_queues
totrue
. Notice that you can’t use quorum queues and high availability at the same time. This is caught by a precheck. This feature is enabled by default to improve reliability of the messaging queues.
The
etcd
tooling has been updated to handle adding and removing nodes. Previously this was an undocumented manual process and required creating service containers. Operators can refer to the etcd admin guide for more details.
Added a neutron check for ML2/OVS and ML2/OVN presence at the start of deploy phase. It will fail if neutron_plugin_agent is set to
ovn
and use of ML2/OVS container detected. In case where neutron_plugin_agent is set toopenvswitch
the check will fail when it detects ML2/OVN container or any of the OVN specific volumes.
Glance, cinder, manila services now support configuration of multiple ceph cluster backends. For nova and gnocchi there is the possibility to configure different ceph clusters - for gnocchi this is possible at the service level while for nova at the host level. See the external ceph guide docs. on how to set multiple ceph backends for more details.
The flag
--check-expiry
has been added to theoctavia-certificates
command.kolla-ansible octavia-certificates --check-expiry <days>
will check if the Octavia certificates are set to expire within a given number of days.
The Octavia amphora provider driver improves control plane resiliency. Should a control plane host go down during a load balancer provisioning operation, an alternate controller can resume the in-process provisioning and complete the request. This solves the issue with resources stuck in PENDING_* states by writing info about task states in persistent backend and monitoring job claims via jobboard. The jobboard feature is now enabled by default. It requires the Redis service to be enabled as a dependency. Use the
enable_octavia_jobboard
variable to override if needed.
With the parameter
rabbitmq_datadir_volume
it is possible to use a directory as volume for the rabbitmq service. By default, a volume named rabbitmq is used (the previous default).
Adds new
restart_policy
calledoneshot
that does not create systemd units and is used for bootstrap tasks.
Added support for Cinder-Backup with S3 backend.
Added support for Glance with S3 backend
In the configuration template of the Senlin service the
cafile
parameter is now set by default in theauthentication
section. This way the use of self-signed certificates on the internal Keystone endpoint is also usable in the Senlin service.
Adds support for ansible-core only installation (use
kolla-ansible install-deps
to install required collections).
Upgrade Notes¶
Minimum supported Ansible version is now
7
(ansible-core 2.14) and maximum supported is8
(ansible-core 2.15).
Default keystone user role has been changed from deprecated role
_member_
tomember
role.
Now
ironic_tftp
service does not bind on 0.0.0.0, by default it uses ip address of theapi_interface
. To revert to the old behaviour, please setironic_tftp_interface_address: 0.0.0.0
inglobals.yml
.
Remnants of Monasca, Storm, Kafka and Zookeeper have been removed (including
kolla-ansible monasca_cleanup
command).
etcd has been upgraded to version 3.4 in this release. Operators are highly encouraged to read the upgrade notes for impacts on etcd clients. Upgrades are only supported from etcd v3.3: Skip version upgrades are not supported. Please ensure that adequate backups are taken before running the upgrade to guard against dataloss.
etcd version 3.4 drops support for the v3alpha endpoint. Internal kolla-ansible endpoints have been updated, but operators are strongly encouraged to audit any customizations or external users of etcd.
Prometheus now uses basic auth. The password is under the key
prometheus_password
in the Kolla passwords file. The username isadmin
. The default set of users can be changed using the variable:prometheus_basic_auth_users
.
Configure Nova libvirt.num_pcie_ports to 16 by default. Nova currently sets ‘num_pcie_ports’ to “0” (defaults to libvirt’s “1”), which is not sufficient for hotplug use with ‘q35’ machine type.
Configuring HAProxy nbproc setting via
haproxy_processes
andhaproxy_process_cpu_map
variables has been dropped since threads are the recommended way to scale CPU performance since 1.8. This covershaproxy
,glance-tls-proxy
andneutron-tls-proxy
. Please usehaproxy_threads
andhaproxy_thread_cpu_map
instead (orglance_tls_proxy_threads
andglance_tls_proxy_thread_cpu_map
for Glance TLS proxy andneutron_tls_proxy_threads
andneutron_tls_proxy_thread_cpu_map
for Neutron TLS proxy).
HORIZON_IMAGES_UPLOAD_MODE
is now set to'direct'
by default. In order to retain the previous default ('legacy'
) - please setHORIZON_IMAGES_UPLOAD_MODE: 'legacy'
in yourcustom_local_settings
file.
Quorum queues in RabbitMQ (controlled by
om_enable_rabbitmq_quorum_queues
variable) is enabled by default from now on. Support for non-HA RabbitMQ queues is dropped. Either quorum queues that are enabled by default, or classic mirrored queues are required now. Migration procedure from non-HA to HA
Removes the restriction on the maximum supported version of 2.14.2 for
ansible-core
. Any 2.14 series release is now supported.
The default value for
ceph_cinder_keyring
has been changed from: “ceph.client.cinder.keyring” to: “client.{{ ceph_cinder_user }}.keyring”the default value for
ceph_cinder_backup_keyring
has been changed from: “ceph.client.cinder-backup.keyring” to: “client.{{ ceph_cinder_backup_user }}.keyring”the default value for
ceph_glance_keyring
has been changed from: “ceph.client.glance.keyring” to: “client.{{ ceph_glance_user }}.keyring”the default value for
ceph_manila_keyring
has been changed from: “ceph.client.manila.keyring” to: “client.{{ ceph_manila_user }}.keyring”and the default value for
ceph_gnocchi_keyring
has been changed from: “ceph.client.gnocchi.keyring” to: “client.{{ ceph_gnocchi_user }}.keyring”User who did override default values for the above variables have to change them according to the new pattern.
The Octavia amphora provider by default is now deployed with the jobboard feature enabled. This requires the Redis service to be enabled as a dependency, please update your configuration accordingly if needed. For futher information see Amphorav2 docs
Enabled
ovn_emit_need_to_frag
setting by default. It is useful when external network’s MTU is lower then internal geneve networks. Host kernel needs to be in version >= 5.2 for this option to work. All Kolla supported host operating systems have higher kernel version.
Since kolla-ansible now also supports Podman, ansible module kolla_docker has been renamed to kolla_container.
restart_policy: no
will now create systemd units, but withRestart
property set tono
.
Changes default value of nova libvirt driver setting
skip_cpu_compare_on_dest
to true. With the libvirt driver, during live migration, skip comparing guest CPU with the destination host. When using QEMU >= 2.9 and libvirt >= 4.4.0, libvirt will do the correct thing with respect to checking CPU compatibility on the destination host during live migration.
Zun is currently incompatible with Debian Bookworm. This is because Zun currently has a hard dependency on a deprecated Docker feature. Operators upgrading from Bullseye are strongly encouraged to disable Zun first. While workarounds may be possible, none are currently tested in CI.
Support for Zun for this release has been provisionally dropped. This is due to a number of base dependencies that require updating. The Zun images remain buildable, and the roles have not been removed, but a precheck has been added to prevent breaking current deployments.
Operators are strongly encouraged to hold off upgrading if Zun is a requirement. Please also consult the deprecation notes.
Deprecation Notes¶
Deprecates support for deploying Masakari. Support for deploying Masakari will be removed from Kolla Ansible in the Caracal Release.
Zun is currently provisionally deprecated but not removed. If Zun regains compatibility within the next release cycle, backports to this version of Kolla and Kolla-Ansible will be considered to provide a smooth upgrade path.
Security Issues¶
The kolla-genpwd, kolla-mergepwd, kolla-readpwd and kolla-writepwd commands now creates or updates passwords.yml with correct permissions. Also they display warning message about incorrect permissions.
Restrict the access to the http Openstack services exposed /server-status by default through the HAProxy on the public endpoint. Fixes issue for Ubuntu/Debian installations. RockyLinux/CentOS not affected. LP#1996913
Bug Fixes¶
Fixes issues with OVN NB/SB DB deployment, where first node needs to be rebootstrapped. LP#1875223
Fix MariaDB backup if enable_proxysql is enable
Fixes 504 timeout when scraping openstack exporter. Ensures that HAProxy server timeout is the same as the scrape timeout for the openstack exporter backend. LP#2006051
Fix improper use of
--file
parameter withdesignate-manage pool update
command. LP#2012292 <https://bugs.launchpad.net/kolla-ansible/+bug/2012292>
Set correct permissions for opensearch-dashboard data location LP#2020152 https://bugs.launchpad.net/kolla-ansible/+bug/2020152
Fix issue with octavia security group rules creation when using IPv6 configuration for octavia management network. See LP#2023502 for more details.
Fixes glance-api failed to start privsep daemon when cinder_backend_ceph is set to true. See LP#2024541 for more details.
Fixes 2024554. Adds host and
mariadb_port
to the wsrep sync status check. This is so none standard ports can be used for mariadb deployments. LP#2024554
enable_keystone_federation
andkeystone_enable_federation_openid
have not been explicitly handled as bool in various templates in the keystone role so far. LP#2036390
Starting with ansible-core 2.13, list concatenation format is changed which resulted in inability to override horizon policy files. See LP#2045660 for more details.
Fixes an issue when Kolla is setting the producer tasks to None, and this disables all designate producer tasks. LP#1879557
Fixes CloudKitty failing to query Prometheus now that basic authentication is required.
Fixes
ironic_tftp
which binds to all ip addresses on the system. Addedironic_tftp_interface
,ironic_tftp_address_family
andironic_tftp_interface_address
parameters to set the address for theironic_tftp
service. LP#2024664
Fixes the incorrect endpoint URLs and service type information for the Cyborg service in the Keystone. LP#2020080
Fixes an issue when using
enable_prometheus_server_external
in conjunction withhaproxy_single_external_frontend
.
Fixes an issue where a Docker health check wasn’t configured for the OpenSearch Dashboards container. See bug 2028362.
Fixes an issue where Prometheus would fail to scrape the OpenStack exporter when using internal TLS with an FQDN. LP#2008208
Fixes prometheus grafana datasource using incorrect basic auth credentials.
Fixes an issue where ‘q35’ libvirt machine type VM could not hotplug more than one PCIe device at a time.
Fixes an issue with prometheus scraping itself now that basic auth has been enabled.
Fixes an issue where Fluentd was parsing Horizon WSGI application logs incorrectly. Horizon error logs are now written to
horizon-error.log
instead ofhorizon.log
. See LP#1898174
Fixes an issue where keepalived track script fails on single controller environment and keepalived VIP goes into BACKUP state.
keepalived_track_script_enabled
variable has been introduced (default: true), which can be used to disable track scripts in keepalived configuration. LP#2025219
The
etcd
tooling has been updated to better serialize restarts when applying configuration or updates. Previously minor outages might have occurred since all services were restarted in the same task.
Fixes an issue were an OVS-DPDK task had a different name to how it was being notified.
Fixes an issue where Prometheus scraping of Etcd metrics would fail if Etcd TLS is enabled. LP#2036950
Fixes an issue where it wasn’t possible to customise Nova service config at the individual service level, which is required in some use cases.
Added ability to define address for a separate tgtd network interface.
Other Notes¶
Refactors the MariaDB and RabbitMQ restart procedures to be compatible with Ansible 2.14.3+. See Ansible issue 80848 for details.