Wallaby Series Release Notes¶
12.8.0-33¶
New Features¶
Adds the flag
om_enable_rabbitmq_high_availablity
. Setting this totrue
will enable both durable queues and classic mirrored queues in RabbitMQ. Note that classic queue mirroring and transient (aka non-durable) queues are deprecated and subject to removal in RabbitMQ version 4.0 (date of release unknown). Changes the pattern used in classic mirroring to exclude some queue types. This pattern is^(?!(amq\\.)|(.*_fanout_)|(reply_)).*
.
Upgrade Notes¶
image_upload_use_cinder_backend = True
is no longer set on the Cinder’s default Ceph RBD backend, the common upstream default is now used (False
currently). See also LP#1991516
Bug Fixes¶
image_upload_use_cinder_backend = True
is no longer set on the Cinder’s default Ceph RBD backend. Related ERRORs and WARNINGs in Cinder and Glance logs are prevented. LP#1991516
Fixes the baremetal role to avoid an error “apparmor_parser apparmor_parser –version failed” by installing apparmor package on debian like systems. LP#2004583
Configuration of service user tokens for all Nova and Cinder services is now done automatically, to ensure security of block-storage volume data.
See LP#[2004555] for more details.
The value of
[oslo_messaging_rabbit] heartbeat_in_pthread
is explicitly set to eithertrue
for wsgi applications, orfalse
otherwise.
Adds configuration necessary for application credential access rules to properly function. LP#1965111
Fixes the incorrect endpoint URLs and service type information for the Cyborg service in the Keystone. LP#2020080
Fixes Keystone OIDC failing to validate JWT because of missing key on Azure auth-oidc endpoint. Adds new variable containing JWKS uri that delivers missing keys. LP#1990375
Fix issue with octavia config generation when using
octavia_auto_configure
and thegenconfig
command. Note that access to the OpenStack API is necessary for Octavia auto configuration to work, even when generating config. See LP#1987299 for more details.
Removes the
dhcp-sequential-ip
configuration option fromironic_dnsmasq
to avoid a race condition offering the same IP address to multiple hosts being inspected at the same time.
Fixes an issue with
ironic-inspector
using the wrong option to configure the interface used to communicate with the Ironic API. LP#1995246
Fixes an issue with
ironic-neutron-agent
using the wrong option to configure the interface used to communicate with the Ironic API. LP#1990675
Fixes an issue where some prechecks would fail or not run when running in check mode. LP#2002657
When upgrading RabbitMQ, the policy ha-all was cleared only if rabbitmq_remove_ha_all_policy is set to true. Now, om_enable_rabbitmq_high_availability must also be set to false.
12.8.0¶
Bug Fixes¶
Fixes an issue with AlertManager external Web URL being unconfigurable. A new variable
prometheus_alertmanager_external_url
has been introduced that users can use to set web.external-url to public.
Under circumstances of extended disruption to the Fluentd-ElasticSearch central logging pipeline, it is possible to generate a sufficient buffer of unsent log data that takes longer than the default Fluentd request timeout (default 5 seconds) to transfer the buffer. The default request timeout value is raised to
60s
, and made configurable using new parameterfluentd_elasticsearch_request_timeout
. LP#1983031
Fixes Ironic API healthchecks when backend TLS encryption is enabled. LP#1990819
12.7.0¶
Security Issues¶
Kolla Ansible used to run Ironic’s tftpd as an (unprivileged) root user. Now, it will explicitly use the nobody user.
Bug Fixes¶
Fixes 1982777. Set multipathd user_friendly_names to “no” to make os-brick able to resize volumes online. Adds ability to override multipathd config. LP#1982777
Fixed bug #1987982 This bug caused the database log_bin_trust_function_creators variable not to be set back to “OFF” after a keystone upgrade.
Fixes an issue with Gnocchi when gnocchi-statsd is disabled. LP#1926914
Fixes an issue where ping might not be installed on some systems, causing HAProxy prechecks to fail.
If
ironic_enabled_notification_topics
is set totrue
,ironic_notification_level
is set toinfo
in order to ensure that Ironic actually sends out notifications.See bug 1969826 for details.
12.6.0¶
New Features¶
Adds variables to configure whether monitoring services should be exposed externally:
enable_grafana_external
enable_kibana_external
enable_prometheus_alertmanager_external
Bug Fixes¶
Fixes an issue where Ironic Inspector could be configured without authentication in a multi-region environment in a region without a local Keystone service.
12.5.0¶
New Features¶
Adds support for configuring the Openstack Compute API microversion used by the OpenStack exporter for Prometheus using the
prometheus_openstack_exporter_compute_api_version
variable. The default value is2.1
to keep metrics unchanged when using recent exporter releases.
Bug Fixes¶
Fixes the issue of exponential growth of /run/openvswitch mounts when kolla-toolbox container is restarted. LP#1979295
Fixes an issue with recovering multi-node MariaDB Galera cluster.
Increases
prometheus_openstack_exporter_timeout
to 45 seconds to reduce the odds of scrape failures on deployments with large number of OpenStack resources. LP#1976629
12.4.0¶
New Features¶
Adds a
tls_connect
module to the Prometheus blackbox exporter. This can be used to test connectivity of TLS servers.
New switches added to control deployment of the Masakari monitors. The deployment of each type of monitors can be controlled individually via
enable_masakari_instancemonitor
andenable_masakari_hostmonitor
. By default, both are set totrue
when the deployment of the Masakari is enabled viaenable_masakari
.
Implements container healthchecks for ironic-neutron-agent service. See blueprint
Adds support for libvirt SASL authentication. It is enabled by default. LP#1964013
Known Issues¶
Existing fluentd log rotation failed to delete old haproxy, swift, glance-tls-proxy and neutron-tls-proxy logs. These will not be deleted by the new logrotate config and will have to be removed manually.
Upgrade Notes¶
RabbitMQ’s Prometheus plugin is no longer enabled by default if Prometheus is not deployed. If external Prometheus is used, you need to turn on
rabbitmq_enable_prometheus_plugin
to get old behaviour.
An HTTP server is now always deployed for Ironic conductor, while previously it was only deployed when iPXE is enabled.
In the Wallaby release, Ironic changed the default deploy driver from iSCSI to direct. In the Xena release, Ironic removed the iSCSI driver. The recommended deploy driver is
direct
, which uses HTTP to transfer the disk image. This requires an HTTP server, and the simplest option is to use the one previously deployed whenenable_ironic_ipxe
is set totrue
.
The addition of libvirt SASL authentication requires a new password in
passwords.yml
,libvirt_sasl_password
. This may be generated using the existingkolla-genpwd
andkolla-mergepwd
tooling.
The addition of libvirt SASL authentication requires both the
nova_libvirt
andnova_compute
containers to be updated simultaneously, using new images with the necessary Cyrus SASL dependencies, as well as configuration containing the SASL credentials.
update the default value of node_custom_config to {{ node_config }}/config, when specified using –configdir
Security Issues¶
Explicitly removes the
net.ipv4.ip_forward
sysctl from/etc/sysctl.conf
on hosts with Neutron L3 Agent. In the absence of another source for this sysctl, it should revert to the default of 0 after the next reboot. This is a follow up to a previous change which stopped setting the sysctl, but leaves existing systems with the original value of 1 set.A deployer looking to more aggressively change the value may set
neutron_l3_agent_host_ipv4_ip_forward
to 0 using a Yoga release of Kolla Ansible. This option will be removed in future. Any deployments still relying on the previous value may setneutron_l3_agent_host_ipv4_ip_forward
to 1. LP#1945453
Fixes an issue where the default configuration of libvirt did not use authentication for the API exposed over TCP on the internal API network. This allowed anyone with access to the internal API network read-write access to libvirt. While the internal API network is typically trusted, other services on this network generally at least require authentication.
SASL authentication is now enabled for libvirt by default. Kolla Ansible supports libvirt TLS since the Train release, and this is recommended to provide a higher level of security. LP#1964013
Bug Fixes¶
Fixes an issue with an OIDC authentication flow requiring unnecessary action from the user. Redirecting to the target IdP page now happens automatically. LP#930055
Removes custom value of
max_allowed_secret_in_bytes
inbarbican.conf
. The default maximum size in Barbican was doubled to avoid issues with some certificates. LP #1957795
Fixed the deployment failure of outward_rabbitmq by resolving port conflicts by customizing RabbitMQ’s
prometheus.tcp.port
. LP #1885106
Use Volume V3 API in OpenStack exporter. Volume V2 API has been removed since OpenStack Wallaby. LP#1938194
Fixes the copy job for grafana custom home dashboard file. The copy job for the grafana home dashboard file needs to run priviliged, otherwise permission denied error occurs. LP#[1947710]
Fixes Octavia’s “Connection refused” errors by adding
ovn_sb_connection
tooctavia.conf
. LP#195011
Ironic API and Ironic Inspector API use separate policy files. Ironic role was updated to be able to handle both policies separately. LP#1952948
Continue to run all actions if one action failed in Elasticsearch curator. LP#1954720
Fixes Placement no logrotate configuration LP#1954723
Fixes Nova resize failing when
migration_interface
is customised. LP#1956976
Fixes unable to connect to zun console when
kolla_enable_tls_external
is true. Access to console of any zun container fails whenkolla_enable_tls_external
is true. This fix sets the protocol for wsproxybase_url
inzun.conf
according to the value ofkolla_enable_tls_external
LP#1957117
Fixes Glance with Cinder iSCSI backend failing due to lack of lock_path setting. LP#1959663
Fixes logrotate config missing for openvswitch and prometheus services. LP#1961795
Fixes an issue with Ironic’s PXE components not getting updated on upgrade. LP#1963752
Fixes configuration of the Prometheus HTTP API URL when using the Prometheus collector in CloudKitty. LP#1961615
Fix the apache’s wsgi configuration for the aodh service in Debuntu binary flavours. LP#1953059
Fixes the baremetal role to avoid an error “Unable to remove “libvirtd”. Now the symlink /etc/apparmor.d/disable/usr.sbin.libvirtd is created by the role. LP#1960302
Existing fluentd log rotation failed to delete old haproxy, swift, glance-tls-proxy and neutron-tls-proxy logs. Standardise rotation and deletion of logs using logrotate.
Fixes an issue with setting up OIDC based Keystone federation against IDP that has a different response type than id_token. This can now be set using a new variable
keystone_federation_oidc_response_type
. LP#1959781
adds back the option to configure the rabbitmq clustering interface via kolla LP#1900160 <https://bugs.launchpad.net/kolla-ansible/+bug/1900160>
Fixes an issue seen when using Jinja2 3.1.0.
Fixes an issue with Masakari instance monitor when libvirt SASL is enabled. libvirt SASL was enabled by default in a recent change to Kolla Ansible. LP#1965754
Fixes the configuration option setting the type of endpoint used by Neutron to send requests to Placement. LP#1960503
Fixes a configuration issue with Node Exporter causing all file system metrics of a host to be identical. LP#1961438
Fixes an issue where a failure of any Nova compute service to register itself would cause only the host querying the nova API to fail. Now, only hosts that fail to register will fail the Kolla Ansible run. Alternatively, to fail all hosts in a cell when any compute service fails to register, set
nova_compute_registration_fatal
totrue
. LP#1940119
The prometheus openstack exporters are now behind haproxy, providing a unique time series in the prometheus database. Also ensures that only one exporter queries the openstack APIs at any given time interval. With the previous behavior each openstack exporter was scraped at the same time. This caused each exporter to query the openstack APIs simultaneously introducing unneccesary load and duplicate time series in the prometheus database due to the instance label being unique for each exporter. LP#1972818
Fixes an issue where RabbitMQ was configured to mirror classic transient queues for all services. According to the RabbitMQ documentation this is not a supported configuration, and contributed to numerous bug reports. In order to avoid making unexpected changes to the RabbitMQ cluster, it is necessary to set
rabbitmq_remove_ha_all_policy
toyes
in order to apply this fix. This variable will be removed in the Yoga release. LP#1954925
Fixes an issue with Cinder upgrade where Cinder services would remain pinned to the previous release’s RPC & object versions. LP#1954932
12.3.0¶
New Features¶
Adds a new variable,
disable_firewall
, which defaults totrue
. If set tofalse
, then the host firewall will not be disabled duringkolla-ansible bootstrap-servers
.
Implements container healthchecks for keystone-fernet container. See blueprint
Implements container healthchecks for memcached services. See blueprint
Implements container healthchecks for nova-spicehtml5proxy service. See blueprint
Adds two new arguments to the
kolla-ansible
command,--check
and--diff
. They are passed through directly toansible-playbook
.
Adds “manila_cephfs_filesystem_name” variable to support multi-fs Ceph Pacific+ deloyments.
Upgrade Notes¶
To fix LP#1941940,
nova_libvirt_dimensions
now by default combines withnova_libvirt_default_dimensions
. Please consider this when customising that variable.
Security Issues¶
Fixes
net.ipv4.ip_forward
not to be enabled by Kolla Ansible on the default network namespace. It was enabled on hosts with Neutron L3 Agent (thus in most common setups with OVS and/or Linux Bridge, but not OVN) and allowed, unless users had extra iptables rules to avoid that, any traffic to be accepted for forwarding (as long as it was routable and passed other checks). Users of existing setups are advised to re-evaluate whether they need this sysctl enabled and disable if not necessary. Kolla Ansible will simply no longer try to set this sysctl at all. Neutron L3 Agent handles forwarding enablement per managed namespace. LP#1945453
Adds mitigation for the Apache Log4j2 Remote Code Execution (RCE) Vulnerability in Elasticsearch - CVE-2021-44228.
Bug Fixes¶
Fixed broken
kolla-toolbox
container when RabbitMQ is disabled and IPv6 is used. LP#1939883
Fixes inability to attach devices (e.g., volumes via iSCSI/FC) to instances on Debian Bullseye. LP#1941940
Fixes
mariadb-clustercheck
not to run when there is no HAProxy. LP#1944114
No longer creates directories for haproxy and swift logs where they are not needed. LP#1945070
Fixes an issue with multinode MariaDB deployments which could fail the playbook execution on WSREP check due to the new behaviour of Galera 4. LP#1947485.
Fixes an issue on Debian with single node MariaDB deployments with HAProxy disabled. See bug 1947534 for details.
Fixes the generation of
wsrep_cluster_address
ingalera.cnf
when--limit
is used while deploying MariaDB nodes. LP#1947589
Fixes an error in placement role which prevents to deploy the placement service when custom policy file is used. LP#1948835
Fixes missing current Ansible version in the error message. LP#1948979
Fix octavia role doesn’t set the amphora network’s gateway_ip LP#1949260
Only run
configure ovn in ovsdb
task on ovn-controller hosts The task will fail on hosts (like controller nodes) without tunnel interface LP#1953367
Fixes an issue where the Nova API logs were written to files ending with -wsgi.log which affected the processing of these logs in the Fluentd pipeline. LP#1950185
On slower nodes, the initial grafana startup could experience a timeout failure when the migrations for setting up the database took longer than expected. This has been fixed by increasing the default timeout. The timeout settings can be changed via new parameters
grafana_start_first_node_delay
andgrafana_start_first_node_retries
for thegrafana
role. LP#1769962
Removes “fix_cephfs_owner.yaml” which related to pre-wallaby Manila’s use of subfolders. Post-wallaby Manila now uses cephfs volumes instead, as such this file is no longer required. LP#1938285 LP#1935784
Removes use of “cephfs_enable_snapshots” in Manila config as this option was removed from Manila in the Wallaby release.
12.2.0¶
New Features¶
Adds config parameter
haproxy_nova_spicehtml5_proxy_tunnel_timeout
to configure theTunnel TimeOut
directive for spicehtml5proxy haproxy service.
Adds two new variables
service_images_pull_retries
andservice_images_pull_delay
which control the behaviour of image pulling tasks. These are useful if your registry is not 100% reliable (usually due to load). The defaults have been set to 3 retries and 5 seconds delay to ensure a better default experience (these are actually Ansible defaults when task retries are enabled).
Implements container healthchecks for rabbitmq services. See blueprint
Adds support for configuring the
filter
andgather_subset
arguments for thesetup
module viakolla_ansible_setup_filter
andkolla_ansible_setup_gather_subset
respectively. These can be used to reduce the number of facts, which can have a significant effect on performance of Ansible.
New variable
ironic_enable_keystone_integration
was added. It helps to add keystone connection information intoironic.conf
if we want to connect to existing keystone (not installing it at the same time).
Upgrade Notes¶
Updates all references to Ansible facts within Kolla Ansible from using individual fact variables to using the items in the
ansible_facts
dictionary. This allows users to disable fact variable injection in their Ansible configuration, which may provide some performance improvement. Check for facts referenced in local configuration files, and update to useansible_facts
before disabling fact variable injection.
Modifies the default value of
ceph_nova_user
fromnova
to the value ofceph_cinder_user
, in line with the default forceph_nova_keyring
. Users who have overriddenceph_nova_keyring
to use separate keyrings for Nova and Cinder should also overrideceph_nova_user
to match the Nova keyring. LP#1934145
Modifies the default value of
rabbitmq_server_additional_erl_args
from an empty string to+S 2:2 +sbwt none +sbwtdcpu none +sbwtdio none
.
Critical Issues¶
Fixes a critical bug which caused Nova instances (VMs) using libvirtd (the default/usual choice) to get killed on libvirtd (
nova_libvirt
) container stop (and thus any restart - either manual or done by running Kolla Ansible). It was affecting Wallaby+ on CentOS, Ubuntu and Debian Buster (not Bullseye). If your deployment is also affected, please read the referenced Launchpad bug report, comment #22, for how to fix it without risking data loss. In short: fixing requires redeploying and this will trigger the bug so one has to first migrate important VMs away and only then redeploy empty compute nodes. LP#1941706
Bug Fixes¶
Fixes monasca-thresh to correctly submit the topology to Storm. The previous container ran the topology in local mode (within the container), and didn’t use the Storm cloud. The new container handles submitting the topology to Storm and also handles killing and replaces the topology when it’s configuration has changed. As a result, the monasca-thresh container is only used for submission, and exits after that’s completed. The logs for the topology will now be available in the storm worker-artifact logs. LP#1808805
Fixes an issue where configuration in containers could become stale. This prevented containers with updated configuration from being restarted, e.g., if the
kolla-ansible genconfig
andkolla-ansible deploy-containers
commands were used together. LP#1848775
Fixes elasticsearch fluentd output being enabled when elasticsearch is not enabled. LP#1927880
Fixes an issue with timesync checks on deployment host. See bug 1933347 for details.
Fixes horizon’s healthcheck when SSL is turned on. LP#1933846
Fixes an issue seen when customising the Docker Yum repository URL on CentOS, where the
docker_yum_gpgkey
variable is not used consistently. LP#1934913
Fixes an issue where spice console is freezed after while, see LP#1938549.
Fixes Masakari in multi-region deployments to query Nova API in its own region. LP#1939291
Fixes nova’s healthchecks when upgrading from previous version. LP#1939679
Fixes an issue with Cyborg deployment. LP#1937911
Fixes HAProxy prechecks when
kolla_externally_managed_cert
is used.
Fixes an issue with
config.json
forneutron-server
when a VMware plugin agent is used.
Stops Fluentd warning message when posting to Elasticsearch 7 bulk API.
Fixes an issue with Neutron
linuxbridge
ML2 agent whenneutron_external_interface
includes multiple interfaces. LP#1863935
Fixes an issue with Manila configuration which was missing a
[glance]
section, preventing some drivers from operating.
Fixes an issue with default Nova configuration for Ceph where the RBD user is set to
nova
, but only acinder
keyring is copied. The default value ofceph_nova_user
is changed to the value ofceph_cinder_user
, in line with the default forceph_nova_keyring
. LP#1934145
Fixes an issue where RabbitMQ consumes a large amount of CPU, particularly on multi-core systems. The default RabbitMQ tuning assumes that RabbitMQ is running on a dedicated host, which is the opposite of a typical Kolla Ansible container setup. For more details on tuning RabbitMQ in your environment, please see: https://www.rabbitmq.com/runtime.html#busy-waiting https://www.rabbitmq.com/runtime.html#scheduling
Other Notes¶
Optimised image pulling to avoid looping over disabled services.
12.1.0¶
New Features¶
Added a new haproxy configuration variable,
haproxy_host_ipv4_tcp_retries2
, which allows users to modify this kernel option. This option sets maximum number of times a TCP packet is retransmitted in established state before giving up. The default kernel value is 15, which corresponds to a duration of approximately between 13 to 30 minutes, depending on the retransmission timeout. This variable can be used to mitigate an issue with stuck connections in case of VIP failover, see bug 1917068 for details.
Adds the ability to override the automatic detection of fluentd_version and fluentd_binary. These can now be defined as extra variables. This removes the dependency of having docker configured for config generation.
Bug Fixes¶
Fixes missing region_name in keystone_auth sections. See bug 1933025 for details.
Fixes default Masakari host monitor config to work with other config that Kolla Ansible sets. This sets
disable_ipmi_check
due torestrict_to_remotes
being set. It prevents theTypeError
that happened when host monitor had to take action. This does not affect any functionality so far as Kolla Ansible does not manage IPMI credentials in Pacemaker. LP#1933209
Fixes an issue where
kolla-ansible
exits with a zero exit code when executed with a bogus command name. LP#1929397
Fixes the container health check for the
ironic_ipxe
container on Debian and Ubuntu systems. LP#1937037
Fixes an issue with Magnum when TLS is enabled. LP#781062
Other Notes¶
Following Cinder upstream, support for using ZFSSA with Cinder has been removed. ZFSSA was unsupported in Train and later removed in Ussuri.
12.0.0¶
New Features¶
Adds HAcluster Ansible role. This role contains High Availability clustering solution composed of Corosync, Pacemaker and Pacemaker Remote.
HAcluster is added as a helper role for Masakari which requires it for its host monitoring, allowing to provide HA to instances on a failed compute host.
Adds configuration parameter
kolla_httpd_timeout
to configure theTimeOut
directive for services that use Apache HTTP server to handle HTTP requests. The default value is 60 seconds which matches the original default, but you may wish to increase this.
Adds support for explicitly creating individually customisable topics in Kafka for Monasca.
Add support for the OpenID Connect authentication protocol in Keystone and enables both ID and access token authentication flows.
Adds support for Masakari host monitor.
Add new option
prometheus_openstack_exporter_timeout
to override defaultscrape_timeout
for openstack exporter job.
Prometheus version 2.x deployment added. This version is enabled by default and replaces a forward-incompatible version 1.x. A variable
prometheus_use_v1
can be set toyes
to preserve version 1.x deployment with its data. Otherwise, Prometheus will start with a new volume, ignoring all previously collected metrics.
Adds support for GlusterFS NFS Manila backend.
Adds support for Elasticsearch storage backend with Cloudkitty: This feature allows for storage of Cloudkitty rating documents directly within an Elasticsearch cluster.
If you already have an Elasticsearch cluster running for logging a new cloudkitty specific index will be used. This allows you to use Kibana, Grafana or another interface to browse your rating data and create an appropriate dashboard or build an appropriate billing service over it.
Adds support for Prometheus as a fetcher/collector for Cloudkitty: This feature allows for use of Prometheus metrics as your source of rating. Using Prometheus allows for rating almost any OpenStack object directly from the Kolla provided exporters (openstack_exporter) or your own custom exporters.
Add
octavia-driver-agent
toOctavia
deployments to allow for additional providers, e.g.ovn-octavia-provider
. It is automatically deployed whenOctavia
is enabled andneutron_plugin_agent
is set toovn
. It can be also enabled by settingenable_octavia_driver_agent
toyes
.
Adds support for CentOS Stream 8 as a host Operating System and base container image. This is the only distribution of CentOS supported from the Wallaby release. The Victoria release will support both CentOS Linux 8 and CentOS Stream 8 hosts and images, and provides a route for migration.
Adds support for using a
tmpfs
mount for the image conversion directory of thecinder_volume
container. This is disabled by default, but may be enabled by settingcinder_enable_conversion_tmpfs
totrue
.
Adds support to import custom Grafana dashboards. The dashboard JSON files should be placed into
{{ node_custom_config }}/grafana/dashboards/
.
Supports Debian Bullseye (11) as host distribution.
Adds support in
kolla_docker
module to setCgroupnsMode
for Docker containers (viacgroupns_mode
module param). Requires Docker 20.10. Note that pre-20.10 all containers behave as if they were run with modehost
.
Adds a new flag,
docker_disable_default_network
, which defaults tono
. Docker is using172.17.0.0/16
by default for bridge networking ondocker0
, and this might cause routing problems for operator networks. Setting this flag toyes
will disable Docker’s bridge networking. This feature will be enabled by default from the Wallaby 12.0.0 release.
Add support for configuring Docker Engine http/https proxy.
Adds support to the
kolla_docker
module for creatingtmpfs
mounts for containers.
Add
kolla_externally_managed_cert
option to disable copy of certificates from the operator host to Kolla Ansible managed hosts.
Implemented container healthchecks for following services:
aodh
,barbican
,blazar
,cinder
,cloudkitty
,cyborg
,designate
,elasticsearch
,gnocchi
,haproxy
,ironic
,kibana
,magnum
,manila
,octavia
,redis
,sahara
,senlin
,skydive
,tacker
,trove
,vitrage
,watcher
. See blueprint
The Mariadb role now allows the creation of multiple clusters. This provides a benefit to operators as they are able to install and maintain several clusters at once using kolla-ansible. This is useful when deploying database clusters for cells or database clusters for services that have large demands on the database.
Switch
octavia-api
to WSGI running under Apache.
Added configuration options to enable backend TLS encryption from HAProxy to the Octavia service. When used in conjunction with enabling TLS for service API endpoints, network communication will be encrypted end to end, from client through HAProxy to the Octavia service.
It is now possible to use Neutron DHCP agent together with OVN networking. New variable is added to control this feature:
neutron_ovn_dhcp_agent
, defaulting tono
.
OVN deployment will now configure
external_ids:ovn-chassis-mac-mappings
to make DVR work on VLAN tenant networks.
Adds support for collecting Prometheus metrics from RabbitMQ. This is enabled by default when Prometheus and RabbitMQ are enabled, and may be disabled by setting
enable_prometheus_rabbitmq_exporter
tofalse
.
Elasticsearch can be optionally registered as an internal service in the Keystone Catalogue. This is off by default.
Due to the removal of the Monasca Grafana fork, the Monasca datasource is now configured in vanilla Grafana.
Support sending control plane logs directly to Elasticsearch when Monasca is enabled.
Support has been added to optionally disable the Monasca alerting pipeline. This can be helpful to reduce resource consumption on Monasca service hosts if the alerting pipeline is not in use.
Adds a means of overriding the haproxy config of individual services. Custom template files can be placed under
{{ node_custom_config }}/haproxy-config/
to be rendered with the same variables as the generic template. Template file names must match the service to override eg.nova-novncproxy.cfg
.
Upgrade Notes¶
New Prometheus version ignores previously stored metrics. If you want to keep using 1.x with the old data, set
prometheus_use_v1
variable toyes
. The old data is not removed, please read the docs for details. Please also make sure you adapt changes in command line options if they were ever customized in your environment because Prometheus 2.x has different syntax (--option
with double dashes instead of-option
).
Bumps minimum required Docker version to 18.09 and minimum required Docker Python SDK version to 3.4.1. These two are checked in prechecks.
CentOS Linux 8 is no longer supported as a host Operating System or base container image. CentOS users should migrate to CentOS Stream 8. The Victoria release will support both CentOS Linux 8 and CentOS Stream 8 hosts and images, and provides a route for migration.
Combine
trove-taskmanager.conf
andtrove-conductor.conf
totrove.conf
. you should move all customized opts in/etc/kolla/config/trove/trove-taskmanager.conf
or/etc/kolla/config/trove/trove-conductor.conf
to/etc/kolla/config/trove/trove.conf
.trove-taskmanager.conf
andtrove-conductor.conf
are no longer used in wallaby
Due to deprecation,
chrony
is no longer enabled by default. To enable it, setenable_chrony
totrue
.If disabled, the container and configuration may be removed by running
kolla-ansible chrony-cleanup
.The
kolla-ansible prechecks
command will fail if Chrony is disabled and the container is running. It will also fail if Chrony is disabled and no host NTP daemon is detected. This check may be disabled by settingprechecks_enable_host_ntp_checks
tofalse
if using an NTP daemon other than chrony, ntpd or systemd-timesyncd.
The Monasca Log Metrics service has been deprecated and is now disabled by default. If you wish to enable it, you can set
monasca_enable_log_metrics_service
toTrue
inglobals.yml
.
Docker iptables manipulation and bridge networking are now disabled by default. This avoids problems that may be caused by Docker settings the default policy of the
FORWARD
chain in thefilter
table toDROP
. To revert to the previous behaviour, setdocker_disable_default_iptables_rules
tono
. This sets the default ofdocker_disable_default_network
.
Adds a new flag,
docker_disable_ip_forward
, which defaults todocker_disable_default_iptables_rules
and is used to disable docker’sip-forward
option which makes docker setnet.ipv4.ip_forward
sysctl to1
. By default,docker_disable_default_iptables_rules
istrue
, in which case docker’sip-forward
option isdisabled
.For existing hosts, this configuration change is applied when configuring docker via
kolla-ansible bootstrap-servers
. Docker changes the sysctl in a non-persistent manner, so it will revert to the default of0
after a reboot, if not configured elsewhere. This should not cause a problem, since Kolla Ansible applies the sysctl where necessary. Operators may wish to perform a proactive reboot, or apply the default through other means.
Removes support for Cinder v2 API. This API was planned for removal from Cinder in the Wallaby release but has been postposed until Xena. Cinder v2 API endpoints are removed from the Keystone endpoint catalogue on upgrade. API v2 support removed in Wallaby
The Karbor project is no longer maintained and retired since the Wallaby cycle. Its support and roles are also removed since Wallaby cycle.
Customizing Neutron Linux bridge and Open vSwitch Agents config via
ml2_conf.ini
is removed. The config has been split out for these agents intolinuxbridge_agent.ini
andopenvswitch_agent.ini
respectively. The old behaviour was deprecated in Ussuri.
Service containers and configuration for the Monasca Grafana service will be removed automatically. It is up to the operator to remove the related HAProxy configuration, the Monasca Grafana database, and associated Docker volumes.
Monasca Log Transformer has been merged with Monasca Log Persister to improve performance and reduce resource consumption. Any custom Monasca Log Transformer configuration should be either merged into Monasca Log Persister configuration, or moved outside of the Monasca pipeline, for example, to Fluentd. Any custom Monasca Log Metrics config will also need to be updated to read from the raw logs pipeline, rather than the transformed logs pipeline. The transformed logs pipeline will be removed from Kafka automatically, as will any log transformer containers. There will be a short interruption to logging services whilst the pipeline is updated. During this time it’s likely that a small window of logs will be lost from the transformed logs Kafka queue. If this is a problem, the Monasca API should be stopped on all nodes prior to upgrading Monasca. This will allow the transformed logs topic to drain into Elasticsearch before the pipeline is reconfigured. Services such as Fluentd, which post logs to the Monasca API, should buffer logs whilst this happens up to the maximum configured buffer. Note that there may be other services forwarding logs, and these will need to be inspected independently. The Log Transformer volumes will remain on the monitoring nodes and can be manually removed as described in the documentation.
The
Qinling
project is no longer maintained and retired since Wallaby cycle . Its support and roles are also removed since Wallaby cycle.
The
Searchlight
project is no longer maintained and retired since Wallaby cycle . Its support and roles are also removed since Wallaby cycle.
Update service configuration for the ELK 7 OSS release. A rolling upgrade from ELK 6 is supported. Please see the official upgrade notes for more detail.
Deprecation Notes¶
Support for deploying
chrony
is deprecated and will be removed in the Xena cycle. The container is no longer enabled by default. To enable it, setenable_chrony
totrue
.
Support for configuration of NTP daemon (via
enable_host_ntp
) is deprecated and will be removed in the next Kolla Ansible release (Xena). Please use other means of configuring NTP.
The Monasca Fork of Grafana is deprecated due to lack of maintenance and will be removed in the Xena release. Instead, support will be provided to allow Monasca users to migrate to the vanilla Grafana service with the Monasca datasource.
The Monasca Log Metrics service has been deprecated and will be removed in the Xena release.
Deprecates support for Prometheus v1.x. In Xena release cycle support for this image will be removed from Kolla Ansible.
Support for deploying
tempest
andrally
is deprecated and will be removed in the Xena cycle. The reason is that these are not services of an OpenStack cloud but its clients.
The
wsrep-notify.sh
script used for disablinghaproxy
user when node is not ready for accepting connections in deployments withoutmariadb-clustercheck
is deprecated and will be removed in Xena release. It has been unreliable and the recommended (enabled by default) approach is to deploymariadb-clustercheck
.
Security Issues¶
The Monasca Grafana service is effectively unmaintained and should not be exposed externally, or in situations where the risk of monitoring data leakage between tenants would be undesired.
Bug Fixes¶
Fixes an issue where it was not easily possible to set the Apache HTTP timeout directive, where the default of 60s would cause problems in slow running services. See LP#1917648.
Fixes a trivial issue where some Monasca containers could momentarily restart when initially racing to create topics in Kafka.
Fixes an issue with
kolla-ansible bootstrap-servers
if Zun is enabled where Zun-specific configuration for Docker was applied to all nodes. LP#1914378
Fix the issue when Swift deployed with S3 Token Middleware enabled. Fixes LP#1862765
OVN will no longer schedule SNAT routers on compute nodes when
neutron_ovn_distributed_fip
is enabled. LP#1901960
RabbitMQ services are now restarted serially to avoid a split brain. LP#1904702
Fixes LP#1906796 by adding notice and note loglevels to monasca log-metrics drop configuration
Fixes Swift’s stop action. It will no longer try to start
swift-object-updater
container again. LP#1906944
Fixes an issue with the
kolla-ansible prechecks
command with Docker 20.10. LP#1907436
Fixes an issue with
kolla-ansible mariadb_recovery
when themariadb
container does not exist on one or more hosts. LP#1907658
Fixes the Northbound and Southbound database socket paths in OVN.
chronyd crash loop if server is rebooted (Debian) LP#1915528
Fixes an issue preventing prechecks from succeeding when a “non-native” NTP daemon was used, such as
ntpd
as opposed tosystemd-timesyncd
on a Debian/Ubuntu system or tochronyd
on a CentOS/RHEL system. LP#1922721
Fixed an issue when Docker was configured after startup on Debian/Ubuntu, which resulted in iptables rules being created - before they were disabled. LP#1923203
Fixes an issue with Octavia SSH key copying if user disabled Octavia auto configuration. LP##1927727
Fixed an issue where docker python SDK 5.0.0 was failing due to missing six - introduced a constraint to install version lower than 5.x. LP#1928915
Fixes more-than-2-node RabbitMQ upgrade failing randomly. LP#1930293.
Fixes Swift deploy when TLS enabled. Added the missing handler and corrected the container name. LP#1931097
Fixes
iscsid
failing in current CentOS 8 based images due to pid file being needlessly set. LP#1933033
Fixes host bootstrap on Debian not removing the conflicting packages. It now behaves in accordance with the docs. LP#1933122
Fixes potential issue with Alertmanger in non-HA deployments. In this scenario, peer gossip protocol is now disabled and Alertmanager won’t try to form a cluster with non-existing other instances. LP#1926463
Adds a new flag,
docker_disable_ip_forward
, which defaults todocker_disable_default_iptables_rules
and is used to disable docker’sip-forward
option which makes docker setnet.ipv4.ip_forward
sysctl to1
. This is to protect from creating all-forwarding hosts. LP#1931615
Fixes an issue when generating
/etc/hosts
duringkolla-ansible bootstrap-servers
when one or more hosts has anapi_interface
with dashes (-
) in its name. LP#1927357
Fixes some configuration issues around Barbican logging. LP#1891343
Fixes some configuration issues around Cinder logging. LP#1916752
Fix cyborg api doesn’t listen on api interface. change host to host_ip in cyborg.conf. See the cyborg documentation
Fix the wrong configuration of the ovs-dpdk service. this breaks the deployment of kolla-ansible. For more details please see bug 1908850.
Fixes an issue with executing
kolla-ansible
when installed viapip install --user
. LP#1915527
Fixes an issue where the Libvirt AppArmor profile is disable and the bootstrap-servers process tries to remove it. See bug 1909874 for details.
Fixes the container image used by mariabackup. It was using the
mariadb
image, which was deprecated in Victoria and removed in Wallaby. LP#1928129
Fixes an issue where
masakari.conf
was generated for themasakari-instancemonitor
service but not used.
Fixes an issue where
masakari-monitors.conf
was generated for themasakari-api
andmasakari-engine
services but not used.
Uses a consistent variable name for container dimensions for
masakari-instancemonitor
-masakari_instancemonitor_dimensions
. The old name ofmasakari_monitors_dimensions
is still supported.
Fixes an issue with Octavia deployment when using a custom service auth project. If
octavia_service_auth_project
is set to a project that does not exist, Octavia deployment would fail. The project is now created. LP#1922100
Fixes LP#1892376 by updating deprecated syntax in the Monasca Elasticsearch template.
Removes whitespace around equal signs in
zookeeper.cfg
which were preventing thezkCleanup.sh
script from running correctly.