Current Series Release Notes

20.0.0-178

Prelude

The Kolla Ansible 21.0.0 (Flamingo) release focuses on tightening operations for the control plane, database layer, and observability stack while following upstream service retirements. Highlights include:

  • Database services now use ProxySQL by default with MariaDB adopting the upstream healthcheck.sh script, TLS enabled for all MariaDB connections through ProxySQL, and Valkey replacing Redis. The legacy HAProxy/clustercheck path and containers have been removed.

  • Logging and monitoring were overhauled: Fluentd moved into its own role and sends logs directly to OpenSearch nodes, Prometheus node-exporters run from a dedicated role, and OpenSearch Dashboards connects to the data nodes without an intermediate HAProxy hop.

  • Control-plane services gained multiple lifecycle improvements. Neutron now mirrors the upstream layout with new maintenance/RPC workers, wrapper containers manage the OVN metadata-agent HAProxy processes, nova-metadata runs in its own container, Horizon uses port 8080 when fronted by HAProxy, and the default uWSGI provider now covers more services.

  • Tooling and reliability improvements: the supported ansible window is 11–12, host bootstrap tasks moved into ansible-collection-kolla, CA bundle trust paths were aligned on Enterprise Linux hosts, mod_oidc gained templated error pages, and several HA fixes landed (Let’s Encrypt ACME cleanup, Horizon memcached resilience, RabbitMQ single-node upgrades, and ProxySQL routing improvements).

  • With ironic-inspector retired upstream, Kolla Ansible now provides the ironic-pxe-filter service to cover bare-metal PXE filtering and removes other unused integrations such as Venus and VMware drivers.

New Features

  • Adds knobs for the Keystone mod_auth_openidc integration: tune the timeout manager via OIDCStateTimeout and provide a custom error page at {{ node_custom_config }}/keystone/federation/modoidc-error-page.html.

  • ProxySQL is now enabled automatically whenever MariaDB is enabled, and the container health check now uses the upstream healthcheck.sh script instead of clustercheck.

  • Fluentd now sends logs directly to OpenSearch node IPs instead of using a Load Balancer. This change reduces Load Balancer overhead from high log volumes. The Load Balancer for OpenSearch remains in place, as it is still used by OpenSearch Dashboards. Fluentd continues to handle node availability, automatically distributing logs via round-robin to available nodes, ensuring log delivery even if individual OpenSearch nodes become unavailable.

  • Adds the optional ironic-pxe-filter service controlled by enable_ironic_pxe_filter. This brings parity with the standalone inspector. Upstream currently classifies the PXE filter as experimental.

  • Implement neutron_agents_wrappers for the neutron-ovn-metdata-agent. This allows the haproxy processes which forward metadata requests in ml2/ovn setups to spawn in separate containers.

  • The OVN container images (ovn-nb-db, ovn-northd and ovn-sb-db) have now default environment variables in place that ease running of ovn-nbctl and ovn-sbctl commands for operators.

  • Added a Valkey role with Sentinel so deployments use Valkey instead of Redis without changing coordination endpoints.

  • Improves performance of Prometheus deployment by separating the prometheus_node_exporter and prometheus_cadvisor services to a new prometheus-node-exporters role.

  • TLS support for MariaDB connections has been enabled for all services when using ProxySQL.

  • bootstrap-servers now always uses the system Python interpreter via auto_silent autodetection.

    octavia-certificates now use the same Python interpreter as the one running the kolla-ansible command itself.

  • HTTP chunked input is now enabled by default for all uWSGI services.

  • Adds support for running following services using uWSGI (without using Apache+mod_wsgi) which is enabled by default. To disable it please set <service>_wsgi_provider to apache (default is uwsgi):

    Service

    Variable

    Aodh

    aodh_wsgi_provider

    Gnocchi

    gnocchi_wsgi_provider

    Heat

    heat_wsgi_provider

    Horizon

    horizon_wsgi_provider

    Ironic

    ironic_wsgi_provider

    Keystone

    keystone_wsgi_provider

    Masakari

    masakari_wsgi_provider

    Octavia

    octavia_wsgi_provider

Upgrade Notes

  • Minimum supported Ansible version is now 11 (ansible-core 2.18) and maximum supported is 12 (ansible-core 2.19).

  • Changes haproxy and rabbitmq default trusted CA store path on EL systems to ca-bundle.crt from ca-bundle.trust.crt.

  • The cron tasks now live in their own Ansible role instead of being shipped inside common.

  • Deployments now ship a default template at ansible/roles/keystone/templates/modoidc-error-page.html.j2 to handle federated authentication errors. Operators can override the full template or just adjust the redirect delay via keystone_federation_oidc_error_page_retry_login_delay_milliseconds. The default redirect delay is 5 seconds.

  • The HAProxy + clustercheck backend for MariaDB is no longer supported. Running kolla-ansible upgrade now deploys ProxySQL and removes the old clustercheck containers automatically.

  • Support for deploying ironic-inspector has been dropped following the service’s retirement upstream. The remaining variables and artifacts were renamed for consistency: ironic_inspector_kernel_cmdline_extras becomes ironic_kernel_cmdline_extras, ironic_inspector_pxe_filter becomes ironic_pxe_filter, and inspector.ipxe becomes ipa.ipxe.

  • bifrost also removed its legacy inspector integration, so the bifrost_enable_ironic_inspector option has been deleted.

  • neutron_legacy_iptables and its handling has been dropped.

  • Support for deploying Venus container images has been dropped.

  • VMware drivers across Nova, Cinder, and Neutron are no longer deployed. Upstream projects removed the integration and the third-party libraries are unmaintained.

  • fluentd now has its own Ansible role instead of being deployed from the common role.

  • Horizon default port (80/443) has been changed to 8080 when using HAProxy, while the old default has been retained for development environments using enable_haproxy set to no.

  • Neutron agent wrappers are now enabled by default. The wrapper containers restart DHCP, L3, and related agents without having to respawn the main service containers, which reduces dataplane disruptions during upgrades and restarts. Operators who need the previous behaviour can set neutron_agents_wrappers to "no" in /etc/kolla/globals.yml.

  • Neutron now runs its API workers under uWSGI and moves auxiliary processes into dedicated containers, matching the upstream deployment model. TLS is terminated directly on uWSGI, so the neutron-tls-proxy service was removed. New containers introduced with this change include:

    • neutron-ovn-maintenance-worker

    • neutron-rpc-server

    • neutron-periodic-workers

  • OpenSearch Dashboards now connects directly to OpenSearch nodes, rather than via a HAProxy endpoint. This should have no user facing impact.

  • Support for Linux Bridge mechanism driver has been removed. The driver was already removed from neutron.

  • Redis has been replaced with Valkey. Before running kolla-ansible upgrade, set enable_redis: "no" and enable_valkey: "yes" in globals.yml. The upgrade playbooks automatically migrate Redis data into Valkey using temporary ports and then switch back to the defaults.

  • Deployments using a file-based external certificate and Let’s Encrypt for the internal certificate (separate VIPs) default to managing the external certificate with Let’s Encrypt. To retain a file-based external certificate, set letsencrypt_external_cert_server: "".

Security Issues

  • Deny access to /server-status via the single frontend. LP#2121626

Bug Fixes

  • Fixes bug LP#2118452 which stopped the RabbitMQ upgrade from version 3.13 to 4.1 even though it is supported.

  • Fixes handler invocation failure in the ovs-dpdk role. LP#2088197

  • Fixes haproxy configurations that kept rendering the acme_client_back backend and the path_reg ^/.well-known/acme-challenge/.+ ACL even when Let’s Encrypt support was disabled. LP#2097452

  • Fixes an issue where Horizon returned HTTP 500 errors when one of the Memcached nodes was unavailable by setting ignore_exc to True in the cache backend. LP#2106557

  • In the kolla-toolbox configuration with external rabbitmq an unnecessary “comma” is generated, which is why the container does not want to start. LP#2111267

  • Fixes an issue where vendordata.json, if defined, was not being copied to the nova-metadata directory. LP#2111328

  • Single-node RabbitMQ upgrades no longer fail on the unsupported drain command; the playbooks now call stop_app in that scenario. LP#2111916

  • Improves ProxySQL routing by setting default_hostgroup for every MariaDB user and by adding user-based rules alongside the schema-based rules. Statements that run before a schema is selected (for example SET AUTOCOMMIT or ROLLBACK) now land in a valid hostgroup instead of failing against NULL backends. LP#2112339

  • Fixed certificate script rendering in Let’s Encrypt role. LP#2115230

  • Fixes configuration of backend TLS when network nodes are separate from controllers. LP#2117084

  • Handlers to trigger a restart nova_libvirt and ovn_sb_db_relay containers have been removed and restarts of these services are now under the control of the service-check-containers role LP#2123946.

  • Fix an issue causing etcd backend TLS certificates to not be templated as the kolla_copy_backend_tls_files variable was evaluating to false due to the etcd_enable_tls_backend variable being undefined.

  • Fixes deployment of Cyborg in dev mode. LP#2030849

  • Remove reference to EXTRA_OPTS in documentation.

  • Fixes an issue where CORS can be blocked when attempting to upload an image via the Horizon user interface.

  • Fixes a bug where Cinder endpoint that Nova uses does not get overridden because of the use of invalid option. LP#2115064

  • Fixes the bug where Keystone become unable to start when the option OIDCXForwardedHeaders is set with empty string in wsgi-keystone.conf. LP#2119344

  • Fixes RabbitMQ version check which would always be skipped. LP#2102662

  • Fixes a bug where K-A can fail service deployment because it tries to copy backend TLS certificates of some hosts to containers when both hosts and containers are not part of backend TLS and do not have certificates to copy. LP#2105505

  • Fixed Fluentd configuration template to avoid generating unnecessary empty lines when optional parameters are not set.

  • Prevents accidental libvirt downgrades in nova_libvirt container image during deploy and upgrade. Adds a nova_libvirt version check that resolves the target image digest once on the first compute host and runs only on hypervisors where the running container digest differs from the target.

  • Move tasks that modified host configuration from kolla-ansible role common to a-c-k as they need to be run only once at the bootstrap of the host and are not strongly related to the common services.

  • Adds a missing override for octavia_notification_topics so that operators can add their own notification topics for Octavia. By default it will send notifications to ceilometer when ceilometer is enabled.

  • Allow operators to run kolla-ansible post-deploy without escalating privileges on the deploy node when node_config is writable for that user.

  • Restore the default Let’s Encrypt ACME server for external certificates so that enabling enable_letsencrypt works out of the box again without explicitly setting letsencrypt_external_cert_server. The default is https://acme-v02.api.letsencrypt.org/directory.