Upgrade OVN to 22.03 on Focal

Important

This page has been identified as being affected by the breaking changes introduced between versions 2.9.x and 3.x of the Juju client. Read support note Breaking changes between Juju 2.9.x and 3.x before continuing.

Charmed OpenStack supports OVN version 22.03 starting with OpenStack Ussuri. Clouds running on Focal nodes that are not using this version are strongly recommended to upgrade to it in order to benefit from important bug fixes and software enhancements.

In particular, the procedure described on this page aims to prevent OVN data plane downtime during the upgrade to 22.03. This is an upstream OVN issue that can cause network disruption to all cloud VMs.

Important

Read this entire document before making any changes to your cloud.

It is recommended that the upgrade be tested in a staged environment prior to applying the steps to a production cloud.

Prerequisites

Ensure the following prerequisites are satisfied before making any changes.

Juju version

Juju must be running the latest stable version of its major and minor release (e.g. 2.9.x). This pertains to all three Juju components: client, controller model, and workload model. Juju upgrade documentation is available, but quick guidance is also given below.

First ensure that the client context is the cloud’s controller and model (check with command juju whoami). The essential commands are then:

sudo snap refresh juju
juju upgrade-controller
juju upgrade-model

Channel charms

OVN must be managed by channel charms (charms ovn-central and ovn-chassis). If it is not, perform the migration away from legacy charms by applying special procedure All charms: migration to channels to those two charms.

Caution

As the above migration document states, when performing the migration to channel charms, ensure that the currently running OVN version does not change.

Updated neutron-api-plugin-ovn charm

The neutron-api-plugin-ovn charm must support the ovn-source configuration option. This works for the xena, wallaby, victoria, and ussuri tracks.

To verify whether the option is available, query for its value:

juju config neutron-api-plugin-ovn ovn-source

Upgrade the charm if the option is not available:

juju refresh neutron-api-plugin-ovn

Procedure

Set the election timer

To minimise timeout issues during the upgrade, set the OVS database server election timer to its maximum value:

juju config ovn-central ovsdb-server-election-timer=30

For background information, see section Raft leader election timeout of this document.

Allow the ovn-central application to settle - use the juju status ovn-central command.

Ensure OVN package requirements

Ensure that select packages are up to date on the cloud’s OVN units.

Perform the package upgrades on all OVN units by running commands across the ovn-chassis and ovn-central applications:

juju exec -a ovn-chassis 'apt update && apt -y install \
   --only-upgrade openvswitch-common ovn-common'
juju exec -a ovn-central 'apt update && apt -y install \
   --only-upgrade openvswitch-common ovn-common'

Note

Some clouds may be running ovn-dedicated-chassis as opposed to ovn-chassis.

Fail-safe mode on OVN < 22.03

To prevent an OVN data plane outage during the upgrade to 22.03 the ovn-controller daemon must be placed into fail-safe mode. This section corresponds to upstream’s documented fail-safe method.

First stop the ovn-northd daemon:

juju exec -a ovn-central 'systemctl stop ovn-northd'

Secondly, identify the Southbound database leader unit (see the Querying OVN page for guidance).

Finally, manually set the northd version to an arbitrary string. The ovn-controller processes will detect this change and adapt to be able to understand the data that the upgraded northd daemon will subsequently insert into the database (use the Southbound leader unit found above):

juju exec -u <sb-db-leader-unit> 'ovn-sbctl set sb-global .  options:northd_internal_version="<string>"'

An example invocation of the above if the Southbound leader unit is ovn-central/2:

juju exec -u ovn-central/2 'ovn-sbctl set sb-global . options:northd_internal_version="safe"'

The above command contains the string ‘safe’. Any string will suffice provided that it is different from the current OVN version.

Perform the upgrade

To ensure a smooth migration, guidance is provided below that includes verification steps.

ovn-central

Prior to upgrading the ovn-central application, change its software sources to ‘distro’ and change the charm’s channel to ‘22.03/stable’:

juju refresh ovn-central --channel 22.03/stable \
   --config <(printf "ovn-central:\n source: \"distro\"")

Now upgrade the application by selecting the UCA pocket for OVN 22.03 on Focal:

juju config ovn-central ovn-source=cloud:focal-ovn-22.03

As before, allow the ovn-central application to settle - use the juju status ovn-central command.

Verify: database migration

Ensure that the upgraded Northbound and Southbound database schemas match what’s expected (the target version). An example set of commands are provided below.

The Northbound database’s target version and actual version, respectively:

juju exec -a ovn-central 'ovsdb-tool schema-version /usr/share/ovn/ovn-nb.ovsschema'

Stdout: |
6.1.0
UnitId: ovn-central/0
Stdout: |
6.1.0
UnitId: ovn-central/1
Stdout: |
6.1.0
UnitId: ovn-central/2

juju exec -a ovn-central 'ovsdb-client get-schema-version unix:/var/run/ovn/ovnnb_db.sock OVN_Northbound'

Stdout: |
6.1.0
UnitId: ovn-central/0
Stdout: |
6.1.0
UnitId: ovn-central/1
Stdout: |
6.1.0
UnitId: ovn-central/2

The Southbound database’s target version and actual version, respectively:

juju exec -a ovn-central 'ovsdb-tool schema-version /usr/share/ovn/ovn-sb.ovsschema'

Stdout: |
20.21.0
UnitId: ovn-central/0
Stdout: |
20.21.0
UnitId: ovn-central/2
Stdout: |
20.21.0
UnitId: ovn-central/1

juju exec -a ovn-central 'ovsdb-client get-schema-version unix:/var/run/ovn/ovnsb_db.sock OVN_Southbound'

Stdout: |
20.21.0
UnitId: ovn-central/0
Stdout: |
20.21.0
UnitId: ovn-central/1
Stdout: |
20.21.0
UnitId: ovn-central/2

If versions do not match it might be that the database migration did not succeed (see log files under /var/log/ovn on the ovn-central units).

Verify: cluster status

Check the status of the Northbound and Southbound database clusters. It is expected that one unit has Role: leader and the others have Role: follower. An example set of commands are provided below.

The Northbound database cluster:

juju exec -a ovn-central 'ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound' | egrep "Server ID|Role|Leader"

Server ID: 2a92 (2a9226b6-7a57-411a-94ee-092aa6a19e40)
Role: follower
Leader: bc3a
Server ID: adb2 (adb28a73-4e21-492c-81d0-f51adc6665a4)
Role: follower
Leader: bc3a
Server ID: bc3a (bc3a26b1-14c0-4133-b2c3-d8f64e4b722d)
Role: leader
Leader: self

The Southbound database cluster:

juju exec -a ovn-central 'ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound' | egrep "Server ID|Role|Leader"

Server ID: 8849 (8849b07b-cc32-47cf-8800-ed89fbc7db94)
Role: follower
Leader: fa7e
Server ID: 50b7 (50b7f34e-b295-4329-8d29-47039f697365)
Role: follower
Leader: fa7e
Server ID: fa7e (fa7e81bb-90e9-4c87-8ce4-cedcd54c6150)
Role: leader
Leader: self

ovn-chassis

To upgrade the ovn-chassis application, change the charm’s channel to ‘22.03/stable’ and then select the UCA pocket for OVN 22.03 on Focal:

juju refresh ovn-chassis --channel 22.03/stable
juju config ovn-chassis ovn-source=cloud:focal-ovn-22.03

Once ovn-chassis units settle in the active/idle state after the config change, restart OVN Metadata agents with:

juju exec -a ovn-chassis 'systemctl restart neutron-ovn-metadata-agent'

Note

Restart of Neutron OVN metadata agents is especially important when upgrading from OVN versions lower than 20.09. These versions used table Chassis in SB database for chassis registration whereas newer versions use Chassis_Private. Without the service restart, metadata agents will not re-register in the new database table and Neutron will not be able to detect these agents.

Neutron

To upgrade OVN packages used by neutron, configure the neutron-api-plugin-ovn charm to use the overlay repository that contains the ‘22.03’ release of OVN:

juju config neutron-api-plugin-ovn ovn-source="cloud:focal-ovn-22.03"
Verify: network agents

Ensure that all network agents are “alive” and “up”:

openstack network agent list

Sample output:

+---------------+----------------------+---------------+---------------+-------+-------+-------------------------------+
| ID            | Agent Type           | Host          | Avail... Zone | Alive | State | Binary                        |
+---------------+----------------------+---------------+---------------+-------+-------+-------------------------------+
| xxxx-xxxx-... | OVN Controller agent | xxxx-xxxx-... |               | :-)   | UP    | ovn-controller                |
| xxxx-xxxx-... | OVN Metadata agent   | xxxx-xxxx-... |               | :-)   | UP    | networking-ovn-metadata-agent |
| xxxx-xxxx-... | OVN Controller agent | xxxx-xxxx-... |               | :-)   | UP    | ovn-controller                |
| xxxx-xxxx-... | OVN Metadata agent   | xxxx-xxxx-... |               | :-)   | UP    | networking-ovn-metadata-agent |
+---------------+----------------------+---------------+---------------+-------+-------+-------------------------------+

Other resources

Raft leader election timeout

The Raft leader election timeout is a crucial factor in the upgrade. It is governed by the ovn-central charm’s ovsdb-server-election-timer configuration option, whose default value is ‘4’ (seconds).

The amount of wall clock time a database (Northbound or Southbound) cluster leader consumes during the upgrade process cannot exceed the election timer. If this occurs, the database unit attempting the upgrade (schema conversion) will be evicted from the cluster, thereby preventing its results from being stored. This scenario will lead to an endless retry loop.

Conversion happens on startup of the DB services after package upgrades. To prevent the aforementioned retry loop, the startup scripts have a 30 second hardcoded timeout. Therefore:

  1. the maximum effective value for the ovsdb-server-election-timer option is ‘30’

  2. an alternative upgrade path would be needed if the conversion cannot succeed within that maximum

There is no template answer for what the value of the option should be. External factors (e.g. server performance characteristics, load, database size, number of records) all have a role to play.

See the upstream mailing list thread for a discussion on the topic. Issue LP #2013344 raises concerns about the option’s default value being too small.