Appendix F1: Series upgrade¶
Overview¶
The purpose of this document is to provide foundational knowledge for preparing an administrator to perform a series upgrade across a Charmed OpenStack cloud. This translates to upgrading the operating system of every cloud node to an entirely new version.
Note
A series upgrade, a charm upgrade, and an OpenStack upgrade are all conceptually different and involve separate operations.
Once this document has been studied the administrator will be ready to graduate to the Series upgrade OpenStack guide that describes the process in more detail.
Concerning the cloud being operated upon, the following is assumed:
It is being upgraded from one LTS series to another (e.g. xenial to bionic, bionic to focal, etc.)
Its nodes are backed by MAAS.
Its services are highly available
It is being upgraded with minimal downtime
Warning
Upgrading a single production machine from one LTS to another is a serious task. Doing so for every cloud node can be that much harder. Attempting to do this with minimal cloud downtime is an order of magnitude more complex.
Such an undertaking should be executed by persons who are intimately familiar with Juju and the currently deployed charms (and their related applications). It should first be tested on a non-production cloud that closely resembles the production environment.
The Juju upgrade-series command¶
The Juju upgrade-series command is the cornerstone of the entire procedure. This command manages an operating system upgrade of a targeted machine and operates on every application unit hosted on that machine. The command works in conjunction with either the prepare or the complete sub-command.
The basic process is to inform the units on a machine that a series upgrade is about to commence, to perform the upgrade, and then inform the units that the upgrade has finished. In most cases with the OpenStack charms, units will first be paused and be left with a workload status of “blocked” and a message of “Ready for do-release-upgrade and reboot.”
For example, to inform units on machine ‘0’ that an upgrade (to series ‘bionic’) is about to occur:
juju upgrade-series 0 prepare bionic
The prepare sub-command causes all the charms (including
subordinates) on the machine to run their pre-series-upgrade
hook.
The administrator must then perform the traditional steps involved in upgrading the OS on the targeted machine (in this example, machine ‘0’). For example, update/upgrade packages with apt update && apt full-upgrade; invoke the do-release-upgrade command; and reboot the machine once complete.
The complete sub-command causes all the charms (including
subordinates) on the machine to run their post-series-upgrade
hook. In most
cases with the OpenStack charms, configuration files will be re-written, units
will be resumed automatically (if paused), and be left with a workload status
of “active” and a message of “Unit is ready”:
juju upgrade-series 0 complete
At this point the series upgrade on the machine and its charms is now done. In the juju status output the machine’s entry under the Series column will have changed from ‘xenial’ to ‘bionic’.
Note
Charms are not obliged to support the two series upgrade hooks but they do make for a more intelligent and a less error-prone series upgrade.
Containers (and their charms) hosted on the target machine remain unaffected by this command. However, during the required post-upgrade reboot of the host all containerised services will naturally be unavailable.
See the Juju documentation to learn more about the series upgrade feature.
Pre-upgrade requirements¶
This is a list of requirements that apply to any cloud. They must be met before making any changes.
All the cloud nodes should be using the same series, be in good working order, and be updated with the latest stable software packages (APT upgrades).
The cloud should be running the latest OpenStack release supported by the current series (e.g. Mitaka for trusty, Queens for xenial, etc.). See Ubuntu OpenStack release cycle and OpenStack upgrades.
The cloud should be fully operational and error-free.
All currently deployed charms should be upgraded to the latest stable charm revision. See Charm upgrades.
The Juju model comprising the cloud should be error-free (e.g. there should be no charm hook errors).
Automatic package updates should be disabled on the nodes to avoid potential conflicts with the manual (or scripted) APT steps.
Specific series upgrade procedures¶
Charms belonging to the OpenStack Charms project are designed to accommodate the next LTS target series wherever possible. However, a new series may occasionally introduce unavoidable challenges for a deployed charm. For instance, it could be that a charm is replaced by an entirely new charm on the new series. This can happen due to development policy concerning the charms themselves (e.g. the ceph charm is replaced by the ceph-mon and ceph-osd charms) or due to reasons independent of the charms (e.g. the workload software is no longer supported on the new operating system). Any core OpenStack charms affected in this way will be documented below.
Workload specific preparations¶
These are preparations that are specific to the current cloud deployment. Completing them in advance is an integral part of the upgrade.
Charm upgradability¶
Verify the documented series upgrade processes for all currently deployed charms. Some charms, especially third-party charms, may either not have implemented series upgrade yet or simply may not work with the target series. Pay particular attention to SDN (software defined networking) and storage charms as these play a crucial role in cloud operations.
Workload maintenance¶
Any workload-specific pre and post series upgrade maintenance tasks should be readied in advance. For example, if a node’s workload requires a database then a pre-upgrade backup plan should be drawn up. Similarly, if a workload requires settings to be adjusted post-upgrade then those changes should be prepared ahead of time. Pay particular attention to stateful services due to their importance in cloud operations. Examples include evacuating a compute node, switching an HA router to another node, and storage rebalancing.
Pre-upgrade tasks are performed before issuing the prepare subcommand, and post-upgrade tasks are done immediately prior to issuing the complete subcommand.
Workflow: sequential vs. concurrent¶
In terms of the workflow there are two approaches:
Sequential - upgrading one machine at a time
Concurrent - upgrading a group of machines simultaneously
Normally, it is best to upgrade sequentially as this ensures data reliability and availability (we’ve assumed an HA cloud). This approach also minimises adverse effects to the deployment if something goes wrong.
However, for even moderately sized clouds, an intervention based purely on a sequential approach can take a very long time to complete. This is where the concurrent method becomes attractive.
In general, a concurrent approach is a viable option for API applications but is not an option for stateful applications. During the course of the cloud-wide series upgrade a hybrid strategy is a reasonable choice.
To be clear, the above pertains to upgrading the series on machines associated with a single application. It is also possible however to employ similar thinking to multiple applications.
Application leadership¶
Application leadership plays an important role in determining the order in which machines (and their applications) will have their series upgraded. The guiding principle is that an application’s unit leader is acted upon by a series upgrade before its non-leaders are (the leader is typically used to coordinate aspects with other services over relations).
Note
Juju will not transfer the leadership of an application (and any subordinate) to another unit while the application is undergoing a series upgrade. This allows a charm to make assumptions that will lead to a more reliable outcome.
Assuming that a cloud is intended to eventually undergo a series upgrade, this guideline will generally influence the cloud’s topology. Containerisation is an effective response to this.
Important
Applications should be co-located on the same machine only if leadership plays a negligible role. Applications deployed with the compute and storage charms fall into this category.
Generic series upgrade¶
This section contains a generic overview of a series upgrade for three machines, each hosting a unit of the ubuntu application. The initial and target series are xenial and bionic, respectively.
This scenario is represented by the following juju status command output:
Model Controller Cloud/Region Version SLA Timestamp
upgrade maas-controller mymaas/default 2.7.6 unsupported 18:33:49Z
App Version Status Scale Charm Store Rev OS Notes
ubuntu1 16.04 active 3 ubuntu jujucharms 15 ubuntu
Unit Workload Agent Machine Public address Ports Message
ubuntu1/0* active idle 0 10.0.0.241 ready
ubuntu1/1 active idle 1 10.0.0.242 ready
ubuntu1/2 active idle 2 10.0.0.243 ready
Machine State DNS Inst id Series AZ Message
0 started 10.0.0.241 node2 xenial zone3 Deployed
1 started 10.0.0.242 node3 xenial zone4 Deployed
2 started 10.0.0.243 node1 xenial zone5 Deployed
First ensure that any new applications will (by default) use the new series, in this case bionic. This is done by configuring at the model level:
juju model-config default-series=bionic
Now do the same at the application level. This will affect any new units of the existing application, in this case ‘ubuntu1’:
juju set-series ubuntu1 bionic
Perform the actual series upgrade. We begin with the machine that houses the application unit leader, machine 0 (see the asterisk in the Unit column). Note that juju run is preferred over juju ssh but the latter should be used for sessions requiring user interaction:
1 2 3 4 5 6 7 8 | # Perform any workload maintenance pre-upgrade steps here
juju upgrade-series 0 prepare bionic
juju run --machine=0 -- sudo apt update
juju ssh 0 sudo apt full-upgrade
juju ssh 0 sudo do-release-upgrade
# Perform any workload maintenance post-upgrade steps here
# Reboot the machine (if not already done)
juju upgrade-series 0 complete
|
In this generic example there are no workload maintenance steps to perform. If there were post-upgrade steps then the prompt to reboot the machine at the end of do-release-upgrade should be answered in the negative and the reboot will be initiated manually on line 7 (i.e. sudo reboot).
It is possible to invoke the complete sub-command before the upgraded machine is ready to process it. Juju will block until the unit is ready after being restarted.
In lines 4 and 5 the upgrade proceeds in the usual interactive fashion. If a non-interactive mode is preferred, those two lines can be replaced with:
juju run --machine=0 --timeout=30m -- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \
-o "Dpkg::Options::=--force-confdef" \
-o "Dpkg::Options::=--force-confold" dist-upgrade
juju run --machine=0 --timeout=30m -- sudo DEBIAN_FRONTEND=noninteractive \
do-release-upgrade -f DistUpgradeViewNonInteractive
The apt-get command is preferred while in non-interactive mode (or with scripting).
Caution
Performing a series upgrade non-interactively can be risky so the decision to do so should be made only after careful deliberation.
Machines 1 and 2 should now be upgraded in the same way (in no particular order).
Note
It has been reported that a trusty:xenial series upgrade may require an
additional step to ensure a purely non-interactive mode. A file under
/etc/apt/apt.conf.d
with a single line as its contents needs to be added
to the target machine pre-upgrade and be removed post-upgrade. It can be
created (here on machine 0) in this way:
juju run –machine=0 – “echo ‘DPkg::options { “–force-confdef”; “–force-confnew”; }’ | sudo tee /etc/apt/apt.conf.d/local”
Next steps¶
When you are ready to perform a series upgrade across your cloud proceed to appendix Series upgrade OpenStack.