Orchestrated Deployment Host Software Deployment

Software deployment orchestration automates the process of upversioning the StarlingX software to a new major release or new patch release (In-Service or Reboot Required (RR)). It automates the execution of all software deploy steps across all the hosts in a cluster, based on the configured policies.

Note

Software deployment orchestration also covers the orchestrated upversioning to a new patched major release, that is, all the comments in this section that are specific to major release also apply to a patched major release.

Software deployment Orchestration supports all standalone configurations: AIO-SX, AIO-DX and standard configuration.

Note

Orchestrating the software deployment of a DC system is different from orchestrating the software deployment of standalone StarlingX configurations.

Software deployment orchestration automatically iterates through all the hosts and deploys the new software load on each host: first the controller hosts, then the storage hosts, and lastly the worker hosts, and finally activates and completes the software deployment. During software deployment on a worker host (and duplex AIO controllers), pods or VMs are automatically moved to the alternate worker hosts. After software deployment orchestration has deployed the new software on all hosts, it will activate, complete, and delete the new software deployment.

Note

Software deployment orchestration completes and deletes the new software deployment only when the -delete option is selected by the user during create strategy. In case of a Major Release, if the software deployment is deleted, it can no longer be rolled back.

To perform a software deployment orchestration, first create an upgrade orchestration strategy for the automated software deployment procedure. This provides polices to perform the software deployment orchestration using the following parameters:

  • The host types to be software deployed.

  • Whether to deploy the software to hosts serially or in parallel.

    • The maximum number of hosts to deploy in parallel.

  • Maintenance action (stop-start or migrate) for hosted OpenStack VMs on a host that is about to have its software updated.

  • Alarm restrictions, that is, options to specify how the orchestration behaves when alarms occur.

Based on these parameters and the state of the hosts, software deployment orchestration creates a number of stages for the overall software deployment strategy. Each stage generally consists of deploying software on hosts for a subset of the hosts on the system. In the case of a reboot required (RR) software release, each stage consists of moving pods or VMs, locking hosts, deploying software on hosts, and unlocking hosts for a subset of the hosts on the system. After creating the software deployment orchestration strategy, you can either apply the entire strategy automatically or apply individual stages to control and monitor their progress manually.

Prerequisites

  • No other orchestration strategy exists. Firmware-upgrade, kubernetes-version-upgrade, system-config-update-strategy, and kube-rootca-update are other types of orchestration. A software deployment cannot be orchestrated while another orchestration is in progress.

  • You have the administrator role privileges.

  • The system is clear of alarms except the software deployment in progress alarm.

  • All the hosts are unlocked, enabled, and available.

  • For Duplex systems, the system should be fully redundant. There should be two controller nodes available, at least one complete storage replication group available for systems with Ceph backend.

  • Sufficient free capacity or unused worker resources must be available across the cluster. A rough calculation is:

    Required spare capacity ( %) = (<Number-of-hosts-to-upgrade-in-parallel> / <total-number-of-hosts>) * 100

  • For a major release deployment, the license for the new release has been installed using system license-install <license-for-new-major-release>.

  • The software release to be deployed has been uploaded.

    • For a major release:

      ~(keystone_admin)]$ software upload [ --local ] <new-release>.iso
      <new-release>.sig <new-release-id> is now uploaded
      +-------------------------------+-------------------+
      | Uploaded File                 | Release           |
      +-------------------------------+-------------------+
      | <new-release>.iso             | <new-release-id>  |
      +-------------------------------+-------------------+
      

      This command may take 5-10 mins depending on hardware.

      where –local can be used when running this command in an SSH session on the active controller to optimize performance. With this option, the system will read files directly from the local disk rather than transferring files over REST APIs backing the CLI.

    • For a patch release:

      ~(keystone_admin)]$ software upload <filename>.patch
      <release-id> is now uploaded
      +-------------------------------+-------------------+
      | Uploaded File                 | Release           |
      +-------------------------------+-------------------+
      | <new-release>.patch           | <new-release-id>  |
      +-------------------------------+-------------------+
      
    • Ensure that the new software release was successfully uploaded.

      ~(keystone_admin)]$ software list
      +--------------------------+-------+-----------+
      | Release                  | RR    |   State   |
      +--------------------------+-------+-----------+
      | starlingx-10.0.0         | True  | deployed  |
      | <new-release-id>         | True  | available |
      +--------------------------+-------+-----------+
      
  • For a major release deployment, the platform issuer (system-local-ca) must be configured beforehand with an RSA certificate/private key. If system-local-ca was configured with a different type of certificate/private key, use the Update system-local-ca or Migrate Platform Certificates to use Cert Manager procedure to reconfigure it with RSA certificate/private key.

Procedure

  1. Create a software deployment orchestration strategy for a specified software release with desired policies.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy create [--controller-apply-type {serial,ignore}]
                                                             [--storage-apply-type {serial,parallel,ignore}]
                                                             [--worker-apply-type {serial,parallel,ignore}]
                                                             [--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]
                                                             [--instance-action {stop-start,migrate}]
                                                             [--alarm-restrictions {strict,relaxed}]
                                                             [--delete]
                                                             <software-release-id>
    
    strategy-uuid:                          5435e049-7002-4403-acfb-7886f6da14af
    release-id:                             <software-release-id>
    controller-apply-type:                  serial
    storage-apply-type:                     serial
    worker-apply-type:                      serial
    default-instance-action:                stop-start
    alarm-restrictions:                     strict
    current-phase:                          build
    current-phase-completion:               0%
    state:                                  building
    inprogress:                             true
    

    where,

    <software-release-id>

    Specifies the specific software release to deploy. This can be a patch release or a major release.

    [--controller-apply-type {serial,ignore}]

    (Optional) Specifies whether software should be deployed to controller hosts in serial or ignored. By default, it is serial. ignore should be used only when re-creating and applying a strategy after an abort or failure.

    [--storage-apply-type {serial,parallel,ignore}]

    (Optional) Specifies whether software should be deployed to storage hosts in serial, in parallel, or ignored. By default, it is serial. Software is deployed to storage hosts in parallel by software deploying a storage host from each storage redundancy group. ignore should be used only when re-creating and applying a strategy after an abort or failure.

    Note

    If parallel apply for storage is used, it will be automatically replaced with the serial apply for --storage-apply-type.

    [--worker-apply-type {serial,parallel,ignore}]

    (Optional) Specifies whether software should be deployed to worker hosts in serial, in parallel or ignored. By default, it is serial. The number of worker hosts that are software deployed in parallel is specified by [--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]. The default is 2. ignore should be used only when re-creating and applying a strategy after an abort or failure.

    [--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]

    Specifies the number of worker hosts that are software deployed in parallel that is specified by [--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]. The default is 2.

    [--instance-action {stop-start,migrate}]

    Applies only to OpenStack VM hosted guests. It specifies the action performed to hosted OpenStack VMs on a worker host (or AIO controller) prior to deploying the new software to the host. The default is stop-start.

    • stop-start

      Before deploying the software release to the host, all the hosted OpenStack VMs are stopped or shutdown.

      After deploying the software release to the host, all the hosted OpenStack VMs are restarted.

    • migrate

      Before deploying the software release to the host, all the hosted OpenStack VMs are migrated to another host capable of hosting the hosted OpenStack VM and that is not part of the current stage.

      • Hosts whose software is already updated are preferred over the hosts whose software has not been updated yet.

      • Live migration is attempted first. If live migration is not possible for the OpenStack VM, cold migration is performed.

    [--alarm-restrictions {strict,relaxed}]

    Lets you determine how to handle alarm restrictions based on the management affecting statuses of any existing alarms, which takes into account the alarm type as well as the alarm’s current severity. Default is strict. If set to relaxed, orchestration will be allowed to proceed if there are no management affecting alarms present.

    Performing management actions without specifically relaxing the alarm checks will still fail if there are any alarms present in the system (except for a small list of basic alarms for the orchestration actions, such as an upgrade operation in progress alarm not impeding upgrade orchestration). You can use the CLI command fm alarm-list --mgmt_affecting to view the alarms that are management affecting.

    • Strict maintains alarm restrictions.

    • Relaxed relaxes the usual alarm restrictions and allows the action to proceed if there are no alarms present in the system with a severity equal to or greater than its management affecting severity. That is, it will use the -f (force) option on the precheck or start of the deployment.

    [--delete]

    (Optional) Specifies if the software deployment needs to be deleted or not.

  2. Wait for the build phase of the software deployment orchestration strategy create to be 100% complete and its state to be ready-to-apply.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy show
    Strategy Software Deploy Strategy:
      strategy-uuid:                          6282f049-bb9e-46f0-9ca8-97bf626884e0
      release-id:                             <software-release-id>
      controller-apply-type:                  serial
      storage-apply-type:                     serial
      worker-apply-type:                      serial
      default-instance-action:                stop-start
      alarm-restrictions:                     strict
      current-phase:                          build
      current-phase-completion:               100%
      state:                                  ready-to-apply
      build-result:                           success
      build-reason:
    

    Note

    If the build phase fails (build-result:  failed that will appear in the show command), determine the issue from the build error reason (build-reason: <Error information> that will appear in the show command) and/or in /var/log/nfv-vim*.log on the active controller, address the issues, delete the strategy, and retry the create.

  3. (Optional) Displays --error-details (phases and steps) of the build strategy.

    The software deploy strategy consists of one or more stages, which consist of one or more hosts to have the new software deployed at the same time.

    Each stage will be split into steps (for example, query-alarms, lock-hosts, upgrade-hosts).

    The new software is deployed on the controller hosts first, followed by the storage hosts, and then the worker hosts.

    The new software is deployed on the worker hosts with no hosted guests (Kubernetes pods or OpenStack VMs) and before the worker hosts with hosted guests (Kubernetes pods or OpenStack VMs).

    Hosted Kubernetes pods will be relocated off each worker host (AIO-Controller) if another worker host capable of hosting the Kubernetes pods is available before the new software is deployed to the worker host (AIO-Controller).

    Hosted OpenStack VMs will be managed according to the requested --instance-action on each worker host (AIO-Controller) before the new software is deployed to the worker host (AIO-Controller).

    The final step in each stage is one of the following:

    system-stabilize

    This waits for a period of time (up to several minutes) and ensures that the system is free of alarms.

    This ensures that we do not continue to deploy the new software to more hosts if the software deployment has caused an issue resulting in an alarm.

    wait-data-sync

    This waits for a period of time (up to many hours) and ensures that data synchronization has completed after the upgrade of a controller or storage node.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy show --details
    
    Strategy Software Deploy Strategy:
     strategy-uuid:                          6282f049-bb9e-46f0-9ca8-97bf626884e0
     release-id:                             <software-release-id>
     controller-apply-type:                  serial
     storage-apply-type:                     serial
     worker-apply-type:                      serial
     default-instance-action:                stop-start
     alarm-restrictions:                     strict
     current-phase:                          build
     current-phase-completion:               100%
     state:                                  ready-to-apply
    
     build-phase:
       ...
       stages:
         ...
         steps:
           ...
    
     apply-phase:
       ...
       stages:
         ...
         steps:
           ...
    
  4. Apply and monitor the software deployment orchestration.

    You can either apply the entire strategy automatically or apply the individual stages to control and monitor their progress manually.

    1. Apply the entire strategy automatically and monitor its progress:

      ~(keystone_admin)]$ sw-manager sw-deploy-strategy apply
      Strategy Software Deploy Strategy:
        strategy-uuid:                          52873771-fc1a-48cd-b322-ab921d34d01c
        release-id:                             <software-release-id>
        controller-apply-type:                  serial
        storage-apply-type:                     serial
        worker-apply-type:                      serial
        default-instance-action:                stop-start
        alarm-restrictions:                     strict
        current-phase:                          apply
        current-phase-completion:               0%
        state:                                  applying
        inprogress:                             true
      

      Show high-level status of apply.

      ~(keystone_admin)]$ sw-manager sw-deploy-strategy show
      Strategy Software Deploy Strategy:
        strategy-uuid:                          35b48793-66f8-46be-8972-cc22117a93ff
        release-id:                             <software-release-id>
        controller-apply-type:                  serial
        storage-apply-type:                     serial
        worker-apply-type:                      serial
        default-instance-action:                stop-start
        alarm-restrictions:                     strict
        current-phase:                          apply
        current-phase-completion:               7%
        state:                                  applying
        inprogress:                             true
      

      Show details of active stage or step of apply.

      ~(keystone_admin)]$ sw-manager sw-deploy-strategy show --active
      Strategy Software Deploy Strategy:
        strategy-uuid:                          52873771-fc1a-48cd-b322-ab921d34d01c
        release-id:                             <software-release-id>
        controller-apply-type:                  serial
        storage-apply-type:                     serial
        worker-apply-type:                      serial
        default-instance-action:                stop-start
        alarm-restrictions:                     strict
        current-phase:                          apply
        current-phase-completion:               7%
        state:                                  applying
        apply-phase:
          total-stages:                         3
          current-stage:                        0
          stop-at-stage:                        3
          timeout:                              12019 seconds
          completion-percentage:                7%
          start-date-time:                      2024-06-11 12:19:51
          inprogress:                           true
          stages:
              stage-id:                         0
              stage-name:                       sw-upgrade-start
              total-steps:                      3
              current-step:                     1
              timeout:                          1321 seconds
              start-date-time:                  2024-06-11 12:19:51
              inprogress:                       true
              steps:
                  step-id:                      1
                  step-name:                    start-upgrade
                  timeout:                      1200 seconds
                  start-date-time:              2024-06-11 12:19:51
                  result:                       wait
                  reason:
      
    2. Apply individual stages.

      ~(keystone_admin)]$ sw-manager sw-deploy-strategy apply --stage-id <STAGE-ID>
      Strategy Software Deploy Strategy:
        strategy-uuid:                          a0277e08-93cc-4964-ba39-ebab367a547c
        release-id:                             <software-release-id>
        controller-apply-type:                  serial
        storage-apply-type:                     serial
        worker-apply-type:                      serial
        default-instance-action:                stop-start
        alarm-restrictions:                     strict
        current-phase:                          apply
        current-phase-completion:               0%
        state:                                  applying
        inprogress:                             true
      
      ~(keystone_admin)]$ sw-manager sw-deploy-strategy show
      Strategy Software Deploy Strategy:
        strategy-uuid:                          a0277e08-93cc-4964-ba39-ebab367a547c
        release-id:                             <software-release-id>
        controller-apply-type:                  serial
        storage-apply-type:                     serial
        worker-apply-type:                      serial
        default-instance-action:                stop-start
        alarm-restrictions:                     strict
        current-phase:                          apply
        current-phase-completion:               7%
        state:                                  applying
        inprogress:                             true
      
      ~(keystone_admin)]$ sw-manager sw-deploy-strategy show --active
      Strategy Software Deploy Strategy:
        strategy-uuid:                          a0277e08-93cc-4964-ba39-ebab367a547c
        release-id:                             <software-release-id>
        controller-apply-type:                  serial
        storage-apply-type:                     serial
        worker-apply-type:                      serial
        default-instance-action:                stop-start
        alarm-restrictions:                     strict
        current-phase:                          apply
        current-phase-completion:               7%
        state:                                  applying
        apply-phase:
          total-stages:                         3
          current-stage:                        0
          stop-at-stage:                        1
          timeout:                              1322 seconds
          completion-percentage:                7%
          start-date-time:                      2024-06-11 14:40:23
          inprogress:                           true
          stages:
              stage-id:                         0
              stage-name:                       sw-upgrade-start
              total-steps:                      3
              current-step:                     1
              timeout:                          1321 seconds
              start-date-time:                  2024-06-11 14:40:23
              inprogress:                       true
              steps:
                  step-id:                      1
                  step-name:                    start-upgrade
                  timeout:                      1200 seconds
                  start-date-time:              2024-06-11 14:40:23
                  result:                       wait
                  reason:
      
  5. While a software deployment orchestration strategy is being applied, it can be aborted.

    The current step will be allowed to complete and if necessary, an abort phase will be created and applied, which will attempt to unlock any hosts that were locked.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy abort
    Strategy Software Deploy Strategy:
      strategy-uuid:                          63f48dfc-f833-479b-b597-d11f9219baf5
      release-id:                             <software-release-id>
      controller-apply-type:                  serial
      storage-apply-type:                     serial
      worker-apply-type:                      serial
      default-instance-action:                stop-start
      alarm-restrictions:                     strict
      current-phase:                          apply
      current-phase-completion:               7%
      state:                                  aborting
      inprogress:                             true
    

    Wait for the abort to complete.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy show
    Strategy Software Deploy Strategy:
      strategy-uuid:                          63f48dfc-f833-479b-b597-d11f9219baf5
      release-id:                             <software-release-id>
      controller-apply-type:                  serial
      storage-apply-type:                     serial
      worker-apply-type:                      serial
      default-instance-action:                stop-start
      alarm-restrictions:                     strict
      current-phase:                          abort
      current-phase-completion:               100%
      state:                                  aborted
      apply-result:                           failed
      apply-reason:
      abort-result:                           success
      abort-reason:
    

    Note

    To view detailed errors, run the following commands:

    [sysadmin@controller-0 ~(keystone_admin)]$ sw-manager sw-deploy-strategy show --error-details
    
    ~(keystone_admin)]$ sw-manager sw-deploy-strategy show
    Strategy Software Deploy Strategy:
    strategy-uuid: <>
    release-id: <software-release-id>
    controller-apply-type: serial
    storage-apply-type: serial
    worker-apply-type: serial
    default-instance-action: stop-start
    alarm-restrictions: strict
    current-phase: abort
    current-phase-completion: 100%
    state: aborted
    apply-result: failed
    apply-error-response:
    abort-result: success
    abort-reason:
    abort-error-response:
    

    Note

    After a software deployment strategy has been applied (or aborted), it must be deleted before another software deployment strategy can be created.

  6. Otherwise, wait for all the steps of all stages of the software deployment orchestration strategy to complete.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy show
    Strategy Software Deploy Strategy:
      strategy-uuid:                          6282f049-bb9e-46f0-9ca8-97bf626884e0
      release-id:                             <software-release-id>
      controller-apply-type:                  serial
      storage-apply-type:                     serial
      worker-apply-type:                      serial
      default-instance-action:                stop-start
      alarm-restrictions:                     strict
      current-phase:                          applied
      current-phase-completion:               100%
      state:                                  applied
      apply-result:                           success
      apply-reason:
    

    If a software deployment strategy apply fails, you must address the issue that caused the failure, then delete/re-create the strategy before attempting to apply it again.

    For additional details, run the sw-manager sw-deploy-strategy show --error-details command.

  7. Delete the completed software deployment strategy.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy delete
    Strategy deleted
    

Postrequisites

After a successful software deployment orchestration,

  • The Kubernetes Version Upgrade procedure can be executed, if desired, to upversion to a new Kubernetes versions available in the new software release.

  • You should also validate that the system and hosted applications are healthy.

  • In the case of a major release software deployment:

    • If you do not need to rollback the major release software deployment, then delete the software deployment that was used by the software deployment orchestration.

      ~(keystone_admin)]$ software deploy delete
      Deployment has been deleted
      
      ~(keystone_admin)]$ software deploy show
      No deploy in progress
      
    • Remove the old major release to reclaim disk space.

      ~(keystone_admin)]$ software list
      +--------------------------+-------+-------------+
      | Release                  | RR    |   State     |
      +--------------------------+-------+-------------+
      | starlingx-10.0.0         | True  | unavailable |
      | <new-major-release-id>   | True  | deployed    |
      +--------------------------+-------+-------------+
      
      ~(keystone_admin)]$ software delete starlingx-10.0.0
      starlingx-10.0.0 has been deleted.
      
      ~(keystone_admin)]$ software list
      +--------------------------+-------+-------------+
      | Release                  | RR    |   State     |
      +--------------------------+-------+-------------+
      | <new-major-release-id>   | True  | deployed    |
      +--------------------------+-------+-------------+