Enroll a Factory Installed Non Distributed Standalone System as a Subcloud¶

The subcloud enrollment feature converts a factory pre-installed system to a subcloud of a DC. For factory pre-installation, standalone systems must be able to be installed locally in the factory, and later deployed and configured on-site as a DC subcloud without re-installing the system.

A factory pre-installed system can remain staged up to 1 year before enrollment. This is a limitation related to certificate recovery, where recovery is possible but it requires manual steps. It is recommended to avoid a staging period longer than 1 year. However, future versions will support a longer staging period.

Prerequisites

The following requirements must be met for factory installation of a system:

The standalone system must be BMC configured (support Redfish protocol).
The standalone system must be installed with prestaged ISO (with archived container images).
The prestaged ISO must be installed with one of the cloud-init boot options. default-boot >= 2 can be specified when generating a prestaged ISO, otherwise a cloud-init boot option will have to be manually selected during installation. Default boot has the following menu options:
- 0 - Prestage Serial Console
- 1 - Prestage Graphical Console (default)
- 2 - Prestage cloud-init All-in-one Serial Console
- 3 - Prestage cloud-init All-in-one Graphical Console
- 4 - Prestage cloud-init Controller Serial Console
- 5 - Prestage cloud-init Controller Graphical Console
To create a prestaged ISO image, see Subcloud Deployment with Local Installation.

Factory Installation of a System¶

Note

If you need to enable and configure UEFI Secure Boot for the server(s) being factory installed, ensure that you use the StarlingX UEFI Secure Boot certificate /CERTS/TiBoot.crt, from the ISO of the release being factory installed.

Factory Installation Automation¶

The automation services are delivered and loaded using a generated seed ISO. The seed ISO is applied by cloud-init service and enabled during prestaged ISO installation. The seed ISO contains the platform automation services as well as cloud-config for cloud-init to set up, and trigger automation services. The automation services are a set of systemd services, that provide streamlined staged execution.

Prepare Seed Configuration¶

Retrieve Base Seed Configuration¶

Download and extract nocloud-factory-install.tar that contains seed ISO contents. It consists of the platform automation systemd services contained in /nocloud-factory-install/factory-install subdir, the base cloud-init configuration contained in meta-data, network-config and user-data in top level dir, and the base host configuration contained in /nocloud-factory-install/config subdir.

nocloud-factory-install/
├──config
│  ├─ localhost.yml
│  └─ open-vcsr-example.yaml # example vcsr config, replace it with actual config or delete it
├──factory-install
│ ├── scripts
│ │  ├── 10-init-setup
│ │  ├── 20-hardware-check
│ │  └── 90-init-final
│ ├── setup
│ │  ├── 10-system-setup
│ │  └── 20-vcsr-setup # sample excutable to setup vcsr, harmless but delete it if no vcsr expected
│ ├── systemd
│ │  ├── factory-install-bootstrap.path
│ │  ├── factory-install-bootstrap.service
│ │  ├── factory-install-config.path
│ │  ├── factory-install-config.service
│ │  ├── factory-install-setup.path
│ │  ├── factory-install-setup.service
│ │  ├── factory-install.target
│ │  ├── factory-install-tests.path
│ │  ├── factory-install-tests.service
│ │  └── utils
│ │      ├──20-cloud-init.preset
│ │      ├──20-factory-install.preset
| |      ├──send-factory-sel-event
│ │      └──disable-factory-install
│ └── tests
│    └──10-system-health
├──seed-config
│   ├── 99-seediso.rules
│   ├── cloud-init-seed.service
│   ├── cloud.cfg
│   └── run-cloud-init-from-seed.sh
├──meta-data
├──network-config
└──user-data

Prepare cloud-init Configuration¶

Before performing the initial configuration in factory, the following requirements must be met:

Only controller-0 is provisioned by the factory installation process. The other nodes, such as controller-1, compute-0, storage-0 must be added after the enrollment is complete.
Management network update is ignored during the enrollment. On other system types, admin network should be configured during factory installation if needed.
The subcloud platform networks should be configured with the expected IP family (IPv4 or IPv6) because the IP family of a subcloud cannot be updated.
System local CA (system_local_ca_cert, system_local_ca_key, and system_root_ca_cert) needs to be installed on the factory installed subclouds in localhost.yaml to enable the SSL communication via OAM connection during enrollment. The System Controller performing the subcloud enrollment needs to have a trusted CA that can validate the server certificates used for the factory installed systems. For more details, see Platform Issuer (system-local-ca). Ensure that the CA certificate used is long lasting and will still be valid at the time of enrollment.
Kubernetes Root CA certs need to be specified during the factory installation process in localhost.yaml, otherwise, the kube-rootca endpoint will be out of sync and a kube-rootca-strategy is needed to make it in sync. Ensure that the CA certificate used is long lasting and will still be valid at the time of enrollment.
Additional applications should not be installed on the factory installed system before completing the enrollment process.
Other configurations that do not allow reconfiguration operations (example: create a new controllerfs or hostfs, storage backend, ceph, etc) should be configured in the factory before the subcloud enrollment.

The nocloud-factory-install from the tarball consists of the following files:

user-data

This file is used to customize the instance when it is first booted. It does the initial setup.

user-data

     #cloud-config

     chpasswd:
       list:
          # Changes the sysadmin password -   the hash below specifies St8rlingX*1234
          - sysadmin:$5$HElSMXRZZ8wlTiEe$I0hValcFqxLRKm3pFdXrpGZlxnmzQt6i9lhIR9FWAf8
     expire:False

     runcmd:
       - [ /bin/bash, -c, "echo$(date):Initiating factory-install"
       - mkdir -p /opt/nocloud
       - mount LABEL=CIDATA /opt/nocloud
       - run-parts --verbose --exit-on-error /opt/nocloud/factory-install/scripts
       -eject/opt/nocloud

The default base configuration file:

Sets the password to St8rlingX*1234 (specified under chpasswd section). The password can be set either in plain-text or hash. The mkpasswd Linux command can be used to generate a new password hash.

For example:
```
mkpasswd -m sha-512
```
It will prompt for the new password to be hashed.
Runs commands specified under runcmd to setup and start automation services.

Users may also choose to extend user-data to perform another configuration for the initial setup of the node. See official cloud-init documentation https://cloudinit.readthedocs.io/en/latest/reference/examples.html for working with configuration files.

meta-data

The meta-data file provides instance specific information, host name, and instance ID.

meta-data
  instance-id:iid-local01

This file should not be modified.

User-data is applied once per instance, hence the instance ID must be changed for subsequent runs if you need to re-apply seed (reinsert seed ISO). However, this is not recommended as it may lead to a bad state, specially if the factory-install services have already started from a previous run.

network-data

Various file parameters can be found on official cloud-init documentation https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v1.html.

‘’This network configuration format lets users customize their instance’s networking interfaces by assigning subnet configuration, virtual device creation (bonds, bridges, VLANs) routes, and DNS configuration.’’

This configuration is a placeholder and it should be updated based on the factory node networking requirements. This can be used to assign the OAM IP address, enable the network interface, and add the route to SSH and monitor progress during factory-install.

Example:

The following network-config shows IPv4 address configuration:

network-data
   version:1
   config:
     - type: physical
       name: enp2s1
       subnets:
         - type: static
           address: 10.10.10.2
           netmask: 255.255.255.0
           gateway:10.10.10.

The following network-config shows IPv6 VLAN configuration:

network-data
    version:1
    config:
      - type: vlan
        name: vlan401
        vlan_link: enp179s0f0
        vlan_id: 401
        subnets:
          - type: static
            address: 2620:10a:a001:d41::208/64
            gateway: 2620:10a:a001:d41::1

Prepare Host Configuration¶

The standalone host configuration files are the configuration files used by the actual platform-automation during the bootstrap and config (deployment) stage, such as localhost.yml, deployment-config.yaml, dm-playbook-overrides.yaml.

The host configuration files must be specified under nocloud-factory-install/config dir. These files will be copied to /home/sysadmin/ on the host during setup.

Note

Only localhost.yml is provided as part of the base configuration. It is a sample placeholder that must be updated.

``localhost.yml``

localhost.yml provides values to be used during bootstrap process. Values in this file can be specified the same as values used during a normal bootstrap process.

Example:

localhost.yaml
system_type:
system_mode:
name:

# DNS servers need to be the same IP family (v4 or v6), need to add anIPv6
# address if installed as IPv6 default values are IPv4
dns_servers:

# Need to assignIPv6addresses, values can be generic, default values are IPv4
# OAM networks can be reconfigured during the enrollment process
external_oam_subnet:
external_oam_gateway_address:
external_oam_floating_address:
external_oam_node_0_address:
external_oam_node_1_address:

# Admin network is required as we cannot reconfigure themanagement network,
# admin networks can be reconfigured during the enrollment process
admin_subnet:
admin_start_address:
admin_end_address:
admin_gateway_address:

# Need to assign IPv6 addresses, values can be generic, default values are IPv4
# management networks can only be reconfigured on Simplex without
# admin network configured. Only the primary stack management
# network supports reconfiguration.
management_subnet:
management_multicast_subnet:
management_start_address:
management_end_address:
#management_gateway cannot be configured with admin network

# Need to assign IPv6 addresses, values can be generic, default values are IPv4
cluster_host_subnet:
cluster_pod_subnet:
cluster_service_subnet:

# The password for factory install stage, need to be aligned with user-data
# The admin password will not be updated during the enrollment. However, it
# will be synchronized with the System Controller after managing the subcloud.
admin_password:
# password for factory install stage, need to be align with the admin_password
ansible_become_pass:
# optional, need to install the same cert with the System Controller, otherwise
# the k8s-rootca endpoint will be out-of-sync after enrollment, but can use
# k8s-rootca-update ochestration to sync it
k8s_root_ca_cert:
k8s_root_ca_key:
# system SSL CA certs are required, and need to align with the System Controllers
system_root_ca_cert:
system_local_ca_cert:
system_local_ca_key

(Optional) Prepare Custom Setup, Checks, and Tests Script¶

The platform-automation framework is designed to run a set of scripts at various stages. Users may provide their own scripts, checks, and tests. Sample placeholders are provided as part of the base configuration for:

Pre-bootstrap (checks during initial setup)
Setup stage (checks during post deployment-unlock)
Test stage (final checks)

Note

The framework executes scripts using run-parts. That is, it will run every script that is found in the specific stage directory. Additional script/test files can be added and file names can be changed.

Users may choose to add new scripts in the parent directory or modify the existing sample placeholders.

Pre-bootstrap, initial setup, and checks

The nocloud-factory-install/factory-install/scripts/: scripts are executed at the beginning before the bootstrap stage.

Note

The files 10-init-setup and 90-init-final in the directory must remain in place to correctly set up and start the platform automation. These files may be modified for custom behavior, however, modifying these two files may result in unexpected behavior.

A placeholder sample script is provided in the directory nocloud-factory-install/factory-install/scripts/20-hardware-check executed before proceeding with platform-automation.

nocloud-factory-install/factory-install/scripts/20-hardware-check

    # cloud-init script to Perform hardware and    firmware checks
    #
    # SAMPLE ONLY    -   REPLACE WITH REAL HARDWARE CHECKS
    #
    echo "Hardware Check -   Start"

    BOARD_VENDOR=$(cat /sys/devices/virtual/dmi/id/board_vendor)
    BOARD_NAME=$(cat /sys/devices/virtual/dmi/id/board_name)
    PRODUCT_NAME=$(cat /sys/devices/virtual/dmi/id/product_name)
    BIOS_VERSION=$(cat /sys/devices/virtual/dmi/id/bios_version)

    echo "BOARD_VENDOR=${BOARD_VENDOR}"
    echo "BOARD_NAME=${BOARD_NAME}"
    echo "PRODUCT_NAME=${PRODUCT_NAME}"
    echo "BIOS_VERISON=${BIOS_VERISON}"

    echo "Hardware Check -   Complete"

    exit 0

Final checks

The scripts in nocloud-factory-install/factory-install/tests/ are run as part of the testing stage which is the last stage. For instance, in the placeholder 10-system-health, fm alarm checks are done.

nocloud-factory-install/factory-install/tests/10-system-health
# Factory install system health checks triggered during the   tests stage
#
# SAMPLE ONLY    -   REPLACE WITH REAL SYSTEM HEALTH CHECKS
#

echo "System Health Checks -   Start"
log_failure () {
    echo "FAIL: $1"
    exit ${2}
}

# check for   service impacting alarms(only recommended in simplex system, multi-nodes
system have alarms due    to  loss of  standby controller)
source /etc/platform/openrc
fm --timeout 10  alarm-list --nowrap|grep -e    "major\|minor\|warning\|critical"
if [ $?  == 0 ];  then
    #  Log   the   health check failure and   exit    0   to   allow factory-install to   finish up.
    #  Modify to exit 1 if factory-install should fail the test stage and halt.
    log_failure "service impacting alarms present" 0
fi

echo "System Health Checks -   Complete"

exit 0

Generate Seed Image¶

The seed ISO can be generated after the seed (cloud-init and host) configurations are prepared. The seed ISO is essentially the content of nocloud-factory-install directory.

Use the Linux genisoimage command line tool to generate the seed ISO:

genisoimage -o <seed-output-dir>/seed.iso -volid 'CIDATA' -untranslated-filenames -joliet -rock -iso-level 2 <path to extracted nocloud-factory-install dir>

Insert Seed Image¶

Procedure

Note

Insert the seed image and power on the system using virtual media, such as the BMC GUI or by utilizing the Redfish API.

Host the ISO image. Place the seed ISO in a location/file server accessible by BMC.
Power off the host.

Insert virtual media seed ISO image using Redfish API.

Example:

Redfish insert media example usage
curl -k -X POST    "https://redfish-server-ip/redfish/v1/Managers/1/VirtualMedia/Cd/Actions/VirtualMedia.InsertMedia
    -H  "Content-Type: application/json" \
    -u  "your-username:your-password" \
    -d  '  {
          "Image": "http://<seed.iso>",
          "Inserted": true
    }'

Power on the system.

The boot order does not need to be changed. The system should boot from the installation. The inserted seed image is used as a data source.

Factory-install Services¶

You must access the node to monitor progress, either using serial console with a tool like IPMI or SSH if an IP address has been assigned (with seed network-config).

Monitor the progress and output of various stages by checking /var/log/factory-install.log:. Automation will go through various stages in the order: bootstrap, config, setup, and tests. Overall, it will run the bootstrap ansible playbook and then the DM playbook, unlock host, and wait for the system to be reconciled.
The following factory-install services can be managed and checked using systemctl.
- factory-install-bootstrap.service
- factory-install-config.service
- factory-install-setup.service
- factory-install-tests.service
Example:
```
sysadmin@controller-0:~$ systemctl status factory-install-<stage>.service
```

Retry and start automation from the failed stage:

sudo systemctl restart factory-install-<stage>.service --no-block

--no-block must be specified to return the command line.

For example, the factory-install failed stage can be restarted as follows:

sysadmin@controller-0:~$ systemctl status factory-install-tests.service
  factory-install-tests.service - Factory Installation Execute System Tests
     Loaded: loaded (/etc/systemd/system/factory-install-tests.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2024-10-23 17:28:00 UTC; 4h 31min ago
TriggeredBy: ● factory-install-tests.path
   Main PID: 1725 (code=exited, status=1/FAILURE)
        CPU: 1.617s

sysadmin@controller-0:~$ sudo systemctl restart factory-install-tests.service --no-block

The factory-install state flag can be found in the following:
- /var/lib/factory-install/stage/*
  - A flag set at the start of a given stage indicates the stage trigger.
  - The final flag in this directory indicates factory-install completion.
- /var/lib/factory-install/state/*
  
  A flag set at the successful completion of a given stage indicates the stage exit.
- /var/lib/factory-install/complete
  
  This flag indicates that factory-install has successfully been completed (equivalent to the stage final flag).
The following flags would indicate that the bootstrap, config, and setup stages have successfully completed and the current stage is tests:
```
sysadmin@controller-0:~$ ls /var/lib/factory-install/state/
bootstrap  config  setup
```
```
sysadmin@controller-0:~$ ls /var/lib/factory-install/stage
bootstrap  config  setup  tests
```
The general script output from the factory-install framework is directed to /var/log/cloud-init-output.log by default. That is, logs prior to starting the services can be found in /var/log/cloud-init-output.log. Failure of one of the scripts or checks may lead to error.

Enroll the Factory Installed System as a Subcloud of a Distributed Cloud¶

Prepare the Subcloud Values for Enrollment¶

The enrollment process requires bootstrap values, install values, and deployment configurations.

Install values

Install values are required to access the BMC controller and to provide necessary values to be configured to enable communication between the system controller and the subcloud via OAM network.

bootstrap_address: the bootstrap address to be opened for ansible-playbook
                   this address should be the updated OAM floating address
bootstrap_interface: the interface to assign with the bootstrap address
bmc_address: bmc address
bmc_username: bmc username
bootstrap_vlan: vlanID for the bootstrap interface

Deployment configurations

To prepare the deployment configurations for enrollment, this YAML file should be similar to the original deployment.

For multi-node systems, configure additional hosts of the system that were not specified during factory installation.
As a subcloud, static routes from the hosts’ admin/management gateway to the System Controller’s management subnet should be added to establish the communication between the System Controllers and the subcloud hosts.
Hosts should be administratively unlocked in this configuration.

Perform Subcloud Enrollment¶

Prerequisites

The software ISO and signature files need to be uploaded on the System Controller before the subcloud enrollment.
Power on the factory-installed server and wait for controller-0 to be enabled and controller-0 to be free from the 250.XXX and 260.XXX alarms.
Wait for cert-manager to renew certificates marked as Automatic [Managed by Cert-Manager]. Verify with sudo show-certs.sh before continuing.

Perform subcloud enrollment using the following command:

~(keystone_admin)]$ dcmanager subcloud add --enroll --bootstrap-address <> --bootstrap-values <> --deployment-config <> --install-values <> --sysadmin-password <> --bmc-password <>

Note

The --enroll option is used to enroll the factory installed system as a subcloud. --bootstrap-address is the OAM floating address of the subcloud, and --bootstrap-values, --deployment-config, and --install-values are the values prepared in the previous section. When the enrollment is initiated, the subcloud will be reconfigured using the provided values without requiring a restart. This behavior is enabled by predefined rules that allow it to be reconfigured once the factory install is completed.

If the subcloud enrollment fails, retry the enrollment process using the following command. If there are 250.001 config out-of-date alarms that need to be cleared, lock and unlock the host to clear the alarms before retrying.

~(keystone_admin)]$ dcmanager subcloud deploy enroll <subcloud-name>

If any value needs to be updated, append the bootstrap values, install values, BMC password, sysadmin password, and deployment configurations as the optional arguments.

~(keystone_admin)]$ dcmanager subcloud deploy enroll <subcloud-name> --boostrap-values <>  --install-values <>   --deploy-config <>   --sysadmin-password<>  --bmc-password<>

After the subcloud reaches the enroll-complete status, further operation (config) can be triggered using the following command to reach the final deploy status complete.

~(keystone_admin)]$ dcmanager subcloud deploy config <subcloud-name> --deployment-config <>  --sysadmin-password