StarlingX Kubernetes 12.0

The sections below provide a detailed list of new features, enhancements, and updates, and links to the associated user guides (if applicable).

ISO image

The pre-built ISO (Debian) for StarlingX 12.0 is located at the StarlingX mirror repo:

https://mirror.starlingx.windriver.com/mirror/starlingx/release/12.0.0/debian/bullseye/amd64/monolithic/outputs/iso/

Source Code for StarlingX 12.0

The source code for StarlingX 12.0 is available on the r/stx.12.0 branch in the StarlingX repositories.

Deployment

To deploy StarlingX 12.0, see Consuming StarlingX.

For detailed installation instructions, see StarlingX 12.0 Installation Guides.

New Features / Enhancements / Limitations

This release introduces several new features designed to improve usability and efficiency, along with targeted enhancements to existing functionality. Performance, stability, and overall user experience have been enhanced. Some features may have known limitations or constraints in this release; these are documented to help set expectations and guide usage.

Kernel updates

Kernel version 6.12.57 is now supported in StarlingX Release 12.0

Kubernetes Versions

Kubernetes versions supported in StarlingX Release 12.0 are 1.32.2 - 1.34.1. The default version is K8s 1.34.1 for fresh installs.

Platform Applications

All StarlingX containerized application components have been updated to the latest versions and compatible with the StarlingX Release 12.0 supported Kubernetes release.

For more information on all StarlingX Application version updates, see Platform Applications.

Bare-metal to Rook Migration

StarlingX does not support bare-metal to Rook migration and all existing StarlingX users will need to reinstall and deploy Rook Ceph.

Centralized IAM (Identity and Access Management) with OIDC compatible MFA

StarlingX adds support for an OIDC backend through the OIDC DEX Identity (IDP) Proxy. For example, a DEX OIDC backend could be configured for a remote Keycloak IDP (an OIDC-compliant IDP with native MFA support). StarlingX will continue to support an LDAP backend through the DEX IDP Proxy for:

  • StarlingX default local LDAP-based authentication, and customer environments that rely on LDAP solutions such as Windows Active Directory (WAD).

kubectl Support for oidc-login Plugin for Improved Usability

StarlingX adds support for the oidc-login plugin for kubectl. It provides an improved user experience when using OIDC authentication with Kubernetes CLI (kubectl).

With kubectl configured to use oidc-login, kubectl commands will automatically initiate OIDC Authentication when required. If there is no token or an expired token in the kubectl cache, then the plugin will open a browser for OIDC login (with MFA if applicable). If login is successful, the kubectl cache will be updated automatically with the resulting token, and kubectl will continue and send the command.

See: Configure Kubeconfig for OIDC Login.

oidc-auth-apps (Local DEX OIDC IDP) now Started up by Default on Install and Upgrade

The oidc-auth-apps application (Local DEX OIDC IdP) is now configured and enabled by default during installation and upgrades. The deployed configuration uses the StarlingX Local LDAP service as the IdP backend.

See:

Kubernetes OIDC Authentication now Configured by Default to use ‘oidc-auth-apps’ on Install

Kubernetes is configured by default to support OIDC Authentication using the ‘oidc-auth-apps’ OIDC DEX Proxy Identity Provider.

See: Kubernetes Authentication & Authorization.

(StarlingX Unified Identity Management Authentication with OIDC

system/software/fm/sw-manager/dcmanager APIs/CLIs optionally supports OIDC authentication in StarlingX Release 12.0. Keystone authentication remains the default.

Users can now enable OIDC authentication using a command line option.

See:

PTP Partial Timing Support (PTS) as Cloud Platform Timing Source

StarlingX Release 12.0 introduces Partial Timing Support (PTS) to the existing Precision Time Protocol (PTP) framework, complementing the current support for Full Timing Support (FTS) and local GNSS sources.

To deliver a robust and reliable PTS implementation, StarlingX enables the system to leverage frequency synchronization (SyncE) when available to further improve timing stability.

Additionally, to meet PTS accuracy requirements, the solution now utilizes hardware timestamp offload when supported by the NIC. This reduces packet delay variation introduced by the software network stack, improving timing precision.

See:

New Node Label sriovdp-rdma

This update adds support for an additional node label sriovdp-rdma=enabled|disabled, allowing the SR-IOV Device Plugin to conditionally set the isRdma parameter for Mellanox / InfiniBand devices.

See:

RINLINE Driver Life Cycle Management Solution

The app-kernel-module-management application enables dynamic loading of out-of-tree kernel modules at runtime, with modules distributed independently of the platform. Kernel modules can be optionally built on demand into container images or provided as pre-built container images. These images are used to deploy and load the required modules onto target Kubernetes nodes. Custom Kubernetes resources are used to define which kernel modules to load, from which container images, and the specific nodes on which the modules should be deployed.

See: Kernel Module Management Application

Granite Rapid-D vCSR Solution with BMC always accessible (factory pre-installed vCSR Software)

Custom cloud-init Configuration of Subcloud Enrollment

StarlingX now supports deploying a Virtual Cell Site Router (vCSR) on AIO-SX subcloud sites where no onsite routing hardware is present. All host network traffic now traverses the vCSR, which is deployed as a containerized application during site installation.

Initial Subcloud installation relies on BMC access, with configuration performed through the subcloud enrollment workflow using a cloud-init no-cloud ISO. Enrollment now supports custom setup scripts and configuration files to automate platform and vCSR initialization and enable site connectivity.

To accommodate sites without direct network access, servers must be factory-installed with all required software. The factory install process has been extended to include vCSR setup using customizable install scripts.

The System Controller uses IPMI system event logs to monitor subcloud status over its BMC.

See:

Subcloud Auto-Restore Support for vCSR Deployments

StarlingX also supports auto-restore capabilities to support disaster-recovery scenarios for subclouds using a virtual Cell Site Router (vCSR). While initial deployment is handled through the customized subcloud enrollment cloud-init process additional mechanisms are required to recover the Subcloud, the host, or if the vCSR pod becomes unavailable.

  • Adds auto-restore functionality to support disaster-recovery scenarios for subclouds using a virtual Cell Site Router (vCSR).

  • Complements the initial vCSR setup performed via the customized subcloud enrollment cloud-init process.

  • Supports three recovery scenarios:

    • Subcloud enrollment failure requiring factory-default restore.

    • Platform or vCSR service failure requiring local backup restore.

    • Hardware failure requiring server replacement and remote backup restore.

  • Introduces a local-only autonomous restore workflow, required because connectivity cannot be restored until the vCSR is operational.

  • Restore process is fully self-orchestrated on the host—no remote Ansible playbooks or remote APIs are used.

  • Only exception: BMC Redfish API is used to initiate installs or trigger recovery actions. Workflow includes:

    • Local system reinstall from prestaged software.

    • Automated restore of the backup archive to recover Subcloud and vCSR state.

    • Use of locally stored container images, either prestaged or restored from a local registry backup.

  • Ensures subcloud recovery is possible even at remote sites with no direct network access during outages.

See:

Cloud Platform Reboot Process Optimization

StarlingX Release 12.0 introduces enhancements to reduce StarlingX host reboot durations.

Note

The scale of these improvements depends on user configuration and hardware type.

Testing was performed in a lab environment with AIO-SX Granite Rapids-D, and the following results were achieved.

  • Server Reboot: 5-6 minutes

  • Pod with PV storage: recovers in 6-12 minutes (including the reboot time)

Subcloud Platform / Kernel Stall Watchdog

New configurability is now available to allow customization of selected sysctl tunable parameters. This feature enables operators to adjust key kernel behaviors to better align with deployment requirements. The following parameters are now configurable:

  • kernel.hung_task_timeout_secs=600

  • kernel.hung_task_panic=1

Additionally, support for custom kdump data collection has been added through new pre and post-k dump hook capabilities. These hooks allow operators to collect additional diagnostic data before and after kdump execution to improve debugging and post-mortem analysis.

When a kdump crash dump is triggered, you can gather additional data beyond the vmcore by using custom hook scripts.

StarlingX now supports an extension to kexec-tools / kdump-tools to incorporate customizable pre and post-hook mechanisms. This is supported on StarlingX Standalone and Distributed cloud deployments.

See:

NetApp Trident with Fiber Channel(FC) and Internet SCSI(iSCSI) Protocols

StarlingX introduces support for NetApp Trident using FC and iSCSI external storage backends for both platform and CNF applications.

NetApp backend supports NetApp ONTAP NAS (NFS) and NetApp ONTAP SAN (iSCSI and Fibre Channel) configurations.

The solution is hardware-agnostic and any NetApp certified FC or iSCSI storage system is expected to work and is compatible with all StarlingX certified servers. All StarlingX certified server models are supported, with no server specific dependencies. This applies to all configurations, including Standalone and Distributed Cloud.

See: Configure an External NetApp Deployment as the Storage Backend.

Multipath Configuration Enhancements

The multipath configuration has been enhanced to support full customization through the /etc/multipath.conf file. Administrators can now directly modify all key parameters including path selection policies, failover behavior, and device-specific rules without being constrained by predefined defaults.

In addition, the previously enforced default device blacklist has been removed. This change broadens hardware compatibility and allows multipath settings to be customized to better align with the needs of each deployment environment.

Storage Configuration Management Enhancement

This release introduces greater flexibility for storage configuration by removing the managed StarlingX built-in Puppet template. With this change, customers and Field Support Engineering can implement custom storage configurations tailored to their specific business requirements.

This enhancement applies to all StarlingX supported iSCSI and Fibre Channel storage backends and is supported across both standalone and Distributed Cloud deployments.

Note

This update does not impact or degrade StarlingX performance.

ACPI (Advanced Configuration and Power Interface) Driver

The system idle driver has been switched from acpi_idle to intel_idle. The intel_idle driver is the preferred idle driver for Intel platforms, as it provides more efficient C-state management by leveraging Intel-specific knowledge of the processor’s power states, resulting in improved power efficiency and lower latency transitions compared to the generic acpi_idle driver.

See: Configurable Power Manager

Crane CLI Tool

Crane is a command-line tool for working with remote container images and registries. It offers functionality similar to Docker while providing additional advanced capabilities.

See Crane Recipes for more information.

See Crane Container Images for more information.

AMD CPU Siena Based Server Support for StarlingX Subcloud

StarlingX now supports servers based on AMD 4th Generation EPYC processors, providing users with an alternative CPU option along with existing Intel-based platforms.

StarlingX on AMD delivers the same functionality, workflows, and CLI / GUI experience for all operations not tied to CPU specific behavior.

Compatibility and Platform Support

All capabilities available on Intel CPU platforms are expected to run equivalently on AMD EPYC-based servers. Both Standalone and Distributed cloud StarlingX configurations currently supported on Intel hardware are now supported on AMD CPU platforms.

See: System Hardware Requirements

Known Limitations and Procedural Changes for StarlingX 12.0

Kubernetes Memory Manager Policies

The interaction between the kube-memory-mgr-policy=static and the Topology Manager policy “restricted” can result in pods failing to be scheduled or started even when there is sufficient memory. This occurs due to the restrictive design of the NUMA-aware memory manager, which prevents the same NUMA node from being used for both single and multi-NUMA allocations.

Procedural Changes: It is important for users to understand the implications of these memory management policies and configure their systems accordingly to avoid unexpected failures.

For detailed configuration options and examples, refer to the Kubernetes documentation at https://kubernetes.io/docs/tasks/administer-cluster/memory-manager/.

Alarm 900.024 Raised When Uploading N-1 Patch Release to the System Controller

When uploading an N-1 patch release to the System Controller, alarm 900.024 (Obsolete Patch) will be triggered.

This behavior is specific to the System Controller and occurs only when uploading an N-1 patch

Procedural Changes: This warning can be safely ignored.

Kubevirt Limitations

The following limitations apply to Kubevirt in StarlingX Release 12.0:

  • Limitation: Kubernetes does not provide CPU Manager detection.

    Procedural Changes: Add cpumanager to Kubevirt:

    apiVersion: kubevirt.io/v1
    kind: KubeVirt
    metadata:
      name: kubevirt
      namespace: kubevirt
    spec:
      configuration:
        developerConfiguration:
          featureGates:
            - LiveMigration
            - Macvtap
            - Snapshot
                  - CPUManager
    

    Check the label, using the following command:

    ~(keystone_admin)]$ kubectl describe node | grep cpumanager
    
    where `cpumanager=true`
    
  • Limitation: Huge pages do not show up under cat /proc/meminfo inside a guest VM. Although, resources are being consumed on the host. For example, if a VM is using 4GB of Huge pages, the host shows the same 4GB of huge pages used. The huge page memory is exposed as normal memory to the VM.

    Procedural Changes: You need to configure Huge pages inside the guest OS.

See the Installation Guides at Installation Guides <https://docs.windriver.com/> for more details.

  • Limitation: Virtual machines using Persistent Volume Claim (PVC) must have a shared ReadWriteMany (RWX) access mode to be live migrated.

    Procedural Changes: Ensure PVC is created with RWX.

    $ class=cephfs --access-mode=ReadWriteMany
    
    $ virtctl image-upload --pvc-name=cirros-vm-disk-test-2 --pvc-size=500Mi --storage-class=cephfs --access-mode=ReadWriteMany --image-path=/home/sysadmin/Kubevirt-GA-testing/latest-manifest/kubevirt-GA-testing/cirros-0.5.1-x86_64-disk.img --uploadproxy-url=https://10.111.54.246 -insecure
    

    Note

    • Live migration is not allowed with a pod network binding of bridge interface type ()

    • Live migration requires ports 49152, 49153 to be available in the virt-launcher pod. If these ports are explicitly specified in the masquarade interface, live migration will not function.

  • For live migration with SR-IOV interface:

    • specify networkData: in cloudinit, so when the VM moves to another node it will not loose the IP config

    • specify nameserver and internal FQDNs to connect to cluster metadata server otherwise cloudinit will not work

    • fix the MAC address otherwise when the VM moves to another node the MAC address will change and cause a problem establishing the link

    Example:

    cloudInitNoCloud:
             networkData: |
               ethernets:
                 sriov-net1:
                   addresses:
                   - 128.224.248.152/23
                   gateway: 128.224.248.1
                   match:
                     macAddress: "02:00:00:00:00:01"
                   nameservers:
                     addresses:
                     - 10.96.0.10
                     search:
                     - default.svc.cluster.local
                     - svc.cluster.local
                     - cluster.local
                   set-name: sriov-link-enabled
               version: 2
    
  • Limitation: Snapshot CRDs and controllers are not present by default and needs to be installed on StarlingX.

    Procedural Changes: To install snapshot CRDs and controllers on Kubernetes, see:

    Additionally, create VolumeSnapshotClass for Cephfs and RBD:

    cat <<EOF>cephfs-storageclass.yaml
    —
    apiVersion: snapshot.storage.k8s.io/v1
    kind: VolumeSnapshotClass
    metadata:
      name: csi-cephfsplugin-snapclass
    driver: cephfs.csi.ceph.com
    parameters:
      clusterID: 60ee9439-6204-4b11-9b02-3f2c2f0a4344
      csi.storage.k8s.io/snapshotter-secret-name: ceph-pool-kube-cephfs-data
      csi.storage.k8s.io/snapshotter-secret-namespace: default deletionPolicy: Delete
    
    EOF
    
        cat <<EOF>rbd-storageclass.yaml
        —
        apiVersion: snapshot.storage.k8s.io/v1
        kind: VolumeSnapshotClass
        metadata:
          name: csi-rbdplugin-snapclass
        driver: rbd.csi.ceph.com
        parameters:
          clusterID: 60ee9439-6204-4b11-9b02-3f2c2f0a4344
          csi.storage.k8s.io/snapshotter-secret-name: ceph-pool-kube-rbd
          csi.storage.k8s.io/snapshotter-secret-namespace: default deletionPolicy: Delete
        EOF
    
    .. note::
    
        Get the cluster ID from : ``kubectl describe sc cephfs, rbd``
    
  • Limitation: Live migration is not possible when using configmap as a filesystem. Currently, virtual machine instances (VMIs) cannot be live migrated as virtiofs does not support live migration.

    Procedural Changes: N/A.

  • Limitation: Live migration is not possible when a VM is using secret exposed as a filesystem. Currently, virtual machine instances cannot be live migrated since virtiofs does not support live migration.

    Procedural Changes: N/A.

  • Limitation: Live migration will not work when a VM is using ServiceAccount exposed as a file system. Currently, VMIs cannot be live migrated since virtiofs does not support live migration.

    Procedural Changes: N/A.

Upper Case Characters in Host Names Cause Issues with Kubernetes Labelling

Upper case characters in host names cause issues with kubernetes labelling.

Procedural Changes: Host names should be in lower case.

Kubernetes Taint on Controllers for Standard Systems

In Standard systems, a Kubernetes taint is applied to controller nodes in order to prevent application pods from being scheduled on those nodes; since controllers in Standard systems are intended ONLY for platform services. If application pods MUST run on controllers, a Kubernetes toleration of the taint can be specified in the application’s pod specifications.

Procedural Changes: Customer applications that need to run on controllers on Standard systems will need to be enabled/configured for Kubernetes toleration in order to ensure the applications continue working after an upgrade from StarlingX Release to StarlingX future Releases. It is suggested to add the Kubernetes toleration to your application prior to upgrading to StarlingX 7.0 Release.

You can specify toleration for a pod through the pod specification (PodSpec). For example:

spec:
....
template:
....
    spec
      tolerations:
        - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
        - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"

See: Taints and Tolerations.

Application Fails After Host Lock/Unlock

In some situations, application may fail to apply after host lock/unlock due to previously evicted pods.

Procedural Changes: Use the kubectl delete command to delete the evicted pods and reapply the application.

Application Apply Failure if Host Reset

If an application apply is in progress and a host is reset it will likely fail. A re-apply attempt may be required once the host recovers and the system is stable.

Procedural Changes: Once the host recovers and the system is stable, a re-apply may be required.

Platform CPU Usage Alarms

Alarms may occur indicating platform cpu usage is greater than 90% if a large number of pods are configured using liveness probes that run every second.

Procedural Changes: To mitigate either reduce the frequency for the liveness probes or increase the number of platform cores.

Pods Using isolcpus

The isolcpus feature currently does not support allocation of thread siblings for cpu requests (i.e. physical thread +HT sibling).

Procedural Changes: For optimal results, if hyperthreading is enabled then isolcpus should be allocated in multiples of two in order to ensure that both SMT siblings are allocated to the same container.

Procedural Changes: N/A.

Deleting Image Tags in registry.local may Delete Tags Under the Same Name

When deleting image tags in the registry.local docker registry, you should be aware that the deletion of an <image-name:tag-name> will delete all tags under the specified <image-name> that have the same ‘digest’ as the specified <image-name:tag-name>. For more information, see, Delete Image Tags in the Docker Registry.

Procedural Changes: NA

K8s Upgrade Abort Failure and Pod Creation Issues on Service Restart

During a Kubernetes upgrade, the kubelet upgrade step updates the sandbox image in /etc/containerd/config.toml to match the pause image version used by the upgraded k8s control plane. If the k8s upgrade process is aborted after this step and later re-initiated, the pause image version is incorrectly modified after kubelet upgrade, resulting in an invalid image tag, which causes subesequent abort failures. Systems in this state may also fail to create new pods or restart existing pods after a service restart.

Initial k8s upgrade (1.32 -> 1.34)

registry.local:9001/registry.k8s.io/pause:3.10 -> registry.local:9001/registry.k8s.io/pause:3.10.1

K8s Upgrade aborted and retried:

registry.local:9001/registry.k8s.io/pause:3.10.1 -> registry.local:9001/registry.k8s.io/pause:3.10.1.1 (invalid tag)

Procedural Changes: Update the invalid image tag (i.e. change 3.10.1.1 to 3.10.1) in the /etc/containerd/config.toml file and retry the abort operation.

After a Kubernetes upgrade is successfully re‑initiated, new pods may fail to be created, or existing pods may fail to restart following a service restart, particularly after a containerd service restart. Resolve this issue using the following steps:

  1. Correct the invalid image tag in /etc/containerd/config.toml.

  2. Restart the containerd service.

  3. Recreate the affected pods if required.

Note

To prevent this issue, validate and correct the sandbox image tag before performing the abort operation during a re-tried upgrade.

KubeVirt VMs are Not Scheduled After Backup and Restore with wipe_ceph_osds=true

When a backup and restore operation is executed with wipe_ceph_osds=true, KubeVirt VMs are not scheduled.

Procedural Changes: After a backup and restore operation with wipe_ceph_osds=true, the VM image needs to be uploaded to the DataVolume to reinitialize the VM.

Stop VMs Before Platform Rollback

StarlingX rollback will fail if KubeVirt VMs are not stopped before executing the activate-rollback operation.

Procedural Changes: If the KubeVirt application is installed, stop running VMs before nitiating the platform rollback. Running VMs during a rollback will cause ive migration and VM management failures after the rollback completes. For details, see Stop VMs Before Platform Rollback.

Subcloud Restore to N-1 Release with Additional Patches

Before restoring a subcloud to the latest or a specific patch level of the N-1 release, you must first upload the corresponding pre-patched ISO for that patch level.

Procedural Changes: N/A.

Subcloud install or restore to the previous release

If the System Controller is on StarlingX Release 12.0, subclouds can be deployed or restored to either StarlingX Release 11.0 or StarlingX Release 12.0.

The following operations have limited support for subclouds of the previous release:

  • Subcloud error reporting

The following operations are not supported for subclouds of the previous release:

  • Orchestrated subcloud kubernetes upgrade

Procedural Changes: N/A.

See: Subclouds Previous Major Release Management.

Subcloud Upgrade with Kubernetes Versions

Before upgrading the platform, update Kubernetes to the highest version supported by your current platform release, as the new platform version requires this specific Kubernetes version. Orchestrated Kubernetes upgrades are not supported for N-1 subclouds. For example, before upgrading to StarlingX Release 12.0, verify that the System Controller and all subclouds are running Kubernetes v1.32.2 (the highest version supported by StarlingX Release 11.0).

Procedural Changes: N/A.

Enhanced Parallel Operations for Distributed Cloud

  • No parallel operation should be performed while the System Controller is being patched.

  • Only one type of parallel operation can be performed at a time. For example, subcloud prestaging or upgrade orchestration should be postponed while batch subcloud deployment is still in progress.

Examples of parallel operation:

  • any type of dcmanager orchestration (prestage, sw-deploy, kube-upgrade, kube-rootca-update)

  • concurrent dcmanager subcloud add

  • dcmanager subcloud-backup/subcloud-backup restore with –group option

Procedural Changes: N/A.

Subcloud Prestage Post Restore

Subcloud backups do not include prestaged software and container images for new release deployments. If you restore a subcloud from backup, you must prestage the subcloud again before deploying the new release.

See:

Procedural Changes: N/A.

IPsec Certificate Renewal Post Duplex/Standard Subcloud Rehoming

After rehoming an AIO-DX or Standard subcloud, the IPsec certificate must be renewed on all subcloud nodes and certain services must be restarted. This is required to ensure successful software updates and upgrades. Contact Wind River Customer Support at https://www.windriver.com/services#support for the Ansible playbook that automates these tasks across all applicable subclouds.

Procedural Changes: N/A.

Unable to create Kubernetes Upgrade Strategy for Subclouds using Horizon GUI

When creating a Kubernetes Upgrade Strategy for a subcloud using the Horizon GUI, it fails and displays the following error:

kube upgrade pre-check: Invalid kube version(s), left: (v1.24.4), right:
(1.24.4)

Procedural Changes: Use the following steps to create the strategy:

Procedure

  1. Create a strategy for subcloud Kubernetes upgrade using the dcmanager kube-upgrade-strategy create --to-version <version> command.

  2. Apply the strategy using the Horizon GUI or the CLI using the command dcmanager kube-upgrade-strategy apply.

Apply a Kubernetes Upgrade Strategy using Horizon

Procedural Changes: N/A.

k8s-coredump only supports lowercase annotation

Creating K8s pod core dump fails when setting the starlingx.io/core_pattern parameter in upper case characters on the pod manifest. This results in the pod being unable to find the target directory and fails to create the coredump file.

Procedural Changes: The starlingx.io/core_pattern parameter only accepts lower case characters for the path and file name where the core dump is saved.

See: Kubernetes Pod Core Dump Handler.

Huge Page Limitation on Postgres

Debian postgres version supports huge pages, and by default uses 1 huge page if it is available on the system, decreasing by 1 the number of huge pages available.

Procedural Changes: The huge page setting must be disabled by setting /etc/postgresql/postgresql.conf: "huge_pages = off". The postgres service needs to be restarted using the Service Manager sudo sm-restart service postgres command.

Warning

The Procedural Changes is not persistent, therefore, if the host is rebooted it will need to be applied again. This will be fixed in a future release.

Quartzville Tools

The following celo64e and nvmupdate64e commands are not supported in StarlingX, Release 9.0 due to a known issue in Quartzville tools that crashes the host.

Procedural Change: Reboot the host using the boot screen menu.

Connectivity Issues on E825 (Granite Rapids D Integrated NIC) when Enabling SyncE TX Signal

Enabling the SyncE TX clock on E825 (tx_clk=synce in the PTP clock instance) results in a single, expected link flap while the NIC physical interface adjusts to drive the signal.

When platform interfaces are also configured on the E825 NIC, the link interruption may result in system alarms, and in AIO-DX or Standard deployments, may trigger a SWACT if the configuration is applied during runtime. The following alarms are raised:

  • 400.005 / 401.005: Communication failure detected with peer

  • 100.106 - 100.111: port / interface failed

    Note

    The 100.xxx alarm ID depends on the affected interface.

Warning

If the E825 link is established over a 1G link, connectivity may be completely disrupted.

Procedural Changes: Avoid configuring tx_clk=synce on E825 ports. If SyncE TX signal is expected, it is recommended to enable it only on specific PTP ports, avoiding the usage on all ports on each NIC.

Add / delete operations on pods results in errors

Under some circumstances, add / delete operations on pods results in error getting ClusterInformation: connection is unauthorized: Unauthorized and also results in pods staying in ContainerCreating/Terminating state. This error may also prevent users from locking a host.

Procedural Changes: If this error occurs run the following kubectl describe pod -n <namespace> <pod name> command. The following message is displayed:

error getting ClusterInformation: connection is unauthorized: Unauthorized

Limitation: There is also a known issue with the Calico CNI that may occur in rare occasions if the Calico token required for communication with the kube-apiserver becomes out of sync due to NTP skew or issues refreshing the token.

Procedural Changes: Delete the calico-node pod (causing it to automatically restart) using the following commands:

$ kubectl get pods -n kube-system --show-labels | grep calico

$ kubectl delete pods -n kube-system -l k8s-app=calico-node

Application Pods with SRIOV Interfaces

Application Pods with SR-IOV Interfaces require a restart-on-reboot: “true” label in their pod spec template.

Pods with SR-IOV interfaces may fail to start after a platform restore or Simplex upgrade and persist in the Container Creating state due to missing PCI address information in the CNI configuration.

Procedural Changes: Application pods that require|SRIOV| should add the label restart-on-reboot: “true” to their pod spec template metadata. All pods with this label will be deleted and recreated after system initialization, therefore all pods must be restartable and managed by a Kubernetes controller (i.e. DaemonSet, Deployment or StatefulSet) for auto recovery.

Pod Spec template example:

template:
    metadata:
      labels:
        tier: node
        app: sriovdp
        restart-on-reboot: "true"

PTP O-RAN Spec Compliant Timing API Notification

The v2 API conforms to O-RAN.WG6.O-Cloud Notification API-v02.01 with the following exceptions, that are not supported in StarlingX

  • O-RAN SyncE Lock-Status-Extended notifications

  • O-RAN SyncE Clock Quality Change notifications

  • O-RAN Custom cluster names

Procedural Changes: See the PTP-notification v2 document for further details: https://docs.starlingx.io/api-ref/ptp-notification-armada-app/api_ptp_notifications_definition_v2.html

ptp4l error “timed out while polling for tx timestamp” reported for NICs using the Intel ice driver

NICs using the Intel® ice driver may report the following error in the ptp4l logs, which results in a PTP port switching to FAULTY before re-initializing.

Note

PTP ports frequently switching to FAULTY may degrade the accuracy of the PTP timing.

ptp4l[80330.489]: timed out while polling for tx timestamp
ptp4l[80330.489]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug

Note

This is due to a limitation with the Intel® ice driver as the driver cannot guarantee the time interval to return the timestamp to the ptp4l user space process which results in the occasional timeout error message.

Procedural Changes: The Procedural Changes recommended by Intel is to increase the tx_timestamp_timeout parameter in the ptp4l config. The increased timeout value gives more time for the ice driver to provide the timestamp to the ptp4l user space process. Timeout values of 50ms and 700ms have been validated. However, the user can use a different value if it is more suitable for their system.

~(keystone_admin)]$ system ptp-instance-parameter-add <instance_name> tx_timestamp_timeout=700
~(keystone_admin)]$ system ptp-instance-apply

Note

The ptp4l timeout error log may also be caused by other underlying issues, such as NIC port instability. Therefore, it is recommended to confirm the NIC port is stable before adjusting the timeout values.

PTP is not supported on Broadcom 57504 NIC

PTP is not supported on the Broadcom 57504 NIC.

Procedural Changes: Do not configure PTP instances on the Broadcom 57504 NIC.

synce4l CLI options are not supported

The SyncE configuration using synce4l is not supported in StarlingX.

The service type of synce4l in the ptp-instance-add command is not supported in StarlingX.

Procedural Changes: N/A.

ptp-notification application is not supported during bootstrap

  • Deployment of ptp-notification during bootstrap time is not supported due to dependencies on the system PTP configuration which is handled post-bootstrap.

    Procedural Changes: N/A.

  • The helm-chart-attribute-modify command is not supported for ptp-notification because the application consists of a single chart. Disabling the chart would render ptp-notification non-functional.

    Procedural Changes: N/A.

The ptp-notification-demo App is not a System-Managed Application

The ptp-notification-demo app is provided for demonstration purposes only. Therefore, it is not supported on typical platform operations such as Upgrades and Backup and Restore.

Procedural Changes: NA

Silicom TimeSync (STS) Card limitations

  • Silicom and Intel based Time Sync NICs may not be deployed on the same system due to conflicting time sync services and operations.

    PTP configuration for Silicom TimeSync (STS) cards is handled separately from StarlingX host PTP configuration and may result in configuration conflicts if both are used at the same time.

    The sts-silicom application provides a dedicated phc2sys instance which synchronizes the local system clock to the Silicom TimeSync (STS) card. Users should ensure that phc2sys is not configured via StarlingX PTP Host Configuration when the sts-silicom application is in use.

    Additionally, if StarlingX PTP Host Configuration is being used in parallel for non-STS NICs, users should ensure that all ptp4l instances do not use conflicting domainNumber values.

  • When the Silicom TimeSync (STS) card is configured in timing mode using the sts-silicom application, the card goes through an initialization process on application apply and server reboots. The ports will bounce up and down several times during the initialization process, causing network traffic disruption. Therefore, configuring the platform networks on the Silicom TimeSync (STS) card is not supported since it will cause platform instability.

Procedural Changes: N/A.

N3000 Image in the containerd cache

The StarlingX system without an N3000 image in the containerd cache fails to configure during a reboot cycle, and results in a failed / disabled node.

The N3000 device requires a reset early in the startup sequence. The reset is done by the n3000-opae image. The image is automatically downloaded on bootstrap and is expected to be in the cache to allow the reset to succeed. If the image is not in the cache for any reason, the image cannot be downloaded as registry.local is not up yet at this point in the startup. This will result in the impacted host going through multiple reboot cycles and coming up in an enabled/degraded state. To avoid this issue:

  1. Ensure that the docker filesystem is properly engineered to avoid the image being automatically removed by the system if flagged as unused. For instructions to resize the filesystem, see Increase Controller Filesystem Storage Allotments Using the CLI

  2. Do not manually prune the N3000 image.

Procedural Changes: Use the procedure below.

Procedure

  1. Lock the node.

    ~(keystone_admin)]$ system host-lock controller-0
    
  2. Pull the (N3000) required image into the containerd cache.

    ~(keystone_admin)]$ crictl pull registry.local:9001/docker.io/starlingx/n3000-opae:stx.8.0-v1.0.2
    
  3. Unlock the node.

    ~(keystone_admin)]$ system host-unlock controller-0
    

Deploying an App using nginx controller fails with internal error after controller.name override

An Helm override of controller.name to the nginx-ingress-controller app may result in errors when creating ingress resources later on.

Example of Helm override:

Procedural Changes: N/A.

Operating System Noise on Application-Isolated cores / Cyclic Test Performance Degradation

Current analysis indicates no confirmed application impact. Due to limited visibility into specific application requirements and tolerances, a definitive assessment cannot be made.

While internal tests show some performance degradation in cyclictest metrics, when validated against StarlingX internal benchmarks, there are no corresponding external application requirements for Cascade Lake or Ice Lake platforms that are being violated. Requirements for GNR-D continue to be met. The observed degradation is acceptable for most use cases, though this is not guaranteed.

An impact related to osnoise has been observed during the system initialization phase. This is transient in nature, and there is currently no evidence confirming a sustained impact on application performance. Ongoing analysis is focused on validating duration, root cause, and ensuring stable performance during steady-state operation.

Note

Current analysis indicates that upstream Linux changes contributed to the observed degradation.

See:

Procedural Changes: Allow up to 50 seconds after launching processes on application-isolated cores before performing time-sensitive or latency-sensitive operations to ensure OS noise has stabilized.

BPF is disabled

BPF cannot be used in the PREEMPT_RT/low latency kernel, due to the inherent incompatibility between PREEMPT_RT and BPF, see, https://lwn.net/Articles/802884/.

Some packages might be affected when PREEMPT_RT and BPF are used together. This includes the following, but not limited to these packages.

  • libpcap

  • libnet

  • dnsmasq

  • qemu

  • nmap-ncat

  • libv4l

  • elfutils

  • iptables

  • tcpdump

  • iproute

  • gdb

  • valgrind

  • kubernetes

  • cni

  • strace

  • mariadb

  • libvirt

  • dpdk

  • libteam

  • libseccomp

  • binutils

  • libbpf

  • dhcp

  • lldpd

  • containernetworking-plugins

  • golang

  • i40e

  • ice

Procedural Changes: Wind River recommends not to use BPF with real time kernel. If required it can still be used, for example, debugging only.

Control Group parameter

The control group (cgroup) parameter kmem.limit_in_bytes has been deprecated, and results in the following message in the kernel’s log buffer (dmesg) during boot-up and/or during the Ansible bootstrap procedure: “kmem.limit_in_bytes is deprecated and will be removed. Please report your use case to linux-mm@kvack.org if you depend on this functionality.” This parameter is used by a number of software packages in StarlingX, including, but not limited to, systemd, docker, containerd, libvirt etc.

Procedural Changes: NA. This is only a warning message about the future deprecation of an interface.

Subcloud Reconfig may fail due to missing inventory file

The dcmanager subcloud reconfig command may fail due to a missing file /var/opt/dc/ansible/<subcloud_name>_inventory.yml.

Procedural Changes: Provide the floating OAM IP address of the subcloud using the “–bootstrap-address” argument. For example:

~(keystone_admin)]$ dcmanager subcloud reconfig --sysadmin-password <password> --deploy-config deployment-config.yaml --bootstrap-address <floating_OAM_IP_address> <subcloud_name>

Increased CPU Usage After Removal of intel_idle.max_cstate=0

CPU usage may increase after removal of the intel_idle.max_cstate=0 setting. The increase may be observed across multiple processes.

No performance impact is expected under active workloads, as the system does not spend time in idle states during normal operation.

See: Configurable Power Manager

Procedural Changes: Intel recommends using the intel_idle driver.

Console Session Issues during Installation

After bootstrap and before unlocking the controller, if the console session times out (or the user logs out), systemd does not work properly. fm, sysinv and mtcAgent do not initialize.

Procedural Changes: If the console times out or the user logs out between bootstrap and unlock of controller-0, then, to recover from this issue, you must re-install the ISO.

Power Metrics Application in Real Time Kernels

When executing Power Metrics application in Real Time kernels, the overall scheduling latency may increase due to inter-core interruptions caused by the MSR (Model-specific Registers) reading.

Due to intensive workloads the kernel may not be able to handle the MSR reading interruptions resulting in stalling data collection due to not being scheduled on the affected core.

Dell iDRAC Virtual Media Issues During Remote Installation

On Dell PowerEdge XR8720t systems with Intel Granite Rapids-D XCC processors and iDRAC version 1.30.10.50 (Build 25), the Dell iDRAC virtual media may experience intermittent failures during remote ISO installation.

These failures include CD mount errors, USB device resets, I/O errors on the virtual media device (sr0), and critical medium errors, which can lead to kernel panic during the initial installation, failed ostree repository synchronization, or installation failure.

Additionally, it was observed that there is an impact on the performance of the iDRAC virtual media, resulting in file transfer times longer than reference numbers during remote ISO installation.

Procedural Changes: Retry the remote ISO installation until it completes successfully.

Dell iDRAC Boot Failure on System Reboot

On Dell PowerEdge XR8720t systems with Intel Granite Rapids-D XCC processors and iDRAC version 1.30.10.50 (Build 25), the system may intermittently fail to boot after a reboot.

After a reboot, the boot process fails and the message “Boot Failed: starlingx” is displayed. The system then gets stuck while attempting alternative boot options. Dell iDRAC lifecycle logs may show SSD failures around the same time.

Procedural Changes: If the system fails to boot after a reboot, perform an additional reboot to allow the system to properly boot.

Dell GNSS Issues

In some Dell PowerEdge XR8720t system samples with Intel Granite Rapids-D XCC processors used during laboratory testing, GNSS issues were observed. These issues are suspected to be caused by mechanical problems.

Due to these problems, the GNSS module was either not detected or operating incorrectly.

Procedural Changes: Contact Dell support for hardware assistance.

Software Delete Operations on the System Controller

On System Controllers, the software delete operation can only be performed on the previous release after all subclouds have been successfully upgraded to the target release.

Procedural Changes: Before executing software delete on the System Controller, ensure that every subcloud in the system has completed its upgrade or update process. Attempting to delete a release earlier is not supported and may impact subcloud operations.

Warning

Do not delete any previous release from the System Controller until all subclouds have been upgraded or updated.

Restart Required for containerd to Apply Config Changes for AIO-SX

On AIO-SX systems, certain container images were removed from the registry due to the image garbage collector and changes introduced during the Kubernetes upgrade. This may impact workloads that rely on specific image versions.

Procedural Changes: Increasing the Docker filesystem size will help retain the image in the containerd cache. Additionally, only for AIO-SX it is recommended to restart containerd after the Kubernetes upgrade. For more details, see Docker Size.

BMC Password

To update the BMC password, the BMC must be de-provisioned and then re-provisioned

Procedural Changes: In order to update the BMC password, de-provision the BMC, and then re-provision it again with the new password.

Sub-Numa Cluster Configuration not Supported on Skylake Servers

Sub-Numa cluster configuration is not supported on Skylake servers.

Procedural Changes: For servers with Skylake Gold or Platinum CPUs, Sub-NUMA clustering must be disabled in the BIOS.

Backup and Restore Playbook fails due to self-triggered “backup in progress”/”restore in progress” flag

Backup and Restore causes the Playbook to fail due to self-triggered “backup in progress” / “restore in progress” flag.

Procedural Changes: Retry the backup after manually removing the flag /etc/platform/.backup_in_progress if it has been more than 10 minutes based on the error message:

"backup has already been started less than x minutes ago.
Wait to start a new backup or manually remove the backup flag in
/etc/platform/.backup_in_progress "

For a “restore in progress” flag, reinstall and retry the restore operation.

RSA required for the system-local-ca issuer

The system-local-ca issuer must use an RSA-based certificate and key. Other key types are not supported during bootstrap or when running the Update system-local-ca or Migrate Platform Certificates to use Cert Manager procedures.

Procedural Changes: N/A.

Multiple trusted CA certificates with same Distinguished Name are not supported

Trusted CA (ssl_ca) certificates must have unique Distinguished Names (DNs). When a new trusted CA certificate is installed with a DN that matches an existing certificate, the system treats it as a replacement and overwrites the existing certificate.

Procedural Changes: N/A.

Kubernetes Root CA Certificates

The Kubernetes Root CA certificate and key are automatically generated with a default 10-year expiration, and are intended for internal use only.

For external access to kube-apiserver, the proxy (HAproxy) authenticates itself using the Rest API/GUI certificate (system-restapi-gui-certificate), which supports Intermediate CAs. The issuer (system-local-ca) can be customized at bootstrap. See Ansible Bootstrap Configurations for more information.

Procedural Changes: N/A.

External Authentication to kube-apiserver Using Client Certificates

SSL termination for external connections to kube-apiserver is now handled by HAProxy, which establishes a new connection to the API server on behalf of the external client. As a result, client certificate authentication is now restricted to the admin user (kubernetes-admin). Token-based authentication remains fully supported and unchanged.

Procedural Changes: N/A.

Password Expiry does not work on LDAP user login

On Debian, the warning message is not being displayed for Active Directory users, when a user logs in and the password is nearing expiry. Similarly, on login when a user’s password has already expired, the password change prompt is not being displayed.

Procedural Changes: It is recommended that users rely on Directory administration tools for “Windows Active Directory” servers to handle password updates, reminders and expiration. It is also recommended that passwords should be updated every 3 months.

Note

The expired password can be reset via Active Directory by IT administrators.

Upgrade activation: cert-manager does not start issuing certificates after upversion

During upgrade activation and upversioning, cert-manager usually takes less than a minute to be available and start issuing certificates. Occassionally, cert-manager can take more time than expected. This behavior is associated with an Open source issue. For more details see https://github.com/cert-manager/cert-manager/issues/7138#issuecomment-2422983418.

Since the cert-manager application is required, the upgrade activation will fail if the app takes too long to be available after the upversion. The following log will be displayed in /var/log/software.log

Error from server (NotFound): secrets "stx-test-cm" not found
certificate.cert-manager.io "stx-test-cm" deleted
software-controller-daemon: software_controller.py(837): INFO: 15 received
from deploy-activate with deploy-state activate-failed
software-controller-daemon: software_controller.py(870): INFO: Received
deploy state changed to DEPLOY_STATES.ACTIVATE_FAILED, agent deploy-activate

Procedural Changes: Cert-manager should recover by itself after a few minutes. If required, the following certificate used for test purposes in the upgrade activation can be created manually to ensure cert-manager is ready before reattempting the upgrade.

cat <<eof> cm_test_cert.yml
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  creationTimestamp: null
  name: system-local-ca
spec:
  ca:
    secretName: system-local-ca
status: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  creationTimestamp: null
  name: stx-test-cm
  namespace: cert-manager
spec:
  commonName: stx-test-cm
  issuerRef:
    kind: ClusterIssuer
    name: system-local-ca
  secretName: stx-test-cm
status: {}
eof
$ kubectl apply -f cm_test_cert.yml

$ rm cm_test_cert.yml

$ kubectl wait certificate -n cert-manager stx-test-cm --for=condition=Ready --timeout 20m

# Verify that the TLS secret associated with the cert was created, using the following:

$ kubectl get secret -n cert-manager stx-test-cm

cert-manager cm-acme-http-solver pod fails

On a multinode setup, when you deploy an acme issuer to issue a certificate, the cm-acme-http-solver pod might fail and stays in “ImagePullBackOff” state due to the following defect https://github.com/cert-manager/cert-manager/issues/5959.

Procedural Changes:

  1. If you are using the namespace “test”, create a docker-registry secret “testkey” with local registry credentials in the “test” namespace.

    ~(keystone_admin)]$ kubectl create secret docker-registry testkey --docker-server=registry.local:9001 --docker-username=admin --docker-password=Password*1234 -n test
    
  2. Use the secret “testkey” in the issuer spec as follows:

    apiVersion: cert-manager.io/v1
    kind: Issuer
    metadata:
     name: stepca-issuer
     namespace: test
    spec:
     acme:
       server: https://test.com:8080/acme/acme/directory
       skipTLSVerify: true
       email: test@test.com
       privateKeySecretRef:
         name: stepca-issuer
       solvers:
       - http01:
           ingress:
             podTemplate:
               spec:
                 imagePullSecrets:
                 - name: testkey
             class:  nginx
    

Vault application is not supported during bootstrap

The Vault application cannot be configured during bootstrap.

Procedural Changes:

The application must be configured after the platform nodes are unlocked / enabled / available, a storage backend is configured, and platform-integ-apps is applied. If Vault is to be run in HA configuration (3 vault server pods) then at least three controller / worker nodes must be unlocked / enabled / available.

Vault application support for running on application cores

By default the Vault application’s pods will run on platform cores. When changing the core selection from platform cores to application cores the following additional procedure is required for the vault application.

Procedural Changes:

“If static kube-cpu-mgr-policy is selected and when overriding the label app.starlingx.io/component for Vault namespace or pods, there are two requirements:

  • The Vault server pods need to be restarted as directed by Hashicorp Vault documentation. Restart each of the standby server pods in turn, then restart the active server pod.

  • Ensure that sufficient hosts with worker function are available to run the Vault server pods on application cores.

See: Kubernetes CPU Manager Policies.

Restart the Vault Server pods

The Vault server pods do not restart automatically.

Procedural Changes: If the pods are to be re-labelled to switch execution from platform to application cores, or vice-versa, then the pods need to be restarted.

Under kubernetes the pods are restarted using the kubectl delete pod command. See, Hashicorp Vault documentation for the recommended procedure for restarting server pods in HA configuration, https://support.hashicorp.com/hc/en-us/articles/23744227055635-How-to-safely-restart-a-Vault-cluster-running-on-Kubernetes.

Ensure that sufficient hosts are available to run the server pods on application cores

The standard cluster with less than 3 worker nodes does not support Vault HA on the application cores. In this configuration (less than three cluster hosts with worker function):

Procedural Changes:

  • When setting label app.starlingx.io/component=application with the Vault app already applied in HA configuration (3 vault server pods), ensure that there are 3 nodes with worker function to support the HA configuration.

  • When applying Vault for the first time and with app.starlingx.io/component set to “application”: ensure that the server replicas is also set to 1 for non-HA configuration. The replicas for Vault server are overriden both for the Vault Helm chart and the Vault manager Helm chart:

    cat <<EOF > vault_overrides.yaml
    server:
      extraLabels:
        app.starlingx.io/component: application
      ha:
        replicas: 1
    injector:
      extraLabels:
        app.starlingx.io/component: application
    EOF
    
    cat <<EOF > vault-manager_overrides.yaml
    manager:
      extraLabels:
        app.starlingx.io/component: application
    server:
      ha:
        replicas: 1
    EOF
    
    $ system helm-override-update vault vault vault --values vault_overrides.yaml
    
    $ system helm-override-update vault vault-manager vault --values vault-manager_overrides.yaml
    

Platform and Kubernetes Upgrades fail if Portieris is applied

Platform and Kubernetes Upgrades fail if Portieris is applied.

Procedural Changes: Before performing platform or Kubernetes upgrades, you must remove the Portieris application. Once the upgrade is complete, you can reinstall Portieris as usual.

Authorization based on Local LDAP Groups is not supported for Harbor

When using Local LDAP for authentication of the Harbor system application, you cannot use Local LDAP Groups for authorization; you can only use individual Local LDAP users for authorization.

Procedural Changes: Use only individual Local LDAP users for specifying authorization.

Harbor cannot be deployed during bootstrap

The Harbor application cannot be deployed during bootstrap due to the bootstrap deployment dependencies such as early availability of storage class.

Procedural Changes: N/A.

Windows Active Directory

  • Limitation: The Kubernetes API does not support uppercase IPv6 addresses.

    Procedural Changes: The issuer_url IPv6 address must be specified as lowercase.

  • Limitation: The refresh token does not work.

    Procedural Changes: If the token expires, manually replace the ID token. For more information, see, Configure Kubernetes Client Access.

  • Limitation: TLS error logs are reported in the oidc-dex container on subclouds. These logs should not have any system impact.

    Procedural Changes: NA

Security Audit Logging for K8s API

A custom policy file can only be created at bootstrap in apiserver_extra_volumes. If a custom policy file was configured at bootstrap, then after bootstrap the user has the option to configure the parameter audit-policy-file to either this custom policy file (/etc/kubernetes/my-audit-policy-file.yml) or the default policy file /etc/kubernetes/default-audit-policy.yaml. If no custom policy file was configured at bootstrap, then the user can only configure the parameter audit-policy-file to the default policy file.

Only the parameter audit-policy-file is configurable after bootstrap, so the other parameters (audit-log-path, audit-log-maxsize, audit-log-maxage and audit-log-maxbackup) cannot be changed at runtime.

Procedural Changes: NA

See: Kubernetes Operator Command Logging.

Software Delete Operations on the System Controller

On System Controllers, the software delete operation can only be performed on the previous release after all subclouds have been successfully upgraded to the target release.

Procedural Changes: Before executing software delete on the System Controller, ensure that every subcloud in the system has completed its upgrade or update process. Attempting to delete a release earlier is not supported and may impact subcloud operations.

Warning

Do not delete any previous release from the System Controller until all subclouds have been upgraded or updated.

ISO/SIG Upload to Central Cloud Fails when Using sudo

To upload a software patch or major release to the System Controller region using the --os-region-name SystemController option, the upload command must be authenticated with Keystone.

Procedural Changes: Do not use sudo with the --os-region-name SystemController option. For example, avoid using sudo software upload <software release> command.

Note

When using the -local option, you must provide the absolute path to the release files.

Note

When using software upload commands with --os-region-name SystemController to upload a software patch or major release to the System Controller region, Keystone authentication is required.

Important

Do not use sudo in combination with the --os-region-name SystemController option. For example, avoid using:

$ sudo software --os-region-name SystemController upload <software-release>

Instead, ensure the command is executed with proper authentication and without sudo.

For more information see, Upload Software Releases Using the CLI

RT Throttling Service not running after Lock/Unlock on Upgraded Subclouds

During the upgrade process, the USM post-upgrade script modifies systemd presets to define which services should be automatically enabled or disabled. As part of this process, any user-enabled custom services may be set to “disabled” after the upgrade completes.

Since this change occurs post-upgrade, systemd will not automatically re-enable the affected service during subsequent lock / unlock operations. By default, USM disables custom services not explicitly listed in the systemd presets. Since service definitions can vary between releases, USM relies on these presets to determine enablement status per host during the upgrade. If a custom service is not included in the presets, it will be marked as disabled and remain inactive after lock / unlock even following a successful upgrade.

Log message during the upgrade:

controller-0 usm-initialize[3061]: info Removed
/etc/systemd/system/multi-user.target.wants/sysctl-rt-sched-apply.service

Procedural Changes: Once the upgrade to StarlingX Release 11.0 completes, run the service-enable and service-start commands for all custom / user services before issuing the first lock / unlock (or reboot).

The enable and start commands for this service are required only once prior to the initial lock / unlock operation. After this step is completed, there is no further need to manually start or enable custom services, as the USM post-upgrade script has already run during the upgrade process.

sw-manager sw-deploy-strategy apply fails

sw-manager apply fails to apply the patch.

Note

The Procedural Changes is applicable only if the sw-manager sw-deploy-strategy fails with the following issues.

  1. To show the operation is in an aborted state due to a timeout, run the following command.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy show
    
    Strategy Patch Strategy:
      strategy-uuid:                          2082ab5e-a387-4b6a-be23-50ac23317725
      controller-apply-type:                  serial
      storage-apply-type:                     serial
      worker-apply-type:                      serial
      default-instance-action:                stop-start
      alarm-restrictions:                     strict
      current-phase:                          abort
      current-phase-completion:               100%
      state:                                  aborted
      apply-result:                           timed-out
      apply-reason:
      abort-result:                           success
      abort-reason:
    
  2. If step 1 fails with ‘timed-out’ results, check if the timeout has occurred due to step-name ‘wait-alarms-clear’ using the command below.

    To display results ‘wait for alarm’ that has timed out and run the following command.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy show --details
    
    step-name:                    wait-alarms-clear
    timeout:                      2400 seconds
    start-date-time:              2024-03-27 19:21:15
    end-date-time:                2024-03-27 20:01:16
    result:                       timed-out
    
  3. To list the 750.006 alarm, use the following command.

    ~(keystone_admin)]$ fm alarm-list
    
    +----------+---------------------------+--------------------+----------+---------------+
    | Alarm ID | Reason Text               |      Entity ID     | Severity | Time Stamp    |
    +----------+---------------------------+--------------------+----------+---------------+
    | 750.006  | A configuration change    | platform-integ-apps| warning  |    2024-03-27T|
    |          | requires a reapply of the |                    |          |    19:21:15.  |
    |          | platform-k8s_application= |                    |          |    471422     |
    |          | integ-apps application.   |                    |          |               |
    +----------+---------------------------+--------------------+----------+---------------+
    
  4. VIM orchestrated patch strategy failed with the 900.103 alarm being triggered.

    ~(keystone_admin)]$ fm alarm-list
    
    +----------+---------------------------+--------------------+----------+---------------+
    | Alarm ID | Reason Text               |      Entity ID     | Severity | Time Stamp    |
    +----------+---------------------------+--------------------+----------+---------------+
    | 900.103  | Software patch auto-apply | orchestration=sw-  | critical | 2024-03-26T03T|
    |          | failed                    |                    |          |               |
    +----------+---------------------------+--------------------+----------+---------------+
    

Procedural Changes - Option 1

  1. Check the system for existing alarms using the fm alarm-list command. If the existing alarms can be ignored then use the sw-manager sw-deploy-strategy create --alarm-restrictions relaxed command to ignore any alarms during patch orchestration

  2. If the alarm was not ignored in using the command in step 1 and the issue is seen when you encounter patch apply failure, check if alarm ‘750.006’ is present on the system.

  3. Delete the failed strategy using the following command.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy delete
    
  4. Create a new strategy.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy create --alarm-restrictions relaxed
    
  5. Apply the strategy.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy apply
    

Procedural Changes - Option 2

  1. Create a new strategy (alarm-restrictions are not relaxed).

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy create
    
  2. Apply the strategy.

    ~(keystone_admin)]$ sw-manager sw-deploy-strategy apply
    

    When the sw-deploy-strategy is in progress, and when at ‘wait-alarms-clear’ step (this can be found from ‘sw-manager patch strategy show –details | grep “step-name”’), check if alarm 750.006 is present, then execute the below steps.

  3. Execute the command.

    ~(keystone_admin)]$ system application-apply platform-integ-apps
    

    This will re-apply the application and clear the alarm ‘750.006’.

  4. If the alarm still persists after step 3, manually delete the alarm using fm alarm-delete <uuid of alarm 750.006> command.

Possible Performance Degradation on Bare-metal Ceph

Preliminary performance testing of StarlingX Release 12.0 indicates a potential degradation in Ceph throughput performance compared to StarlingX Release 11.0. The observed impact ranges from 0% to 9% and exhibits high variability based on the traffic profile. These results are derived from testing with default platform CPU and memory allocations. Further investigation is ongoing to better understand the behavior. This preliminary test uses default platform CPU and memory allocation.

Procedural Changes: Perform workload-specific performance validation prior to deployment. If throughput degradation is observed, consider tuning platform CPU and memory allocations or adjusting traffic profiles to mitigate the impact until further analysis and updated guidance are available.

Possible Performance Degradation on Rook Ceph

Preliminary performance testing of StarlingX Release 12.0 with Ceph version 18.2.7 indicates a potential degradation in Rook Ceph throughput performance compared to StarlingX Release 11.0. The observed impact ranges from 0% to 9% and shows high variability depending on the traffic profile. These results are based on default platform CPU and memory allocations. Further investigation in a future release is being carried out to better characterize the impact. This preliminary test uses default platform CPU and memory allocation.

Procedural Changes: Validate Rook Ceph performance under representative traffic workloads prior to deployment. If throughput degradation is observed, consider tuning platform CPU and memory allocations or adjusting workload profiles to mitigate the impact until further guidance is available.

Warning

Ceph 18.2.7 includes critical stability and reliability fixes, making it a mandatory upgrade for the StarlingX Release 12.0. See: https://ceph.io/en/news/blog/2025/v18-2-7-reef-released/.

Upgrade Failure on AIO-DX Ceph monitor after unlock

During the software upgrade on a AIO-DX there is a rare issue that can affect Ceph monitors. The affected fixed monitor can get stuck in a reboot loop.

The problem and the affected monitor can be identified by finding both the alarm and a message in the logs:

Alarm sample:

200.006: <HOSTNAME> is degraded due to the failure of its 'ceph-fixed-mon (mon.<HOSTNAME>, )' process.

Message “Bad table magic number” to be found on the faulty monitor log:

$ grep "Bad table magic number" /var/log/ceph/ceph-mon.<HOSTNAME>.log

Procedural Changes: Recreate the faulty monitor:

Warning

This should only be executed on the host that has the problem with the faulty monitor.

Follow the steps to recreate the monitor:

$ sudo rm -f /etc/pmon.d/ceph-fixed-mon.conf
$ sudo /etc/init.d/ceph stop mon-${HOSTNAME}
$ sudo rm -rf /var/lib/ceph/data/ceph-${HOSTNAME}/

$ sudo ceph-mon --cluster ceph --mkfs \
--id ${HOSTNAME} \
--keyring /dev/null \
--fsid $(ceph fsid) \
--mon-data /var/lib/ceph/data/ceph-${HOSTNAME}

$ sudo /etc/init.d/ceph-init-wrapper start mon.${HOSTNAME}

$ sudo ln -s /etc/ceph/ceph-fixed-mon.conf.pmon /etc/pmon.d/ceph-fixed-mon.conf

Change in Credential Management for CephObjectStoreUser

A single CephObjectStoreUser can no longer contain multiple credentials. When upgrading, Rook Ceph will delete all undeclared credentials and retain only one.

Procedural Changes: The credentials now must be defined in the CephObjectStoreUser and stored individually on individual secrets.

For more information on how to perform this operation, see Rook Ceph official documentation:

https://rook.io/docs/rook/v1.17/Storage-Configuration/Object-Storage-RGW/object-storage/#managing-user-s3-credentials

Rook Ceph Configuration Update for Object Storage

Some ObjectBucketClaim options added in Rook Ceph v1.16 (present in StarlingX Release 11.0) were disabled by default to improve security. Here is a comprehensive list with all disabled options: “bucketMaxObjects, bucketMaxSize, bucketPolicy, bucketLifecycle, bucketOwner”

Procedural Changes: To re-enable these options, provide a user override through Helm user overrides

Procedure

  1. Create an override file file_with_changes.yaml containing the key:

    obcAllowAdditionalConfigFields: "maxObjects,maxSize"
    

    The options to be re-enabled should be added to the value list in a comma-separated format.

  2. Apply the overrides with:

    $ system helm-override-update rook-ceph rook-ceph rook-ceph --values <file_with_changes.yaml>
    

Rook Ceph Application Limitation During Floating Monitor Removal

On a AIO-DX system, removing the floating monitor using “system controllerfs-modify ceph-float –functions=”” may lead to temporary system instability, including the possibility of uncontrolled swacts.

Procedural Changes: To avoid this issue, ensure that all finalizers are removed from the floating monitor Rook Ceph chart after its deletion, using the following command:

$ kubectl patch hr rook-ceph-floating-monitor -p '{"metadata":{"finalizers":[]}}' --type=merge

Host fails to lock during an upgrade

After adding multiple OSDs simultaneously configured in the Ceph cluster, some OSDs may remain in a configuring state even though the cluster is healthy and the OSD is deployed. This is an intermittent issue that only occurs on systems with Ceph storage backend configured with more than one OSD per host. This causes the system host-lock command to fail with the following error:

$ system host-lock controller-<id>
controller-<id> : Rejected: Can not lock a controller with storage devices
in 'configuring' state.

Since system host-lock on the controller fails and the OSD is still in the configuring state, the upgrade is blocked from proceeding.

Procedural Changes: Use the following steps to proceed with the upgrade.

  1. List the OSDID in the ‘configuring’ state using the following command:

    $ system host-stor-list <hostname>
    
  2. Identify the OSD using the following command:

    $ ceph osd find osd.<OSDID>
    
  3. If the OSD is found manually update the database inventory using the stor uuid:

    $ sudo -u postgres psql -U postgres -d sysinv -c "UPDATE i_istor SET state='configured' WHERE uuid='<STOR_UUID>';";
    

Rook Ceph Application Limitation

After applying Rook Ceph application in an AIO-DX configuration the 800.001 - Storage Alarm Condition: HEALTH_WARN alarm may be triggered.

Procedural Changes: Restart the pod of the monitor associated with the slow operations detected by Ceph. Check ceph -s.

Remove all OSDs on a host on Rook Ceph

The procedure to remove OSDs will not work as expected when removing all OSDs from a host. The Ceph cluster gets stuck with a HEALTH_WARN state.

Note

Use the Procedural change only if the cluster is stuck in HEALTH_WARN state after removing all OSDs on a host.

Procedural Changes:

  1. Check the cluster health status.

  2. Check crushmap tree.

  3. Remove the host(s) that are empty in the command executed before

  4. Check the cluster health status.

Critical alarm 800.001 after Backup and Restore on AIO-SX Systems

A Critical alarm 800.001 may be triggered after running the Restore Playbook. The alarm details are as follows:

~(keystone_admin)]$ fm alarm-list
+-------+----------------------------------------------------------------------+--------------------------------------+----------+---------------+
| Alarm | Reason Text                                                          | Entity ID                            | Severity | Time Stamp    |
| ID    |                                                                      |                                      |          |               |
+-------+----------------------------------------------------------------------+--------------------------------------+----------+---------------+
| 800.  | Storage Alarm Condition: HEALTH_ERR. Please check 'ceph -s' for more | cluster=                             | critical | 2024-08-29T06 |
| 001   | details.                                                             | 96ebcfd4-3ea5-4114-b473-7fd0b4a65616 |          | :57:59.701792 |
|       |                                                                      |                                      |          |               |
+-------+----------------------------------------------------------------------+--------------------------------------+----------+---------------+

Procedural Changes: To clear this alarm run the following commands:

Note

Applies only to AIO-SX systems.

FS_NAME=kube-cephfs
METADATA_POOL_NAME=kube-cephfs-metadata
DATA_POOL_NAME=kube-cephfs-data

# Ensure that the Ceph MDS is stopped
sudo rm -f /etc/pmon.d/ceph-mds.conf
sudo /etc/init.d/ceph stop mds

# Recover MDS state from filesystem
ceph fs new ${FS_NAME} ${METADATA_POOL_NAME} ${DATA_POOL_NAME} --force

# Try to recover from some common errors
sudo ceph fs reset ${FS_NAME} --yes-i-really-mean-it

cephfs-journal-tool --rank=${FS_NAME}:0 event recover_dentries summary
cephfs-journal-tool --rank=${FS_NAME}:0 journal reset
cephfs-table-tool ${FS_NAME}:0 reset session
cephfs-table-tool ${FS_NAME}:0 reset snap
cephfs-table-tool ${FS_NAME}:0 reset inode
sudo /etc/init.d/ceph start mds

Intermittent installation of Rook-Ceph on Distributed Cloud

While installing rook-ceph, if the installation fails, this is due to ceph-mgr-provision not being provisioned correctly.

Procedural Changes: It is recommended to use the system application-remove rook-ceph --force to initiate rook-ceph installation.

Storage Nodes are not considered part of the Kubernetes cluster

When running the system kube-host-upgrade-list command the output must only display controller and worker hosts that have control-plane and kubelet components. Storage nodes do not have any of those components and so are not considered a part of the Kubernetes cluster.

Procedural Changes: Do not include Storage nodes as part of the Kubernetes upgrade.

Optimization with a Large number of OSDs

As Storage nodes are not optimized, you may need to optimize your Ceph configuration for balanced operation across deployments with a high number of OSDs. This results in an alarm being generated even if the installation succeeds.

800.001 - Storage Alarm Condition: HEALTH_WARN. Please check ‘ceph -s’

Procedural Changes: To optimize your storage nodes with a large number of OSDs, it is recommended to use the following commands:

~(keystone_admin)]$ ceph osd pool set kube-rbd pg_num 256
~(keystone_admin)]$ ceph osd pool set kube-rbd pgp_num 256

Storage Nodes Recovery on Power Outage

Storage nodes take 10-15 minutes longer to recover in the event of a full power outage.

Procedural Changes: NA

Ceph Recovery on an AIO-DX System

In certain instances Ceph may not recover on an AIO-DX system, and remains in the down state when viewed using the ceph -s command; for example, if an OSD comes up after a controller reboot and a swact occurs, or other possible causes for example, hardware failure of the disk or the entire host, power outage, or switch down.

Procedural Changes: There is no specific command or procedure that solves the problem for all possible causes. Each case needs to be analyzed individually to find the root cause of the problem and the solution. It is recommended to contact Customer Support at, http://www.windriver.com/support.

Restrictions on the Size of Persistent Volume Claims (PVCs)

There is a limitation on the size of Persistent Volume Claims (PVCs) that can be used for all StarlingX Releases.

Procedural Changes: It is recommended that all PVCs should be a minimum size of 1GB. For more information, see, https://bugs.launchpad.net/starlingx/+bug/1814595.

Failure to clean up platform-integ-apps files/Helm release

The System Controller does not have Ceph configured, so the platform-integ-apps is not installed and the images are not automatically downloaded to registry.central when upgrading the platform.

The missing images on the subclouds are:

registry.central:9001/docker.io/openstackhelm/ceph-config-helper:ubuntu_focal_18.2.0-1-20231013
registry.central:9001/quay.io/cephcsi/cephcsi:v3.10.1
registry.central:9001/registry.k8s.io/sig-storage/csi-attacher:v4.4.2
registry.central:9001/registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.1
registry.central:9001/registry.k8s.io/sig-storage/csi-provisioner:v3.6.2
registry.central:9001/registry.k8s.io/sig-storage/csi-resizer:v1.9.2
registry.central:9001/registry.k8s.io/sig-storage/csi-snapshotter:v6.3.2

If the System Controller does not have Ceph configured and the subclouds have Ceph configured, then the images need to be manually uploaded to the registry.central before starting the upgrade of the subclouds.

To push the images to the registry.central, run the following commands on the System Controller:

# Change the variables according to the setup
REGISTRY_PREFIX="server:port/path"
REGISTRY_USERNAME="admin"
REGISTRY_PASSWORD="password"

sudo docker login registry.local:9001 --username ${REGISTRY_USERNAME} --password ${REGISTRY_PASSWORD}
for image in\
docker.io/openstackhelm/ceph-config-helper:ubuntu_focal_18.2.0-1-20231013 \
registry.k8s.io/sig-storage/csi-attacher:v4.4.2 \
registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.1 \
registry.k8s.io/sig-storage/csi-provisioner:v3.6.2 \
registry.k8s.io/sig-storage/csi-resizer:v1.9.2 \
registry.k8s.io/sig-storage/csi-snapshotter:v6.3.2 \
quay.io/cephcsi/cephcsi:v3.10.1

do
sudo docker pull ${REGISTRY_PREFIX}/${image}
sudo docker tag ${REGISTRY_PREFIX}/${image} registry.local:9001/${image}
sudo docker push registry.local:9001/${image}
done

Procedural Changes: In case the subcloud upgrade finishes without the correct images pushed to the registry.central, it is still possible to recover the system following the steps below.

After pushing the images to the registry.central, each subcloud must be recovered with the following steps (these commands should be run on the Subcloud):

source /etc/platform/openrc

# Remove old app manually
sudo rm -rf /opt/platform/helm/22.12/platform-integ-apps;
sudo rm -rf /opt/platform/fluxcd/22.12/platform-integ-apps;
sudo -u postgres psql postgres -d sysinv -c "DELETE from kube_app WHERE name = 'platform-integ-apps';";
sudo sm-restart service sysinv-inv && sudo sm-restart service sysinv-conductor;
sleep 15; # Wait services to restart
system application-upload /usr/local/share/applications/helm/platform-integ-apps-22.12-72.tgz;
sleep 15; # Wait upload to fail (It is expcected to fail here)
system application-delete platform-integ-apps;
system application-upload /usr/local/share/applications/helm/platform-integ-apps-22.12-72.tgz;
sleep 10; # Wait for the upload to succeed
system application-apply platform-integ-apps;

Note

The images need to be pushed to the registry.central registry before upgrading the subclouds.

Deprecated Notices in Stx 12.0

In-tree and Out-of-tree drivers

In StarlingX Release 12.0 only the out-of-tree versions of the Intel ice i40e, and iavf drivers are supported. Switching between in-tree and out-of-tree driver versions are not supported.

The out_of_tree_drivers service parameter and the out-of-tree-drivers boot parameter are deprecated and should not be modified to switch to in-tree driver versions. The values will be ignored, and the system will always use the out-of-tree versions of the Intel ice, i40e, and iavf drivers.

See: Intel Driver Versions

Kubernetes Root CA boostrap overrides

The overrides k8s_root_ca_cert, k8s_root_ca_key, and apiserver_cert_sans will be deprecated in a future release. External connections to kube-apiserver are now routed through a proxy that identifies itself using the REST API/GUI certificate issued by the platform issuer (system-local-ca).

See: Ansible Bootstrap Configurations

kubernetes-power-manager

Intel has stopped support for the kubernetes-power-manager application. This is still being supported by StarlingX and will be removed in a future release.

cpu_busy_cycles metric is deprecated and must be replaced with cpu_c0_state_residency_percent for continued usage (if the metrics are customized via helm overrides).

For more information, see Configurable Power Manager.

Bare Metal Ceph

Host-based Ceph is deprecated in StarlingX Release 11.0 and will be removed in a future release.

Impact: Existing deployments using Host-based Ceph will require migration prior to upgrading to StarlingX future releases.

Recommendation: For new deployments, adopt Rook-Ceph to avoid service disruption during the transition from Bare Metal Ceph to Rook.

Migration: StarlingX does not support bare-metal to Rook migration and all existing StarlingX users will need to reinstall and deploy Rook Ceph.

Static Configuration for Hardware Accelerator Cards

Static configuration for hardware accelerator cards is deprecated in StarlingX Release 10.0 and will be discontinued in future releases. Use FEC operator instead.

See Switch between Static Method Hardware Accelerator and SR-IOV FEC Operator

Kubernetes APIs

Kubernetes APIs that will be removed in K8s 1.27 are listed below:

See: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-27

Kubernetes APIs

Kubernetes APIs that will be removed in K8s 1.27 are listed below:

See: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-27

Containerd Schema 1 images

Support for Docker Schema 1 images has been disabled in containerd v2.0. Users should ensure all container images are using Docker Schema 2 or OCI image format. Docker Schema 1 images will no longer be supported and may fail to pull or run.

See: https://containerd.io/releases/#deprecated-features

cgroup v1 Deprecation and Transition to cgroup v2

Support for cgroup v1 is being deprecated in preparation for a full transition to cgroup v2. Applications that rely on cgroup v1 should be updated to use cgroup v2 when available, to ensure forward compatibility.

Support Lifecycle

StarlingX Release 12.0:

cgroup v1 is deprecated. cgroup v2 is not available.

Future Releases:

cgroup v1 will be fully removed. cgroup v2 will be enabled and supported.

O-RAN O2 IMS

In the context of hosting a RAN Application on Cloud Platform, the O-RAN O2 Application provides and exposes the IMS and Deployement Management Service APIs of the O2 interface between the O-Cloud (Cloud Platform) and the Service Management and Orchestration (SMO), in the O-RAN Architecture.

As O2 specifications evolve, O2 IMS capability falls outside the current StarlingX design scope and is supported by Wind River Conductor. StarlingX continues to support the O2 Kubernetes Deployement Management Service (DMS) interface in alignment with O-RAN Alliance specifications.

Ingress nginx

The upstream Ingress nginx project has announced that no updates will be provided after March 2026. StarlingX currently uses Ingress nginx to implement the Kubernetes Ingress API and plans to continue using it until further notice. Since the Ingress API in Kubernetes is now frozen and the Gateway API is the recommended replacement, StarlingX is targeting a future release to introduce support for the Kubernetes Gateway API. Applications currently relying on the Ingress nginx API are encouraged to migrate to the Gateway API once it becomes available in StarlingX.

kubernetes-power-manager

Intel has stopped support for the kubernetes-power-manager application. This is still being supported by StarlingX and will be removed in a future release.

For more information, see Configurable Power Manager.

Static Configuration for Hardware Accelerator Cards

Static configuration for hardware accelerator cards was deprecated in StarlingX Release 24.09.00 and will be discontinued in future releases. Use FEC operator instead.

See Switch between Static Method Hardware Accelerator and SR-IOV FEC Operator

N3000 FPGA Firmware Update Orchestration

The N3000 FPGA Firmware Update Orchestration was deprecated in StarlingX Release 10.0. For more information, see N3000 FPGA Overview for more information.

Alarms and Event Logs Deprecation

The following Alarms are deprecated:

900.101 - Software patch auto-apply in progress 900.102 - Software patch auto-apply aborting 900.103 - Software patch auto-apply failed

The following Event Logs are deprecated:

900.111 - Software patch auto-apply start 900.112 - Software patch auto-apply inprogress 900.113 - Software patch auto-apply rejected 900.114 - Software patch auto-apply cancelled 900.115 - Software patch auto-apply failed 900.116 - Software patch auto-apply completed 900.117 - Software patch auto-apply abort 900.118 - Software patch auto-apply aborting 900.119 - Software patch auto-apply abort rejected 900.120 - Software patch auto-apply abort failed 900.121 - Software patch auto-apply aborted

These will be removed in a future release. These are superceded by the USM Software 900.2xx series of alarms and event logs.

Removed in Stx 12.0

MacVTap Interfaces

MacVTap interfaces for KubeVirt VMs are not supported in StarlingX Release 12.0 and future releases.

ptp-notification v1 API

The ptp-notification v1 API is no longer supported in StarlingX Release 12.0 and has been removed. Only the O-RAN Compliant Notification API (ptp-notification v2 API) is supported.

Static Configuration for Hardware Accelerator Cards

Static configuration for hardware accelerator cards is no longer supported in StarlingX Release 12.0. Use FEC operator instead.

See Switch between Static Method Hardware Accelerator and SR-IOV FEC Operator

Out_of_tree_drivers “service parameter and out-of-tree-drivers” Boot parameters

By default out-of-tree-drivers drivers(ice/i40e/iavf) are loaded and these parameters are not required. So in StarlingX Release 12.0, the “out_of_tree_drivers” service parameter and the “out-of-tree-drivers” boot parameters have been removed. Users will need to modify their configuration files to avoid using these parameters.

Telegraf cpu_busy_cycles metric

The cpu_busy_cycles metric from the Telegraf Intel PowerStat Input Plugin in power-metrics application has been removed and the cpu_c0_state_residency_percent metric should be used instead.

Kubernetes Root CA Customization

The option to provide a custom Kubernetes Root CA certificate and key has been removed from the interfaces used to update the certificate (system, sw-manager and dcmanager CLI and APIs, and from the Horizon GUI). The subject and expiration date customization options remain supported.

The deprecated bootstrap overrides used previously to customize the certificate (k8s_root_ca_cert, k8s_root_ca_key, and apiserver_cert_sans) have been removed as well.

The Kubernetes Root CA certificate now will always be auto generated and used for internal connections. External connections to kube-apiserver are routed through HAproxy, which identifies itself using the REST API/GUI certificate (system-restapi-gui-certificate) issued by the platform issuer (system-local-ca).

Hardware Updates

See:

Bug status

Fixed bugs

This release provides fixes for a number of defects. Refer to the StarlingX bug database to review the Release 12.0 Fixed Bugs.