StarlingX Kubernetes 12.0¶
The sections below provide a detailed list of new features, enhancements, and updates, and links to the associated user guides (if applicable).
ISO image¶
The pre-built ISO (Debian) for StarlingX 12.0 is located at the
StarlingX mirror repo:
Source Code for StarlingX 12.0¶
The source code for StarlingX 12.0 is available on the r/stx.12.0 branch in the StarlingX repositories.
Deployment¶
To deploy StarlingX 12.0, see Consuming StarlingX.
For detailed installation instructions, see StarlingX 12.0 Installation Guides.
New Features / Enhancements / Limitations¶
This release introduces several new features designed to improve usability and efficiency, along with targeted enhancements to existing functionality. Performance, stability, and overall user experience have been enhanced. Some features may have known limitations or constraints in this release; these are documented to help set expectations and guide usage.
Kernel updates¶
Kernel version 6.12.57 is now supported in StarlingX Release 12.0
Kubernetes Versions¶
Kubernetes versions supported in StarlingX Release 12.0 are 1.32.2 - 1.34.1. The default version is K8s 1.34.1 for fresh installs.
Platform Applications¶
All StarlingX containerized application components have been updated to the latest versions and compatible with the StarlingX Release 12.0 supported Kubernetes release.
For more information on all StarlingX Application version updates, see Platform Applications.
Bare-metal to Rook Migration¶
StarlingX does not support bare-metal to Rook migration and all existing StarlingX users will need to reinstall and deploy Rook Ceph.
Centralized IAM (Identity and Access Management) with OIDC compatible MFA¶
StarlingX adds support for an OIDC backend through the OIDC DEX Identity (IDP) Proxy. For example, a DEX OIDC backend could be configured for a remote Keycloak IDP (an OIDC-compliant IDP with native MFA support). StarlingX will continue to support an LDAP backend through the DEX IDP Proxy for:
StarlingX default local LDAP-based authentication, and customer environments that rely on LDAP solutions such as Windows Active Directory (WAD).
kubectl Support for oidc-login Plugin for Improved Usability¶
StarlingX adds support for the oidc-login plugin for kubectl. It provides an improved user experience when using OIDC authentication with Kubernetes CLI (kubectl).
With kubectl configured to use oidc-login, kubectl commands will automatically initiate OIDC Authentication when required. If there is no token or an expired token in the kubectl cache, then the plugin will open a browser for OIDC login (with MFA if applicable). If login is successful, the kubectl cache will be updated automatically with the resulting token, and kubectl will continue and send the command.
oidc-auth-apps (Local DEX OIDC IDP) now Started up by Default on Install and Upgrade¶
The oidc-auth-apps application (Local DEX OIDC IdP) is now configured and
enabled by default during installation and upgrades. The deployed configuration
uses the StarlingX Local LDAP service as the IdP backend.
See:
To configure Frontend Dex OIDC, see Configure Frontend Dex OIDC
To configure Local LDAP Backend, see Configure Local LDAP Backend
To configure Remote WAD Backend, see Configure Remote WAD Backend
To configure Remote OIDC Backend, see Configure Remote OIDC Backend
To configure Multiple Backends, see Configure Multiple Backends
Default Helm Overrides for OIDC Auth Apps application, see Default Helm Overrides for OIDC Auth Apps application
Kubernetes OIDC Authentication now Configured by Default to use ‘oidc-auth-apps’ on Install¶
Kubernetes is configured by default to support OIDC Authentication using the ‘oidc-auth-apps’ OIDC DEX Proxy Identity Provider.
(StarlingX Unified Identity Management Authentication with OIDC¶
system/software/fm/sw-manager/dcmanager APIs/CLIs optionally supports
OIDC authentication in StarlingX Release 12.0. Keystone authentication remains
the default.
Users can now enable OIDC authentication using a command line option.
See:
PTP Partial Timing Support (PTS) as Cloud Platform Timing Source¶
StarlingX Release 12.0 introduces Partial Timing Support (PTS) to the existing Precision Time Protocol (PTP) framework, complementing the current support for Full Timing Support (FTS) and local GNSS sources.
To deliver a robust and reliable PTS implementation, StarlingX enables the system to leverage frequency synchronization (SyncE) when available to further improve timing stability.
Additionally, to meet PTS accuracy requirements, the solution now utilizes hardware timestamp offload when supported by the NIC. This reduces packet delay variation introduced by the software network stack, improving timing precision.
See:
New Node Label sriovdp-rdma¶
This update adds support for an additional node label
sriovdp-rdma=enabled|disabled, allowing the SR-IOV Device Plugin to
conditionally set the isRdma parameter for Mellanox / InfiniBand devices.
See:
RINLINE Driver Life Cycle Management Solution¶
The app-kernel-module-management application enables dynamic loading of out-of-tree kernel modules at runtime, with modules distributed independently of the platform. Kernel modules can be optionally built on demand into container images or provided as pre-built container images. These images are used to deploy and load the required modules onto target Kubernetes nodes. Custom Kubernetes resources are used to define which kernel modules to load, from which container images, and the specific nodes on which the modules should be deployed.
Granite Rapid-D vCSR Solution with BMC always accessible (factory pre-installed vCSR Software)¶
Custom cloud-init Configuration of Subcloud Enrollment¶
StarlingX now supports deploying a Virtual Cell Site Router (vCSR) on AIO-SX subcloud sites where no onsite routing hardware is present. All host network traffic now traverses the vCSR, which is deployed as a containerized application during site installation.
Initial Subcloud installation relies on BMC access, with configuration performed through the subcloud enrollment workflow using a cloud-init no-cloud ISO. Enrollment now supports custom setup scripts and configuration files to automate platform and vCSR initialization and enable site connectivity.
To accommodate sites without direct network access, servers must be factory-installed with all required software. The factory install process has been extended to include vCSR setup using customizable install scripts.
The System Controller uses IPMI system event logs to monitor subcloud status over its BMC.
See:
Subcloud Auto-Restore Support for vCSR Deployments¶
StarlingX also supports auto-restore capabilities to support disaster-recovery scenarios for subclouds using a virtual Cell Site Router (vCSR). While initial deployment is handled through the customized subcloud enrollment cloud-init process additional mechanisms are required to recover the Subcloud, the host, or if the vCSR pod becomes unavailable.
Adds auto-restore functionality to support disaster-recovery scenarios for subclouds using a virtual Cell Site Router (vCSR).
Complements the initial vCSR setup performed via the customized subcloud enrollment cloud-init process.
Supports three recovery scenarios:
Subcloud enrollment failure requiring factory-default restore.
Platform or vCSR service failure requiring local backup restore.
Hardware failure requiring server replacement and remote backup restore.
Introduces a local-only autonomous restore workflow, required because connectivity cannot be restored until the vCSR is operational.
Restore process is fully self-orchestrated on the host—no remote Ansible playbooks or remote APIs are used.
Only exception: BMC Redfish API is used to initiate installs or trigger recovery actions. Workflow includes:
Local system reinstall from prestaged software.
Automated restore of the backup archive to recover Subcloud and vCSR state.
Use of locally stored container images, either prestaged or restored from a local registry backup.
Ensures subcloud recovery is possible even at remote sites with no direct network access during outages.
See:
Cloud Platform Reboot Process Optimization¶
StarlingX Release 12.0 introduces enhancements to reduce StarlingX host reboot durations.
Note
The scale of these improvements depends on user configuration and hardware type.
Testing was performed in a lab environment with AIO-SX Granite Rapids-D, and the following results were achieved.
Server Reboot: 5-6 minutes
Pod with PV storage: recovers in 6-12 minutes (including the reboot time)
Subcloud Platform / Kernel Stall Watchdog¶
New configurability is now available to allow customization of selected
sysctl tunable parameters. This feature enables operators to adjust key
kernel behaviors to better align with deployment requirements. The following
parameters are now configurable:
kernel.hung_task_timeout_secs=600
kernel.hung_task_panic=1
Additionally, support for custom kdump data collection has been added through new pre and post-k dump hook capabilities. These hooks allow operators to collect additional diagnostic data before and after kdump execution to improve debugging and post-mortem analysis.
When a kdump crash dump is triggered, you can gather additional data beyond the vmcore by using custom hook scripts.
StarlingX now supports an extension to kexec-tools / kdump-tools to
incorporate customizable pre and post-hook mechanisms. This is supported on
StarlingX Standalone and Distributed cloud deployments.
See:
NetApp Trident with Fiber Channel(FC) and Internet SCSI(iSCSI) Protocols¶
StarlingX introduces support for NetApp Trident using FC and iSCSI external storage backends for both platform and CNF applications.
NetApp backend supports NetApp ONTAP NAS (NFS) and NetApp ONTAP SAN (iSCSI and Fibre Channel) configurations.
The solution is hardware-agnostic and any NetApp certified FC or iSCSI storage system is expected to work and is compatible with all StarlingX certified servers. All StarlingX certified server models are supported, with no server specific dependencies. This applies to all configurations, including Standalone and Distributed Cloud.
See: Configure an External NetApp Deployment as the Storage Backend.
Multipath Configuration Enhancements¶
The multipath configuration has been enhanced to support full customization
through the /etc/multipath.conf file. Administrators can now directly modify
all key parameters including path selection policies, failover behavior, and
device-specific rules without being constrained by predefined defaults.
In addition, the previously enforced default device blacklist has been removed. This change broadens hardware compatibility and allows multipath settings to be customized to better align with the needs of each deployment environment.
Storage Configuration Management Enhancement¶
This release introduces greater flexibility for storage configuration by removing the managed StarlingX built-in Puppet template. With this change, customers and Field Support Engineering can implement custom storage configurations tailored to their specific business requirements.
This enhancement applies to all StarlingX supported iSCSI and Fibre Channel storage backends and is supported across both standalone and Distributed Cloud deployments.
Note
This update does not impact or degrade StarlingX performance.
ACPI (Advanced Configuration and Power Interface) Driver¶
The system idle driver has been switched from acpi_idle to intel_idle.
The intel_idle driver is the preferred idle driver for Intel platforms, as it
provides more efficient C-state management by leveraging Intel-specific
knowledge of the processor’s power states, resulting in improved power
efficiency and lower latency transitions compared to the generic acpi_idle driver.
Crane CLI Tool¶
Crane is a command-line tool for working with remote container images and registries. It offers functionality similar to Docker while providing additional advanced capabilities.
See Crane Recipes for more information.
See Crane Container Images for more information.
AMD CPU Siena Based Server Support for StarlingX Subcloud¶
StarlingX now supports servers based on AMD 4th Generation EPYC processors, providing users with an alternative CPU option along with existing Intel-based platforms.
StarlingX on AMD delivers the same functionality, workflows, and CLI / GUI experience for all operations not tied to CPU specific behavior.
Compatibility and Platform Support
All capabilities available on Intel CPU platforms are expected to run equivalently on AMD EPYC-based servers. Both Standalone and Distributed cloud StarlingX configurations currently supported on Intel hardware are now supported on AMD CPU platforms.
Known Limitations and Procedural Changes for StarlingX 12.0¶
Kubernetes Memory Manager Policies¶
The interaction between the kube-memory-mgr-policy=static
and the Topology Manager policy “restricted” can result in pods failing to be
scheduled or started even when there is sufficient memory. This
occurs due to the restrictive design of the NUMA-aware memory manager, which
prevents the same NUMA node from being used for both single and multi-NUMA
allocations.
Procedural Changes: It is important for users to understand the implications of these memory management policies and configure their systems accordingly to avoid unexpected failures.
For detailed configuration options and examples, refer to the Kubernetes documentation at https://kubernetes.io/docs/tasks/administer-cluster/memory-manager/.
Alarm 900.024 Raised When Uploading N-1 Patch Release to the System Controller¶
When uploading an N-1 patch release to the System Controller, alarm 900.024 (Obsolete Patch) will be triggered.
This behavior is specific to the System Controller and occurs only when uploading an N-1 patch
Procedural Changes: This warning can be safely ignored.
Kubevirt Limitations¶
The following limitations apply to Kubevirt in StarlingX Release 12.0:
Limitation: Kubernetes does not provide CPU Manager detection.
Procedural Changes: Add
cpumanagerto Kubevirt:apiVersion: kubevirt.io/v1 kind: KubeVirt metadata: name: kubevirt namespace: kubevirt spec: configuration: developerConfiguration: featureGates: - LiveMigration - Macvtap - Snapshot - CPUManagerCheck the label, using the following command:
~(keystone_admin)]$ kubectl describe node | grep cpumanager where `cpumanager=true`
Limitation: Huge pages do not show up under cat /proc/meminfo inside a guest VM. Although, resources are being consumed on the host. For example, if a VM is using 4GB of Huge pages, the host shows the same 4GB of huge pages used. The huge page memory is exposed as normal memory to the VM.
Procedural Changes: You need to configure Huge pages inside the guest OS.
See the Installation Guides at Installation Guides <https://docs.windriver.com/> for more details.
Limitation: Virtual machines using Persistent Volume Claim (PVC) must have a shared ReadWriteMany (RWX) access mode to be live migrated.
Procedural Changes: Ensure PVC is created with RWX.
$ class=cephfs --access-mode=ReadWriteMany $ virtctl image-upload --pvc-name=cirros-vm-disk-test-2 --pvc-size=500Mi --storage-class=cephfs --access-mode=ReadWriteMany --image-path=/home/sysadmin/Kubevirt-GA-testing/latest-manifest/kubevirt-GA-testing/cirros-0.5.1-x86_64-disk.img --uploadproxy-url=https://10.111.54.246 -insecure
Note
Live migration is not allowed with a pod network binding of bridge interface type ()
Live migration requires ports 49152, 49153 to be available in the virt-launcher pod. If these ports are explicitly specified in the masquarade interface, live migration will not function.
For live migration with SR-IOV interface:
specify networkData: in cloudinit, so when the VM moves to another node it will not loose the IP config
specify nameserver and internal FQDNs to connect to cluster metadata server otherwise cloudinit will not work
fix the MAC address otherwise when the VM moves to another node the MAC address will change and cause a problem establishing the link
Example:
cloudInitNoCloud: networkData: | ethernets: sriov-net1: addresses: - 128.224.248.152/23 gateway: 128.224.248.1 match: macAddress: "02:00:00:00:00:01" nameservers: addresses: - 10.96.0.10 search: - default.svc.cluster.local - svc.cluster.local - cluster.local set-name: sriov-link-enabled version: 2Limitation: Snapshot CRDs and controllers are not present by default and needs to be installed on StarlingX.
Procedural Changes: To install snapshot CRDs and controllers on Kubernetes, see:
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
Additionally, create
VolumeSnapshotClassfor Cephfs and RBD:cat <<EOF>cephfs-storageclass.yaml — apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-cephfsplugin-snapclass driver: cephfs.csi.ceph.com parameters: clusterID: 60ee9439-6204-4b11-9b02-3f2c2f0a4344 csi.storage.k8s.io/snapshotter-secret-name: ceph-pool-kube-cephfs-data csi.storage.k8s.io/snapshotter-secret-namespace: default deletionPolicy: Delete EOF
cat <<EOF>rbd-storageclass.yaml — apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-rbdplugin-snapclass driver: rbd.csi.ceph.com parameters: clusterID: 60ee9439-6204-4b11-9b02-3f2c2f0a4344 csi.storage.k8s.io/snapshotter-secret-name: ceph-pool-kube-rbd csi.storage.k8s.io/snapshotter-secret-namespace: default deletionPolicy: Delete EOF .. note:: Get the cluster ID from : ``kubectl describe sc cephfs, rbd``Limitation: Live migration is not possible when using configmap as a filesystem. Currently, virtual machine instances (VMIs) cannot be live migrated as
virtiofsdoes not support live migration.Procedural Changes: N/A.
Limitation: Live migration is not possible when a VM is using secret exposed as a filesystem. Currently, virtual machine instances cannot be live migrated since
virtiofsdoes not support live migration.Procedural Changes: N/A.
Limitation: Live migration will not work when a VM is using ServiceAccount exposed as a file system. Currently, VMIs cannot be live migrated since
virtiofsdoes not support live migration.Procedural Changes: N/A.
Upper Case Characters in Host Names Cause Issues with Kubernetes Labelling¶
Upper case characters in host names cause issues with kubernetes labelling.
Procedural Changes: Host names should be in lower case.
Kubernetes Taint on Controllers for Standard Systems¶
In Standard systems, a Kubernetes taint is applied to controller nodes in order to prevent application pods from being scheduled on those nodes; since controllers in Standard systems are intended ONLY for platform services. If application pods MUST run on controllers, a Kubernetes toleration of the taint can be specified in the application’s pod specifications.
Procedural Changes: Customer applications that need to run on controllers on Standard systems will need to be enabled/configured for Kubernetes toleration in order to ensure the applications continue working after an upgrade from StarlingX Release to StarlingX future Releases. It is suggested to add the Kubernetes toleration to your application prior to upgrading to StarlingX 7.0 Release.
You can specify toleration for a pod through the pod specification (PodSpec). For example:
spec:
....
template:
....
spec
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
See: Taints and Tolerations.
Application Fails After Host Lock/Unlock¶
In some situations, application may fail to apply after host lock/unlock due to previously evicted pods.
Procedural Changes: Use the kubectl delete command to delete the evicted pods and reapply the application.
Application Apply Failure if Host Reset¶
If an application apply is in progress and a host is reset it will likely fail. A re-apply attempt may be required once the host recovers and the system is stable.
Procedural Changes: Once the host recovers and the system is stable, a re-apply may be required.
Platform CPU Usage Alarms¶
Alarms may occur indicating platform cpu usage is greater than 90% if a large number of pods are configured using liveness probes that run every second.
Procedural Changes: To mitigate either reduce the frequency for the liveness probes or increase the number of platform cores.
Pods Using isolcpus¶
The isolcpus feature currently does not support allocation of thread siblings for cpu requests (i.e. physical thread +HT sibling).
Procedural Changes: For optimal results, if hyperthreading is enabled then isolcpus should be allocated in multiples of two in order to ensure that both SMT siblings are allocated to the same container.
Procedural Changes: N/A.
K8s Upgrade Abort Failure and Pod Creation Issues on Service Restart¶
During a Kubernetes upgrade, the kubelet upgrade step updates the
sandbox image in /etc/containerd/config.toml to match the pause image
version used by the upgraded k8s control plane. If the k8s upgrade
process is aborted after this step and later re-initiated, the pause
image version is incorrectly modified after kubelet upgrade, resulting
in an invalid image tag, which causes subesequent abort failures.
Systems in this state may also fail to create new pods or restart
existing pods after a service restart.
- Initial k8s upgrade (1.32 -> 1.34)
registry.local:9001/registry.k8s.io/pause:3.10 -> registry.local:9001/registry.k8s.io/pause:3.10.1
- K8s Upgrade aborted and retried:
registry.local:9001/registry.k8s.io/pause:3.10.1 -> registry.local:9001/registry.k8s.io/pause:3.10.1.1 (invalid tag)
Procedural Changes: Update the invalid image tag
(i.e. change 3.10.1.1 to 3.10.1) in the /etc/containerd/config.toml file
and retry the abort operation.
After a Kubernetes upgrade is successfully re‑initiated, new pods may fail to be created, or existing pods may fail to restart following a service restart, particularly after a containerd service restart. Resolve this issue using the following steps:
Correct the invalid image tag in /etc/containerd/config.toml.
Restart the containerd service.
Recreate the affected pods if required.
Note
To prevent this issue, validate and correct the sandbox image tag before performing the abort operation during a re-tried upgrade.
KubeVirt VMs are Not Scheduled After Backup and Restore with wipe_ceph_osds=true¶
When a backup and restore operation is executed with wipe_ceph_osds=true,
KubeVirt VMs are not scheduled.
Procedural Changes: After a backup and restore operation with wipe_ceph_osds=true,
the VM image needs to be uploaded to the DataVolume to reinitialize the VM.
Stop VMs Before Platform Rollback¶
StarlingX rollback will fail if KubeVirt VMs are not stopped before executing the activate-rollback operation.
Procedural Changes: If the KubeVirt application is installed, stop running VMs before nitiating the platform rollback. Running VMs during a rollback will cause ive migration and VM management failures after the rollback completes. For details, see Stop VMs Before Platform Rollback.
Subcloud Restore to N-1 Release with Additional Patches¶
Before restoring a subcloud to the latest or a specific patch level of the N-1 release, you must first upload the corresponding pre-patched ISO for that patch level.
Procedural Changes: N/A.
Subcloud install or restore to the previous release¶
If the System Controller is on StarlingX Release 12.0, subclouds can be deployed or restored to either StarlingX Release 11.0 or StarlingX Release 12.0.
The following operations have limited support for subclouds of the previous release:
Subcloud error reporting
The following operations are not supported for subclouds of the previous release:
Orchestrated subcloud kubernetes upgrade
Procedural Changes: N/A.
Subcloud Upgrade with Kubernetes Versions¶
Before upgrading the platform, update Kubernetes to the highest version supported by your current platform release, as the new platform version requires this specific Kubernetes version. Orchestrated Kubernetes upgrades are not supported for N-1 subclouds. For example, before upgrading to StarlingX Release 12.0, verify that the System Controller and all subclouds are running Kubernetes v1.32.2 (the highest version supported by StarlingX Release 11.0).
Procedural Changes: N/A.
Enhanced Parallel Operations for Distributed Cloud¶
No parallel operation should be performed while the System Controller is being patched.
Only one type of parallel operation can be performed at a time. For example, subcloud prestaging or upgrade orchestration should be postponed while batch subcloud deployment is still in progress.
Examples of parallel operation:
any type of
dcmanager orchestration(prestage, sw-deploy, kube-upgrade, kube-rootca-update)concurrent
dcmanager subcloud adddcmanager subcloud-backup/subcloud-backup restorewith –group option
Procedural Changes: N/A.
Subcloud Prestage Post Restore¶
Subcloud backups do not include prestaged software and container images for new release deployments. If you restore a subcloud from backup, you must prestage the subcloud again before deploying the new release.
See:
Procedural Changes: N/A.
IPsec Certificate Renewal Post Duplex/Standard Subcloud Rehoming¶
After rehoming an AIO-DX or Standard subcloud, the IPsec certificate must be renewed on all subcloud nodes and certain services must be restarted. This is required to ensure successful software updates and upgrades. Contact Wind River Customer Support at https://www.windriver.com/services#support for the Ansible playbook that automates these tasks across all applicable subclouds.
Procedural Changes: N/A.
Unable to create Kubernetes Upgrade Strategy for Subclouds using Horizon GUI¶
When creating a Kubernetes Upgrade Strategy for a subcloud using the Horizon GUI, it fails and displays the following error:
kube upgrade pre-check: Invalid kube version(s), left: (v1.24.4), right:
(1.24.4)
Procedural Changes: Use the following steps to create the strategy:
Procedure
Create a strategy for subcloud Kubernetes upgrade using the dcmanager kube-upgrade-strategy create --to-version <version> command.
Apply the strategy using the Horizon GUI or the CLI using the command dcmanager kube-upgrade-strategy apply.
Apply a Kubernetes Upgrade Strategy using Horizon
Procedural Changes: N/A.
k8s-coredump only supports lowercase annotation¶
Creating K8s pod core dump fails when setting the
starlingx.io/core_pattern parameter in upper case characters on the pod
manifest. This results in the pod being unable to find the target directory
and fails to create the coredump file.
Procedural Changes: The starlingx.io/core_pattern parameter only accepts
lower case characters for the path and file name where the core dump is saved.
Huge Page Limitation on Postgres¶
Debian postgres version supports huge pages, and by default uses 1 huge page if it is available on the system, decreasing by 1 the number of huge pages available.
Procedural Changes: The huge page setting must be disabled by setting
/etc/postgresql/postgresql.conf: "huge_pages = off". The postgres service
needs to be restarted using the Service Manager sudo sm-restart service postgres
command.
Warning
The Procedural Changes is not persistent, therefore, if the host is rebooted it will need to be applied again. This will be fixed in a future release.
Quartzville Tools¶
The following celo64e and nvmupdate64e commands are not supported in StarlingX, Release 9.0 due to a known issue in Quartzville tools that crashes the host.
Procedural Change: Reboot the host using the boot screen menu.
Connectivity Issues on E825 (Granite Rapids D Integrated NIC) when Enabling SyncE TX Signal¶
Enabling the SyncE TX clock on E825 (tx_clk=synce in the PTP clock instance) results in a single, expected link flap while the NIC physical interface adjusts to drive the signal.
When platform interfaces are also configured on the E825 NIC, the link interruption may result in system alarms, and in AIO-DX or Standard deployments, may trigger a SWACT if the configuration is applied during runtime. The following alarms are raised:
400.005 / 401.005: Communication failure detected with peer
100.106 - 100.111: port / interface failed
Note
The 100.xxx alarm ID depends on the affected interface.
Warning
If the E825 link is established over a 1G link, connectivity may be completely disrupted.
Procedural Changes: Avoid configuring tx_clk=synce on E825 ports. If
SyncE TX signal is expected, it is recommended to enable it only on specific
PTP ports, avoiding the usage on all ports on each NIC.
Add / delete operations on pods results in errors¶
Under some circumstances, add / delete operations on pods results in error getting ClusterInformation: connection is unauthorized: Unauthorized and also results in pods staying in ContainerCreating/Terminating state. This error may also prevent users from locking a host.
Procedural Changes: If this error occurs run the following kubectl describe pod -n <namespace> <pod name> command. The following message is displayed:
error getting ClusterInformation: connection is unauthorized: Unauthorized
Limitation: There is also a known issue with the Calico CNI that may occur in rare occasions if the Calico token required for communication with the kube-apiserver becomes out of sync due to NTP skew or issues refreshing the token.
Procedural Changes: Delete the calico-node pod (causing it to automatically restart) using the following commands:
$ kubectl get pods -n kube-system --show-labels | grep calico
$ kubectl delete pods -n kube-system -l k8s-app=calico-node
Application Pods with SRIOV Interfaces¶
Application Pods with SR-IOV Interfaces require a restart-on-reboot: “true” label in their pod spec template.
Pods with SR-IOV interfaces may fail to start after a platform restore or Simplex upgrade and persist in the Container Creating state due to missing PCI address information in the CNI configuration.
Procedural Changes: Application pods that require|SRIOV| should add the label restart-on-reboot: “true” to their pod spec template metadata. All pods with this label will be deleted and recreated after system initialization, therefore all pods must be restartable and managed by a Kubernetes controller (i.e. DaemonSet, Deployment or StatefulSet) for auto recovery.
Pod Spec template example:
template:
metadata:
labels:
tier: node
app: sriovdp
restart-on-reboot: "true"
PTP O-RAN Spec Compliant Timing API Notification¶
The v2 API conforms to O-RAN.WG6.O-Cloud Notification API-v02.01 with the following exceptions, that are not supported in StarlingX
O-RAN SyncE Lock-Status-Extended notifications
O-RAN SyncE Clock Quality Change notifications
O-RAN Custom cluster names
Procedural Changes: See the PTP-notification v2 document for further details: https://docs.starlingx.io/api-ref/ptp-notification-armada-app/api_ptp_notifications_definition_v2.html
ptp4l error “timed out while polling for tx timestamp” reported for NICs using the Intel ice driver¶
NICs using the Intel® ice driver may report the following error in the ptp4l
logs, which results in a PTP port switching to FAULTY before
re-initializing.
Note
PTP ports frequently switching to FAULTY may degrade the accuracy of
the PTP timing.
ptp4l[80330.489]: timed out while polling for tx timestamp
ptp4l[80330.489]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
Note
This is due to a limitation with the Intel® ice driver as the driver cannot
guarantee the time interval to return the timestamp to the ptp4l user
space process which results in the occasional timeout error message.
Procedural Changes: The Procedural Changes recommended by Intel is to increase the
tx_timestamp_timeout parameter in the ptp4l config. The increased
timeout value gives more time for the ice driver to provide the timestamp to
the ptp4l user space process. Timeout values of 50ms and 700ms have been
validated. However, the user can use a different value if it is more suitable
for their system.
~(keystone_admin)]$ system ptp-instance-parameter-add <instance_name> tx_timestamp_timeout=700
~(keystone_admin)]$ system ptp-instance-apply
Note
The ptp4l timeout error log may also be caused by other underlying
issues, such as NIC port instability. Therefore, it is recommended to
confirm the NIC port is stable before adjusting the timeout values.
PTP is not supported on Broadcom 57504 NIC¶
PTP is not supported on the Broadcom 57504 NIC.
Procedural Changes: Do not configure PTP instances on the Broadcom 57504 NIC.
synce4l CLI options are not supported¶
The SyncE configuration using synce4l is not supported in StarlingX.
The service type of synce4l in the ptp-instance-add command
is not supported in StarlingX.
Procedural Changes: N/A.
ptp-notification application is not supported during bootstrap¶
Deployment of
ptp-notificationduring bootstrap time is not supported due to dependencies on the system PTP configuration which is handled post-bootstrap.Procedural Changes: N/A.
The helm-chart-attribute-modify command is not supported for
ptp-notificationbecause the application consists of a single chart. Disabling the chart would renderptp-notificationnon-functional.Procedural Changes: N/A.
The ptp-notification-demo App is not a System-Managed Application¶
The ptp-notification-demo app is provided for demonstration purposes only. Therefore, it is not supported on typical platform operations such as Upgrades and Backup and Restore.
Procedural Changes: NA
Silicom TimeSync (STS) Card limitations¶
Silicom and Intel based Time Sync NICs may not be deployed on the same system due to conflicting time sync services and operations.
PTP configuration for Silicom TimeSync (STS) cards is handled separately from StarlingX host PTP configuration and may result in configuration conflicts if both are used at the same time.
The sts-silicom application provides a dedicated
phc2sysinstance which synchronizes the local system clock to the Silicom TimeSync (STS) card. Users should ensure thatphc2sysis not configured via StarlingX PTP Host Configuration when the sts-silicom application is in use.Additionally, if StarlingX PTP Host Configuration is being used in parallel for non-STS NICs, users should ensure that all
ptp4linstances do not use conflictingdomainNumbervalues.When the Silicom TimeSync (STS) card is configured in timing mode using the sts-silicom application, the card goes through an initialization process on application apply and server reboots. The ports will bounce up and down several times during the initialization process, causing network traffic disruption. Therefore, configuring the platform networks on the Silicom TimeSync (STS) card is not supported since it will cause platform instability.
Procedural Changes: N/A.
N3000 Image in the containerd cache¶
The StarlingX system without an N3000 image in the containerd cache fails to configure during a reboot cycle, and results in a failed / disabled node.
The N3000 device requires a reset early in the startup sequence. The reset is
done by the n3000-opae image. The image is automatically downloaded on bootstrap
and is expected to be in the cache to allow the reset to succeed. If the image
is not in the cache for any reason, the image cannot be downloaded as
registry.local is not up yet at this point in the startup. This will result
in the impacted host going through multiple reboot cycles and coming up in an
enabled/degraded state. To avoid this issue:
Ensure that the docker filesystem is properly engineered to avoid the image being automatically removed by the system if flagged as unused. For instructions to resize the filesystem, see Increase Controller Filesystem Storage Allotments Using the CLI
Do not manually prune the N3000 image.
Procedural Changes: Use the procedure below.
Procedure
Lock the node.
~(keystone_admin)]$ system host-lock controller-0
Pull the (N3000) required image into the
containerdcache.~(keystone_admin)]$ crictl pull registry.local:9001/docker.io/starlingx/n3000-opae:stx.8.0-v1.0.2
Unlock the node.
~(keystone_admin)]$ system host-unlock controller-0
Deploying an App using nginx controller fails with internal error after controller.name override¶
An Helm override of controller.name to the nginx-ingress-controller app may result in errors when creating ingress resources later on.
Example of Helm override:
Procedural Changes: N/A.
Operating System Noise on Application-Isolated cores / Cyclic Test Performance Degradation¶
Current analysis indicates no confirmed application impact. Due to limited visibility into specific application requirements and tolerances, a definitive assessment cannot be made.
While internal tests show some performance degradation in cyclictest metrics, when validated against StarlingX internal benchmarks, there are no corresponding external application requirements for Cascade Lake or Ice Lake platforms that are being violated. Requirements for GNR-D continue to be met. The observed degradation is acceptable for most use cases, though this is not guaranteed.
An impact related to osnoise has been observed during the system initialization
phase. This is transient in nature, and there is currently no evidence
confirming a sustained impact on application performance. Ongoing analysis
is focused on validating duration, root cause, and ensuring stable
performance during steady-state operation.
Note
Current analysis indicates that upstream Linux changes contributed to the observed degradation.
See:
Procedural Changes: Allow up to 50 seconds after launching processes on application-isolated cores before performing time-sensitive or latency-sensitive operations to ensure OS noise has stabilized.
BPF is disabled¶
BPF cannot be used in the PREEMPT_RT/low latency kernel, due to the inherent incompatibility between PREEMPT_RT and BPF, see, https://lwn.net/Articles/802884/.
Some packages might be affected when PREEMPT_RT and BPF are used together. This includes the following, but not limited to these packages.
libpcap
libnet
dnsmasq
qemu
nmap-ncat
libv4l
elfutils
iptables
tcpdump
iproute
gdb
valgrind
kubernetes
cni
strace
mariadb
libvirt
dpdk
libteam
libseccomp
binutils
libbpf
dhcp
lldpd
containernetworking-plugins
golang
i40e
ice
Procedural Changes: Wind River recommends not to use BPF with real time kernel. If required it can still be used, for example, debugging only.
Control Group parameter¶
The control group (cgroup) parameter kmem.limit_in_bytes has been deprecated, and results in the following message in the kernel’s log buffer (dmesg) during boot-up and/or during the Ansible bootstrap procedure: “kmem.limit_in_bytes is deprecated and will be removed. Please report your use case to linux-mm@kvack.org if you depend on this functionality.” This parameter is used by a number of software packages in StarlingX, including, but not limited to, systemd, docker, containerd, libvirt etc.
Procedural Changes: NA. This is only a warning message about the future deprecation of an interface.
Subcloud Reconfig may fail due to missing inventory file¶
The dcmanager subcloud reconfig command may fail due to a missing file /var/opt/dc/ansible/<subcloud_name>_inventory.yml.
Procedural Changes: Provide the floating OAM IP address of the subcloud using the “–bootstrap-address” argument. For example:
~(keystone_admin)]$ dcmanager subcloud reconfig --sysadmin-password <password> --deploy-config deployment-config.yaml --bootstrap-address <floating_OAM_IP_address> <subcloud_name>
Increased CPU Usage After Removal of intel_idle.max_cstate=0¶
CPU usage may increase after removal of the intel_idle.max_cstate=0 setting.
The increase may be observed across multiple processes.
No performance impact is expected under active workloads, as the system does not spend time in idle states during normal operation.
See: Configurable Power Manager
Procedural Changes: Intel recommends using the intel_idle driver.
Console Session Issues during Installation¶
After bootstrap and before unlocking the controller, if the console session times
out (or the user logs out), systemd does not work properly. fm, sysinv and
mtcAgent do not initialize.
Procedural Changes: If the console times out or the user logs out between bootstrap and unlock of controller-0, then, to recover from this issue, you must re-install the ISO.
Power Metrics Application in Real Time Kernels¶
When executing Power Metrics application in Real Time kernels, the overall scheduling latency may increase due to inter-core interruptions caused by the MSR (Model-specific Registers) reading.
Due to intensive workloads the kernel may not be able to handle the MSR reading interruptions resulting in stalling data collection due to not being scheduled on the affected core.
Dell iDRAC Virtual Media Issues During Remote Installation¶
On Dell PowerEdge XR8720t systems with Intel Granite Rapids-D XCC processors and iDRAC version 1.30.10.50 (Build 25), the Dell iDRAC virtual media may experience intermittent failures during remote ISO installation.
These failures include CD mount errors, USB device resets, I/O errors on the virtual media device (sr0), and critical medium errors, which can lead to kernel panic during the initial installation, failed ostree repository synchronization, or installation failure.
Additionally, it was observed that there is an impact on the performance of the iDRAC virtual media, resulting in file transfer times longer than reference numbers during remote ISO installation.
Procedural Changes: Retry the remote ISO installation until it completes successfully.
Dell iDRAC Boot Failure on System Reboot¶
On Dell PowerEdge XR8720t systems with Intel Granite Rapids-D XCC processors and iDRAC version 1.30.10.50 (Build 25), the system may intermittently fail to boot after a reboot.
After a reboot, the boot process fails and the message “Boot Failed: starlingx” is displayed. The system then gets stuck while attempting alternative boot options. Dell iDRAC lifecycle logs may show SSD failures around the same time.
Procedural Changes: If the system fails to boot after a reboot, perform an additional reboot to allow the system to properly boot.
Dell GNSS Issues¶
In some Dell PowerEdge XR8720t system samples with Intel Granite Rapids-D XCC processors used during laboratory testing, GNSS issues were observed. These issues are suspected to be caused by mechanical problems.
Due to these problems, the GNSS module was either not detected or operating incorrectly.
Procedural Changes: Contact Dell support for hardware assistance.
Software Delete Operations on the System Controller¶
On System Controllers, the software delete operation can only be performed on the previous release after all subclouds have been successfully upgraded to the target release.
Procedural Changes: Before executing software delete on the System Controller, ensure that every subcloud in the system has completed its upgrade or update process. Attempting to delete a release earlier is not supported and may impact subcloud operations.
Warning
Do not delete any previous release from the System Controller until all subclouds have been upgraded or updated.
Restart Required for containerd to Apply Config Changes for AIO-SX¶
On AIO-SX systems, certain container images were removed from the registry due to the image garbage collector and changes introduced during the Kubernetes upgrade. This may impact workloads that rely on specific image versions.
Procedural Changes: Increasing the Docker filesystem size will help retain the
image in the containerd cache. Additionally, only for AIO-SX it is
recommended to restart containerd after the Kubernetes upgrade. For more
details, see Docker Size.
BMC Password¶
To update the BMC password, the BMC must be de-provisioned and then re-provisioned
Procedural Changes: In order to update the BMC password, de-provision the BMC, and then re-provision it again with the new password.
Sub-Numa Cluster Configuration not Supported on Skylake Servers¶
Sub-Numa cluster configuration is not supported on Skylake servers.
Procedural Changes: For servers with Skylake Gold or Platinum CPUs, Sub-NUMA clustering must be disabled in the BIOS.
Backup and Restore Playbook fails due to self-triggered “backup in progress”/”restore in progress” flag¶
Backup and Restore causes the Playbook to fail due to self-triggered “backup in progress” / “restore in progress” flag.
Procedural Changes: Retry the backup after manually removing the flag /etc/platform/.backup_in_progress if it has been more than 10 minutes based on the error message:
"backup has already been started less than x minutes ago.
Wait to start a new backup or manually remove the backup flag in
/etc/platform/.backup_in_progress "
For a “restore in progress” flag, reinstall and retry the restore operation.
RSA required for the system-local-ca issuer¶
The system-local-ca issuer must use an RSA-based certificate and key.
Other key types are not supported during bootstrap or when running the
Update system-local-ca or Migrate Platform Certificates to use Cert Manager
procedures.
Procedural Changes: N/A.
Multiple trusted CA certificates with same Distinguished Name are not supported¶
Trusted CA (ssl_ca) certificates must have unique Distinguished Names (DNs). When a new trusted CA certificate is installed with a DN that matches an existing certificate, the system treats it as a replacement and overwrites the existing certificate.
Procedural Changes: N/A.
Kubernetes Root CA Certificates¶
The Kubernetes Root CA certificate and key are automatically generated with a default 10-year expiration, and are intended for internal use only.
For external access to kube-apiserver, the proxy (HAproxy) authenticates itself using the Rest API/GUI certificate (system-restapi-gui-certificate), which supports Intermediate CAs. The issuer (system-local-ca) can be customized at bootstrap. See Ansible Bootstrap Configurations for more information.
Procedural Changes: N/A.
External Authentication to kube-apiserver Using Client Certificates¶
SSL termination for external connections to kube-apiserver is now handled
by HAProxy, which establishes a new connection to the API server on behalf of
the external client. As a result, client certificate authentication is now
restricted to the admin user (kubernetes-admin). Token-based authentication
remains fully supported and unchanged.
Procedural Changes: N/A.
Password Expiry does not work on LDAP user login¶
On Debian, the warning message is not being displayed for Active Directory users, when a user logs in and the password is nearing expiry. Similarly, on login when a user’s password has already expired, the password change prompt is not being displayed.
Procedural Changes: It is recommended that users rely on Directory administration tools for “Windows Active Directory” servers to handle password updates, reminders and expiration. It is also recommended that passwords should be updated every 3 months.
Note
The expired password can be reset via Active Directory by IT administrators.
Upgrade activation: cert-manager does not start issuing certificates after upversion¶
During upgrade activation and upversioning, cert-manager usually takes less than a minute to be available and start issuing certificates. Occassionally, cert-manager can take more time than expected. This behavior is associated with an Open source issue. For more details see https://github.com/cert-manager/cert-manager/issues/7138#issuecomment-2422983418.
Since the cert-manager application is required, the upgrade activation will fail if the app takes too long to be available after the upversion. The following log will be displayed in /var/log/software.log
Error from server (NotFound): secrets "stx-test-cm" not found
certificate.cert-manager.io "stx-test-cm" deleted
software-controller-daemon: software_controller.py(837): INFO: 15 received
from deploy-activate with deploy-state activate-failed
software-controller-daemon: software_controller.py(870): INFO: Received
deploy state changed to DEPLOY_STATES.ACTIVATE_FAILED, agent deploy-activate
Procedural Changes: Cert-manager should recover by itself after a few minutes. If required, the following certificate used for test purposes in the upgrade activation can be created manually to ensure cert-manager is ready before reattempting the upgrade.
cat <<eof> cm_test_cert.yml
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
creationTimestamp: null
name: system-local-ca
spec:
ca:
secretName: system-local-ca
status: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
creationTimestamp: null
name: stx-test-cm
namespace: cert-manager
spec:
commonName: stx-test-cm
issuerRef:
kind: ClusterIssuer
name: system-local-ca
secretName: stx-test-cm
status: {}
eof
$ kubectl apply -f cm_test_cert.yml
$ rm cm_test_cert.yml
$ kubectl wait certificate -n cert-manager stx-test-cm --for=condition=Ready --timeout 20m
# Verify that the TLS secret associated with the cert was created, using the following:
$ kubectl get secret -n cert-manager stx-test-cm
cert-manager cm-acme-http-solver pod fails¶
On a multinode setup, when you deploy an acme issuer to issue a certificate,
the cm-acme-http-solver pod might fail and stays in “ImagePullBackOff” state
due to the following defect https://github.com/cert-manager/cert-manager/issues/5959.
Procedural Changes:
If you are using the namespace “test”, create a docker-registry secret “testkey” with local registry credentials in the “test” namespace.
~(keystone_admin)]$ kubectl create secret docker-registry testkey --docker-server=registry.local:9001 --docker-username=admin --docker-password=Password*1234 -n test
Use the secret “testkey” in the issuer spec as follows:
apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: stepca-issuer namespace: test spec: acme: server: https://test.com:8080/acme/acme/directory skipTLSVerify: true email: test@test.com privateKeySecretRef: name: stepca-issuer solvers: - http01: ingress: podTemplate: spec: imagePullSecrets: - name: testkey class: nginx
Vault application is not supported during bootstrap¶
The Vault application cannot be configured during bootstrap.
Procedural Changes:
The application must be configured after the platform nodes are unlocked /
enabled / available, a storage backend is configured, and platform-integ-apps
is applied. If Vault is to be run in HA configuration (3 vault server pods)
then at least three controller / worker nodes must be unlocked / enabled / available.
Vault application support for running on application cores¶
By default the Vault application’s pods will run on platform cores. When changing the core selection from platform cores to application cores the following additional procedure is required for the vault application.
Procedural Changes:
“If static kube-cpu-mgr-policy is selected and when overriding the label
app.starlingx.io/component for Vault namespace or pods, there are two
requirements:
The Vault server pods need to be restarted as directed by Hashicorp Vault documentation. Restart each of the standby server pods in turn, then restart the active server pod.
Ensure that sufficient hosts with worker function are available to run the Vault server pods on application cores.
See: Kubernetes CPU Manager Policies.
Restart the Vault Server pods¶
The Vault server pods do not restart automatically.
Procedural Changes: If the pods are to be re-labelled to switch execution from platform to application cores, or vice-versa, then the pods need to be restarted.
Under kubernetes the pods are restarted using the kubectl delete pod command. See, Hashicorp Vault documentation for the recommended procedure for restarting server pods in HA configuration, https://support.hashicorp.com/hc/en-us/articles/23744227055635-How-to-safely-restart-a-Vault-cluster-running-on-Kubernetes.
Ensure that sufficient hosts are available to run the server pods on application cores¶
The standard cluster with less than 3 worker nodes does not support Vault HA on the application cores. In this configuration (less than three cluster hosts with worker function):
Procedural Changes:
When setting label app.starlingx.io/component=application with the Vault app already applied in HA configuration (3 vault server pods), ensure that there are 3 nodes with worker function to support the HA configuration.
When applying Vault for the first time and with
app.starlingx.io/componentset to “application”: ensure that the server replicas is also set to 1 for non-HA configuration. The replicas for Vault server are overriden both for the Vault Helm chart and the Vault manager Helm chart:cat <<EOF > vault_overrides.yaml server: extraLabels: app.starlingx.io/component: application ha: replicas: 1 injector: extraLabels: app.starlingx.io/component: application EOF cat <<EOF > vault-manager_overrides.yaml manager: extraLabels: app.starlingx.io/component: application server: ha: replicas: 1 EOF $ system helm-override-update vault vault vault --values vault_overrides.yaml $ system helm-override-update vault vault-manager vault --values vault-manager_overrides.yaml
Platform and Kubernetes Upgrades fail if Portieris is applied¶
Platform and Kubernetes Upgrades fail if Portieris is applied.
Procedural Changes: Before performing platform or Kubernetes upgrades, you must remove the Portieris application. Once the upgrade is complete, you can reinstall Portieris as usual.
Harbor cannot be deployed during bootstrap¶
The Harbor application cannot be deployed during bootstrap due to the bootstrap deployment dependencies such as early availability of storage class.
Procedural Changes: N/A.
Windows Active Directory¶
Limitation: The Kubernetes API does not support uppercase IPv6 addresses.
Procedural Changes: The issuer_url IPv6 address must be specified as lowercase.
Limitation: The refresh token does not work.
Procedural Changes: If the token expires, manually replace the ID token. For more information, see, Configure Kubernetes Client Access.
Limitation: TLS error logs are reported in the oidc-dex container on subclouds. These logs should not have any system impact.
Procedural Changes: NA
Security Audit Logging for K8s API¶
A custom policy file can only be created at bootstrap in apiserver_extra_volumes.
If a custom policy file was configured at bootstrap, then after bootstrap the
user has the option to configure the parameter audit-policy-file to either
this custom policy file (/etc/kubernetes/my-audit-policy-file.yml) or the
default policy file /etc/kubernetes/default-audit-policy.yaml. If no
custom policy file was configured at bootstrap, then the user can only
configure the parameter audit-policy-file to the default policy file.
Only the parameter audit-policy-file is configurable after bootstrap, so
the other parameters (audit-log-path, audit-log-maxsize,
audit-log-maxage and audit-log-maxbackup) cannot be changed at
runtime.
Procedural Changes: NA
Software Delete Operations on the System Controller¶
On System Controllers, the software delete operation can only be performed on the previous release after all subclouds have been successfully upgraded to the target release.
Procedural Changes: Before executing software delete on the System Controller, ensure that every subcloud in the system has completed its upgrade or update process. Attempting to delete a release earlier is not supported and may impact subcloud operations.
Warning
Do not delete any previous release from the System Controller until all subclouds have been upgraded or updated.
ISO/SIG Upload to Central Cloud Fails when Using sudo¶
To upload a software patch or major release to the System Controller region
using the --os-region-name SystemController option, the upload command must be
authenticated with Keystone.
Procedural Changes: Do not use sudo with the --os-region-name SystemController
option. For example, avoid using sudo software upload <software release>
command.
Note
When using the -local option, you must provide the absolute path to the
release files.
Note
When using software upload commands with --os-region-name SystemController
to upload a software patch or major release to the System Controller
region, Keystone authentication is required.
Important
Do not use sudo in combination with the --os-region-name SystemController
option. For example, avoid using:
$ sudo software --os-region-name SystemController upload <software-release>
Instead, ensure the command is executed with proper authentication and without sudo.
For more information see, Upload Software Releases Using the CLI
RT Throttling Service not running after Lock/Unlock on Upgraded Subclouds¶
During the upgrade process, the USM post-upgrade script modifies systemd
presets to define which services should be automatically enabled or disabled.
As part of this process, any user-enabled custom services may be set to
“disabled” after the upgrade completes.
Since this change occurs post-upgrade, systemd will not automatically
re-enable the affected service during subsequent lock / unlock operations.
By default, USM disables custom services not explicitly listed in the systemd
presets. Since service definitions can vary between releases, USM relies on
these presets to determine enablement status per host during the upgrade.
If a custom service is not included in the presets, it will be marked as
disabled and remain inactive after lock / unlock even following a successful
upgrade.
Log message during the upgrade:
controller-0 usm-initialize[3061]: info Removed
/etc/systemd/system/multi-user.target.wants/sysctl-rt-sched-apply.service
Procedural Changes: Once the upgrade to StarlingX Release 11.0 completes, run the service-enable and service-start commands for all custom / user services before issuing the first lock / unlock (or reboot).
The enable and start commands for this service are required only once prior to the initial lock / unlock operation. After this step is completed, there is no further need to manually start or enable custom services, as the USM post-upgrade script has already run during the upgrade process.
sw-manager sw-deploy-strategy apply fails¶
sw-manager apply fails to apply the patch.
Note
The Procedural Changes is applicable only if the sw-manager sw-deploy-strategy
fails with the following issues.
To show the operation is in an aborted state due to a timeout, run the following command.
~(keystone_admin)]$ sw-manager sw-deploy-strategy show Strategy Patch Strategy: strategy-uuid: 2082ab5e-a387-4b6a-be23-50ac23317725 controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: abort current-phase-completion: 100% state: aborted apply-result: timed-out apply-reason: abort-result: success abort-reason:
If step 1 fails with ‘timed-out’ results, check if the timeout has occurred due to step-name ‘wait-alarms-clear’ using the command below.
To display results ‘wait for alarm’ that has timed out and run the following command.
~(keystone_admin)]$ sw-manager sw-deploy-strategy show --details step-name: wait-alarms-clear timeout: 2400 seconds start-date-time: 2024-03-27 19:21:15 end-date-time: 2024-03-27 20:01:16 result: timed-out
To list the 750.006 alarm, use the following command.
~(keystone_admin)]$ fm alarm-list +----------+---------------------------+--------------------+----------+---------------+ | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +----------+---------------------------+--------------------+----------+---------------+ | 750.006 | A configuration change | platform-integ-apps| warning | 2024-03-27T| | | requires a reapply of the | | | 19:21:15. | | | platform-k8s_application= | | | 471422 | | | integ-apps application. | | | | +----------+---------------------------+--------------------+----------+---------------+
VIM orchestrated patch strategy failed with the 900.103 alarm being triggered.
~(keystone_admin)]$ fm alarm-list +----------+---------------------------+--------------------+----------+---------------+ | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +----------+---------------------------+--------------------+----------+---------------+ | 900.103 | Software patch auto-apply | orchestration=sw- | critical | 2024-03-26T03T| | | failed | | | | +----------+---------------------------+--------------------+----------+---------------+
Procedural Changes - Option 1
Check the system for existing alarms using the fm alarm-list command. If the existing alarms can be ignored then use the sw-manager sw-deploy-strategy create --alarm-restrictions relaxed command to ignore any alarms during patch orchestration
If the alarm was not ignored in using the command in step 1 and the issue is seen when you encounter patch apply failure, check if alarm ‘750.006’ is present on the system.
Delete the failed strategy using the following command.
~(keystone_admin)]$ sw-manager sw-deploy-strategy delete
Create a new strategy.
~(keystone_admin)]$ sw-manager sw-deploy-strategy create --alarm-restrictions relaxed
Apply the strategy.
~(keystone_admin)]$ sw-manager sw-deploy-strategy apply
Procedural Changes - Option 2
Create a new strategy (alarm-restrictions are not relaxed).
~(keystone_admin)]$ sw-manager sw-deploy-strategy create
Apply the strategy.
~(keystone_admin)]$ sw-manager sw-deploy-strategy apply
When the
sw-deploy-strategyis in progress, and when at ‘wait-alarms-clear’ step (this can be found from ‘sw-manager patch strategy show –details | grep “step-name”’), check if alarm 750.006 is present, then execute the below steps.Execute the command.
~(keystone_admin)]$ system application-apply platform-integ-apps
This will re-apply the application and clear the alarm ‘750.006’.
If the alarm still persists after step 3, manually delete the alarm using fm alarm-delete <uuid of alarm 750.006> command.
Possible Performance Degradation on Bare-metal Ceph¶
Preliminary performance testing of StarlingX Release 12.0 indicates a potential degradation in Ceph throughput performance compared to StarlingX Release 11.0. The observed impact ranges from 0% to 9% and exhibits high variability based on the traffic profile. These results are derived from testing with default platform CPU and memory allocations. Further investigation is ongoing to better understand the behavior. This preliminary test uses default platform CPU and memory allocation.
Procedural Changes: Perform workload-specific performance validation prior to deployment. If throughput degradation is observed, consider tuning platform CPU and memory allocations or adjusting traffic profiles to mitigate the impact until further analysis and updated guidance are available.
Possible Performance Degradation on Rook Ceph¶
Preliminary performance testing of StarlingX Release 12.0 with Ceph version 18.2.7 indicates a potential degradation in Rook Ceph throughput performance compared to StarlingX Release 11.0. The observed impact ranges from 0% to 9% and shows high variability depending on the traffic profile. These results are based on default platform CPU and memory allocations. Further investigation in a future release is being carried out to better characterize the impact. This preliminary test uses default platform CPU and memory allocation.
Procedural Changes: Validate Rook Ceph performance under representative traffic workloads prior to deployment. If throughput degradation is observed, consider tuning platform CPU and memory allocations or adjusting workload profiles to mitigate the impact until further guidance is available.
Warning
Ceph 18.2.7 includes critical stability and reliability fixes, making it a mandatory upgrade for the StarlingX Release 12.0. See: https://ceph.io/en/news/blog/2025/v18-2-7-reef-released/.
Upgrade Failure on AIO-DX Ceph monitor after unlock¶
During the software upgrade on a AIO-DX there is a rare issue that can affect Ceph monitors. The affected fixed monitor can get stuck in a reboot loop.
The problem and the affected monitor can be identified by finding both the alarm and a message in the logs:
Alarm sample:
200.006: <HOSTNAME> is degraded due to the failure of its 'ceph-fixed-mon (mon.<HOSTNAME>, )' process.
Message “Bad table magic number” to be found on the faulty monitor log:
$ grep "Bad table magic number" /var/log/ceph/ceph-mon.<HOSTNAME>.log
Procedural Changes: Recreate the faulty monitor:
Warning
This should only be executed on the host that has the problem with the faulty monitor.
Follow the steps to recreate the monitor:
$ sudo rm -f /etc/pmon.d/ceph-fixed-mon.conf
$ sudo /etc/init.d/ceph stop mon-${HOSTNAME}
$ sudo rm -rf /var/lib/ceph/data/ceph-${HOSTNAME}/
$ sudo ceph-mon --cluster ceph --mkfs \
--id ${HOSTNAME} \
--keyring /dev/null \
--fsid $(ceph fsid) \
--mon-data /var/lib/ceph/data/ceph-${HOSTNAME}
$ sudo /etc/init.d/ceph-init-wrapper start mon.${HOSTNAME}
$ sudo ln -s /etc/ceph/ceph-fixed-mon.conf.pmon /etc/pmon.d/ceph-fixed-mon.conf
Change in Credential Management for CephObjectStoreUser¶
A single CephObjectStoreUser can no longer contain multiple credentials. When upgrading, Rook Ceph will delete all undeclared credentials and retain only one.
Procedural Changes: The credentials now must be defined in the CephObjectStoreUser and stored individually on individual secrets.
For more information on how to perform this operation, see Rook Ceph official documentation:
Rook Ceph Configuration Update for Object Storage¶
Some ObjectBucketClaim options added in Rook Ceph v1.16 (present in StarlingX Release 11.0) were disabled by default to improve security. Here is a comprehensive list with all disabled options: “bucketMaxObjects, bucketMaxSize, bucketPolicy, bucketLifecycle, bucketOwner”
Procedural Changes: To re-enable these options, provide a user override through Helm user overrides
Procedure
Create an override file
file_with_changes.yamlcontaining the key:obcAllowAdditionalConfigFields: "maxObjects,maxSize"
The options to be re-enabled should be added to the value list in a comma-separated format.
Apply the overrides with:
$ system helm-override-update rook-ceph rook-ceph rook-ceph --values <file_with_changes.yaml>
Rook Ceph Application Limitation During Floating Monitor Removal¶
On a AIO-DX system, removing the floating monitor using “system controllerfs-modify ceph-float –functions=”” may lead to temporary system instability, including the possibility of uncontrolled swacts.
Procedural Changes: To avoid this issue, ensure that all finalizers are removed from the floating monitor Rook Ceph chart after its deletion, using the following command:
$ kubectl patch hr rook-ceph-floating-monitor -p '{"metadata":{"finalizers":[]}}' --type=merge
Host fails to lock during an upgrade¶
After adding multiple OSDs simultaneously configured in the Ceph cluster, some OSDs may remain in a configuring state even though the cluster is healthy and the OSD is deployed. This is an intermittent issue that only occurs on systems with Ceph storage backend configured with more than one OSD per host. This causes the system host-lock command to fail with the following error:
$ system host-lock controller-<id>
controller-<id> : Rejected: Can not lock a controller with storage devices
in 'configuring' state.
Since system host-lock on the controller fails and the OSD is still in
the configuring state, the upgrade is blocked from proceeding.
Procedural Changes: Use the following steps to proceed with the upgrade.
List the OSDID in the ‘configuring’ state using the following command:
$ system host-stor-list <hostname>
Identify the OSD using the following command:
$ ceph osd find osd.<OSDID>
If the OSD is found manually update the database inventory using the
stor uuid:$ sudo -u postgres psql -U postgres -d sysinv -c "UPDATE i_istor SET state='configured' WHERE uuid='<STOR_UUID>';";
Rook Ceph Application Limitation¶
After applying Rook Ceph application in an AIO-DX configuration the
800.001 - Storage Alarm Condition: HEALTH_WARN alarm may be triggered.
Procedural Changes: Restart the pod of the monitor associated with the
slow operations detected by Ceph. Check ceph -s.
Remove all OSDs on a host on Rook Ceph¶
The procedure to remove OSDs will not work as expected when removing all
OSDs from a host. The Ceph cluster gets stuck with a HEALTH_WARN state.
Note
Use the Procedural change only if the cluster is stuck in HEALTH_WARN
state after removing all OSDs on a host.
Procedural Changes:
Check the cluster health status.
Check crushmap tree.
Remove the host(s) that are empty in the command executed before
Check the cluster health status.
Critical alarm 800.001 after Backup and Restore on AIO-SX Systems¶
A Critical alarm 800.001 may be triggered after running the Restore Playbook. The alarm details are as follows:
~(keystone_admin)]$ fm alarm-list
+-------+----------------------------------------------------------------------+--------------------------------------+----------+---------------+
| Alarm | Reason Text | Entity ID | Severity | Time Stamp |
| ID | | | | |
+-------+----------------------------------------------------------------------+--------------------------------------+----------+---------------+
| 800. | Storage Alarm Condition: HEALTH_ERR. Please check 'ceph -s' for more | cluster= | critical | 2024-08-29T06 |
| 001 | details. | 96ebcfd4-3ea5-4114-b473-7fd0b4a65616 | | :57:59.701792 |
| | | | | |
+-------+----------------------------------------------------------------------+--------------------------------------+----------+---------------+
Procedural Changes: To clear this alarm run the following commands:
Note
Applies only to AIO-SX systems.
FS_NAME=kube-cephfs
METADATA_POOL_NAME=kube-cephfs-metadata
DATA_POOL_NAME=kube-cephfs-data
# Ensure that the Ceph MDS is stopped
sudo rm -f /etc/pmon.d/ceph-mds.conf
sudo /etc/init.d/ceph stop mds
# Recover MDS state from filesystem
ceph fs new ${FS_NAME} ${METADATA_POOL_NAME} ${DATA_POOL_NAME} --force
# Try to recover from some common errors
sudo ceph fs reset ${FS_NAME} --yes-i-really-mean-it
cephfs-journal-tool --rank=${FS_NAME}:0 event recover_dentries summary
cephfs-journal-tool --rank=${FS_NAME}:0 journal reset
cephfs-table-tool ${FS_NAME}:0 reset session
cephfs-table-tool ${FS_NAME}:0 reset snap
cephfs-table-tool ${FS_NAME}:0 reset inode
sudo /etc/init.d/ceph start mds
Intermittent installation of Rook-Ceph on Distributed Cloud¶
While installing rook-ceph, if the installation fails, this is due to
ceph-mgr-provision not being provisioned correctly.
Procedural Changes: It is recommended to use the system application-remove rook-ceph --force to initiate rook-ceph installation.
Storage Nodes are not considered part of the Kubernetes cluster¶
When running the system kube-host-upgrade-list command the output must only display controller and worker hosts that have control-plane and kubelet components. Storage nodes do not have any of those components and so are not considered a part of the Kubernetes cluster.
Procedural Changes: Do not include Storage nodes as part of the Kubernetes upgrade.
Optimization with a Large number of OSDs¶
As Storage nodes are not optimized, you may need to optimize your Ceph configuration for balanced operation across deployments with a high number of OSDs. This results in an alarm being generated even if the installation succeeds.
800.001 - Storage Alarm Condition: HEALTH_WARN. Please check ‘ceph -s’
Procedural Changes: To optimize your storage nodes with a large number of OSDs, it is recommended to use the following commands:
~(keystone_admin)]$ ceph osd pool set kube-rbd pg_num 256
~(keystone_admin)]$ ceph osd pool set kube-rbd pgp_num 256
Storage Nodes Recovery on Power Outage¶
Storage nodes take 10-15 minutes longer to recover in the event of a full power outage.
Procedural Changes: NA
Ceph Recovery on an AIO-DX System¶
In certain instances Ceph may not recover on an AIO-DX system, and remains in the down state when viewed using the ceph -s command; for example, if an OSD comes up after a controller reboot and a swact occurs, or other possible causes for example, hardware failure of the disk or the entire host, power outage, or switch down.
Procedural Changes: There is no specific command or procedure that solves the problem for all possible causes. Each case needs to be analyzed individually to find the root cause of the problem and the solution. It is recommended to contact Customer Support at, http://www.windriver.com/support.
Restrictions on the Size of Persistent Volume Claims (PVCs)¶
There is a limitation on the size of Persistent Volume Claims (PVCs) that can be used for all StarlingX Releases.
Procedural Changes: It is recommended that all PVCs should be a minimum size of 1GB. For more information, see, https://bugs.launchpad.net/starlingx/+bug/1814595.
Failure to clean up platform-integ-apps files/Helm release¶
The System Controller does not have Ceph configured,
so the platform-integ-apps is not installed and the images are not
automatically downloaded to registry.central when upgrading the platform.
The missing images on the subclouds are:
registry.central:9001/docker.io/openstackhelm/ceph-config-helper:ubuntu_focal_18.2.0-1-20231013
registry.central:9001/quay.io/cephcsi/cephcsi:v3.10.1
registry.central:9001/registry.k8s.io/sig-storage/csi-attacher:v4.4.2
registry.central:9001/registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.1
registry.central:9001/registry.k8s.io/sig-storage/csi-provisioner:v3.6.2
registry.central:9001/registry.k8s.io/sig-storage/csi-resizer:v1.9.2
registry.central:9001/registry.k8s.io/sig-storage/csi-snapshotter:v6.3.2
If the System Controller does not have Ceph configured and the subclouds have Ceph configured, then the images need to be manually uploaded to the registry.central before starting the upgrade of the subclouds.
To push the images to the registry.central, run the following commands on the System Controller:
# Change the variables according to the setup
REGISTRY_PREFIX="server:port/path"
REGISTRY_USERNAME="admin"
REGISTRY_PASSWORD="password"
sudo docker login registry.local:9001 --username ${REGISTRY_USERNAME} --password ${REGISTRY_PASSWORD}
for image in\
docker.io/openstackhelm/ceph-config-helper:ubuntu_focal_18.2.0-1-20231013 \
registry.k8s.io/sig-storage/csi-attacher:v4.4.2 \
registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.1 \
registry.k8s.io/sig-storage/csi-provisioner:v3.6.2 \
registry.k8s.io/sig-storage/csi-resizer:v1.9.2 \
registry.k8s.io/sig-storage/csi-snapshotter:v6.3.2 \
quay.io/cephcsi/cephcsi:v3.10.1
do
sudo docker pull ${REGISTRY_PREFIX}/${image}
sudo docker tag ${REGISTRY_PREFIX}/${image} registry.local:9001/${image}
sudo docker push registry.local:9001/${image}
done
Procedural Changes: In case the subcloud upgrade finishes without the correct images pushed to the registry.central, it is still possible to recover the system following the steps below.
After pushing the images to the registry.central, each subcloud must be recovered with the following steps (these commands should be run on the Subcloud):
source /etc/platform/openrc
# Remove old app manually
sudo rm -rf /opt/platform/helm/22.12/platform-integ-apps;
sudo rm -rf /opt/platform/fluxcd/22.12/platform-integ-apps;
sudo -u postgres psql postgres -d sysinv -c "DELETE from kube_app WHERE name = 'platform-integ-apps';";
sudo sm-restart service sysinv-inv && sudo sm-restart service sysinv-conductor;
sleep 15; # Wait services to restart
system application-upload /usr/local/share/applications/helm/platform-integ-apps-22.12-72.tgz;
sleep 15; # Wait upload to fail (It is expcected to fail here)
system application-delete platform-integ-apps;
system application-upload /usr/local/share/applications/helm/platform-integ-apps-22.12-72.tgz;
sleep 10; # Wait for the upload to succeed
system application-apply platform-integ-apps;
Note
The images need to be pushed to the registry.central registry before upgrading the subclouds.
Deprecated Notices in Stx 12.0¶
In-tree and Out-of-tree drivers¶
In StarlingX Release 12.0 only the out-of-tree versions of the Intel ice
i40e, and iavf drivers are supported. Switching between in-tree and
out-of-tree driver versions are not supported.
The out_of_tree_drivers service parameter and the out-of-tree-drivers boot
parameter are deprecated and should not be modified to switch to in-tree driver
versions. The values will be ignored, and the system will always use the
out-of-tree versions of the Intel ice, i40e, and iavf drivers.
Kubernetes Root CA boostrap overrides¶
The overrides k8s_root_ca_cert, k8s_root_ca_key, and apiserver_cert_sans
will be deprecated in a future release. External connections to kube-apiserver
are now routed through a proxy that identifies itself using the REST API/GUI
certificate issued by the platform issuer (system-local-ca).
kubernetes-power-manager¶
Intel has stopped support for the kubernetes-power-manager application. This
is still being supported by StarlingX and will be removed in a future release.
cpu_busy_cycles metric is deprecated and must be replaced with
cpu_c0_state_residency_percent for continued usage
(if the metrics are customized via helm overrides).
For more information, see Configurable Power Manager.
Bare Metal Ceph¶
Host-based Ceph is deprecated in StarlingX Release 11.0 and will be removed in a future release.
Impact: Existing deployments using Host-based Ceph will require migration prior to upgrading to StarlingX future releases.
Recommendation: For new deployments, adopt Rook-Ceph to avoid service disruption during the transition from Bare Metal Ceph to Rook.
Migration: StarlingX does not support bare-metal to Rook migration and all existing StarlingX users will need to reinstall and deploy Rook Ceph.
Static Configuration for Hardware Accelerator Cards¶
Static configuration for hardware accelerator cards is deprecated in StarlingX Release 10.0 and will be discontinued in future releases. Use FEC operator instead.
See Switch between Static Method Hardware Accelerator and SR-IOV FEC Operator
Kubernetes APIs¶
Kubernetes APIs that will be removed in K8s 1.27 are listed below:
See: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-27
Kubernetes APIs¶
Kubernetes APIs that will be removed in K8s 1.27 are listed below:
See: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-27
Containerd Schema 1 images¶
Support for Docker Schema 1 images has been disabled in containerd v2.0.
Users should ensure all container images are using Docker Schema 2 or
OCI image format. Docker Schema 1 images will no longer be supported
and may fail to pull or run.
cgroup v1 Deprecation and Transition to cgroup v2¶
Support for cgroup v1 is being deprecated in preparation for a full
transition to cgroup v2. Applications that rely on cgroup v1
should be updated to use cgroup v2 when available, to ensure forward
compatibility.
Support Lifecycle
StarlingX Release 12.0:
cgroup v1 is deprecated. cgroup v2 is not available.
Future Releases:
cgroup v1 will be fully removed. cgroup v2 will be enabled and supported.
O-RAN O2 IMS¶
In the context of hosting a RAN Application on Cloud Platform, the O-RAN O2 Application provides and exposes the IMS and Deployement Management Service APIs of the O2 interface between the O-Cloud (Cloud Platform) and the Service Management and Orchestration (SMO), in the O-RAN Architecture.
As O2 specifications evolve, O2 IMS capability falls outside the current StarlingX design scope and is supported by Wind River Conductor. StarlingX continues to support the O2 Kubernetes Deployement Management Service (DMS) interface in alignment with O-RAN Alliance specifications.
Ingress nginx¶
The upstream Ingress nginx project has announced that no updates will be provided after March 2026. StarlingX currently uses Ingress nginx to implement the Kubernetes Ingress API and plans to continue using it until further notice. Since the Ingress API in Kubernetes is now frozen and the Gateway API is the recommended replacement, StarlingX is targeting a future release to introduce support for the Kubernetes Gateway API. Applications currently relying on the Ingress nginx API are encouraged to migrate to the Gateway API once it becomes available in StarlingX.
kubernetes-power-manager¶
Intel has stopped support for the kubernetes-power-manager application. This
is still being supported by StarlingX and will be removed in a future release.
For more information, see Configurable Power Manager.
Static Configuration for Hardware Accelerator Cards¶
Static configuration for hardware accelerator cards was deprecated in StarlingX Release 24.09.00 and will be discontinued in future releases. Use FEC operator instead.
See Switch between Static Method Hardware Accelerator and SR-IOV FEC Operator
N3000 FPGA Firmware Update Orchestration¶
The N3000 FPGA Firmware Update Orchestration was deprecated in StarlingX Release 10.0. For more information, see N3000 FPGA Overview for more information.
Alarms and Event Logs Deprecation¶
- The following Alarms are deprecated:
900.101 - Software patch auto-apply in progress 900.102 - Software patch auto-apply aborting 900.103 - Software patch auto-apply failed
- The following Event Logs are deprecated:
900.111 - Software patch auto-apply start 900.112 - Software patch auto-apply inprogress 900.113 - Software patch auto-apply rejected 900.114 - Software patch auto-apply cancelled 900.115 - Software patch auto-apply failed 900.116 - Software patch auto-apply completed 900.117 - Software patch auto-apply abort 900.118 - Software patch auto-apply aborting 900.119 - Software patch auto-apply abort rejected 900.120 - Software patch auto-apply abort failed 900.121 - Software patch auto-apply aborted
These will be removed in a future release. These are superceded by the USM Software 900.2xx series of alarms and event logs.
Removed in Stx 12.0¶
MacVTap Interfaces¶
MacVTap interfaces for KubeVirt VMs are not supported in StarlingX Release 12.0 and future releases.
ptp-notification v1 API¶
The ptp-notification v1 API is no longer supported in StarlingX Release 12.0 and has been removed. Only the O-RAN Compliant Notification API (ptp-notification v2 API) is supported.
Static Configuration for Hardware Accelerator Cards¶
Static configuration for hardware accelerator cards is no longer supported in StarlingX Release 12.0. Use FEC operator instead.
See Switch between Static Method Hardware Accelerator and SR-IOV FEC Operator
Out_of_tree_drivers “service parameter and out-of-tree-drivers” Boot parameters¶
By default out-of-tree-drivers drivers(ice/i40e/iavf) are loaded and these parameters are not required. So in StarlingX Release 12.0, the “out_of_tree_drivers” service parameter and the “out-of-tree-drivers” boot parameters have been removed. Users will need to modify their configuration files to avoid using these parameters.
Telegraf cpu_busy_cycles metric¶
The cpu_busy_cycles metric from the Telegraf Intel PowerStat Input Plugin in
power-metrics application has been removed and the cpu_c0_state_residency_percent
metric should be used instead.
Kubernetes Root CA Customization¶
The option to provide a custom Kubernetes Root CA certificate and key has been
removed from the interfaces used to update the certificate (system,
sw-manager and dcmanager CLI and APIs, and from the Horizon GUI). The
subject and expiration date customization options remain supported.
The deprecated bootstrap overrides used previously to customize the certificate
(k8s_root_ca_cert, k8s_root_ca_key, and apiserver_cert_sans) have been
removed as well.
The Kubernetes Root CA certificate now will always be auto generated and used for
internal connections. External connections to kube-apiserver are routed through
HAproxy, which identifies itself using the REST API/GUI certificate
(system-restapi-gui-certificate) issued by the platform issuer (system-local-ca).
Hardware Updates¶
See:
Bug status¶
Fixed bugs¶
This release provides fixes for a number of defects. Refer to the StarlingX bug database to review the Release 12.0 Fixed Bugs.