Stein Series Release Notes

8.2.0

New Features

  • k8s_fedora_atomic_v1 Add PodSecurityPolicy for privileged pods. Use privileged PSP for Calico and node-problem-detector. Add PSP for flannel from upstream.

Other Notes

8.1.0

New Features

  • Add Nginx as an additional Ingress controller option for Kubernetes. Installation is done via the upstream nginx-ingress helm chart, and selection can be done via label ingress_controller=nginx.

  • Added label traefik_ingress_controller_tag to enable specifying Traefik container version.

  • Using Node Problem Detector, Draino and AutoScaler to support auto healing for K8s cluster, user can use a new label “auto_healing_enabled’ to turn on/off it.

    Meanwhile, a new label “auto_scaling_enabled” is also introduced to enable the capability to let the k8s cluster auto scale based its workload.

  • Support multi DNS server when creating template. User can use a comma delimited ipv4 address list to specify multi dns server, for example “8.8.8.8,114.114.114.114”

Bug Fixes

  • Fixed an issue that applications running on master nodes which rely on network connection keep restarting because of timeout or connection lost, by making calico devices unmanaged in NetworkManager config on master nodes.

  • Traefik container now defaults to a fixed tag (v1.7.10) instead of tag (latest)

8.0.0

Prelude

Added new tool magnum-status upgrade check.

New Features

  • Deploy kubelet in master nodes for the k8s_fedora_atomic driver. Previously it was done only for Calico, now kubelet will run in all cases. Really useful, for monitoring the master nodes (eg deploy fluentd) or run the Kubernetes control-plance self-hosted.

  • This will add the Octavia client code for client to interact with the Octavia component of OpenStack

  • New framework for magnum-status upgrade check command is added. This framework allows adding various checks which can be run before a Magnum upgrade to ensure if the upgrade can be performed safely.

  • To get a better cluster template versioning and relieve the pain of maintaining public cluster template, now the name of cluster template can be changed.

  • Add tiller_enabled to install tiller in k8s_fedora_atomic clusters. Defaults to false. Add tiller_tag label to select the version of tiller. If the tag is not set the tag that matches the helm client version in the heat-agent will be picked. The tiller image can be stored in a private registry and the cluster can pull it using the container_infra_prefix label. Add tiller_namespace label to select in which namespace to install tiller. Tiller is install with a Kubernetes job. This job runs with a container that includes the helm client. This image is maintained by the magnum team and lives in, docker.io/openstackmagnum/helm-client. This container follows the same versions as helm and tiller.

  • For k8s_fedora_atomic, run flannel as a cni plugin. The deployment method is taken from the flannel upstream documentation. One more label for the cni tag is added flannel_cni_tag for the container, quay.io/repository/coreos/flannel-cni. The flannel container is taken from flannel upsteam as well quay.io/repository/coreos/flannel.

  • Add ‘grafana_tag’ and ‘prometheus_tag’ labels for the k8s_fedora_atomic driver. Grafana defaults to 5.1.5 and Prometheus defaults to v1.8.2.

  • Add heat_container_agent_tag label to allow users to select the heat-agent tag. Stein default: stein-dev

  • Add Heat container agent into Kubernetes cluster worker nodes to support cluster rolling upgrade.

  • Installs the metrics-server service that is replacing kubernetes deprecated heapster as a cluster wide metrics reporting service used by schedulling, HPA and others. This service is installed and configured using helm and so tiller_enabled flag must be True. Heapster service is maintained active to allow compatibility.

  • Added monitoring_enabled to install prometheus-operator monitoring solution by means of helm stable/prometheus-operator public chart. Defaults to false. grafana_admin_passwd label can be used to set Grafana dashboard admin access password. If grafana_admin_passwd is not set the password defaults to prom_operator.

  • Start Kubernetes workers installation right after the master instances are created rather than waiting for all the services inside masters, which could decrease the Kubernetes cluster launch time significantly.

  • A new label named master_lb_floating_ip_enabled is introduced which controls if Magnum allocates floating IP for the load balancer of master nodes. This label only takes effect when the master_lb_enabled is set. The default value is the same as floating_ip_enabled. The floating_ip_enabled property now only controls if Magnum should allocate the floating IPs for the master and worker nodes.

  • Now cloud-provider-openstack of Kubernetes has a webhook to support Keystone authorization and authentication. With this feature, user can use a new label ‘keystone-auth-enabled’ to enable the keystone authN and authZ.

  • Add a new option ‘octavia’ for the label ‘ingress_controller’ and a new label ‘octavia_ingress_controller_tag’ to enable the deployment of octavia-ingress-controller in the Kubernetes cluster. The ‘ingress_controller_role’ label is not used for this option.

  • Use ClusterIP as the default Prometheus service type, because the NodePort type service has the requirement that the extra security group rule is properly configured. Kubernetes cluster administrator could still change the service type after the cluster creation.

  • Use the external cloud provider in k8s_fedora_atomic. The cloud_provider_tag label can be used to select the container tag for it, together with the cloud_provider_enabled label. The cloud provider runs as a DaemonSet on all master nodes.

  • This makes the keypair optional. The user should not have to include the keypair because they may use some other method of security such as using SSSD, preconfigured on the image.

  • Add Kubernetes cluster pre-delete support to remove the cloud resources before deleting the cluster. For now, only load balancers for Kubernetes services of LoadBalancer type are deleted.

  • Now an OpenStack driver for Kubernetes Cluster Autoscaler is being proposed to support autoscaling when running k8s cluster on top of OpenStack. However, currently there is no way in Magnum to let the external consumer to control which node will be removed. The alternative option is calling Heat API directly but obviously it is not the best solution and it’s confusing k8s community. So this new API is being added into Magnum: POST <ClusterID>/actions/resize

  • Magnums onlys has one server group for all master and worker nodes per cluster, which is not very flexible for small cloud scale. For a 3+ master clusters, it’s easily meeting the capacity when using hard anti-affinity policy. Now one server group is added for each master and worker nodes group to have better flexibility.

Upgrade Notes

  • The etcd service for the Kubernetes cluster is no longer allocated a floating IP.

  • The cloud config for kubernets has been renamed from /etc/kubernetes/kube_openstack_config to /etc/kubernetes/cloud-config as the kubelet expects this exact name when the external cloud provider is used. A copy of /etc/kubernetes/kube_openstack_config is in place for applications developed for previous versions of magnum.

Deprecation Notes

  • Currently, Magnum is running periodic tasks to collect k8s cluster metrics to message bus. Unfortunately, it’s collecting pods info only from “default” namespace which makes this function useless. What’s more, even Magnum can get all pods from all namespaces, it doesn’t make much sense to keep this function in Magnum. Because operators only care about the health of cluster nodes. If they want to know the status of pods, they can use heapster or other tools to get that. So the feature is being deprecated now and will be removed in Stein release. And the default value is changed to False, which means won’t send the metrics.

Security Issues

  • Defines more strict security group rules for Kubernetes worker nodes. The ports that are open by default: default port range (30000-32767) for external service ports; kubelet healthcheck port; Calico BGP network ports; flannel overlay network ports. The cluster admin should manually config the security group on the nodes where Traefik is allowed. To allow traffic to the default ports (80, 443) that the Traefik ingress controller exposes users will need to create additional rules or expose Traefik with a Kubernetes service with type: LoadBalaner. Finally, the SSH port in worker nodes is closed as well. If ssh access is required, users will need to create a rule for port 22 as well.

Bug Fixes

  • Add a new label service_cluster_ip_range for Kubernetes so that user can set the IP range for service portals to avoid conflicts with the pod IP range.

  • Allow overriding cluster template labels for swarm mode clusters - this functionality was missed from this COE when it was introduced.

  • When doing a cluster update Magnum is now passing the existing parameter to Heat which will use the Heat templates stored in the Heat db. This change will prevent Heat from replacing all nodes when the Heat templates change, for example after an upgrade of the Magnum server code. https://storyboard.openstack.org/#!/story/1722573

  • Add iptables -P FORWARD ACCEPT unit. On node reboot, kubelet and kube-proxy set iptables -P FORWARD DROP which doesn’t work with flannel in the way we use it. Add a systemd unit to set the rule to ACCEPT after flannel, docker, kubelet, kube-proxy.

  • In kubernetes cluster, a floating IP is created and associated with the vip of a load balancer which is created corresponding to the service of LoadBalancer type inside kubernetes, it should be deleted when the cluster is deleted.

  • Return instance ID of workder node in k8s minion template so that consumer can send API request to Heat to remove a particular node with removal_policies. Otherwise, the consumer (e.g. AutoScaler) has to use index to do the remove which is confusing out of the OpenStack world. https://storyboard.openstack.org/#!/story/2005054

  • Fixed a bug where –live-restore was passed to Docker daemon causing the swarm init to fail. Magnum now ensures the –live-restore is not passed to the Docker daemon if it’s default in an image.