By default all Kubernetes pods are non-isolated and they accept traffic from any source. “Network policy” is Kubernetes specification that defines how groups of pods are allowed to communicate with each other and other network endpoints [1].
This Spec suggests a design for supporting Kubernetes “Network policy” in Kuryr.
Kubernetes “Network policies” define which traffic is allowed to be sent or received by group of pods.
Each network policy has 3 main parts [2]:
In order to support network-policy, kuryr-kubernetes should handle all the events that are related to the network-policies and translate them into Neutron objects for apply an equivalent network-topology to the one defined by the Kubernetes policies. Neutron doesn’t have a security API that is equivalent to the kubernetes-network-policy. The translation should be done carefully in order to achieve eventually consistent required topology, and avoid corner-cases and race conditions.
This spec suggests to implement Kubernetes network-policy by leveraging Neutron security-groups [4]. There is some similarity between security-groups and network-policy, but there are also some definitions inside the network-policy that require some more complex translation work.
In order to provide a full translation between kubernetes-policies to security groups, there are 3 main issues that need to be consider:
The next paragraphs describe the implementation proposal for each of the tasks described above, including new Handler and Drivers, that should be added to the Kuryr controller.
Network policy that allows all traffic, should be translated to Security group with one rule that allows all traffic.
Example for allow all egress traffic:
{
"security_group_rule": {
"direction": "egress",
"protocol": null,
"security_group_id": "[id]"
}
}
Can be translated to security-group-rules ingress/egress traits.
Can be done by “remote ip prefix” trait in security-group-rule as both use CIDRs. In case of Exceptions (It’s an inner CIDR’s of the ipBlock, that should be excluded from the rule), the ip-range should be broken into pieces that cover the all ip-block without the exception. For example, if there is Ip-block :”1.1.1.0/24 except 1.1.1.0/26”, Kuryr should create security-groups-rules with 1.1.1.128/25 an 1.1.1.64/26).
Pod selector uses kubernetes label-selectors [6] for choosing set of pods. It is used in the policy for 2 purposes:
The first one defines on which ports the policy should be applied, so it will be discussed in the next section. For the second, the translation mechanism can use the security-group trait - “remote_policy_group”, that allows to define as valid source all ports that belong to another security-group. This means that we could create security-group with no rules for each network-policy selector and attach all ports corresponding to pods that selected by the pod query to this security-group. We assume that each port attached to this security-group will be attached to at least one other group (default security-group), so that attachment will not entirely block traffic to the port.
Namespace selector is used for choosing all the pods that belong to the namespaces that selected by the query for allowing ingress/egress traffic. Should use the same security-group as the pod selector for allowing ingress/egress traffic from the selected namespaces.
This can be translated to port and protocol in security-group-rule.
In this case security-group-rule should be created for each tuple of peer and ports. Number of rules will be a Cartesian product of ports and peers.
A security-group that derived from kubernetes-policy should be tagged [7] with network_policy UID. In case of issue of length or characters-set hashing should be applied, so security-group unique tag could be derived from the policy-UID.
For defining on which pods the policy should be applied Kubernetes defines a pod-selector query [5]. For applying the policy on the relevant ports, Kuryr needs to know at any given moment which pods belong to that group. It can happen when pod is created/updated/change-label.
When policy is created, Kuryr should trigger a get query for applying the policy on all pods that already match, and add a watch for getting an update when POD added or removed from network-policy and apply/remove the translated policy on the pod’s port.
For applying the policy on the pod, an annotation with the security-group-id will be added to the pod. That will cause the “update pod” event. The VIFHandler via security-group Driver will attach the pod to the port.
We can’t attach the security-group directly in the watch callback as it will create a race condition between the watch and the VIFHandler as the watch could be called before Kuryr notified that the pod is created. With the annotation - when new pod is created, if the watch was called before VIFHandler pod creation processing, VIFHandler will get the pod already with the annotation. Otherwise, it will get pod with no security-group-annotation and will attach it to the default security-group. When the watch will update the annotation, the pod will be updated with the correct security-group.
When policy is updated, if policy pod-selectors changed, a diff between the old and new selected-pod-set should be done, and the pods security-groups annotations should be updated respectively. Selector watches should be updated with the new queries.
As mentioned above, “remote_group_id” will be used to allow ingress and egress traffic from pods selected by the pod/namespace selectors.
For the pod-selector and namespace-selector we need to create a security-group per policy (one for ingress and one for egress). The security-group should be tagged with tag that is derived from the policy-UID and traffic direction (for example: [policy_UID]_EG for egress traffic). In case of characters-sets or allowed-length issues, hash should be applied for updating these security-groups.
For each selector (namespace or pod) a watch should be set. The watch callback will add/remove the relevant pods to/from the security-group.
For supporting Network-policy Handler that watches network_policy events will be added.
Two new drivers will be added:
The following should be done on Controller startup:
Changes in the security-policy can cause negative impact on the port-pools [9]. The combination of the security-groups of port is part of the pool key, and changes in network-policy could make some pools not relevant any more.
For example let’s assume that we have 2 policies “a” and “b”, and both policies should be applied on pods with the label “role: db”. When the first pod with label “role: db” is created - a new port-pool is created and its pool key is composed from the security-groups of “a” and “b”. If policy “b” will be changed and pods with label “role: db” would not be selected by the policy anymore, then the port-pool that was created for the combination of “a” and “b” will not be not useful any more.
That can lead to the ports leak, as pool holds many ports that not useful anymore. For handling this issue a new cleanup task should be added. This task will release all ports from the pools that are not in use anymore.
Another issue that needs to be treated is that the policies of pod can be changed while pod is running. Currently when pod is deleted its’ port is returned to the pool that it was taken from. But if the pod’s policies are changed, this behaviour is incorrect. When port is released it should be returned to the pool that matches to the current state of the pod.
Kubernetes label-selector divided into 2 types of queries “match-labels”, and “match-expression” [10]. “match-labels” selects a closed list of labels while “match-expression” selects all pods that match to particular expression.
This spec suggests to create a watch for each label-selector query, because in “match-expression” queries it is not possible to determine if pod matches the query, without implementing parser for each expression. By setting a watch we are using kubernetes-api-server for the matching between pods and queries.
The spec treats the “match-labels” and “match-expression” queries in the same way for simplicity reasons. But future optimization may distinguish between queries-types. “match-labels” queries watches may be removed and the matching between pod to its’ “match-labels” queries could be done directly on the vif-handler.
Security-groups are supported by the networking backend for all vif interfaces. In case of special interfaces (SR-IOV, mac-vlan, etc ..), the network-policy will be applied on the interface if and only if then networking backend enables security-groups on those interfaces.
This section describes system execution flow in the following scenarios:
Pod is deployed on empty system:
Network policy is deployed:
Let’s assume that following policy is created (taken from k8s tutorial [8]):
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: test-network-policy namespace: default spec: podSelector: matchLabels: role: db policyTypes: - Ingress - Egress ingress: - from: - ipBlock: cidr: 172.17.0.0/16 except: - 172.17.1.0/24 - namespaceSelector: matchLabels: project: myproject - podSelector: matchLabels: role: frontend ports: - protocol: TCP port: 6379 egress: - to: - ipBlock: cidr: 10.0.0.0/24 ports: - protocol: TCP port: 5978
Second pod is created:
[1] | https://kubernetes.io/docs/concepts/services-networking/network-policies/ |
[2] | https://kubernetes.io/docs/api-reference/v1.8/#networkpolicy-v1-networking/ |
[3] | https://kubernetes.io/docs/concepts/services-networking/network-policies/#default-allow-all-ingress-traffic |
[4] | https://developer.openstack.org/api-ref/network/v2/index.html#security-group-rules-security-group-rules |
[5] | (1, 2, 3) https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ |
[6] | (1, 2) https://kubernetes.io/docs/concepts/services-networking/network-policies/#default-deny-all-ingress-traffic |
[7] | https://docs.openstack.org/neutron/latest/contributor/internals/tag.html |
[8] | https://kubernetes.io/docs/concepts/services-networking/network-policies/#the-networkpolicy-resource |
[9] | https://github.com/openstack/kuryr-kubernetes/blob/master/doc/source/devref/port_manager.rst |
[10] | https://v1-8.docs.kubernetes.io/docs/api-reference/v1.8/#labelselector-v1-meta |
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.