Scenario - Using Open vSwitch w/ ASAP 2 (Direct Mode)¶
Overview¶
With appropriate hardware, operators can choose to utilize ASAP 2-accelerated Open vSwitch instead of unaccelerated Open vSwitch for the Neutron virtual network infrastructure. ASAP 2 technology offloads packet processing onto hardware built into the NIC rather than using the CPU of the host. It requires careful consideration and planning before implementing. This document outlines how to set it up in your environment.
Note
ASAP 2 is a proprietary feature provided with certain Mellanox NICs, including the ConnectX-4 Lx and ConnectX-5. Future support is not guaranteed. This feature is considered EXPERIMENTAL and should not be used for production workloads. There is no guarantee of upgradability or backwards compatibility.
Note
Hardware offloading is not compatible with the openvswitch
firewall
driver. To ensure flows are offloaded, port security must be disabled.
Information on disabling port security is discussed later in this document.
Recommended reading¶
This guide is a variation of the standard Open vSwitch and SR-IOV deployment guides available at:
The following resources may also be helpful:
Prerequisites¶
To enable SR-IOV and PCI passthrough capabilities on a Linux platform, ensure that VT-d/VT-x are enabled for Intel processors and AMD-V/AMD-Vi are enabled for AMD processors. Such features are typically enabled in the BIOS.
On an Intel platform, the following kernel parameters are required and can be added to the GRUB configuration:
GRUB_CMDLINE_LINUX="... iommu=pt intel_iommu=on"
On an AMD platform, use these parameters instead:
GRUB_CMDLINE_LINUX="... iommu=pt amd_iommu=on"
Update GRUB and reboot the host(s).
SR-IOV provides virtual functions (VFs) that can be presented to instances as network interfaces and are used in lieu of tuntap interfaces. Configuration of VFs is outside the scope of this guide. The following links may be helpful:
Deployment¶
Configure your networking according the Open vSwitch implementation docs:
Note
At this time, only a single (non-bonded) interface is supported.
An example provider network configuration has been provided below:
- network:
container_bridge: "br-provider"
container_type: "veth"
type: "vlan"
range: "700:709"
net_name: "physnet1"
network_interface: "ens4f0"
group_binds:
- neutron_openvswitch_agent
Add a nova_pci_passthrough_whitelist
entry to user_variables.yml
, where
devname
is the name of the interface connected to the provider bridge and
physical_network
is the name of the provider network.
nova_pci_passthrough_whitelist: '{"devname":"ens4f0","physical_network":"physnet1"}'
Note
In the respective network block configured in openstack_user_config.yml
,
devname
corresponds to network_interface
and physical_network
corresponds to net_name
.
To enable the openvswitch
firewall driver rather than the default
iptables_hybrid
firewall driver, add the following overrides to
user_variables.yml
:
neutron_ml2_conf_ini_overrides:
securitygroup:
firewall_driver: openvswitch
neutron_openvswitch_agent_ini_overrides:
securitygroup:
firewall_driver: openvswitch
Note
Hardware-offloaded flows are not activated for ports utilizing security groups or port security. Be sure to disable port security and security groups on individual ports or networks when hardware offloading is required.
Once the OpenStack cluster is configured, start the OpenStack deployment as listed in the OpenStack-Ansible Install guide by running all playbooks in sequence on the deployment host.
Post-Deployment¶
Once the deployment is complete, create the VFs that will be used for SR-IOV.
In this example, the physical function (PF) is ens4f0
. It will
simultaneously be connected to the Neutron provider bridge br-provider
.
On each compute node, determine the maximum number of VFs a PF can support:
# cat /sys/class/net/ens4f0/device/sriov_totalvfs
Note
To adjust sriov_totalvfs
please refer to Mellanox documentation.
On each compute node, create the VFs:
# echo '8' > /sys/class/net/ens4f0/device/sriov_numvfs
Configure Open vSwitch hardware offloading¶
Unbind the VFs from the Mellanox driver:
# for vf in `grep PCI_SLOT_NAME /sys/class/net/ens4f0/device/virtfn*/uevent | cut -d'=' -f2`
do
echo $vf > /sys/bus/pci/drivers/mlx5_core/unbind
done
Enable the switch in the NIC:
# PCI_ADDR=`grep PCI_SLOT_NAME /sys/class/net/ens4f0/device/uevent | sed 's:.*PCI_SLOT_NAME=::'`
# devlink dev eswitch set pci/$PCI_ADDR mode switchdev
Enable hardware offload filters with TC:
# ethtool -K ens4f0 hw-tc-offload on
Rebind the VFs to the Mellanox driver:
# for vf in `grep PCI_SLOT_NAME /sys/class/net/ens4f0/device/virtfn*/uevent | cut -d'=' -f2`
do
echo $vf > /sys/bus/pci/drivers/mlx5_core/bind
done
Enable hardware offloading in OVS:
# ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
# ovs-vsctl set Open_vSwitch . other_config:max-idle=30000
Restart Open vSwitch
# systemctl restart openvswitch-switch
Restart the Open vSwitch agent
# systemctl restart neutron-openvswitch-agent
Restart the Nova compute service
# systemctl restart nova-compute
Warning
Changes to sriov_numvfs
as well as the built-in NIC switch will not
persist a reboot and must be performed every time the server is started.
Verify operation¶
To verify operation of hardware-offloaded Open vSwitch, you must create a virtual machine instance using an image with the proper network drivers.
The following images are known to contain working drivers:
Before creating an instance, a Neutron port must be created that has the following characteristics:
--vnic-type direct --binding-profile '{"capabilities": ["switchdev"]}'
To ensure flows are offloaded, disable port security with the
--disable-port-security
argument.
An example of the full command can be seen here:
# openstack port create \
--network <network> \
--vnic-type direct --binding-profile '{"capabilities": ["switchdev"]}' \
--disable-port-security \
<name>
The port can then be attached to the instance at boot. Once booted, the port will be updated to reflect the PCI address of the corresponding virtual function:
root@aio1-utility-container-8c0b0916:~# openstack port show -c binding_profile testport2
+-----------------+------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------+------------------------------------------------------------------------------------------------------------------+
| binding_profile | capabilities='[u'switchdev']', pci_slot='0000:21:00.6', pci_vendor_info='15b3:1016', physical_network='physnet1' |
+-----------------+------------------------------------------------------------------------------------------------------------------+
Observing traffic¶
From the compute node, perform a packet capture on the representor port
that corresponds to the virtual function attached to the instance. In this
example, the interface is eth1
.
root@compute1:~# tcpdump -nnn -i eth1 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
Perform a ping from another host and observe the traffic at the representor port:
root@infra2:~# ping 192.168.88.151 -c5
PING 192.168.88.151 (192.168.88.151) 56(84) bytes of data.
64 bytes from 192.168.88.151: icmp_seq=1 ttl=64 time=48.3 ms
64 bytes from 192.168.88.151: icmp_seq=2 ttl=64 time=1.52 ms
64 bytes from 192.168.88.151: icmp_seq=3 ttl=64 time=0.586 ms
64 bytes from 192.168.88.151: icmp_seq=4 ttl=64 time=0.688 ms
64 bytes from 192.168.88.151: icmp_seq=5 ttl=64 time=0.775 ms
--- 192.168.88.151 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4045ms
rtt min/avg/max/mdev = 0.586/10.381/48.335/18.979 ms
root@compute1:~# tcpdump -nnn -i eth1 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
19:51:09.684957 IP 192.168.88.254 > 192.168.88.151: ICMP echo request, id 11168, seq 1, length 64
19:51:09.685448 IP 192.168.88.151 > 192.168.88.254: ICMP echo reply, id 11168, seq 1, length 64
When offloading is handled in the NIC, only the first packet(s) of the flow will be visible in the packet capture.
The following command can be used to dump flows in the kernel datapath:
# ovs-dpctl dump-flows type=ovs
The following command can be used to dump flows that are offloaded:
# ovs-dpctl dump-flows type=offloaded