OVS auxiliary bridge to reduce live migration networking disruption in OVN¶
Launchpad bug: https://bugs.launchpad.net/neutron/+bug/1933517
The goal of this spec is to propose a change in the VM - backend (OVN) connectivity to improve the live migration process.
Problem Description¶
The live migration is a very sensitive process that implies to configure a destination host to execute a virtual machine that is already running. When the destination hypervisor is prepared, the source virtual machine is paused and immediately unpaused in the destination host.
Since [1] and [2], Nova is capable of plugging the port and create the interface in the destination host during the pre-live migration. But the TAP device is created by libvirt when the VM is unpaused. This is when the port interface is assigned with an OpenFlow port ID.
When the port is created, the network backend detects the new port attached and informs to the OVN Northbound and this event is received by Neutron. The network backend is commanded to configure the needed rules for this new port.
The problem with this sequence of actions is that when the virtual
machine is unpaused, the network backend is not ready to continue any
network communication until the OVN controller (in the compute node) has set
all needed OpenFlow rules and chassis configuration, using the OpenFlow port
ID (interface.ofport
) assigned when the TAP interface is created.
Proposed Change¶
The spec scope is limited to the OVN backend. This spec does not consider the
case of HW offloaded ports, that have other plug process. This spec is only
considering os_vif.objects.vif.VIFOpenVSwitch
VIF types within the related
network backend.
This spec proposes to create an intermediate OVS bridge between the integration bridge and the virtual machine tap interface. This OVS bridge will be connected to the integration bridge with a patch port. The TAP interface will be plugged into the intermediate bridge when the virtual machine is created or unpaused (that happens during the live migration). This architecture is very similar to OVS hybrid plugging, but with an OVS bridge in between.
The advantage of this approach is that the integration bridge port, in this
case the patch port, that is created during the pre-live migration process, has
a valid OpenFlow port ID (interface.ofport
), needed to provide the correct
OpenFlow rules in the integration bridge.
After the port is plugged, Nova commands to libvirt to copy the guest memory to the destination host. If “live_migration_permit_post_copy” is used [3], the virtual machine on the destination host will be activated before all its memory has been copied. However there is a period of time where the port is bound to the destination host and the virtual machine is still running on the source host. During this period of time, the virtual machine won’t be able to communicate, same as without this feature.
When the virtual machine is unpaused in the destination host, the OVN backend is ready to immediately continue transmitting packets from the guess.
Implementation¶
The VIF plug and unplug process is executed by Nova, using the different
backend implementations provided by os-vif
library. The bridge creation
and deletion will be done during the plug and unplug processes [4], as in OVS
hybrid plug strategy.
The proposed architecture and naming is the following one, implemented in [5]:
┌────────┐
│tap-xxx │ TAP interface port
┌─────┴────────┴─────┐
│ pbr-xxx │ Port bridge
└─────┬────────┬─────┘
│pbp-xxx │ Port bridge patch port (port bridge side)
└────┬───┘
┌────┴───┐
│ipb-xxx │ Port bridge patch port (integration bridge side)
┌─────┴────────┴─────┐
│ integration │
│ bridge │
└────────────────────┘
For each new VM TAP interface, an OVS bridge is created along with the patch port to connect this bridge with the integration bridge. This port bridge will have a default OpenFlow rule, allowing all traffic from the TAP interface port to the patch port:
cookie=0x0, duration=84162.020s, table=0, n_packets=444223, n_bytes=832399534, priority=0 actions=NORMAL
OVN links the OVS ports (Open vSwitch DB) with the Logical Switch Ports (OVN
NB database) using the Neutron DB port ID. The OVN NB Logical Switch Port,
created by Neutron, will have this port ID in the lsp.name
field. The OVS
port ipb-xxx
(the integration bridge side patch port) is created with
interface.external_ids={iface-id: port_id}
. The ovn-controller will detect
this new port and assign to OVN SB Port Binding the corresponding chassis.
The interface.external_ids
information is added by os-vif
during the
port plug process [5], called by Nova.
Note: OVS and OVN ports are represented in os-vif
with the same VIF type,
objects.vif.VIFOpenVSwitch
. The os-vif
implementation could be used
both for OVS and OVN backends. However, the scope of this spec is limited to
OVN backend only.
Nova - Neutron events¶
Nova and Neutron have a mechanism to communicate the state of the VIFs [6].
This is a unidirectional API that allows Neutron to inform Nova when a VIF has
been plugged or unplugged. Nova expects from Neutron a network-vif-plugged
event when the port has been bound or plugged. Depending on the backend, Nova
will expect a network-vif-plugged
when the port has been bound or when
the port has been plugged [7]. For example, in OVS with hybrid plug, Nova
expects the event when the port has been plugged to the OVS. In OVN, Nova is
expecting this event when the port has been bound during the live migration
process [8].
Once Nova creates the intermediate bridge [1] [2], Neutron can send the
network-vif-plugged
event when the port is plugged and the network backend
is configured.
This spec proposes to use the existing OVN event infrastructure to capture the
patch port creation event, using LogicalSwitchPortCreateUpEvent
, and in
the handler method push the event to Nova. Along with this change, Neutron will
inform Nova about the backend used in the port adding a field in
port.vif_details
called backend
. The value will be “ovn”. This
port update will be done in [9], in the mech driver port binding method.
Neutron configuration¶
As explained in the previous section, this spec proposes to change the event
emission from Neutron. That will depend on if os-vif
is configured to
create this intermediate bridge between the integration bridge and the TAP
device.
Because Neutron cannot read the os-vif
configuration, this spec proposes
to add the same config flag in the ML2 OVN plugin section: “per_port_bridge”.
If enabled, Neutron will send the network-vif-plugged
when the port is
plugged, not when the port binding is updated.
QoS¶
With this feature, the Logical Switch Port is now the patch port between the port bridge and the TAP device, not the TAP device port. In OVS, QoS rules can only be applied to external ports. However, in OVN the QoS rules are applied using the “match” field in the QoS register [10]. This is a string in the same expression language used in the Logical_Flow table. The traffic shaping applied to the TAP device is the same when using the patch port reference because the traffic flow is the same in both ports.
Performance¶
The intermediate bridge will have one single rule, with action NORMAL. That means all traffic coming from the TAP port will be sent to the patch port and vice versa.
The datapath will collapse this bridge resulting in an identical flow as without the bridge. In other words, the new intermediate bridge won’t have any effect on packet processing performance.
OVN DPDK¶
Same as with other bridges (physical bridges, tunnel bridges, external bridges), each port bridge will be connected to the integration bridge with a patch port. That keeps one single datapath (“netdev” in DPDK case), as commented in the previous section.
Upgrade Impact¶
This feature could be enabled in os-vif
using the configuration variable
“per_port_bridge” [11]. If this option is enabled, any OVN port will be
plugged and unplugged using the implementation described in this spec.
Currently this implementation does not handle mixing both plugging methods
(connecting the TAP port directly to the integration bridge or creating the
port bridge as described in this spec). If this option is enabled, the compute
node should not host any virtual machine. At this point, as described in the
proposed change, any virtual machine created or migrated to this host, will
be plugged using the intermediate port bridge. In case of migration, that will
solve or mitigate the connectivity problems when the virtual machine is
unpaused in the destination host.
This spec does not cover any scenario mixing TAP ports using both plug strategies on the same host (same as with OVS hybrid plugging strategy).
Testing¶
Tempest Tests¶
Live migration tempest tests are covered in the Nova CI. In order to test this
feature, a set of tests from tempest.api.compute.admin.test_live_migration
should be executed on neutron-ovn-tempest-slow
CI job.
Documentation Impact¶
User Documentation¶
Document the new architecture and the configuration flag to change the event emission from Neutron.
Implementation¶
Assignee(s)¶
Sean K Mooney (smooney@redhat.com, IRC: sean-k-mooney) Rodolfo Alonso Hernandez (ralonsoh@redhat.com, IRC: ralonsoh)