Open vSwitch: High availability using VRRP

This architecture example augments the self-service deployment example with a high-availability mechanism using the Virtual Router Redundancy Protocol (VRRP) via keepalived and provides failover of routing for self-service networks. It requires a minimum of two network nodes because VRRP creates one master (active) instance and at least one backup instance of each router.

During normal operation, keepalived on the master router periodically transmits heartbeat packets over a hidden network that connects all VRRP routers for a particular project. Each project with VRRP routers uses a separate hidden network. By default this network uses the first value in the tenant_network_types option in the ml2_conf.ini file. For additional control, you can specify the self-service network type and physical network name for the hidden network using the l3_ha_network_type and l3_ha_network_name options in the neutron.conf file.

If keepalived on the backup router stops receiving heartbeat packets, it assumes failure of the master router and promotes the backup router to master router by configuring IP addresses on the interfaces in the qrouter namespace. In environments with more than one backup router, keepalived on the backup router with the next highest priority promotes that backup router to master router.

Note

This high-availability mechanism configures VRRP using the same priority for all routers. Therefore, VRRP promotes the backup router with the highest IP address to the master router.

Warning

There is a known bug with keepalived v1.2.15 and earlier which can cause packet loss when max_l3_agents_per_router is set to 3 or more. Therefore, we recommend that you upgrade to keepalived v1.2.16 or greater when using this feature.

Interruption of VRRP heartbeat traffic between network nodes, typically due to a network interface or physical network infrastructure failure, triggers a failover. Restarting the layer-3 agent, or failure of it, does not trigger a failover providing keepalived continues to operate.

Consider the following attributes of this high-availability mechanism to determine practicality in your environment:

  • Instance network traffic on self-service networks using a particular router only traverses the master instance of that router. Thus, resource limitations of a particular network node can impact all master instances of routers on that network node without triggering failover to another network node. However, you can configure the scheduler to distribute the master instance of each router uniformly across a pool of network nodes to reduce the chance of resource contention on any particular network node.

  • Only supports self-service networks using a router. Provider networks operate at layer-2 and rely on physical network infrastructure for redundancy.

  • For instances with a floating IPv4 address, maintains state of network connections during failover as a side effect of 1:1 static NAT. The mechanism does not actually implement connection tracking.

For production deployments, we recommend at least three network nodes with sufficient resources to handle network traffic for the entire environment if one network node fails. Also, the remaining two nodes can continue to provide redundancy.

Prerequisites

Add one network node with the following components:

  • Three network interfaces: management, provider, and overlay.

  • OpenStack Networking layer-2 agent, layer-3 agent, and any dependencies.

Note

You can keep the DHCP and metadata agents on each compute node or move them to the network nodes.

Architecture

High-availability using VRRP with Linux bridge - overview

The following figure shows components and connectivity for one self-service network and one untagged (flat) network. The master router resides on network node 1. In this particular case, the instance resides on the same compute node as the DHCP agent for the network. If the DHCP agent resides on another compute node, the latter only contains a DHCP namespace and Linux bridge with a port on the overlay physical network interface.

High-availability using VRRP with Linux bridge - components and connectivity - one network

Example configuration

Use the following example configuration as a template to add support for high-availability using VRRP to an existing operational environment that supports self-service networks.

Controller node

  1. In the neutron.conf file:

    • Enable VRRP.

      [DEFAULT]
      l3_ha = True
      
  2. Restart the following services:

    • Server

Network node 1

No changes.

Network node 2

  1. Install the Networking service OVS layer-2 agent and layer-3 agent.

  2. Install OVS.

  3. In the neutron.conf file, configure common options:

    [DEFAULT]
    core_plugin = ml2
    auth_strategy = keystone
    
    [database]
    # ...
    
    [keystone_authtoken]
    # ...
    
    [nova]
    # ...
    
    [agent]
    # ...
    

    See the Installation Tutorials and Guides and Configuration Reference for your OpenStack release to obtain the appropriate additional configuration for the [DEFAULT], [database], [keystone_authtoken], [nova], and [agent] sections.

  4. Start the following services:

    • OVS

  5. Create the OVS provider bridge br-provider:

    $ ovs-vsctl add-br br-provider
    
  6. Add the provider network interface as a port on the OVS provider bridge br-provider:

    $ ovs-vsctl add-port br-provider PROVIDER_INTERFACE
    

    Replace PROVIDER_INTERFACE with the name of the underlying interface that handles provider networks. For example, eth1.

  7. In the openvswitch_agent.ini file, configure the layer-2 agent.

    [ovs]
    bridge_mappings = provider:br-provider
    local_ip = OVERLAY_INTERFACE_IP_ADDRESS
    
    [agent]
    tunnel_types = vxlan
    l2_population = true
    
    [securitygroup]
    firewall_driver = iptables_hybrid
    

    Replace OVERLAY_INTERFACE_IP_ADDRESS with the IP address of the interface that handles VXLAN overlays for self-service networks.

  8. In the l3_agent.ini file, configure the layer-3 agent.

    [DEFAULT]
    interface_driver = openvswitch
    
  9. Start the following services:

    • Open vSwitch agent

    • Layer-3 agent

Compute nodes

No changes.

Verify service operation

  1. Source the administrative project credentials.

  2. Verify presence and operation of the agents.

    $ openstack network agent list
    +--------------------------------------+--------------------+----------+-------------------+-------+-------+---------------------------+
    | ID                                   | Agent Type         | Host     | Availability Zone | Alive | State | Binary                    |
    +--------------------------------------+--------------------+----------+-------------------+-------+-------+---------------------------+
    | 1236bbcb-e0ba-48a9-80fc-81202ca4fa51 | Metadata agent     | compute2 | None              | True  | UP    | neutron-metadata-agent    |
    | 457d6898-b373-4bb3-b41f-59345dcfb5c5 | Open vSwitch agent | compute2 | None              | True  | UP    | neutron-openvswitch-agent |
    | 71f15e84-bc47-4c2a-b9fb-317840b2d753 | DHCP agent         | compute2 | nova              | True  | UP    | neutron-dhcp-agent        |
    | 8805b962-de95-4e40-bdc2-7a0add7521e8 | L3 agent           | network1 | nova              | True  | UP    | neutron-l3-agent          |
    | a33cac5a-0266-48f6-9cac-4cef4f8b0358 | Open vSwitch agent | network1 | None              | True  | UP    | neutron-openvswitch-agent |
    | a6c69690-e7f7-4e56-9831-1282753e5007 | Metadata agent     | compute1 | None              | True  | UP    | neutron-metadata-agent    |
    | af11f22f-a9f4-404f-9fd8-cd7ad55c0f68 | DHCP agent         | compute1 | nova              | True  | UP    | neutron-dhcp-agent        |
    | bcfc977b-ec0e-4ba9-be62-9489b4b0e6f1 | Open vSwitch agent | compute1 | None              | True  | UP    | neutron-openvswitch-agent |
    | 7f00d759-f2c9-494a-9fbf-fd9118104d03 | Open vSwitch agent | network2 | None              | True  | UP    | neutron-openvswitch-agent |
    | b28d8818-9e32-4888-930b-29addbdd2ef9 | L3 agent           | network2 | nova              | True  | UP    | neutron-l3-agent          |
    +--------------------------------------+--------------------+----------+-------------------+-------+-------+---------------------------+
    

Create initial networks

Similar to the self-service deployment example, this configuration supports multiple VXLAN self-service networks. After enabling high-availability, all additional routers use VRRP. The following procedure creates an additional self-service network and router. The Networking service also supports adding high-availability to existing routers. However, the procedure requires administratively disabling and enabling each router which temporarily interrupts network connectivity for self-service networks with interfaces on that router.

  1. Source a regular (non-administrative) project credentials.

  2. Create a self-service network.

    $ openstack network create selfservice2
    +-------------------------+--------------+
    | Field                   | Value        |
    +-------------------------+--------------+
    | admin_state_up          | UP           |
    | mtu                     | 1450         |
    | name                    | selfservice2 |
    | port_security_enabled   | True         |
    | router:external         | Internal     |
    | shared                  | False        |
    | status                  | ACTIVE       |
    +-------------------------+--------------+
    
  3. Create a IPv4 subnet on the self-service network.

    $ openstack subnet create --subnet-range 198.51.100.0/24 \
      --network selfservice2 --dns-nameserver 8.8.4.4 selfservice2-v4
    +-------------------+------------------------------+
    | Field             | Value                        |
    +-------------------+------------------------------+
    | allocation_pools  | 198.51.100.2-198.51.100.254  |
    | cidr              | 198.51.100.0/24              |
    | dns_nameservers   | 8.8.4.4                      |
    | enable_dhcp       | True                         |
    | gateway_ip        | 198.51.100.1                 |
    | ip_version        | 4                            |
    | name              | selfservice2-v4              |
    +-------------------+------------------------------+
    
  4. Create a IPv6 subnet on the self-service network.

    $ openstack subnet create --subnet-range fd00:198:51:100::/64 --ip-version 6 \
      --ipv6-ra-mode slaac --ipv6-address-mode slaac --network selfservice2 \
      --dns-nameserver 2001:4860:4860::8844 selfservice2-v6
    +-------------------+--------------------------------------------------------+
    | Field             | Value                                                  |
    +-------------------+--------------------------------------------------------+
    | allocation_pools  | fd00:198:51:100::2-fd00:198:51:100:ffff:ffff:ffff:ffff |
    | cidr              | fd00:198:51:100::/64                                   |
    | dns_nameservers   | 2001:4860:4860::8844                                   |
    | enable_dhcp       | True                                                   |
    | gateway_ip        | fd00:198:51:100::1                                     |
    | ip_version        | 6                                                      |
    | ipv6_address_mode | slaac                                                  |
    | ipv6_ra_mode      | slaac                                                  |
    | name              | selfservice2-v6                                        |
    +-------------------+--------------------------------------------------------+
    
  5. Create a router.

    $ openstack router create router2
    +-----------------------+---------+
    | Field                 | Value   |
    +-----------------------+---------+
    | admin_state_up        | UP      |
    | name                  | router2 |
    | status                | ACTIVE  |
    +-----------------------+---------+
    
  6. Add the IPv4 and IPv6 subnets as interfaces on the router.

    $ openstack router add subnet router2 selfservice2-v4
    $ openstack router add subnet router2 selfservice2-v6
    

    Note

    These commands provide no output.

  7. Add the provider network as a gateway on the router.

    $ openstack router set --external-gateway provider1 router2
    

Verify network operation

  1. Source the administrative project credentials.

  2. Verify creation of the internal high-availability network that handles VRRP heartbeat traffic.

    $ openstack network list
    +--------------------------------------+----------------------------------------------------+--------------------------------------+
    | ID                                   | Name                                               | Subnets                              |
    +--------------------------------------+----------------------------------------------------+--------------------------------------+
    | 1b8519c1-59c4-415c-9da2-a67d53c68455 | HA network tenant f986edf55ae945e2bef3cb4bfd589928 | 6843314a-1e76-4cc9-94f5-c64b7a39364a |
    +--------------------------------------+----------------------------------------------------+--------------------------------------+
    
  3. On each network node, verify creation of a qrouter namespace with the same ID.

    Network node 1:

    # ip netns
    qrouter-b6206312-878e-497c-8ef7-eb384f8add96
    

    Network node 2:

    # ip netns
    qrouter-b6206312-878e-497c-8ef7-eb384f8add96
    

    Note

    The namespace for router 1 from Linux bridge: Self-service networks should only appear on network node 1 because of creation prior to enabling VRRP.

  4. On each network node, show the IP address of interfaces in the qrouter namespace. With the exception of the VRRP interface, only one namespace belonging to the master router instance contains IP addresses on the interfaces.

    Network node 1:

    # ip netns exec qrouter-b6206312-878e-497c-8ef7-eb384f8add96 ip addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: ha-eb820380-40@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
        link/ether fa:16:3e:78:ba:99 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-eb820380-40
           valid_lft forever preferred_lft forever
        inet 169.254.0.1/24 scope global ha-eb820380-40
           valid_lft forever preferred_lft forever
        inet6 fe80::f816:3eff:fe78:ba99/64 scope link
           valid_lft forever preferred_lft forever
    3: qr-da3504ad-ba@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
        link/ether fa:16:3e:dc:8e:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 198.51.100.1/24 scope global qr-da3504ad-ba
           valid_lft forever preferred_lft forever
        inet6 fe80::f816:3eff:fedc:8ea8/64 scope link
           valid_lft forever preferred_lft forever
    4: qr-442e36eb-fc@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
        link/ether fa:16:3e:ee:c8:41 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet6 fd00:198:51:100::1/64 scope global nodad
           valid_lft forever preferred_lft forever
        inet6 fe80::f816:3eff:feee:c841/64 scope link
           valid_lft forever preferred_lft forever
    5: qg-33fedbc5-43@if28: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
        link/ether fa:16:3e:03:1a:f6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 203.0.113.21/24 scope global qg-33fedbc5-43
           valid_lft forever preferred_lft forever
        inet6 fd00:203:0:113::21/64 scope global nodad
           valid_lft forever preferred_lft forever
        inet6 fe80::f816:3eff:fe03:1af6/64 scope link
           valid_lft forever preferred_lft forever
    

    Network node 2:

    # ip netns exec qrouter-b6206312-878e-497c-8ef7-eb384f8add96 ip addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: ha-7a7ce184-36@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
        link/ether fa:16:3e:16:59:84 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-7a7ce184-36
           valid_lft forever preferred_lft forever
        inet6 fe80::f816:3eff:fe16:5984/64 scope link
           valid_lft forever preferred_lft forever
    3: qr-da3504ad-ba@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
        link/ether fa:16:3e:dc:8e:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    4: qr-442e36eb-fc@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    5: qg-33fedbc5-43@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
        link/ether fa:16:3e:03:1a:f6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    

    Note

    The master router may reside on network node 2.

  5. Launch an instance with an interface on the additional self-service network. For example, a CirrOS image using flavor ID 1.

    $ openstack server create --flavor 1 --image cirros --nic net-id=NETWORK_ID selfservice-instance2
    

    Replace NETWORK_ID with the ID of the additional self-service network.

  6. Determine the IPv4 and IPv6 addresses of the instance.

    $ openstack server list
    +--------------------------------------+-----------------------+--------+----------------------------------------------------------------+--------+---------+
    | ID                                   | Name                  | Status | Networks                                                       | Image  | Flavor  |
    +--------------------------------------+-----------------------+--------+----------------------------------------------------------------+--------+---------+
    | bde64b00-77ae-41b9-b19a-cd8e378d9f8b | selfservice-instance2 | ACTIVE | selfservice2=fd00:198:51:100:f816:3eff:fe71:e93e, 198.51.100.4 | cirros | m1.tiny |
    +--------------------------------------+-----------------------+--------+----------------------------------------------------------------+--------+---------+
    
  7. Create a floating IPv4 address on the provider network.

    $ openstack floating ip create provider1
    +-------------+--------------------------------------+
    | Field       | Value                                |
    +-------------+--------------------------------------+
    | fixed_ip    | None                                 |
    | id          | 0174056a-fa56-4403-b1ea-b5151a31191f |
    | instance_id | None                                 |
    | ip          | 203.0.113.17                         |
    | pool        | provider1                            |
    +-------------+--------------------------------------+
    
  8. Associate the floating IPv4 address with the instance.

    $ openstack server add floating ip selfservice-instance2 203.0.113.17
    

    Note

    This command provides no output.

Verify failover operation

  1. Begin a continuous ping of both the floating IPv4 address and IPv6 address of the instance. While performing the next three steps, you should see a minimal, if any, interruption of connectivity to the instance.

  2. On the network node with the master router, administratively disable the overlay network interface.

  3. On the other network node, verify promotion of the backup router to master router by noting addition of IP addresses to the interfaces in the qrouter namespace.

  4. On the original network node in step 2, administratively enable the overlay network interface. Note that the master router remains on the network node in step 3.

Keepalived VRRP health check

The health of your keepalived instances can be automatically monitored via a bash script that verifies connectivity to all available and configured gateway addresses. In the event that connectivity is lost, the master router is rescheduled to another node.

If all routers lose connectivity simultaneously, the process of selecting a new master router will be repeated in a round-robin fashion until one or more routers have their connectivity restored.

To enable this feature, edit the l3_agent.ini file:

ha_vrrp_health_check_interval = 30

Where ha_vrrp_health_check_interval indicates how often in seconds the health check should run. The default value is 0, which indicates that the check should not run at all.

Network traffic flow

This high-availability mechanism simply augments Open vSwitch: Self-service networks with failover of layer-3 services to another router if the master router fails. Thus, you can reference Self-service network traffic flow for normal operation.