[ English | English (United Kingdom) | Deutsch | русский ]

Example of multi-AZ environment configuration

On this page, we will provide an example configuration that can be used in production environments with multiple Availability Zones.

It will be an extended and more specific version of Routed environment example so it is expected that you are aware of the concepts and approaches defined there.

To better understand why some configuration options were applied in examples it is also recommended to look through Configuring the inventory

Generic design

The following design decisions were made in the example below:

  • Three Availability Zones (AZs)

  • Three infrastructure (control plane) hosts, each host is placed in a different Availability Zone

  • Eight compute hosts, 2 compute hosts in each Availability Zone. First Availability Zone has two extra compute hosts for pinned CPU aggregate.

  • Three Ceph storage clusters provisioned with Ceph Ansible.

  • Compute hosts act as OVN gateway hosts

  • Tunnel networks which are reachable between Availability Zones

  • Public API, OpenStack external and management networks are represented as stretched L2 networks between Availability Zones.

../../_images/az-layout-general.drawio.png

Load Balancing

A Load Balancer (HAProxy) is usually deployed on infrastructure hosts. With infrastructure hosts being spread across Availability Zones we need to come up with a more complex design which is aimed at solving the following issues:

  • Withstand a single Availability Zone failure

  • Reduce amount of cross-AZ traffic

  • Spread load across Availability Zones

To address these challenges, the following changes to the basic design were made:

  • Leverage DNS Round Robin (an A/AAAA record per AZ) for Public API

  • Define Internal API FQDN through /etc/hosts overrides, which are unique per Availability Zone

  • Define 6 keepalived instances: 3 for public and 3 for internal VIPs

  • Ensure HAProxy to prioritize a backend from own Availability Zone over “remote” ones

The example also deploys HAProxy with Keepalived in their own LXC containers on the contrary to a more conventional bare metal deployment. You can check a HAProxy and Keepalived in LXC containers for more details on how to do that.

../../_images/az-balancing-diagram.drawio.png

Storage design complications

There are multiple complications related to organizing storage where the storage is not stretched between Availability Zones.

First, there is only a single controller in any given Availability Zone, while multiple copies of cinder_volume needs to be run for each storage provider for High Availability. As cinder_volume needs access to storage network, one of the best places for it are ceph-mon hosts.

Another challenge is to organize shared storage for Glance Images, as rbd can’t be used consistently anymore. While Glance Interoperable Import interface could be leveraged for syncing images between rbd backends, in fact not all clients and services can work with Glances import API. One of the most obvious solutions here can be usage of Swift API, while configuring Ceph RadosGW policy to replicate the bucket between independent instances located in their Availability Zones.

Last, but not the least complication is Nova scheduling when cross_az_attach is disabled. As Nova will not add an Availability Zone to instances request_specs when an instance is created from a volume directly, on the contrary to creating volume manually in advance and supplying volume UUID to the instance create API call. The problem with that behavior, is that Nova will attempt to Live Migrate or re-schedule instances without an Availability Zone in request_specs to other AZs, which will result in failure, as cross_az_attach is disabled. You can read more about this in a Nova bug report In order to work around this Bug you need to set a default_schedule_zone for Nova and Cinder, which will ensure AZ always being defined in request_specs. You can also go further and define an actual Availability Zone as default_schedule_zone, making each controller to have its own default. As Load Balancer will attempt to send requests only to “local” backends first, this approach does work to distribute new VMs across all AZs when user does not supply AZ explicitly. Otherwise, the “default” AZ will be accepting significantly more new signups.

Configuration examples

Network configuration

Network CIDR/VLAN assignments

The following CIDR assignments are used for this environment.

Network

CIDR

VLAN

Management Network

172.29.236.0/22

10

AZ1 Storage Network

172.29.244.0/24

20

AZ1 Tunnel (Geneve) Network

172.29.240.0/24

30

AZ2 Storage Network

172.29.245.0/24

21

AZ2 Tunnel (Geneve) Network

172.29.241.0/24

31

AZ3 Storage Network

172.29.246.0/24

22

AZ3 Tunnel (Geneve) Network

172.29.242.0/24

32

Public API VIPs

203.0.113.0/28

400

IP assignments

The following host name and IP address assignments are used for this environment.

Host name

Management IP

Tunnel (Geneve) IP

Storage IP

infra1

172.29.236.11

infra2

172.29.236.12

infra3

172.29.236.13

az1_ceph1

172.29.237.201

172.29.244.201

az1_ceph2

172.29.237.202

172.29.244.202

az1_ceph3

172.29.237.203

172.29.244.203

az2_ceph1

172.29.238.201

172.29.245.201

az2_ceph2

172.29.238.202

172.29.245.202

az2_ceph3

172.29.238.203

172.29.245.203

az3_ceph1

172.29.239.201

172.29.246.201

az3_ceph2

172.29.239.202

172.29.246.202

az3_ceph3

172.29.239.203

172.29.246.203

az1_compute1

172.29.237.11

172.29.240.11

172.29.244.11

az1_compute2

172.29.237.12

172.29.240.12

172.29.244.12

az1_pin_compute1

172.29.237.13

172.29.240.13

172.29.244.13

az1_pin_compute2

172.29.237.14

172.29.240.14

172.29.244.14

az2_compute1

172.29.238.11

172.29.241.11

172.29.245.11

az2_compute2

172.29.238.12

172.29.241.12

172.29.245.12

az3_compute1

172.29.239.11

172.29.242.11

172.29.246.11

az3_compute3

172.29.239.12

172.29.242.12

172.29.246.12

Host network configuration

Each host does require the correct network bridges to be implemented. In this example, we leverage the systemd_networkd role that performs configuration for us during openstack_hosts execution. It creates all required vlans and bridges. The only pre-requirement is to have a connection to the host via SSH available for Ansible to manage the host.

Note

Example assumes that default gateway is set through bond0 interface, which aggregates eth0 and eth1 links. If your environment does not have eth0, but instead has p1p1 or some other interface name, ensure that references to eth0 are replaced with the appropriate name. The same applies to additional network interfaces

---
# VLAN Mappings
_az_vlan_mappings:
  az1:
    management: 10
    storage: 20
    tunnel: 30
    public-api: 400
  az2:
    management: 10
    storage: 21
    tunnel: 31
    public-api: 400
  az3:
    management: 10
    storage: 22
    tunnel: 32
    public-api: 400

# Bonding interfaces
_bond0_interfaces:
  - eth0
  - eth1

# NETDEV defenition
_systemd_networkd_default_devices:
  - NetDev:
      Name: vlan-mgmt
      Kind: vlan
    VLAN:
      Id: "{{ _az_vlan_mappings[az_name]['management'] }}"
    filename: 10-openstack-vlan-mgmt
  - NetDev:
      Name: bond0
      Kind: bond
    Bond:
      Mode: 802.3ad
      TransmitHashPolicy: layer3+4
      LACPTransmitRate: fast
      MIIMonitorSec: 100
    filename: 05-general-bond0
  - NetDev:
      Name: "{{ management_bridge }}"
      Kind: bridge
    Bridge:
      ForwardDelaySec: 0
      HelloTimeSec: 2
      MaxAgeSec: 12
      STP: off
    filename: "11-openstack-{{ management_bridge }}"

_systemd_networkd_storage_devices:
  - NetDev:
      Name: vlan-stor
      Kind: vlan
    VLAN:
      Id: "{{ _az_vlan_mappings[az_name]['storage'] }}"
    filename: 12-openstack-vlan-stor
  - NetDev:
      Name: br-storage
      Kind: bridge
    Bridge:
      ForwardDelaySec: 0
      HelloTimeSec: 2
      MaxAgeSec: 12
      STP: off
    filename: 13-openstack-br-storage

_systemd_networkd_tunnel_devices:
  - NetDev:
      Name: vlan-tunnel
      Kind: vlan
    VLAN:
      Id: "{{ _az_vlan_mappings[az_name]['tunnel'] }}"
    filename: 16-openstack-vlan-tunnel

_systemd_networkd_pub_api_devices:
  - NetDev:
      Name: vlan-public-api
      Kind: vlan
    VLAN:
      Id: "{{ _az_vlan_mappings[az_name]['public-api'] }}"
    filename: 17-openstack-vlan-public-api
  - NetDev:
      Name: br-public-api
      Kind: bridge
    Bridge:
      ForwardDelaySec: 0
      HelloTimeSec: 2
      MaxAgeSec: 12
      STP: off
    filename: 18-openstack-br-public-api

openstack_hosts_systemd_networkd_devices: |-
  {% set devices = [] %}
  {% if is_metal %}
  {%   set _ = devices.extend(_systemd_networkd_default_devices) %}
  {%   if inventory_hostname in (groups['compute_hosts'] + groups['storage_hosts']) %}
  {%     set _ = devices.extend(_systemd_networkd_storage_devices) %}
  {%   endif %}
  {%   if inventory_hostname in (groups[az_name ~ '_ceph_mon_hosts'] + groups[az_name ~ '_ceph_osd_hosts']) %}
  {%     set _ = devices.extend(_systemd_networkd_cluster_devices) %}
  {%   endif %}
  {%   if inventory_hostname in groups['compute_hosts'] %}
  {%     set _ = devices.extend(_systemd_networkd_tunnel_devices) %}
  {%   endif %}
  {%   if inventory_hostname in groups['haproxy_hosts'] %}
  {%     set _ = devices.extend(_systemd_networkd_pub_api_devices) %}
  {%   endif %}
  {% endif %}
  {{ devices }}

# NETWORK definition

# NOTE: this can work only in case management network has the same netmask as all other networks
#       while in example manaement is /22 while rest are /24
# _management_rank: "{{ management_address | ansible.utils.ipsubnet(hostvars[inventory_hostname]['cidr_networks']['management']) }}"
_management_rank: "{{ (management_address | split('.'))[-1] }}"

# NOTE: `05` is prefixed to filename to have precedence over netplan
_systemd_networkd_bonded_networks: |-
  {% set struct = [] %}
  {% for interface in _bond0_interfaces %}
  {%   set interface_data = ansible_facts[interface | replace('-', '_')] %}
  {%   set _ = struct.append({
                'interface': interface_data['device'],
                'filename' : '05-general-' ~ interface_data['device'],
                'bond': 'bond0',
                'link_config_overrides': {
                  'Match': {
                    'MACAddress': interface_data['macaddress']
                  }
                }
               })
  %}
  {% endfor %}
  {% set bond_vlans = ['vlan-mgmt'] %}
  {% if inventory_hostname in (groups['compute_hosts'] + groups['storage_hosts']) %}
  {%   set _ = bond_vlans.append('vlan-stor') %}
  {% endif %}
  {% if inventory_hostname in groups['haproxy_hosts'] %}
  {%   set _ = bond_vlans.append('vlan-public-api') %}
  {% endif %}
  {% if inventory_hostname in groups['compute_hosts'] %}
  {%   set _ = bond_vlans.append('vlan-tunnel') %}
  {% endif %}
  {% set _ = struct.append({
              'interface': 'bond0',
              'filename': '05-general-bond0',
              'vlan': bond_vlans
            })
  %}
  {{ struct }}

_systemd_networkd_mgmt_networks:
  - interface: "vlan-mgmt"
    bridge: "{{ management_bridge }}"
    filename: 10-openstack-vlan-mgmt
  - interface: "{{ management_bridge }}"
    address: "{{ management_address }}"
    netmask: "{{ cidr_networks['management'] | ansible.utils.ipaddr('netmask') }}"
    filename: "11-openstack-{{ management_bridge }}"

_systemd_networkd_storage_networks:
  - interface: "vlan-stor"
    bridge: "br-storage"
    filename: 12-openstack-vlan-stor
  - interface: "br-storage"
    address: "{{ cidr_networks['storage_' ~ az_name] | ansible.utils.ipmath(_management_rank) }}"
    netmask: "{{ cidr_networks['storage_' ~ az_name] | ansible.utils.ipaddr('netmask') }}"
    filename: "13-openstack-br-storage"

_systemd_networkd_tunnel_networks:
  - interface: "vlan-tunnel"
    filename: 16-openstack-vlan-tunnel
    address: "{{ cidr_networks['tunnel_' ~ az_name] | ansible.utils.ipmath(_management_rank) }}"
    netmask: "{{ cidr_networks['tunnel_' ~ az_name] | ansible.utils.ipaddr('netmask') }}"
    static_routes: |-
      {% set routes = [] %}
      {% set tunnel_cidrs = cidr_networks | dict2items | selectattr('key', 'match', 'tunnel_az[0-9]') | map(attribute='value') %}
      {% set gateway = cidr_networks['tunnel_' ~ az_name] | ansible.utils.ipaddr('1') | ansible.utils.ipaddr('address') %}
      {% for cidr in tunnel_cidrs | reject('eq', cidr_networks['tunnel_' ~ az_name]) %}
      {%   set _ = routes.append({'cidr': cidr, 'gateway': gateway}) %}
      {% endfor %}
      {{ routes }}

_systemd_networkd_pub_api_networks:
  - interface: "vlan-public-api"
    bridge: "br-public-api"
    filename: 17-openstack-vlan-public-api
  - interface: "br-public-api"
    filename: "18-openstack-br-public-api"

openstack_hosts_systemd_networkd_networks: |-
  {% set networks = [] %}
  {% if is_metal %}
  {%   set _ = networks.extend(_systemd_networkd_mgmt_networks + _systemd_networkd_bonded_networks) %}
  {%   if inventory_hostname in (groups['compute_hosts'] + groups['storage_hosts']) %}
  {%     set _ = networks.extend(_systemd_networkd_storage_networks) %}
  {%   endif %}
  {%   if inventory_hostname in groups['compute_hosts'] %}
  {%     set _ = networks.extend(_systemd_networkd_tunnel_networks) %}
  {%   endif %}
  {%   if inventory_hostname in groups['haproxy_hosts'] %}
  {%     set _ = networks.extend(_systemd_networkd_pub_api_networks) %}
  {%   endif %}
  {% endif %}
  {{ networks }}

Deployment configuration

Environment customizations

Deployed files in /etc/openstack_deploy/env.d allow the customization of Ansible groups.

To deploy HAProxy in container we need to create a file /etc/openstack_deploy/env.d/haproxy.yml with the following content:

---
# This file contains an example to show how to set
# the cinder-volume service to run in a container.
#
# Important note:
# In most cases you need to ensure that default route inside of the
# container doesn't go through eth0, which is part of lxcbr0 and
# SRC nat-ed. You need to pass "public" VIP interface inside of the
# container and ensure "default" route presence on it.

container_skel:
  haproxy_container:
    properties:
      is_metal: false

As we are using Ceph for this environment, so the cinder-volume runs in a container on the Ceph Monitor hosts. To achieve this, implement /etc/openstack_deploy/env.d/cinder.yml with the following content:

---
# This file contains an example to show how to set
# the cinder-volume service to run in a container.
#
# Important note:
# When using LVM or any iSCSI-based cinder backends, such as NetApp with
# iSCSI protocol, the cinder-volume service *must* run on metal.
# Reference: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1226855

container_skel:
  cinder_volumes_container:
    properties:
      is_metal: false

In order to be able to execute a playbook only against hosts in a single Availability Zone, as well as be able to set AZ-specific variables, we need to define groups definitions. For that, create a file /etc/openstack_deploy/env.d/az.yml with the following content:

---

component_skel:
  az1_containers:
    belongs_to:
      - az1_all
  az1_hosts:
    belongs_to:
      - az1_all

  az2_containers:
    belongs_to:
      - az2_all
  az2_hosts:
    belongs_to:
      - az2_all

  az3_containers:
    belongs_to:
      - az3_all
  az3_hosts:
    belongs_to:
      - az3_all

container_skel:
  az1_containers:
    properties:
      is_nest: true

  az2_containers:
    properties:
      is_nest: true

  az3_containers:
    properties:
      is_nest: true

Above example will create following groups:

  • azN_hosts which will contain only bare metal nodes

  • azN_containers that will contain all containers that are spawned on

    bare metal nodes, that are part of the pod.

  • azN_all that will contain azN_hosts and azN_containers members

We also need to define a complete new set of groups for Ceph, to deploy multiple independent instances of it.

For that, create a file /etc/openstack_deploy/env.d/ceph.yml with the following content:

---
component_skel:
  # Ceph MON
  ceph_mon_az1:
    belongs_to:
      - ceph-mon
      - ceph_all
      - az1_all
  ceph_mon_az2:
    belongs_to:
      - ceph-mon
      - ceph_all
      - az2_all
  ceph_mon_az3:
    belongs_to:
      - ceph-mon
      - ceph_all
      - az3_all

  # Ceph OSD
  ceph_osd_az1:
    belongs_to:
      - ceph-osd
      - ceph_all
      - az1_all
  ceph_osd_az2:
    belongs_to:
      - ceph-osd
      - ceph_all
      - az2_all
  ceph_osd_az3:
    belongs_to:
      - ceph-osd
      - ceph_all
      - az3_all

  # Ceph RGW
  ceph_rgw_az1:
    belongs_to:
      - ceph-rgw
      - ceph_all
      - az1_all
  ceph_rgw_az2:
    belongs_to:
      - ceph-rgw
      - ceph_all
      - az2_all
  ceph_rgw_az3:
    belongs_to:
      - ceph-rgw
      - ceph_all
      - az3_all

container_skel:
  # Ceph MON
  ceph_mon_container_az1:
    belongs_to:
      - az1_ceph_mon_containers
    contains:
      - ceph_mon_az1
  ceph_mon_container_az2:
    belongs_to:
      - az2_ceph_mon_containers
    contains:
      - ceph_mon_az2
  ceph_mon_container_az3:
    belongs_to:
      - az3_ceph_mon_containers
    contains:
      - ceph_mon_az3

  # Ceph RGW
  ceph_rgw_container_az1:
    belongs_to:
      - az1_ceph_rgw_containers
    contains:
      - ceph_rgw_az1
  ceph_rgw_container_az2:
    belongs_to:
      - az2_ceph_rgw_containers
    contains:
      - ceph_rgw_az2
  ceph_rgw_container_az3:
    belongs_to:
      - az3_ceph_rgw_containers
    contains:
      - ceph_rgw_az3

  # Ceph OSD
  ceph_osd_container_az1:
    belongs_to:
      - az1_ceph_osd_containers
    contains:
      - ceph_osd_az1
    properties:
      is_metal: true
  ceph_osd_container_az2:
    belongs_to:
      - az2_ceph_osd_containers
    contains:
      - ceph_osd_az2
    properties:
      is_metal: true
  ceph_osd_container_az3:
    belongs_to:
      - az3_ceph_osd_containers
    contains:
      - ceph_osd_az3
    properties:
      is_metal: true


physical_skel:
  # Ceph MON
  az1_ceph_mon_containers:
    belongs_to:
      - all_containers
  az1_ceph_mon_hosts:
    belongs_to:
      - hosts
  az2_ceph_mon_containers:
    belongs_to:
      - all_containers
  az2_ceph_mon_hosts:
    belongs_to:
      - hosts
  az3_ceph_mon_containers:
    belongs_to:
      - all_containers
  az3_ceph_mon_hosts:
    belongs_to:
      - hosts

  # Ceph OSD
  az1_ceph_osd_containers:
    belongs_to:
      - all_containers
  az1_ceph_osd_hosts:
    belongs_to:
      - hosts
  az2_ceph_osd_containers:
    belongs_to:
      - all_containers
  az2_ceph_osd_hosts:
    belongs_to:
      - hosts
  az3_ceph_osd_containers:
    belongs_to:
      - all_containers
  az3_ceph_osd_hosts:
    belongs_to:
      - hosts

  # Ceph RGW
  az1_ceph_rgw_containers:
    belongs_to:
      - all_containers
  az1_ceph_rgw_hosts:
    belongs_to:
      - hosts
  az2_ceph_rgw_containers:
    belongs_to:
      - all_containers
  az2_ceph_rgw_hosts:
    belongs_to:
      - hosts
  az3_ceph_rgw_containers:
    belongs_to:
      - all_containers
  az3_ceph_rgw_hosts:
    belongs_to:
      - hosts

Environment layout

The /etc/openstack_deploy/openstack_user_config.yml file defines the environment layout.

For each AZ, a group will need to be defined containing all hosts within that AZ.

Within defined provider networks, address_prefix is used to override the prefix of the key added to each host that contains IP address information. We use AZ-specific prefixes for container, tunnel, or storage. reference_group contains the name of a defined AZ group and is used to limit the scope of each provider network to that group.

YAML Anchors and Aliases are used heavily in the example below to populate all groups that might become handy while not repeating hosts definitions each time. You can read more about the topic in Ansible Documentation

The following configuration describes the layout for this environment.

---

cidr_networks: &os_cidrs
  management: 172.29.236.0/22
  tunnel_az1: 172.29.240.0/24
  tunnel_az2: 172.29.241.0/24
  tunnel_az3: 172.29.242.0/24
  storage_az1: 172.29.244.0/24
  storage_az2: 172.29.245.0/24
  storage_az3: 172.29.246.0/24
  public_api_vip: 203.0.113.0/28

used_ips:
  # management network - openstack VIPs
  - "172.29.236.1,172.29.236.30"
  # management network - other hosts not managed by OSA dynamic inventory
  - "172.29.238.0,172.29.239.255"
  # storage network - reserved for ceph hosts
  - "172.29.244.200,172.29.244.250"
  - "172.29.245.200,172.29.245.250"
  - "172.29.246.200,172.29.246.250"
  # public_api
  - "203.0.113.1,203.0.113.10"

global_overrides:
  internal_lb_vip_address: internal.example.cloud
  external_lb_vip_address: example.cloud
  management_bridge: "br-mgmt"
  cidr_networks: *os_cidrs
  provider_networks:
    - network:
        group_binds:
          - all_containers
          - hosts
        type: "raw"
        container_bridge: "br-mgmt"
        container_interface: "eth1"
        container_type: "veth"
        ip_from_q: "management"
        is_management_address: true
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage_az1"
        address_prefix: "storage_az1"
        type: "raw"
        group_binds:
          - cinder_volume
          - nova_compute
          - ceph_all
        reference_group: "az1_all"
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage_az2"
        address_prefix: "storage_az2"
        type: "raw"
        group_binds:
          - cinder_volume
          - nova_compute
          - ceph_all
        reference_group: "az2_all"
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage_az3"
        address_prefix: "storage_az3"
        type: "raw"
        group_binds:
          - cinder_volume
          - nova_compute
          - ceph_all
        reference_group: "az3_all"
    - network:
        container_bridge: "vlan-tunnel"
        container_type: "veth"
        container_interface: "eth4"
        ip_from_q: "tunnel_az1"
        address_prefix: "tunnel"
        type: "raw"
        group_binds:
          - neutron_ovn_controller
        reference_group: "az1_all"
    - network:
        container_bridge: "vlan-tunnel"
        container_type: "veth"
        container_interface: "eth4"
        ip_from_q: "tunnel_az2"
        address_prefix: "tunnel"
        type: "raw"
        group_binds:
          - neutron_ovn_controller
        reference_group: "az2_all"
    - network:
        container_bridge: "vlan-tunnel"
        container_type: "veth"
        container_interface: "eth4"
        ip_from_q: "tunnel_az3"
        address_prefix: "tunnel"
        type: "raw"
        group_binds:
          - neutron_ovn_controller
        reference_group: "az3_all"
    - network:
        group_binds:
          - haproxy
        type: "raw"
        container_bridge: "br-public-api"
        container_interface: "eth20"
        container_type: "veth"
        ip_from_q: public_api_vip
        static_routes:
          - cidr: 0.0.0.0/0
            gateway: 203.0.113.1

### conf.d configuration ###

# Control plane
az1_controller_hosts: &controller_az1
  infra1:
    ip: 172.29.236.11

az2_controller_hosts: &controller_az2
  infra2:
    ip: 172.29.236.12

az3_controller_hosts: &controller_az3
  infra3:
    ip: 172.29.236.13

# Computes

## AZ1
az1_shared_compute_hosts: &shared_computes_az1
  az1_compute1:
    ip: 172.29.237.11
  az1_compute2:
    ip: 172.29.237.12

az1_pinned_compute_hosts: &pinned_computes_az1
  az1_pin_compute1:
    ip: 172.29.237.13
  az1_pin_compute2:
    ip: 172.29.237.14

## AZ2

az2_shared_compute_hosts: &shared_computes_az2
  az2_compute1:
    ip: 172.29.238.11
  az2_compute2:
    ip: 172.29.238.12

## AZ3
az3_shared_compute_hosts: &shared_computes_az3
  az3_compute1:
    ip: 172.29.239.11
  az3_compute2:
    ip: 172.29.239.12

# Storage

## AZ1
az1_storage_hosts: &storage_az1
  az1_ceph1:
    ip: 172.29.237.201
  az1_ceph2:
    ip: 172.29.237.202
  az1_ceph3:
    ip: 172.29.237.203

## AZ2
az2_storage_hosts: &storage_az2
  az2_ceph1:
    ip: 172.29.238.201
  az2_ceph2:
    ip: 172.29.238.202
  az2_ceph3:
    ip: 172.29.238.203

## AZ3
az3_storage_hosts: &storage_az3
  az3_ceph1:
    ip: 172.29.239.201
  az3_ceph2:
    ip: 172.29.239.202
  az3_ceph3:
    ip: 172.29.239.203

# AZ association

az1_compute_hosts: &compute_hosts_az1
  <<: *shared_computes_az1
  <<: *pinned_computes_az1

az2_compute_hosts: &compute_hosts_az2
  <<: *shared_computes_az2

az3_compute_hosts: &compute_hosts_az3
  <<: *shared_computes_az3

az1_hosts:
  <<: *compute_hosts_az1
  <<: *controller_az1
  <<: *storage_az1

az2_hosts:
  <<: *compute_hosts_az2
  <<: *controller_az2
  <<: *storage_az2

az3_hosts:
  <<: *compute_hosts_az3
  <<: *controller_az3
  <<: *storage_az3

# Final mappings
shared_infra_hosts: &controllers
  <<: *controller_az1
  <<: *controller_az2
  <<: *controller_az3

repo-infra_hosts: *controllers
memcaching_hosts: *controllers
database_hosts: *controllers
mq_hosts: *controllers
operator_hosts: *controllers
identity_hosts: *controllers
image_hosts: *controllers
dashboard_hosts: *controllers
compute-infra_hosts: *controllers
placement-infra_hosts: *controllers
storage-infra_hosts: *controllers
network-infra_hosts: *controllers
network-northd_hosts: *controllers
coordination_hosts: *controllers

compute_hosts: &computes
  <<: *compute_hosts_az1
  <<: *compute_hosts_az2
  <<: *compute_hosts_az3

pinned_compute_hosts:
  <<: *pinned_computes_az1

shared_compute_hosts:
  <<: *shared_computes_az1
  <<: *shared_computes_az2
  <<: *shared_computes_az3

network-gateway_hosts: *computes

storage_hosts: &storage
  <<: *storage_az1
  <<: *storage_az2
  <<: *storage_az3

az1_ceph_osd_hosts:
  <<: *storage_az1

az2_ceph_osd_hosts:
  <<: *storage_az2

az3_ceph_osd_hosts:
  <<: *storage_az3

az1_ceph_mon_hosts:
  <<: *storage_az1

az2_ceph_mon_hosts:
  <<: *storage_az2

az3_ceph_mon_hosts:
  <<: *storage_az3

az1_ceph_rgw_hosts:
  <<: *storage_az1

az2_ceph_rgw_hosts:
  <<: *storage_az2

az3_ceph_rgw_hosts:
  <<: *storage_az3

User variables

In order to properly configure Availability Zones, we need to leverage group_vars and define Availability Zone name used for each AZ there. For this, create files:

  • /etc/openstack_deploy/group_vars/az1_all.yml

  • /etc/openstack_deploy/group_vars/az2_all.yml

  • /etc/openstack_deploy/group_vars/az3_all.yml

With content like below, where N should be AZ number depending on the file:

az_name: azN

As for this environment, the load balancer is created in the LXC containers on the infrastructure hosts, we need to ensure absence of the default route on eth0 interface. To prevent that from happening, we override lxc_container_networks in /etc/openstack_deploy/group_vars/haproxy/lxc_network.yml file:

---
lxc_container_networks:
  lxcbr0_address:
    bridge: "{{ lxc_net_bridge | default('lxcbr0') }}"
    bridge_type: "{{ lxc_net_bridge_type | default('linuxbridge') }}"
    interface: eth0
    type: veth
    dhcp_use_routes: False

Next, we want to secure HAProxy pointing always to the backend which is considered as “local” to the HAProxy. For that we switch balancing algorithm to first and order re-backends so that the one from current Availability Zone appears to be the first in the list. This can be done by creating file /etc/openstack_deploy/group_vars/haproxy/backend_overrides.yml with content:

---

haproxy_drain: True
haproxy_ssl_all_vips: True
haproxy_bind_external_lb_vip_interface: eth20
haproxy_bind_internal_lb_vip_interface: eth1
haproxy_bind_external_lb_vip_address: "*"
haproxy_bind_internal_lb_vip_address: "*"
haproxy_vip_binds:
  - address: "{{ haproxy_bind_external_lb_vip_address }}"
    interface: "{{ haproxy_bind_external_lb_vip_interface }}"
    type: external
  - address: "{{ haproxy_bind_internal_lb_vip_address }}"
    interface: "{{ haproxy_bind_internal_lb_vip_interface }}"
    type: internal

haproxy_cinder_api_service_overrides:
  haproxy_backend_nodes: "{{ groups['cinder_api'] | select('in', groups[az_name ~ '_containers']) | union(groups['cinder_api']) | unique | default([]) }}"
  haproxy_balance_alg: first
  haproxy_limit_hosts: "{{ groups['haproxy_all'] | intersect(groups[az_name ~ '_all']) }}"

haproxy_horizon_service_overrides:
  haproxy_backend_nodes: "{{ groups['horizon_all'] | select('in', groups[az_name ~ '_containers']) | union(groups['horizon_all']) | unique | default([]) }}"
  haproxy_balance_alg: first
  haproxy_limit_hosts: "{{ groups['haproxy_all'] | intersect(groups[az_name ~ '_all']) }}"

haproxy_keystone_service_overrides:
  haproxy_backend_nodes: "{{ groups['keystone_all'] | select('in', groups[az_name ~ '_containers']) | union(groups['keystone_all']) | unique | default([]) }}"
  haproxy_balance_alg: first
  haproxy_limit_hosts: "{{ groups['haproxy_all'] | intersect(groups[az_name ~ '_all']) }}"

haproxy_neutron_server_service_overrides:
  haproxy_backend_nodes: "{{ groups['neutron_server'] | select('in', groups[az_name ~ '_containers']) | union(groups['neutron_server']) | unique | default([]) }}"
  haproxy_balance_alg: first
  haproxy_limit_hosts: "{{ groups['haproxy_all'] | intersect(groups[az_name ~ '_all']) }}"

haproxy_nova_api_compute_service_overrides:
  haproxy_backend_nodes: "{{ groups['nova_api_os_compute'] | select('in', groups[az_name ~ '_containers']) | union(groups['nova_api_os_compute']) | unique | default([]) }}"
  haproxy_balance_alg: first
  haproxy_limit_hosts: "{{ groups['haproxy_all'] | intersect(groups[az_name ~ '_all']) }}"

haproxy_nova_api_metadata_service_overrides:
  haproxy_backend_nodes: "{{ groups['nova_api_metadata'] | select('in', groups[az_name ~ '_containers']) | union(groups['nova_api_metadata']) | unique | default([]) }}"
  haproxy_balance_alg: first
  haproxy_limit_hosts: "{{ groups['haproxy_all'] | intersect(groups[az_name ~ '_all']) }}"

haproxy_placement_service_overrides:
  haproxy_backend_nodes: "{{ groups['placement_all'] | select('in', groups[az_name ~ '_containers']) | union(groups['placement_all']) | unique | default([]) }}"
  haproxy_balance_alg: first
  haproxy_limit_hosts: "{{ groups['haproxy_all'] | intersect(groups[az_name ~ '_all']) }}"

haproxy_repo_service_overrides:
  haproxy_backend_nodes: "{{ groups['repo_all'] | select('in', groups[az_name ~ '_containers']) | union(groups['repo_all']) | unique | default([]) }}"
  haproxy_balance_alg: first
  haproxy_limit_hosts: "{{ groups['haproxy_all'] | intersect(groups[az_name ~ '_all']) }}"

We also need to define a couple of extra Keepalived instances in order to secure DNS RR approach, along with configuring Keepalived in unicast mode. For that create a file /etc/openstack_deploy/group_vars/haproxy/keepalived.yml with following content:

---

haproxy_keepalived_external_vip_cidr_az1: 203.0.113.5/32
haproxy_keepalived_external_vip_cidr_az2: 203.0.113.6/32
haproxy_keepalived_external_vip_cidr_az3: 203.0.113.7/32
haproxy_keepalived_internal_vip_cidr_az1: 172.29.236.21/32
haproxy_keepalived_internal_vip_cidr_az2: 172.29.236.22/32
haproxy_keepalived_internal_vip_cidr_az3: 172.29.236.23/32
haproxy_keepalived_external_interface: "{{ haproxy_bind_external_lb_vip_interface }}"
haproxy_keepalived_internal_interface: "{{ haproxy_bind_internal_lb_vip_interface }}"

keepalived_unicast_peers:
  internal: |-
    {% set peers = [] %}
    {% for addr in groups['haproxy'] | map('extract', hostvars, ['container_networks', 'management_address']) %}
    {%   set _ = peers.append((addr['address'] ~ '/' ~ addr['netmask']) | ansible.utils.ipaddr('host/prefix')) %}
    {% endfor %}
    {{ peers }}
  external: |-
    {% set peers = [] %}
    {% for addr in groups['haproxy'] | map('extract', hostvars, ['container_networks', 'public_api_vip_address']) %}
    {%   set _ = peers.append((addr['address'] ~ '/' ~ addr['netmask']) | ansible.utils.ipaddr('host/prefix')) %}
    {% endfor %}
    {{ peers }}

keepalived_internal_unicast_src_ip: >-
  {{ (management_address ~ '/' ~ container_networks['management_address']['netmask']) | ansible.utils.ipaddr('host/prefix') }}
keepalived_external_unicast_src_ip: >-
  {{ (container_networks['public_api_vip_address']['address'] ~ '/' ~ container_networks['public_api_vip_address']['netmask']) | ansible.utils.ipaddr('host/prefix') }}

keepalived_instances:
  az1-external:
    interface: "{{ haproxy_keepalived_external_interface | default(management_bridge) }}"
    state: "{{ (inventory_hostname in groups['az1_all']) | ternary('MASTER', 'BACKUP') }}"
    virtual_router_id: 40
    priority: "{{ (inventory_hostname in groups['az1_all']) | ternary(200, (groups['haproxy']|length-groups['haproxy'].index(inventory_hostname))*50) }}"
    vips:
      - "{{ haproxy_keepalived_external_vip_cidr_az1 | default('169.254.1.1/24')  }} dev {{ haproxy_keepalived_external_interface | default(management_bridge) }}"
    track_scripts: "{{ keepalived_scripts | dict2items | json_query('[*].{name: key, instance: value.instance}') | rejectattr('instance', 'equalto', 'internal') | map(attribute='name') | list }}"
    unicast_src_ip: "{{ keepalived_external_unicast_src_ip }}"
    unicast_peers: "{{ keepalived_unicast_peers['external'] | difference([keepalived_external_unicast_src_ip]) }}"
  az1-internal:
    interface: "{{ haproxy_keepalived_internal_interface | default(management_bridge) }}"
    state: "{{ (inventory_hostname in groups['az1_all']) | ternary('MASTER', 'BACKUP') }}"
    virtual_router_id: 41
    priority: "{{ (inventory_hostname in groups['az1_all']) | ternary(200, (groups['haproxy']|length-groups['haproxy'].index(inventory_hostname))*50) }}"
    vips:
      - "{{ haproxy_keepalived_internal_vip_cidr_az1 | default('169.254.2.1/24') }} dev {{ haproxy_keepalived_internal_interface | default(management_bridge) }}"
    track_scripts: "{{ keepalived_scripts | dict2items | json_query('[*].{name: key, instance: value.instance}') | rejectattr('instance', 'equalto', 'external') | map(attribute='name') | list }}"
    unicast_src_ip: "{{ keepalived_internal_unicast_src_ip }}"
    unicast_peers: "{{ keepalived_unicast_peers['internal'] | difference([keepalived_internal_unicast_src_ip]) }}"

  az2-external:
    interface: "{{ haproxy_keepalived_external_interface | default(management_bridge) }}"
    state: "{{ (inventory_hostname in groups['az2_all']) | ternary('MASTER', 'BACKUP') }}"
    virtual_router_id: 42
    priority: "{{ (inventory_hostname in groups['az2_all']) | ternary(200, (groups['haproxy']|length-groups['haproxy'].index(inventory_hostname))*50) }}"
    vips:
      - "{{ haproxy_keepalived_external_vip_cidr_az2 | default('169.254.1.1/24')  }} dev {{ haproxy_keepalived_external_interface | default(management_bridge) }}"
    track_scripts: "{{ keepalived_scripts | dict2items | json_query('[*].{name: key, instance: value.instance}') | rejectattr('instance', 'equalto', 'internal') | map(attribute='name') | list }}"
    unicast_src_ip: "{{ keepalived_external_unicast_src_ip }}"
    unicast_peers: "{{ keepalived_unicast_peers['external'] | difference([keepalived_external_unicast_src_ip]) }}"
  az2-internal:
    interface: "{{ haproxy_keepalived_internal_interface | default(management_bridge) }}"
    state: "{{ (inventory_hostname in groups['az2_all']) | ternary('MASTER', 'BACKUP') }}"
    virtual_router_id: 43
    priority: "{{ (inventory_hostname in groups['az2_all']) | ternary(200, (groups['haproxy']|length-groups['haproxy'].index(inventory_hostname))*50) }}"
    vips:
      - "{{ haproxy_keepalived_internal_vip_cidr_az2 | default('169.254.2.1/24') }} dev {{ haproxy_keepalived_internal_interface | default(management_bridge) }}"
    track_scripts: "{{ keepalived_scripts | dict2items | json_query('[*].{name: key, instance: value.instance}') | rejectattr('instance', 'equalto', 'external') | map(attribute='name') | list }}"
    unicast_src_ip: "{{ keepalived_internal_unicast_src_ip }}"
    unicast_peers: "{{ keepalived_unicast_peers['internal'] | difference([keepalived_internal_unicast_src_ip]) }}"

  az3-external:
    interface: "{{ haproxy_keepalived_external_interface | default(management_bridge) }}"
    state: "{{ (inventory_hostname in groups['az3_all']) | ternary('MASTER', 'BACKUP') }}"
    virtual_router_id: 44
    priority: "{{ (inventory_hostname in groups['az3_all']) | ternary(200, (groups['haproxy']|length-groups['haproxy'].index(inventory_hostname))*50) }}"
    vips:
      - "{{ haproxy_keepalived_external_vip_cidr_az3 | default('169.254.1.1/24')  }} dev {{ haproxy_keepalived_external_interface | default(management_bridge) }}"
    track_scripts: "{{ keepalived_scripts | dict2items | json_query('[*].{name: key, instance: value.instance}') | rejectattr('instance', 'equalto', 'internal') | map(attribute='name') | list }}"
    unicast_src_ip: "{{ keepalived_external_unicast_src_ip }}"
    unicast_peers: "{{ keepalived_unicast_peers['external'] | difference([keepalived_external_unicast_src_ip]) }}"
  az3-internal:
    interface: "{{ haproxy_keepalived_internal_interface | default(management_bridge) }}"
    state: "{{ (inventory_hostname in groups['az3_all']) | ternary('MASTER', 'BACKUP') }}"
    virtual_router_id: 45
    priority: "{{ (inventory_hostname in groups['az3_all']) | ternary(200, (groups['haproxy']|length-groups['haproxy'].index(inventory_hostname))*50) }}"
    vips:
      - "{{ haproxy_keepalived_internal_vip_cidr_az3 | default('169.254.2.1/24') }} dev {{ haproxy_keepalived_internal_interface | default(management_bridge) }}"
    track_scripts: "{{ keepalived_scripts | dict2items | json_query('[*].{name: key, instance: value.instance}') | rejectattr('instance', 'equalto', 'external') | map(attribute='name') | list }}"
    unicast_src_ip: "{{ keepalived_internal_unicast_src_ip }}"
    unicast_peers: "{{ keepalived_unicast_peers['internal'] | difference([keepalived_internal_unicast_src_ip]) }}"

In order to add support for multiple compute tiers (in with CPU overcommit and pinned CPUs) you need to create a file /etc/openstack_deploy/group_vars/pinned_compute_hosts with content:

nova_cpu_allocation_ratio: 1.0
nova_ram_allocation_ratio: 1.0

Rest of variables can be defined in /etc/openstack_deploy/user_variables.yml but a lot of them will be referencing az_name variable, so it’s presence (along with corresponding groups) are vital for this scenario.

---
# Set a different scheduling AZ name on each controller
# You can change that to a specific AZ name which will be used as default one
default_availability_zone: "{{ az_name }}"

# Defining unique internal VIP in hosts per AZ
_openstack_internal_az_vip: "{{ hostvars[groups['haproxy'][0]]['haproxy_keepalived_internal_vip_cidr_' ~ az_name] | ansible.utils.ipaddr('address') }}"
openstack_host_custom_hosts_records: "{{ _openstack_services_fqdns['internal'] | map('regex_replace', '^(.*)$', _openstack_internal_az_vip ~ ' \\1') }}"

# Use local to AZ memcached inside of AZ
memcached_servers: >-
  {{
    groups['memcached'] | intersect(groups[az_name ~ '_containers'])
      | map('extract', hostvars, 'management_address')
      | map('regex_replace', '(.+)', '\1:' ~ memcached_port)
      | list | join(',')
  }}

# Ceph-Ansible variables
ceph_cluster_name: "ceph-{{ az_name }}"
ceph_keyrings_dir: "/etc/openstack_deploy/ceph/{{ ceph_cluster_name }}"
ceph_conf_file: "{{ lookup('file', ceph_keyrings_dir ~ '/ceph.conf') }}"
cluster: "{{ ceph_cluster_name }}"
cluster_network: "{{ public_network }}"
monitor_address: "{{ container_networks['storage_address']['address'] }}"
mon_group_name: "ceph_mon_{{ az_name }}"
mgr_group_name: "{{ mon_group_name }}"
osd_group_name: "ceph_osd_{{ az_name }}"
public_network: "{{ cidr_networks['storage_' ~ az_name] }}"
rgw_group_name: "ceph_rgw_{{ az_name }}"
rgw_zone: "{{ az_name }}"

# Cinder variables
cinder_active_active_cluster_name: "{{ ceph_cluster_name }}"
cinder_default_availability_zone: "{{ default_availability_zone }}"
cinder_storage_availability_zone: "{{ az_name }}"

# Glance to use Swift as a backend
glance_default_store: swift
glance_use_uwsgi: False

# Neutron variables
neutron_availability_zone: "{{ az_name }}"
neutron_default_availability_zones:
  - az1
  - az2
  - az3
neutron_ovn_distributed_fip: True
neutron_plugin_type: ml2.ovn
neutron_plugin_base:
  - ovn-router
  - qos
  - auto_allocate
neutron_ml2_drivers_type: geneve,vlan
neutron_provider_networks:
  network_types: "{{ neutron_ml2_drivers_type }}"
  network_geneve_ranges: "1:65000"
  network_vlan_ranges: >-
    vlan:100:200
  network_mappings: "vlan:br-vlan"
  network_interface_mappings: "br-vlan:bond0"

# Nova variables
nova_cinder_rbd_inuse: True
nova_glance_rbd_inuse: false
nova_libvirt_images_rbd_pool: ""
nova_libvirt_disk_cachemodes: network=writeback,file=directsync
nova_libvirt_hw_disk_discard: unmap
nova_nova_conf_overrides:
  DEFAULT:
    default_availability_zone: "{{ default_availability_zone }}"
    default_schedule_zone: "{{ default_availability_zone }}"
  cinder:
    cross_az_attach: false

# Create required aggregates and flavors
cpu_pinned_flavors:
  specs:
    - name: pinned.small
      vcpus: 2
      ram: 2048
    - name: pinned.medium
      vcpus: 4
      ram: 8192
  extra_specs:
    hw:cpu_policy: dedicated
    hw:vif_multiqueue_enabled: 'true'
    trait:CUSTOM_PINNED_CPU: required

cpu_shared_flavors:
  specs:
    - name: shared.small
      vcpus: 1
      ram: 1024
    - name: shared.medium
      vcpus: 2
      ram: 4096

openstack_user_compute:
  flavors:
    - "{{ cpu_shared_flavors }}"
    - "{{ cpu_pinned_flavors }}"
  aggregates:
    - name: az1-shared
      hosts: "{{ groups['az1_shared_compute_hosts'] }}"
      availability_zone: az1
    - name: az1-pinned
      hosts: "{{ groups['az1_pinned_compute_hosts'] }}"
      availability_zone: az1
      metadata:
        trait:CUSTOM_PINNED_CPU: required
        pinned-cpu: 'true'
    - name: az2-shared
      hosts: "{{ groups['az2_shared_compute_hosts'] }}"
      availability_zone: az2
    - name: az3-shared
      hosts: "{{ groups['az3_shared_compute_hosts'] }}"
      availability_zone: az3