This chapter is intended to help troubleshoot and resolve operational issues in an OpenStack-Ansible deployment.
This section focuses on troubleshooting general host-to-host communication required for the OpenStack control plane to function properly.
This does not cover any networking related to instance connectivity.
These instructions assume an OpenStack-Ansible installation using LXC containers, VXLAN overlay, and the Linuxbridge ml2 driver.
HOST_NET
(Physical Host Management and Access to Internet)CONTAINER_NET
(LXC container network used Openstack Services)OVERLAY_NET
(VXLAN overlay network)Useful network utilities and commands:
# ip link show [dev INTERFACE_NAME]
# arp -n [-i INTERFACE_NAME]
# ip [-4 | -6] address show [dev INTERFACE_NAME]
# ping <TARGET_IP_ADDRESS>
# tcpdump [-n -nn] < -i INTERFACE_NAME > [host SOURCE_IP_ADDRESS]
# brctl show [BRIDGE_ID]
# iptables -nL
# arping [-c NUMBER] [-d] <TARGET_IP_ADDRESS>
Perform the following checks:
IP addresses should be applied to physical interface, bond interface, tagged sub-interface, or in some cases the bridge interface:
# ip address show dev bond0
14: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500..UP...
link/ether a0:a0:a0:a0:a0:01 brd ff:ff:ff:ff:ff:ff
inet 10.240.0.44/22 brd 10.240.3.255 scope global bond0
valid_lft forever preferred_lft forever
...
Perform the following checks:
br-mgmt
IP address should be applied to br-mgmt
:
# ip address show dev br-mgmt
18: br-mgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500...UP...
link/ether a0:a0:a0:a0:a0:01 brd ff:ff:ff:ff:ff:ff
inet 172.29.236.44/22 brd 172.29.239.255 scope global br-mgmt
valid_lft forever preferred_lft forever
...
IP address should be applied to eth1
inside the LXC container:
# ip address show dev eth1
59: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500...UP...
link/ether b1:b1:b1:b1:b1:01 brd ff:ff:ff:ff:ff:ff
inet 172.29.236.55/22 brd 172.29.239.255 scope global eth1
valid_lft forever preferred_lft forever
...
br-mgmt
should contain veth-pair ends from all containers and a
physical interface or tagged-subinterface:
# brctl show br-mgmt
bridge name bridge id STP enabled interfaces
br-mgmt 8000.abcdef12345 no 11111111_eth1
22222222_eth1
...
bond0.100
99999999_eth1
...
Perform the following checks:
br-vxlan
IP address should be applied to br-vxlan
:
# ip address show dev br-vxlan
21: br-vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500...UP...
link/ether a0:a0:a0:a0:a0:02 brd ff:ff:ff:ff:ff:ff
inet 172.29.240.44/22 brd 172.29.243.255 scope global br-vxlan
valid_lft forever preferred_lft forever
...
IP address should be applied to eth10 inside the required LXC containers:
# ip address show dev eth10
67: eth10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 150...UP...
link/ether b1:b1:b1:b1:b1:02 brd ff:ff:ff:ff:ff:ff
inet 172.29.240.55/22 brd 172.29.243.255 scope global eth10
valid_lft forever preferred_lft forever
...
br-vxlan
should contain veth-pair ends from required LXC containers and
a physical interface or tagged-subinterface:
# brctl show br-vxlan
bridge name bridge id STP enabled interfaces
br-vxlan 8000.ghijkl123456 no bond1.100
3333333_eth10
You can check the status of an OpenStack service by accessing every controller node and running the service <SERVICE_NAME> status.
See the following links for additional information to verify OpenStack services:
Restart your OpenStack services by accessing every controller node. Some OpenStack services will require restart from other nodes in your environment.
The following table lists the commands to restart an OpenStack service.
OpenStack service | Commands |
---|---|
Image service | # service glance-registry restart
# service glance-api restart
|
Compute service (controller node) | # service nova-api-os-compute restart
# service nova-consoleauth restart
# service nova-scheduler restart
# service nova-conductor restart
# service nova-api-metadata restart
# service nova-novncproxy restart (if using novnc)
# service nova-spicehtml5proxy restart (if using spice)
|
Compute service (compute node) | # service nova-compute restart
|
Networking service | # service neutron-server restart
# service neutron-dhcp-agent restart
# service neutron-l3-agent restart
# service neutron-metadata-agent restart
# service neutron-linuxbridge-agent restart
|
Networking service (compute node) | # service neutron-linuxbridge-agent restart
|
Block Storage service | # service cinder-api restart
# service cinder-backup restart
# service cinder-scheduler restart
# service cinder-volume restart
|
Object Storage service | # service swift-account-auditor restart
# service swift-account-server restart
# service swift-account-reaper restart
# service swift-account-replicator restart
# service swift-container-auditor restart
# service swift-container-server restart
# service swift-container-reconciler restart
# service swift-container-replicator restart
# service swift-container-sync restart
# service swift-container-updater restart
# service swift-object-auditor restart
# service swift-object-expirer restart
# service swift-object-server restart
# service swift-object-reconstructor restart
# service swift-object-replicator restart
# service swift-object-updater restart
# service swift-proxy-server restart
|
This section will focus on troubleshooting general instance (VM) connectivity communication. This does not cover any networking related to instance connectivity. This is assuming a OpenStack-Ansible install using LXC containers, VXLAN overlay and the Linuxbridge ml2 driver.
Data flow example
COMPUTE NODE
+-------------+ +-------------+
+->"If VXLAN"+->+ *br vxlan +--->+ bond#.#00 +---+
| +-------------+ +-------------+ |
+-------------+ | +-----------------+
Instance +---> | brq bridge |++ +-->| physical network|
+-------------+ | +-----------------+
| +-------------+ +-------------+ |
+->"If VLAN"+->+ br vlan +--->+ bond1 +---+
+-------------+ +-------------+
NETWORK NODE
+-------------+ +-------------+ +-----------------+
+->"If VXLAN"+->+ *bond#.#00 +--->+ *br vxlan +-->+*Container eth10 +
| +-------------+ +-------------+ +-----------------+
+----------------+ | +-------------+
|physical network|++ +--->+ | brq bridge |+--> Neutron DHCP/Router
+----------------+ | +-------------+
| +-------------+ +-------------+ +-----------------+
+->"If VLAN"+->+ bond1 +--->+ br vlan +-->+ Container eth11 +
+-------------+ +-------------+ +-----------------+
If VLAN:
Does physical interface show link and all VLANs properly trunked across physical network?
Important
Do not continue until physical network is properly configured.
Does the instance’s IP address ping from network’s DHCP namespace or other instances in the same network?
security-group-rules
,
consider adding allow ICMP rule for testing.Important
Do not continue until instance has an IP address and can reach local network resources like DHCP.
Does the instance’s IP address ping from the gateway device (Neutron router namespace or another gateway device)?
security-group-rules
,
consider adding allow ICMP rule for testing.security-group-rules
,
consider adding ICMP rule for testing.Important
Do not continue until the instance can reach its gateway.
If VXLAN:
Does physical interface show link and all VLANs properly trunked across physical network?
Important
Do not continue until physical network is properly configured.
Are VXLAN VTEP addresses able to ping each other?
br-vxlan
interface on Compute and eth10
inside the Neutron network agent container.Important
Do not continue until VXLAN endpoints have reachability to each other.
Does the instance’s IP address ping from network’s DHCP namespace or other instances in the same network?
security-group-rules
,
consider adding allow ICMP rule for testing.Important
Do not continue until instance has an IP address and can reach local network resources.
Does the instance’s IP address ping from the gateway device (Neutron router namespace or another gateway device)?
security-group-rules
,
consider adding allow ICMP rule for testing.security-group-rules
,
consider adding ICMP rule for testing.access-control-lists
.The glance-registry
handles the database operations for managing the
storage of the image index and properties. The glance-api
handles the
API interactions and image store.
To troubleshoot problems or errors with the Image service, refer to
/var/log/glance-api.log
and /var/log/glance-registry.log
inside
the glance api container.
You can also conduct the following activities which may generate logs to help identity problems:
openstack image list
command to ensure that the API and
registry is working.For an example and more information, see Verify operation <https://docs.openstack.org/newton/install-guide-ubuntu/glance-verify.html>_. and Manage Images <https://docs.openstack.org/user-guide/common/cli-manage-images.html>_
Ubuntu kernel packages newer than version 3.13 contain a change in
module naming from nf_conntrack
to br_netfilter
. After
upgrading the kernel, run the openstack-hosts-setup.yml
playbook against those hosts. For more information, see
OSA bug 157996.
At the beginning of a playbook run, information about each host is gathered, such as:
To improve performance, particularly in large deployments, you can cache host facts and information.
OpenStack-Ansible enables fact caching by default. The facts are
cached in JSON files within /etc/openstack_deploy/ansible_facts
.
Fact caching can be disabled by running
export ANSIBLE_CACHE_PLUGIN=memory
.
To set this permanently, set this variable in
/usr/local/bin/openstack-ansible.rc
.
Refer to the Ansible documentation on fact caching for more details.
Cached facts may be incorrect if the host receives a kernel upgrade or new network interfaces. Newly created bridges also disrupt cache facts.
This can lead to unexpected errors while running playbooks, and require cached facts to be regenerated.
Run the following command to remove all currently cached facts for all hosts:
# rm /etc/openstack_deploy/ansible_facts/*
New facts will be gathered and cached during the next playbook run.
To clear facts for a single host, find its file within
/etc/openstack_deploy/ansible_facts/
and remove it. Each host has
a JSON file that is named after its hostname. The facts for that host
will be regenerated on the next playbook run.
All LXC containers on the host have at least two virtual Ethernet interfaces:
Note
Some containers, such as cinder
, glance
, neutron_agents
, and
swift_proxy
have more than two interfaces to support their
functions.
On the host, all virtual Ethernet devices are named based on their container as well as the name of the interface inside the container:
${CONTAINER_UNIQUE_ID}_${NETWORK_DEVICE_NAME}
As an example, an all-in-one (AIO) build might provide a utility container called aio1_utility_container-d13b7132. That container will have two network interfaces: d13b7132_eth0 and d13b7132_eth1.
Another option would be to use the LXC tools to retrieve information about the utility container. For example:
# lxc-info -n aio1_utility_container-d13b7132
Name: aio1_utility_container-d13b7132
State: RUNNING
PID: 8245
IP: 10.0.3.201
IP: 172.29.237.204
CPU use: 79.18 seconds
BlkIO use: 678.26 MiB
Memory use: 613.33 MiB
KMem use: 0 bytes
Link: d13b7132_eth0
TX bytes: 743.48 KiB
RX bytes: 88.78 MiB
Total bytes: 89.51 MiB
Link: d13b7132_eth1
TX bytes: 412.42 KiB
RX bytes: 17.32 MiB
Total bytes: 17.73 MiB
The Link:
lines will show the network interfaces that are attached
to the utility container.
To dump traffic on the br-mgmt
bridge, use tcpdump
to see all
communications between the various containers. To narrow the focus,
run tcpdump
only on the desired network interface of the
containers.
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.