Database¶
Introduction¶
The general recommendation is to run a galera cluster (using mysql or mariadb) for your database. The active-active features of it allow for a fast and easy failover.
Note that OpenStack does not play well with writing in parallel to multiple galera nodes, see the config recommendation below.
One database to rule them all?¶
You can consider deploying the database in two ways:
one galera cluster for each OpenStack service
one single big galera cluster for all OpenStack services
The recommendation is to split your galera in separate clusters for multiple reasons:
Reduce the impact when a galera cluster is down
Allow intervention on smaller part of infrastructure
Also there is no benefit of colocating multiple services on the same galera cluster.
Config recommendation¶
This section is split into three parts:
the configuration for galera itself
the configuration for the reverse proxy in front of galera
the configuration for the OpenStack services
Galera configuration¶
All of these settings need to be consistent on all nodes of the galera cluster.
Note that this guide does not include the general requirements to get the galera cluster set up in the first place. For this please see https://mariadb.com/kb/en/getting-started-with-mariadb-galera-cluster/
General health configs¶
[mysqld]
max_connections=4000
max_statement_time=20
In order to ensure your cluster runs smoothly we recommend you limit the amount of connections and the time statements can be executed.
The value for max_connections
should be set based on actual tests (testing with a lot of idle connections is fine).
The value of 20
seconds for max_statement_time
is enough for all normal use-cases we know of.
You might only run into issues with regular Nova cleanup jobs if they do not run often enough.
Replication stability¶
[galera]
wsrep_provider_options=gcomm.thread_prio=rr:2;gcs.fc_limit=160;gcs.fc_factor=0.8;gcache.size=2G
When you have a large amount of connections to your galera cluster, these connections might starve the galera replication thread. If the replication thread does not get enough CPU time, the galera cluster will lose its members and break.
This setting sets the replication thread to realtime scheduling on the kernel side.
If you run galera as a non-privileged user (as you hopefully do), galera will need CAP_SYS_NICE
in order to be allowed to change the priority.
If you run inside a container environment, you might need to set kernel.sched_rt_runtime_us=-1
(although that is suboptimal).
Performance¶
[mysqld]
tmp_table_size=64M
max_heap_table_size=64M
optimizer_switch=derived_merge=off
Temporary tables and the derived_merge
optimizer are an important setting if you have a large amount of Neutron RBAC rules.
Reverse Proxy Configuration¶
You will need to run a reverse proxy in front of your galera cluster to ensure OpenStack only ever communicates with a single cluster node for write
requests.
This is required because OpenStack does not handle well the deadlocks when writing to different nodes in parallel.
If you choose to run haproxy for this, you can use something like the following config:
defaults
timeout client 300s
listen db_master
bind 0.0.0.0:3306
balance first
option mysql-check
server server-1 server-1.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 1
server server-2 server-2.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup id 2
server server-3 server-3.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup id 3
listen db_slave
bind 0.0.0.0:3308
balance roundrobin
option mysql-check
server server-1 server-1.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 3 backup
server server-2 server-2.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 1 weight 10
server server-3 server-3.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 2 weight 10
By using two blocks, we can separate the read
only SQL requests (listening on port 3308 here) from read/write
requests (listening on port 3306 here) and lower down a little bit the load on the first (master) mysql backend.
You should note the timeout client
setting here, as it is relevant to the OpenStack configuration.
OpenStack Configuration¶
Database Connection Settings¶
The database configuration is normally in the [database]
section of the configuration.
You should set the following:
connection = mysql+pymysql://login:pass@proxy:3306/db?charset=utf8
slave_connection = mysql+pymysql://login:pass@proxy:3308/db?charset=utf8
connection_recycle_time = 280
max_pool_size = 15
max_overflow = 25
The connection
is used by OpenStack services to do read
and write
requests.
The slave_connection
is used by OpenStack services to do read
only requests.
The connection_recycle_time
should be a bit smaller than the timeout client
in the reverse proxy (5% to 10%).
This ensures connections are recreated on the OpenStack side first before the reverse proxy is forcing the connection to terminate.
The max_pool_size
and max_overflow
define the amount of connections an individual thread is allowed to have.
You will need to set this based on experience (although the above should be a good start).
Database cleanup¶
Nova and Cinder use soft deletes inside their database.
This means deleted entries are still persistent in the database and just get a deleted
flag set.
In order to prevent the database tables from growing forever these deleted entries will need to be regularly removed. For this you can use:
nova-manage db archive_deleted_rows
and nova-manage db purge for Nova (https://docs.openstack.org/nova/latest/cli/nova-manage.html)cinder-manage db purge
for Cinder (https://docs.openstack.org/cinder/latest/cli/cinder-manage.html)
If you did never run these cleanups previously (or if your environment has a high amount of resources being deleted) you might run into
a timeout due to the max_statement_time
on the database cluster.
To work around this the nova-manage
commands support a --max-rows
argument.
For Cinder you might need to run the SQL statements manually and add a limit 1000
to them (statements are part of the error of the command).