5.8. Measuring performance of Cinder with Ceph backend¶
status: | ready |
---|---|
version: | 1.0 |
Abstract: | This document describes a test plan for quantifying the performance of block storage devices provided by OpenStack Cinder with Ceph used as back-end. The plan includes the collection of several resource utilization metrics, which will be used to analyze and understand the overall performance of used storage technologies. In particular, resource bottlenecks will either be fixed, or best practices developed for system and hardware requirements. |
Conventions: |
|
5.8.1. Test Plan¶
This test plan aims to identify Cinder + Ceph storage performance and its dependency from amount of concurrent attached consumers.
5.8.1.1. Test Environment¶
5.8.1.1.1. Preparation¶
To be able to run test we need:
- Ceph cluster installed and configured
- K8s cluster installed and configured with Calico
- OpenStack cloud with Heat and Cinder installed on top of K8s cluster
- Created and uploaded into Glance image with random data which will be used
for prefilling of Cinder volumes
software | version | source |
---|---|---|
Ceph | jewell | Debian jessie ceph package repository |
Kargo | master | From sources |
Kubernetes | 1.4.3 | quay.io/coreos/hyperkube:v1.4.3_coreos.0 |
Calico | 0.22.0 | docker hub |
calicoctl | 1.0.0-beta | docker hub |
OpenStack | newton | From sources |
5.8.1.1.2. Environment description¶
Test results MUST include a description of the environment used. The following items should be included:
- Hardware configuration of each server. If virtual machines are used then both physical and virtual hardware should be fully documented. An example format is given below:
Ceph cluster member:
server | name | ||
role | |||
vendor,model | |||
operating_system | |||
CPU | vendor,model | ||
processor_count | |||
core_count | |||
frequency_MHz | |||
RAM | vendor,model | ||
amount_MB | |||
NETWORK | interface_name | ||
vendor,model | |||
bandwidth | |||
STORAGE | dev_name | ||
vendor,model | |||
SSD/HDD | |||
size |
- Configuration of physical network. The description of phisical and logical connectivities.
- Configuration of virtual machines and virtual networks (if used). The configuration files can be attached, along with the mapping of virtual machines to host machines.
- Ceph cluster configuration Deployment scheme and configuration of ceph components.
- ceph nodes configuration and roles
- amount of ceph monitor nodes
- amount of ceph OSD and placement groups
- Kubernetes + Calico configuration Deployment scheme and configuration of servers used within testing environment.
- k8s nodes configuration and roles
- k8s networking configuration (Calico)
- OpenStack deployment configuration used by fuel-ccp. OpenStack services configuration and topology used by fuel-ccp.
- OpenStack services and roles topology
- OpenStack cinder + ceph configuration
5.8.1.2. Test Cases¶
- Case group 1 - average time of creation, attachment and deletion of Cinder volumes
- Case group 2 - amount of concurrent read, write and simultaneous read, write IOPS depending on amount of VMs
5.8.1.2.1. Description¶
This specific test plan contains test cases, that needs to be run on the environments differing list of parameters below. Here we have 2 kinds of metrics to be measured.
- OpenStack control plane side tests of Cinder with Ceph back-end like execution time for basic functionality of cinder.
- Load tests of VM storage subsystem provided by Cinder with Ceph back-end. This tests will show dependency of IOPS from amount of consumers and disk operations types.
5.8.1.2.2. Parameters¶
Parameters depend on ceph and OpenStack configurations.
Case group 1:
Parameter name | Value |
---|---|
vms + volumes | 30, 60, 90 |
concurrency | 10, 20, 30 |
operation | create, attach, delete |
Case group 2:
Parameter name | Value |
---|---|
VMs per rw mode | 1, 2, 5, 10, 20 |
read/write mode | randread, randwrite |
block size | 4k - constant |
io depth (queue) | 64 |
test duration in seconds | 600 |
filesize | 40G |
5.8.1.2.3. List of performance metrics¶
The tables below show the list of test metrics to be collected. The priority is the relative ranking of the importance of each metric in evaluating the performance of the system.
Case group 1:
Value | Measurement Units | Description |
---|---|---|
Time | seconds | time spent on requested operation
|
Case group 2:
Value | Measurement Units | Description |
---|---|---|
READ_IO | operations/second | amount of input/output operations per
second during random read from storage
subsystem
|
WRITE_IO | operations/second | amount of input/output operations per
second during random read from storage
subsystem
|
READ_LATENCY | milliseconds | time that passes between submission to
the kernel and when the IO is complete
|
WRITE_LATENCY | milliseconds | time that passes between submission to
the kernel and when the IO is complete
|
VMs_COUNT | number | amount of simultaneously launched VMs
with attached cinder volumes producing
storage loads
|
5.8.1.2.4. Measuring performance values¶
Case group 1:
“Control plane” test will be executed using OpenStack Rally scenarios.
Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
Where:
- operation will be one of create, attach or delete
- volume size is also matters and all operations mentioned above will be repeated for each groups of volumes
Case group 2:
Storage performance testing will be based on test scripts which will be delivered and launched inside VMs using OpenStack Heat templates.
Heat templates could be laucnhed with different set of paramteters. These parameters are serving for 2 porpuses:
- 1.Parameters for OpenStack environment:
- key_name - SSH key name that will be injected into instances
- flavor - flavor to be used for instances
- image - image to be used for instances
- network_name - internal network to be used for instances
- volume_size - volume size to be created and attached to instance
- vm_count - amount of VMs with volumes to be spawned
- 2.Parameters for test script:
- test_mode - condition of test (time or disk)
- test_rw - read or write mode (randread, randwrite, randrw)
- test_runtime - amount of time in seconds (default 600)
- test_filesize - amount of data size (default 4G)
- test_iodepth - IO queue size generated by test (default 64)
nodes
count
|
test duration
time in sec
|
average
IOPS
READ
|
average
IOPS
WRITE
|
average
latency
READ
|
average
latency
WRITE
|
---|---|---|---|---|---|
2 | |||||
4 | |||||
10 | |||||
20 | |||||
40 |
nodes
count
|
test duration
time in sec
|
SUM
IOPS
READ
|
SUM
IOPS
WRITE
|
---|---|---|---|
2 | |||
4 | |||
10 | |||
20 | |||
40 |
5.8.2. Applications¶
5.8.2.1. Rally jobs templates:¶
---
CinderVolumes.create_and_attach_volume:
-
args:
size: 10
image:
name: "cirros_vm"
flavor:
name: "m1.tiny"
runner:
type: "constant"
times: 30
concurrency: 10
context:
users:
tenants: 3
users_per_tenant: 3
quotas:
cinder:
volumes: -1
gigabytes: -1
snapshots: -1
api_versions:
cinder:
version: 2
service_type: cinderv2
sla:
failure_rate:
max: 10
---
CinderVolumes.create_and_attach_volume:
-
args:
size: 10
image:
name: "cirros_vm"
flavor:
name: "m1.tiny"
runner:
type: "constant"
times: 60
concurrency: 20
context:
users:
tenants: 3
users_per_tenant: 3
quotas:
cinder:
volumes: -1
gigabytes: -1
snapshots: -1
api_versions:
cinder:
version: 2
service_type: volumev2
sla:
failure_rate:
max: 10
---
CinderVolumes.create_and_attach_volume:
-
args:
size: 10
image:
name: "cirros_vm"
flavor:
name: "m1.tiny"
runner:
type: "constant"
times: 120
concurrency: 40
context:
users:
tenants: 3
users_per_tenant: 3
quotas:
cinder:
volumes: -1
gigabytes: -1
snapshots: -1
api_versions:
cinder:
version: 2
service_type: volumev2
sla:
failure_rate:
max: 10
5.8.2.2. Heat templates¶
heat_template_version: newton
parameters:
image:
type: string
flavor:
type: string
key_name:
type: string
vm_count:
type: string
volume_size:
type: string
network_name:
type: string
test_iodepth:
type: number
default: 64
test_runtime:
type: number
default: 600
test_filesize:
type: string
default: 4G
test_mode:
type: string
default: size
test_rw:
type: string
default: randrw
resources:
server_resources:
type: OS::Heat::ResourceGroup
properties:
count: { get_param: vm_count }
resource_def:
type: vm-with-vol.yaml
properties:
image: { get_param: image }
flavor: { get_param: flavor }
key_name: { get_param: key_name }
network_name: { get_param: network_name }
volume_size: { get_param: volume_size }
test_iodepth: { get_param: test_iodepth }
test_filesize: { get_param: test_filesize }
test_runtime: { get_param: test_runtime }
test_mode: { get_param: test_mode }
test_rw: { get_param: test_rw }
index:
list_join: ['-', [ {get_param: test_mode},'vm', {get_param: test_rw}, '%index%' ]]
outputs:
script_result:
value: { get_attr: [server_resources, result] }
heat_template_version: newton
parameters:
image:
type: string
flavor:
type: string
key_name:
type: string
volume_size:
type: string
network_name:
type: string
volume_image:
type: string
default: 40g-urandom
test_iodepth:
type: number
test_filesize:
type: string
index:
type: string
test_mode:
type: string
default: size
test_rw:
type: string
default: randrw
test_runtime:
type: number
default: 300
resources:
server:
type: OS::Nova::Server
properties:
name: { get_param: index }
image: { get_param: image }
flavor: { get_param: flavor }
key_name: { get_param: key_name }
networks:
- network: { get_param: network_name }
user_data_format: RAW
user_data:
str_replace:
template: |
#!/bin/bash
export IODEPTH=iodepth
export SIZE=filesize
export RWMODE=rwmode
export RUNMOD=runmode
export RUNTIME=runtime
scriptfile
params:
scriptfile: { get_file: vmScript.sh }
iodepth: { get_param: test_iodepth }
filesize: { get_param: test_filesize }
runmode: { get_param: test_mode }
rwmode: { get_param: test_rw }
runtime: { get_param: test_runtime }
volume:
type: OS::Cinder::Volume
properties:
size: { get_param: volume_size }
image: { get_param: volume_image }
attachment:
type: OS::Cinder::VolumeAttachment
properties:
instance_uuid: { get_resource: server }
volume_id: { get_resource: volume }
outputs:
result:
value: stub-attribute
5.8.2.3. Test script for heat¶
#!/bin/bash
# Script for IO testing
WORKDIR="$(cd "$(dirname ${0})" && pwd)"
WORKSPACE="${WORKDIR}/workspace"
USER_NAME="${USER_NAME:-root}"
USER_PASS="${USER_PASS:-r00tme}"
REMOTE_HOST="${REMOTE_HOST:-172.20.9.15}"
STARTTIME=""
STOPTIME=""
function prepare()
{
local ec=0
mkdir -p ${WORKSPACE}
export DEBIAN_FRONTEND=noninteractive
apt update > /dev/null 2>&1 || ec=$?
apt install -y fio sshpass bc > /dev/null 2>&1 || ec=$?
return ${ec}
}
function check_vol()
{
local volpath
local retval
local maxretry
local counter
retval=1
counter=0
maxretry=60
volpath=${TARGET}
while true
do
if [ -e ${volpath} ]; then
retval=0
break
else
continue
fi
counter=$(( counter + 1 ))
sleep 2
if [ "${counter}" -ge "${maxretry}" ]; then
break
fi
done
return ${retval}
}
function u2m_sec()
{
local input
local output
input=${1}
output=$(echo "scale=4;${input}/1000" | bc)
if echo ${output} | grep -q '^\..'; then
output="0${output}"
fi
echo "${output}"
}
parse_terse()
{
# msec = 1000 usec, 1s = 1000 msec
local input=$*
local jobname #3
local read_iops #8
local read_bw #7 #KB/s
local read_clat_min #14 #usec
local read_clat_max #15 #usec
local read_clat_mean #16 #usec
local read_clat_95 #29 #usec
local read_clat_99 #30 #usec
local read_total_lat_avg #40 #usec
local read_bw_avg #45 #KB/s
local write_iops #49
local write_bw #48 #KB/s
local write_clat_min #55 #usec
local write_clat_max #56 #usec
local write_clat_mean #57 #usec
local read_clat_95 #70 #usec
local read_clat_99 #71 #usec
local write_total_lat_avg #81 #usec
local write_bw_avg #86 #KB/s
jobname="$(echo "${input}" | cut -d';' -f3)"
read_iops="$(echo "${input}" | cut -d';' -f8)"
read_bw="$(echo "${input}" | cut -d';' -f7)"
read_clat_min="$(u2m_sec "$(echo "${input}" | cut -d';' -f14)")"
read_clat_max="$(u2m_sec "$(echo "${input}" | cut -d';' -f15)")"
read_clat_mean="$(u2m_sec "$(echo "${input}" | cut -d';' -f16)")"
read_clat_95="$(u2m_sec "$(echo "${input}" | cut -d';' -f29 | cut -d'=' -f2)")"
read_clat_99="$(u2m_sec "$(echo "${input}" | cut -d';' -f30 | cut -d'=' -f2)")"
read_total_lat_avg="$(u2m_sec "$(echo "${input}" | cut -d';' -f40)")"
read_bw_avg="$(echo "${input}" | cut -d';' -f45)"
write_iops="$(echo "${input}" | cut -d';' -f49)"
write_bw="$(echo "${input}" | cut -d';' -f48)"
write_clat_min="$(u2m_sec "$(echo "${input}" | cut -d';' -f55)")"
write_clat_max="$(u2m_sec "$(echo "${input}" | cut -d';' -f56)")"
write_clat_mean="$(u2m_sec "$(echo "${input}" | cut -d';' -f57)")"
write_clat_95="$(u2m_sec "$(echo "${input}" | cut -d';' -f70 | cut -d'=' -f2)")"
write_clat_99="$(u2m_sec "$(echo "${input}" | cut -d';' -f71 | cut -d'=' -f2)")"
write_total_lat_avg="$(u2m_sec "$(echo "${input}" | cut -d';' -f81)")"
write_bw_avg="$(echo "${input}" | cut -d';' -f86)"
echo "${STARTTIME},${STOPTIME},${jobname},${read_iops},${read_bw},${read_clat_mean},${read_clat_min},${read_clat_max},${read_clat_95},${read_clat_99},${read_total_lat_avg},${read_bw_avg},${write_iops},${write_bw},${write_clat_mean},${write_clat_min},${write_clat_max},${write_clat_95},${write_clat_99},${write_total_lat_avg},${write_bw_avg}"
}
function run_fio()
{
local iodepth
local bs
local ioengine
local direct
local buffered
local jobname
local filename
local size
local readwrite
local runtime
bs="4k"
direct=1
buffered=0
ioengine="libaio"
jobname="$(hostname)_fio"
iodepth="${IODEPTH}"
filename="${TARGET}"
size="--size=${SIZE}"
readwrite="${RWMODE}"
STARTTIME=$(date +%Y.%m.%d-%H:%M:%S)
if [[ "${RUNMOD}" == "time" ]]; then runtime="--runtime=${RUNTIME} --time_based=1"; size='';fi
fio --ioengine=${ioengine} --direct=${direct} --buffered=${buffered} \
--name=${jobname} --filename=${filename} --bs=${bs} --iodepth=${iodepth} ${size} \
--readwrite=${readwrite} ${runtime} --output-format=terse --terse-version=3 --output=${WORKSPACE}/"$(hostname)"_terse.out 2>&1 | tee ${WORKSPACE}/"$(hostname)"_raw_fio_terse.log
STOPTIME="$(date +%Y.%m.%d-%H:%M:%S)"
if [ "$(stat ${WORKSPACE}/"$(hostname)"_raw_fio_terse.log | grep -oP '(?<=(Size:))(.[0-9]+\s)')" -eq 0 ]; then
rm ${WORKSPACE}/"$(hostname)"_raw_fio_terse.log
fi
}
function put_results()
{
local remotehost
local remotepath
remotehost="${1}"
remotepath="/${USER_NAME}/results"
if [ -f ${WORKSPACE}/"$(hostname)"_results.csv ]; then
sshpass -p ${USER_PASS} ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ${USER_NAME}@${remotehost} "mkdir -p ${remotepath}"
sshpass -p ${USER_PASS} scp -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -r ${WORKSPACE}/*.* ${USER_NAME}@${remotehost}:${remotepath}/
else
exit 1
fi
}
# Main
IODEPTH="${IODEPTH:-64}"
TARGET="${TARGET:-/dev/vdc}"
SIZE="${SIZE:-4G}"
RUNTIME="${RUNTIME:-600}" # 10min
RWMODE="${RWMODE:-randrw}"
RUNMOD="${RUNMOD}"
PARSEONLY="${PARSEONLY:-false}"
# Output format:
# starttime, endtime, Jobname, read IOPS, read bandwith KB/s, mean read complete latency msec, avg read latency msec, avg read bandwith KB/s, write IOPS, write bandwith KB/s, mean write complete latency msec, avg write latency msec, avg write bandwith KB/s
if [[ "${PARSEONLY}" == "true" ]]; then
for tline in $(cat "${1}")
do
parse_terse "${tline}"
done
exit 0
fi
prepare || exit $?
check_vol || exit $?
run_fio
parse_terse "$(cat ${WORKSPACE}/"$(hostname)"_terse.out)" > ${WORKSPACE}/"$(hostname)"_results.csv
put_results "${REMOTE_HOST}"
5.8.3. Reports¶
- Test plan execution reports: