5.8. Measuring performance of Cinder with Ceph backend¶

status

ready

version

1.0

Abstract

This document describes a test plan for quantifying the performance of block storage devices provided by OpenStack Cinder with Ceph used as back-end. The plan includes the collection of several resource utilization metrics, which will be used to analyze and understand the overall performance of used storage technologies. In particular, resource bottlenecks will either be fixed, or best practices developed for system and hardware requirements.

Conventions

Kubernetes: is an open-source system for automating deployment, scaling, and management of containerized applications.
Calico: is a new approach to virtual networking and network security for containers, VMs, and bare metal services, that provides a rich set of security enforcement capabilities running on top of a highly scalable and efficient virtual network fabric. Calico includes pre-integration with Kubernetes and Mesos (as a CNI network plugin), Docker (as a libnetwork plugin) and OpenStack (as a Neutron plugin).
fuel-ccp: CCP stands for “Containerized Control Plane”. The goal of this project is to make building, running and managing production-ready OpenStack containers on top of Kubernetes an easy task for operators.
OpenStack: OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface.
Cinder: The Block Storage service provides block storage devices to guest instances. The method in which the storage is provisioned and consumed is determined by the Block Storage driver, or drivers in the case of a multi-backend configuration. There are a variety of drivers that are available: NAS/SAN, NFS, iSCSI, Ceph, and more.
Heat: Heat is a service to orchestrate composite cloud applications using a declarative template format through an OpenStack-native REST API.
Ceph: Ceph is a massively scalable, open source, distributed storage system. It is comprised of an object store, block store, and a POSIX-compliant distributed file system. The platform can auto-scale to the exabyte level and beyond. It runs on commodity hardware, is self-healing and self-managing, and has no single point of failure. Ceph is in the Linux kernel and is integrated with the OpenStack cloud operating system.
Nodes: are servers used for workloads.
IOPS: Input/output operations per second is a performance measurement used to characterize computer storage devices like hard disk drives (HDD), solid state drives (SSD), and storage area networks (SAN).
Completion latency: This is the time that passes between submission to the kernel and when the IO is complete, not including submission latency.

5.8.1. Test Plan¶

This test plan aims to identify Cinder + Ceph storage performance and its dependency from amount of concurrent attached consumers.

5.8.1.1. Test Environment¶

5.8.1.1.1. Preparation¶

To be able to run test we need:
- Ceph cluster installed and configured
- K8s cluster installed and configured with Calico
- OpenStack cloud with Heat and Cinder installed on top of K8s cluster
- Created and uploaded into Glance image with random data which will be used
for prefilling of Cinder volumes

Software to be installed¶
software	version	source
Ceph	jewell	Debian jessie ceph package repository
Kargo	master	From sources
Kubernetes	1.4.3	quay.io/coreos/hyperkube:v1.4.3_coreos.0
Calico	0.22.0	docker hub
calicoctl	1.0.0-beta	docker hub
OpenStack	newton	From sources

5.8.1.1.2. Environment description¶

Test results MUST include a description of the environment used. The following items should be included:

Hardware configuration of each server. If virtual machines are used then both physical and virtual hardware should be fully documented. An example format is given below:

Ceph cluster member:

Description of server hardware¶
server	name
	role
	vendor,model
	operating_system
CPU	vendor,model
	processor_count
	core_count
	frequency_MHz
RAM	vendor,model
RAM	amount_MB
NETWORK	interface_name
	vendor,model
	bandwidth
STORAGE	dev_name
	vendor,model
	SSD/HDD
	size

Configuration of physical network. The description of phisical and logical connectivities.
Configuration of virtual machines and virtual networks (if used). The configuration files can be attached, along with the mapping of virtual machines to host machines.
Ceph cluster configuration Deployment scheme and configuration of ceph components.

ceph nodes configuration and roles

amount of ceph monitor nodes

amount of ceph OSD and placement groups

Kubernetes + Calico configuration Deployment scheme and configuration of servers used within testing environment.

k8s nodes configuration and roles

k8s networking configuration (Calico)

OpenStack deployment configuration used by fuel-ccp. OpenStack services configuration and topology used by fuel-ccp.

OpenStack services and roles topology

OpenStack cinder + ceph configuration

5.8.1.2. Test Cases¶

Case group 1 - average time of creation, attachment and deletion of Cinder volumes
Case group 2 - amount of concurrent read, write and simultaneous read, write IOPS depending on amount of VMs

5.8.1.2.1. Description¶

This specific test plan contains test cases, that needs to be run on the environments differing list of parameters below. Here we have 2 kinds of metrics to be measured.

OpenStack control plane side tests of Cinder with Ceph back-end like execution time for basic functionality of cinder.
Load tests of VM storage subsystem provided by Cinder with Ceph back-end. This tests will show dependency of IOPS from amount of consumers and disk operations types.

5.8.1.2.2. Parameters¶

Parameters depend on ceph and OpenStack configurations.

Case group 1:

Parameter name	Value
vms + volumes	30, 60, 90
concurrency	10, 20, 30
operation	create, attach, delete

Case group 2:

Parameter name	Value
VMs per rw mode	1, 2, 5, 10, 20
read/write mode	randread, randwrite
block size	4k - constant
io depth (queue)	64
test duration in seconds	600
filesize	40G

5.8.1.2.3. List of performance metrics¶

The tables below show the list of test metrics to be collected. The priority is the relative ranking of the importance of each metric in evaluating the performance of the system.

Case group 1:

List of performance metrics for cinder functionality¶
Value	Measurement Units	Description
Time	seconds	time spent on requested operation

Case group 2:

List of performance metrics for storage subsystem¶
Value	Measurement Units	Description
READ_IO	operations/second	amount of input/output operations per second during random read from storage subsystem
WRITE_IO	operations/second	amount of input/output operations per second during random read from storage subsystem
READ_LATENCY	milliseconds	time that passes between submission to the kernel and when the IO is complete
WRITE_LATENCY	milliseconds	time that passes between submission to the kernel and when the IO is complete
VMs_COUNT	number	amount of simultaneously launched VMs with attached cinder volumes producing storage loads

5.8.1.2.4. Measuring performance values¶

Case group 1:

“Control plane” test will be executed using OpenStack Rally scenarios.

Maximum values of performance metrics from Rally¶
Action	Min (sec)	Median (sec)	90%ile (sec)	95%ile (sec)	Max (sec)	Avg (sec)	Success	Count

Where:

operation will be one of create, attach or delete
volume size is also matters and all operations mentioned above will be repeated for each groups of volumes

Case group 2:

Storage performance testing will be based on test scripts which will be delivered and launched inside VMs using OpenStack Heat templates.

Heat templates could be laucnhed with different set of paramteters. These parameters are serving for 2 porpuses:

1.Parameters for OpenStack environment:
1. key_name - SSH key name that will be injected into instances
2. flavor - flavor to be used for instances
3. image - image to be used for instances
4. network_name - internal network to be used for instances
5. volume_size - volume size to be created and attached to instance
6. vm_count - amount of VMs with volumes to be spawned
2.Parameters for test script:
1. test_mode - condition of test (time or disk)
2. test_rw - read or write mode (randread, randwrite, randrw)
3. test_runtime - amount of time in seconds (default 600)
4. test_filesize - amount of data size (default 4G)
5. test_iodepth - IO queue size generated by test (default 64)

Average values of performance metrics from cinder + ceph¶
nodes count	test duration time in sec	average IOPS READ	average IOPS WRITE	average latency READ	average latency WRITE
2
4
10
20
40

Summary values of performance metrics from cinder + ceph¶
nodes count	test duration time in sec	SUM IOPS READ	SUM IOPS WRITE
2
4
10
20
40

5.8.2. Applications¶

5.8.2.1. Rally jobs templates:¶

---
  CinderVolumes.create_and_attach_volume:
    -
      args:
          size: 10
          image:
            name: "cirros_vm"
          flavor:
            name: "m1.tiny"
      runner:
        type: "constant"
        times: 30
        concurrency: 10
      context:
        users:
          tenants: 3
          users_per_tenant: 3
        quotas:
          cinder:
            volumes: -1
            gigabytes: -1
            snapshots: -1
        api_versions:
            cinder:
                version: 2
                service_type: cinderv2
      sla:
        failure_rate:
          max: 10

---
  CinderVolumes.create_and_attach_volume:
    -
      args:
          size: 10
          image:
            name: "cirros_vm"
          flavor:
            name: "m1.tiny"
      runner:
        type: "constant"
        times: 60
        concurrency: 20
      context:
        users:
          tenants: 3
          users_per_tenant: 3
        quotas:
          cinder:
            volumes: -1
            gigabytes: -1
            snapshots: -1
        api_versions:
            cinder:
                version: 2
                service_type: volumev2
      sla:
        failure_rate:
          max: 10

---
  CinderVolumes.create_and_attach_volume:
    -
      args:
          size: 10
          image:
            name: "cirros_vm"
          flavor:
            name: "m1.tiny"
      runner:
        type: "constant"
        times: 120
        concurrency: 40
      context:
        users:
          tenants: 3
          users_per_tenant: 3
        quotas:
          cinder:
            volumes: -1
            gigabytes: -1
            snapshots: -1
        api_versions:
            cinder:
                version: 2
                service_type: volumev2
      sla:
        failure_rate:
          max: 10

5.8.2.2. Heat templates¶

heat_template_version: newton

parameters:
  image:
    type: string
  flavor:
    type: string
  key_name:
    type: string
  vm_count:
    type: string
  volume_size:
    type: string
  network_name:
    type: string
  test_iodepth:
    type: number
    default: 64
  test_runtime:
    type: number
    default: 600
  test_filesize:
    type: string
    default: 4G
  test_mode:
    type: string
    default: size
  test_rw:
    type: string
    default: randrw


resources:
  server_resources:
    type: OS::Heat::ResourceGroup
    properties:
      count: { get_param: vm_count }
      resource_def:
        type: vm-with-vol.yaml
        properties:
          image: { get_param: image }
          flavor: { get_param: flavor }
          key_name: { get_param: key_name }
          network_name: { get_param: network_name }
          volume_size: { get_param: volume_size }
          test_iodepth: { get_param: test_iodepth }
          test_filesize: { get_param: test_filesize }
          test_runtime: { get_param: test_runtime }
          test_mode: { get_param: test_mode }
          test_rw: { get_param: test_rw }
          index:
            list_join: ['-', [ {get_param: test_mode},'vm', {get_param: test_rw}, '%index%' ]]

outputs:
  script_result:
    value: { get_attr: [server_resources, result] }

heat_template_version: newton
parameters:
  image:
    type: string
  flavor:
    type: string
  key_name:
    type: string
  volume_size:
    type: string
  network_name:
    type: string
  volume_image:
    type: string
    default: 40g-urandom
  test_iodepth:
    type: number
  test_filesize:
    type: string
  index:
    type: string
  test_mode:
    type: string
    default: size
  test_rw:
    type: string
    default: randrw
  test_runtime:
    type: number
    default: 300

resources:
  server:
    type: OS::Nova::Server
    properties:
      name: { get_param: index }
      image: { get_param: image }
      flavor: { get_param: flavor }
      key_name: { get_param: key_name }
      networks:
        - network: { get_param: network_name }
      user_data_format: RAW
      user_data:
        str_replace:
          template: |
            #!/bin/bash
            export IODEPTH=iodepth
            export SIZE=filesize
            export RWMODE=rwmode
            export RUNMOD=runmode
            export RUNTIME=runtime
            scriptfile
          params:
            scriptfile: { get_file: vmScript.sh }
            iodepth: { get_param: test_iodepth }
            filesize: { get_param: test_filesize }
            runmode: { get_param: test_mode }
            rwmode: { get_param: test_rw }
            runtime: { get_param: test_runtime }
  volume:
    type: OS::Cinder::Volume
    properties:
      size: { get_param: volume_size }
      image: { get_param: volume_image }

  attachment:
    type: OS::Cinder::VolumeAttachment
    properties:
      instance_uuid: { get_resource: server }
      volume_id: { get_resource: volume }

outputs:
  result:
    value: stub-attribute

5.8.2.3. Test script for heat¶

#!/bin/bash
# Script for IO testing
WORKDIR="$(cd "$(dirname ${0})" && pwd)"
WORKSPACE="${WORKDIR}/workspace"
USER_NAME="${USER_NAME:-root}"
USER_PASS="${USER_PASS:-r00tme}"
REMOTE_HOST="${REMOTE_HOST:-172.20.9.15}"
STARTTIME=""
STOPTIME=""
function prepare()
{
  local ec=0
  mkdir -p ${WORKSPACE}
  export DEBIAN_FRONTEND=noninteractive
  apt update > /dev/null 2>&1 || ec=$?
  apt install -y fio sshpass bc > /dev/null 2>&1 || ec=$?
  return ${ec}
}

function check_vol()
{
  local volpath
  local retval
  local maxretry
  local counter
  retval=1
  counter=0
  maxretry=60
  volpath=${TARGET}
  while true
  do
    if [ -e ${volpath} ]; then
      retval=0
      break
    else
      continue
    fi
    counter=$(( counter + 1 ))
    sleep 2
    if [ "${counter}" -ge "${maxretry}" ]; then
      break
    fi
  done
  return ${retval}
}

function u2m_sec()
{
  local input
  local output
  input=${1}
  output=$(echo "scale=4;${input}/1000" | bc)
  if echo ${output} | grep -q '^\..'; then
    output="0${output}"
  fi
  echo "${output}"
}

parse_terse()
{
  # msec = 1000 usec, 1s = 1000 msec
  local input=$*
  local jobname #3
  local read_iops #8
  local read_bw #7 #KB/s
  local read_clat_min #14 #usec
  local read_clat_max #15 #usec
  local read_clat_mean #16 #usec
  local read_clat_95 #29 #usec
  local read_clat_99 #30 #usec
  local read_total_lat_avg #40 #usec
  local read_bw_avg #45 #KB/s
  local write_iops #49
  local write_bw #48 #KB/s
  local write_clat_min #55 #usec
  local write_clat_max #56 #usec
  local write_clat_mean #57 #usec
  local read_clat_95 #70 #usec
  local read_clat_99 #71 #usec
  local write_total_lat_avg #81 #usec
  local write_bw_avg #86 #KB/s
  jobname="$(echo "${input}" | cut -d';' -f3)"
  read_iops="$(echo "${input}" | cut -d';' -f8)"
  read_bw="$(echo "${input}" | cut -d';' -f7)"
  read_clat_min="$(u2m_sec "$(echo "${input}" | cut -d';' -f14)")"
  read_clat_max="$(u2m_sec "$(echo "${input}" | cut -d';' -f15)")"
  read_clat_mean="$(u2m_sec "$(echo "${input}" | cut -d';' -f16)")"
  read_clat_95="$(u2m_sec "$(echo "${input}" | cut -d';' -f29 | cut -d'=' -f2)")"
  read_clat_99="$(u2m_sec "$(echo "${input}" | cut -d';' -f30 | cut -d'=' -f2)")"
  read_total_lat_avg="$(u2m_sec "$(echo "${input}" | cut -d';' -f40)")"
  read_bw_avg="$(echo "${input}" | cut -d';' -f45)"
  write_iops="$(echo "${input}" | cut -d';' -f49)"
  write_bw="$(echo "${input}" | cut -d';' -f48)"
  write_clat_min="$(u2m_sec "$(echo "${input}" | cut -d';' -f55)")"
  write_clat_max="$(u2m_sec "$(echo "${input}" | cut -d';' -f56)")"
  write_clat_mean="$(u2m_sec "$(echo "${input}" | cut -d';' -f57)")"
  write_clat_95="$(u2m_sec "$(echo "${input}" | cut -d';' -f70 | cut -d'=' -f2)")"
  write_clat_99="$(u2m_sec "$(echo "${input}" | cut -d';' -f71 | cut -d'=' -f2)")"
  write_total_lat_avg="$(u2m_sec "$(echo "${input}" | cut -d';' -f81)")"
  write_bw_avg="$(echo "${input}" | cut -d';' -f86)"
  echo "${STARTTIME},${STOPTIME},${jobname},${read_iops},${read_bw},${read_clat_mean},${read_clat_min},${read_clat_max},${read_clat_95},${read_clat_99},${read_total_lat_avg},${read_bw_avg},${write_iops},${write_bw},${write_clat_mean},${write_clat_min},${write_clat_max},${write_clat_95},${write_clat_99},${write_total_lat_avg},${write_bw_avg}"
}

function run_fio()
{
  local iodepth
  local bs
  local ioengine
  local direct
  local buffered
  local jobname
  local filename
  local size
  local readwrite
  local runtime
  bs="4k"
  direct=1
  buffered=0
  ioengine="libaio"
  jobname="$(hostname)_fio"
  iodepth="${IODEPTH}"
  filename="${TARGET}"
  size="--size=${SIZE}"
  readwrite="${RWMODE}"
  STARTTIME=$(date +%Y.%m.%d-%H:%M:%S)
  if [[ "${RUNMOD}" == "time" ]]; then runtime="--runtime=${RUNTIME} --time_based=1"; size='';fi
  fio --ioengine=${ioengine} --direct=${direct} --buffered=${buffered} \
  --name=${jobname} --filename=${filename} --bs=${bs} --iodepth=${iodepth} ${size} \
  --readwrite=${readwrite} ${runtime} --output-format=terse --terse-version=3 --output=${WORKSPACE}/"$(hostname)"_terse.out 2>&1 | tee ${WORKSPACE}/"$(hostname)"_raw_fio_terse.log
  STOPTIME="$(date +%Y.%m.%d-%H:%M:%S)"
  if [ "$(stat ${WORKSPACE}/"$(hostname)"_raw_fio_terse.log | grep -oP '(?<=(Size:))(.[0-9]+\s)')" -eq 0 ]; then
    rm ${WORKSPACE}/"$(hostname)"_raw_fio_terse.log
  fi
}

function put_results()
{
  local remotehost
  local remotepath
  remotehost="${1}"
  remotepath="/${USER_NAME}/results"
  if [ -f ${WORKSPACE}/"$(hostname)"_results.csv ]; then
    sshpass -p ${USER_PASS} ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ${USER_NAME}@${remotehost} "mkdir -p ${remotepath}"
    sshpass -p ${USER_PASS} scp -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -r ${WORKSPACE}/*.* ${USER_NAME}@${remotehost}:${remotepath}/
  else
    exit 1
  fi
}

# Main
IODEPTH="${IODEPTH:-64}"
TARGET="${TARGET:-/dev/vdc}"
SIZE="${SIZE:-4G}"
RUNTIME="${RUNTIME:-600}" # 10min
RWMODE="${RWMODE:-randrw}"
RUNMOD="${RUNMOD}"
PARSEONLY="${PARSEONLY:-false}"

# Output format:
# starttime, endtime, Jobname, read IOPS, read bandwidth KB/s, mean read complete latency msec, avg read latency msec, avg read bandwidth KB/s, write IOPS, write bandwidth KB/s, mean write complete latency msec, avg write latency msec, avg write bandwidth KB/s
if [[ "${PARSEONLY}" == "true" ]]; then
  for tline in $(cat "${1}")
  do
    parse_terse "${tline}"
  done
  exit 0
fi
prepare || exit $?
check_vol || exit $?
run_fio
parse_terse "$(cat ${WORKSPACE}/"$(hostname)"_terse.out)" > ${WORKSPACE}/"$(hostname)"_results.csv
put_results "${REMOTE_HOST}"

5.8.3. Reports¶

Test plan execution reports:

Results of measuring performance of Cinder with Ceph backend

5.8. Measuring performance of Cinder with Ceph backend

5.8. Measuring performance of Cinder with Ceph backend¶

5.8.1. Test Plan¶

5.8.1.1. Test Environment¶

5.8.1.1.1. Preparation¶

5.8.1.1.2. Environment description¶

5.8.1.2. Test Cases¶

5.8.1.2.1. Description¶

5.8.1.2.2. Parameters¶

5.8.1.2.3. List of performance metrics¶

5.8.1.2.4. Measuring performance values¶

5.8.2. Applications¶

5.8.2.1. Rally jobs templates:¶

5.8.2.2. Heat templates¶

5.8.2.3. Test script for heat¶

5.8.3. Reports¶

performance_docs 0.1

Page Contents