5.25. Telemetry Services resource consumption/scalability testing

status

ready

version

1.0

Abstract

This document describes how scalability and performance testing is conducted on an OpenStack Cloud with a focus on OpenStack Telemetry Services. Currently this focuses on Telemetry Services collection/processing of metrics, further test cases can be added to scale and performance test other aspects of the OpenStack Telemetry Services.

5.25.1. Test Plan

Characterize the resource consumption and application performance of OpenStack Telemetry Services on an OpenStack Cloud as a workload increases over time. As the workload is increased, measure System Performance Metrics (CPU, Memory, Disk, IO) and Application Performance Metrics (responsiveness, health, utilization, functionality) until desired load is reached or system/application failures.

5.25.1.1. Test Environment

5.25.1.1.1. Preparation

Ideally this is run on a newly deployed cloud each time for repeatability purposes. Cloud deployment should be documented for each test case / run as deployment will set many configuration values which will impact performance.

5.25.1.1.2. Environment description

The environment description includes hardware specs, software versions, tunings and configuration of the OpenStack Cloud under test.

5.25.1.1.2.1. Hardware

List details of hardware for each node type here.

Deployment node (Undercloud)

Parameter

Value

model

ex. Dell PowerEdge r610

CPU

ex. 2xIntel(R) Xeon(R) X5650 @ 2.67GHz (12Cores/24Threads)

Memory

ex. 64GiB (@1333MHz)

Disk

ex. 4 x 146GiB 15K SAS Drives in RAID 0

Network

ex. 2x1Gb/s Broadcom, 2x10Gb/s Intel X520

Controller

Parameter

Value

model

ex. Dell PowerEdge r610

CPU

ex. 2xIntel(R) Xeon(R) X5650 @ 2.67GHz (12Cores/24Threads)

Memory

ex. 64GiB (@1333MHz)

Disk

ex. 4 x 146GiB 15K SAS Drives in RAID 0

Network

ex. 2x1Gb/s Broadcom, 2x10Gb/s Intel X520

Compute

Parameter

Value

model

ex. Dell PowerEdge r610

CPU

ex. 2xIntel(R) Xeon(R) X5650 @ 2.67GHz (12Cores/24Threads)

Memory

ex. 64GiB (@1333MHz)

Disk

ex. 4 x 146GiB 15K SAS Drives in RAID 0

Network

ex. 2x1Gb/s Broadcom, 2x10Gb/s Intel X520

Additional Hardware for testing/monitoring/results

  • Performance Monitoring Host (Carbon/Graphite/Grafana)

  • Performance Results Host (ElasticSearch/Kibana)

5.25.1.1.2.2. Software

Record versions of Linux kernel, Base Operating System (ex. Centos 7.3), OpenStack version (ex. Newton), OpenStack Packages, testing harness/framework and any other pertinent software.

5.25.1.1.2.3. Tuning/Configuration

Record deployed configuration, including the following but not limited to

  • # of Gnocchi-metricd processes

  • # api processes/threads

  • api deployed in httpd? (If so include httpd configuration options)

  • Backend (file, swift, ceph)

  • Ceilometer polling interval

  • Other Services worker/process counts (Nova, Neutron, …)

5.25.1.1.2.4. System Performance Monitoring

Record System performance metrics into a separate metrics collection/storage/analysis system. Suggested system would be a separate machine with Carbon, Graphite, and Grafana with dashboards for monitoring system resource utilization. To push metrics into the TSDB, collectd can/should be installed on all monitored machines. (Deployment, Controllers, and Computes)

5.25.1.1.2.5. Test Diagram

Attach test diagram to display test topology.

5.25.1.2. Test Case 1

5.25.1.2.1. Description

Boot 50 persisting instances every 1200 seconds until 1000 instances booted and running in OpenStack cloud.

Parameters

  1. Amount of Instances to boot per period (ex. 50)

  2. Amount of time to wait between booting periods (ex. 1200 seconds)

  3. Maximum number of instances desired for test (ex. 1000)

Depending upon available hardware, the above parameters will need to adjusted

Stopping/Failure Conditions

  • Max number of instances achieved

  • Failure to boot instances

  • Failure for Telemetry Services to consume metrics

  • Other service failures/errors

  • System out of Resources (ex. CPU 100% utilized)

5.25.1.2.2. Setup

  1. Deploy OpenStack Cloud

  2. Install testing and monitoring tooling

  3. Gather metadata on Cloud

  4. Run test

5.25.1.2.3. Analysis

Review System performance metrics graphs during test duration to observe for stopping/failure conditions. Review testing harness output for test failure conditions.

5.25.1.2.4. List of performance metrics

Performance

  • CPU utilization

  • Memory utilization

  • Disk IO utilization

  • Per-Process CPU/Memory/IO (Gnocchi, Ceilometer, Nova, Swift, Ceph …)

  • Time required to Boot Instances

  • Responsiveness of Gnocchi/Ceilometer or services

Failure Conditions

  • Errors in log files (Gnocchi, Ceilometer, Nova, Swift, …)

5.25.2. Reports

Test plan execution reports: