Add an Elasticsearch v2 storage driver¶
https://storyboard.openstack.org/#!/story/2006332
Problem Description¶
For now, there is only one v2 storage driver: InfluxDB. However there should always be several proposed choices for each of CloudKitty’s modules.
The following strengths make Elasticsearch a great candidate:
It’s a widespread solution. On most deployments, it is likely that an Elasticsearch cluster is available (for example for log centralization).
HA and clustering. Elasticsearch features HA and clustering by default, whereas it is only available in the paid version of InfluxDB.
It’s performant. And Elasticsearch allows some tuning by admins.
Data visualization Data from the InfluxDB storage driver can be visualized with Grafana, the Elastic stack provides Kibana.
Proposed Change¶
A v2 storage driver for Elasticsearch, available through the
cloudkitty.storage.v2.backends.elasticsearch entrypoint.
Here’s a summary of the routes and aggregation methods that will be used for each of the v2 storage driver interface’s methods:
init:PUT /<index>See “Data model impact” for mapping details.push:POST /<index>/<mapping>/_bulkretrieve:GET /<index>/_searchA standard search query with filters.total:GET /<index>/_searchThecompositeaggregation will be used: Severaltermsaggregations in themustclause of aboolquery will allow to group data on specific attributes. Asumaggregation will then be applied to the buckets to obtain the qty and price for each of them.delete:POST /<index>/_delete_by_querySame principle as theretrievemethod, but for deletion.
Warning
The “composite” query is stable since Elasticsearch version
6.5. In order to be compatible with 6.x and 7.x, cloudkitty will
use the include_type_name parameter for mapping creation. This
parameter was added in Elasticsearch 6.8. This parameter will be
removed in Elasticsearch 8. Thus, CloudKitty will require
Elasticsearch >= 6.5 and < to 8.
Note
About pagination: Given that offset + size can’t exceed
15000 in the search API, the retrieve function will use
scrolling. The search_after feature will not be used, as it
is stateless, which means that consecutive requests may return
unexpected results depending on the index updates happening at the
same time. The duration for which scroll contexts should be kept
open will be configurable through a config file option marked as
advanced.
The total function will use the after parameter of the
composite aggregation
Note
The CloudKitty storage driver will only require the OSS version of Elasticsearch to work. However, some X-Pack features of the Basic version, like authentication, will be supported (but not mandatory) in the future.
Alternatives¶
None.
Data model impact¶
The data model used in Elasticsearch will be as follows:
Each DataPoint will be a single document. An existing empty index is required (this will allow tuning from admins). In order to improve overall performance, a mapping with the following attributes will be created.
start: (date) The start of the period the datapoint applies to.end: (date) The end of the period the datapoint applies to.type: (keyword) The type of the datapoint.unit: (keyword) The unit of the datapoint.qty: (double) The qty of the datapoint.price: (double) The price of the datapoint.groupby: (object) Dict of the datapoint’s groupby attributes.metadata: (object) Dict of the datapoint’s metadata attributes.
Note
In order to allow flexible groupby/metadata, the associated objects will be flexible.
Note
Given that we will only do exact value searches, every string
attribute will be converted to a keyword. This will be achieved
using dynamic templates.
Warning
By default, the _source field will be enabled. An option to
disable it in order to improve storage size may be added,
but this should be done with care. See the link to the
Elasticsearch documentation in the references for details.
In the end, the mapping will be defined as follows:
{
"mappings": {
"_doc": {
// cast all strings to keywords
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
// we won't add any attribute to the base object, so dynamic must be false
"dynamic": false,
"properties": {
"start": {"type": "date"},
"end": {"type": "date"},
"type": {"type": "keyword"},
"unit": {"type": "keyword"},
"qty": {"type": "double"},
"price": {"type": "double"},
// groupby and metadata will accept new attributes
"groupby": {"dynamic": true, "type": "object"},
// even though metadata should not be indexed, disabling it can't be
// undone, and disabled objects are only available through the "_source"
// field, which may also be disabled
"metadata": {"dynamic": true, "type": "object"}
}
}
}
}
Note
Given that a term to filter on may be part of groupby or
metadata, each filter will add two term queries to the
should part of the bool query (one for the groupby
section and one for the metadata section). Thus, the
minimum_should_match parameter of the bool query will be set
to half of the number of terms in the should query.
REST API impact¶
None.
Security impact¶
In the first iteration, there will be no support for x-pack authentication. It will be up to the admins to secure the connections between the Elasticsearch cluster and CloudKitty. Authentication will be introduced in future releases.
Notifications Impact¶
None.
Other end user impact¶
None.
Performance Impact¶
On most benchmarks (and from what could be determined from POCs), data insertion into Elasticsearch is slower than insertion into InfluxDB. However, Elasticsearch is faster for aggregations. However, once CloudKitty has caught up with the current timestamp, not many insertions are required. Moreover, Elasticsearch’s support for clustering and for tuning should allow for a better overall performance in the end.
Other deployer impact¶
The new backend will require more configuration from the admins:
Index aliases and lifecycles
Shards and replicas
Developer impact¶
None
Implementation¶
Assignee(s)¶
- Primary assignee:
peschk_l
Work Items¶
Implement an Elasticsearch storage driver
Add support for the driver to the Devstack plugin.
Add a Tempest job where the Elasticsearch storage driver is used.
Dependencies¶
Elasticsearch >= 6.5.
Testing¶
In addition to unit tests, this will be tested with Tempest.
Documentation Impact¶
The configuration options provided of this driver will be detailed in the documentation. There will also be a section dedicated to the configuration of the Elasticsearch index.
References¶
Dynamic templates: https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
_sourcefield: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.htmlsearch_afterparameter: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-request-body.html#request-body-search-search-afterElasticsearch and InfluxDB benchmarks: https://jolicode.com/blog/influxdb-vs-elasticsearch-for-time-series-and-metrics-data