Use the swift-dispersion-report tool
to measure overall cluster health. This tool checks if a
set of deliberately distributed containers and objects are
currently in their proper places within the cluster. For
instance, a common deployment has three replicas of each
object. The health of that object can be measured by
checking if each replica is in its proper place. If only 2
of the 3 is in place the object’s health can be said to be
at 66.66%, where 100% would be perfect. A single object’s
health, especially an older object, usually reflects the
health of that entire partition the object is in. If you
make enough objects on a distinct percentage of the
partitions in the cluster,you get a good estimate of the
overall cluster health. In practice, about 1% partition
coverage seems to balance well between accuracy and the
amount of time it takes to gather results. The first thing
that needs to be done to provide this health value is
create a new account solely for this usage. Next, you need
to place the containers and objects throughout the system
so that they are on distinct partitions. The
swift-dispersion-populate tool does this
by making up
random container and object names until they fall on
distinct partitions. Last, and repeatedly for the life of
the cluster, you must run the
swift-dispersion-report tool to
check the health of each of these containers and objects.
These tools need direct access to the entire cluster and
to the ring files (installing them on a proxy server
suffices). The
swift-dispersion-populate and
swift-dispersion-report commands
both use the same configuration file,
/etc/swift/dispersion.conf
.
Example dispersion.conf
file:
[dispersion] auth_url = http://localhost:8080/auth/v1.0 auth_user = test:tester auth_key = testing
There are also configuration options for specifying the dispersion coverage, which defaults to 1%, retries, concurrency, and so on. However, the defaults are usually fine. Once the configuration is in place, run swift-dispersion-populate to populate the containers and objects throughout the cluster. Now that those containers and objects are in place, you can run swift-dispersion-report to get a dispersion report, or the overall health of the cluster. Here is an example of a cluster in perfect health:
$ swift-dispersion-report Queried 2621 containers for dispersion reporting, 19s, 0 retries 100.00% of container copies found (7863 of 7863) Sample represents 1.00% of the container partition space Queried 2619 objects for dispersion reporting, 7s, 0 retries 100.00% of object copies found (7857 of 7857) Sample represents 1.00% of the object partition space
Now, deliberately double the weight of a device in the object ring (with replication turned off) and re-run the dispersion report to show what impact that has:
$ swift-ring-builder object.builder set_weight d0 200 $ swift-ring-builder object.builder rebalance ... $ swift-dispersion-report Queried 2621 containers for dispersion reporting, 8s, 0 retries 100.00% of container copies found (7863 of 7863) Sample represents 1.00% of the container partition space Queried 2619 objects for dispersion reporting, 7s, 0 retries There were 1763 partitions missing one copy. 77.56% of object copies found (6094 of 7857) Sample represents 1.00% of the object partition space
You can see the health of the objects in the cluster has gone down significantly. Of course, this test environment has just four devices, in a production environment with many devices the impact of one device change is much less. Next, run the replicators to get everything put back into place and then rerun the dispersion report:
... start object replicators and monitor logs until they're caught up ... $ swift-dispersion-report Queried 2621 containers for dispersion reporting, 17s, 0 retries 100.00% of container copies found (7863 of 7863) Sample represents 1.00% of the container partition space Queried 2619 objects for dispersion reporting, 7s, 0 retries 100.00% of object copies found (7857 of 7857) Sample represents 1.00% of the object partition space
Alternatively, the dispersion report can also be output in json format. This allows it to be more easily consumed by third-party utilities:
$ swift-dispersion-report -j {"object": {"retries:": 0, "missing_two": 0, "copies_found": 7863, "missing_one": 0, "copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "missing_all": 0}, "container": {"retries:": 0, "missing_two": 0, "copies_found": 12534, "missing_one": 0, "copies_expected": 12534, "pct_found": 100.0, "overlapping": 15, "missing_all": 0}}
Configuration option = Default value | Description |
---|---|
auth_url = http://localhost:8080/auth/v1.0 | Endpoint for auth server, such as keystone |
auth_user = test:tester | Default user for dispersion in this context |
auth_key = testing | No help text available for this option. |
auth_url = http://localhost:5000/v2.0/ | Endpoint for auth server, such as keystone |
auth_user = tenant:user | Default user for dispersion in this context |
auth_key = password | No help text available for this option. |
auth_version = 2.0 | Indicates which version of auth |
endpoint_type = publicURL | Indicates whether endpoint for auth is public or internal |
keystone_api_insecure = no | No help text available for this option. |
swift_dir = /etc/swift | Swift configuration directory |
dispersion_coverage = 1.0 | No help text available for this option. |
retries = 5 | No help text available for this option. |
concurrency = 25 | Number of replication workers to spawn |
container_populate = yes | No help text available for this option. |
object_populate = yes | No help text available for this option. |
container_report = yes | No help text available for this option. |
object_report = yes | No help text available for this option. |
dump_json = no | No help text available for this option. |