Use the swift-dispersion-report tool to measure overall cluster health. This tool checks if a set of deliberately distributed containers and objects are currently in their proper places within the cluster. For instance, a common deployment has three replicas of each object. The health of that object can be measured by checking if each replica is in its proper place. If only 2 of the 3 is in place the object's health can be said to be at 66.66%, where 100% would be perfect. A single object's health, especially an older object, usually reflects the health of that entire partition the object is in. If you make enough objects on a distinct percentage of the partitions in the cluster,you get a good estimate of the overall cluster health.
In practice, about 1% partition coverage seems to balance well between accuracy and the amount of time it takes to gather results. To provide this health value, you must create an account solely for this usage. Next, you must place the containers and objects throughout the system so that they are on distinct partitions. Use the swift-dispersion-populate tool to create random container and object names until they fall on distinct partitions.
Last, and repeatedly for the life of the cluster, you must run the swift-dispersion-report tool to check the health of each container and object.
These tools must have direct access to the entire cluster and ring files. Installing them on a proxy server suffices.
The swift-dispersion-populate and
swift-dispersion-report commands both use the
same /etc/swift/dispersion.conf
configuration
file. Example dispersion.conf
file:
[dispersion] auth_url = http://localhost:8080/auth/v1.0 auth_user = test:tester auth_key = testing
You can use configuration options to specify the dispersion coverage, which defaults to 1%, retries, concurrency, and so on. However, the defaults are usually fine. After the configuration is in place, run the swift-dispersion-populate tool to populate the containers and objects throughout the cluster. Now that those containers and objects are in place, you can run the swift-dispersion-report tool to get a dispersion report or view the overall health of the cluster. Here is an example of a cluster in perfect health:
$ swift-dispersion-report Queried 2621 containers for dispersion reporting, 19s, 0 retries 100.00% of container copies found (7863 of 7863) Sample represents 1.00% of the container partition space Queried 2619 objects for dispersion reporting, 7s, 0 retries 100.00% of object copies found (7857 of 7857) Sample represents 1.00% of the object partition space
Now, deliberately double the weight of a device in the object ring (with replication turned off) and re-run the dispersion report to show what impact that has:
$ swift-ring-builder object.builder set_weight d0 200 $ swift-ring-builder object.builder rebalance ... $ swift-dispersion-report Queried 2621 containers for dispersion reporting, 8s, 0 retries 100.00% of container copies found (7863 of 7863) Sample represents 1.00% of the container partition space Queried 2619 objects for dispersion reporting, 7s, 0 retries There were 1763 partitions missing one copy. 77.56% of object copies found (6094 of 7857) Sample represents 1.00% of the object partition space
You can see the health of the objects in the cluster has gone down significantly. Of course, this test environment has just four devices, in a production environment with many devices the impact of one device change is much less. Next, run the replicators to get everything put back into place and then rerun the dispersion report:
... start object replicators and monitor logs until they're caught up ... $ swift-dispersion-report Queried 2621 containers for dispersion reporting, 17s, 0 retries 100.00% of container copies found (7863 of 7863) Sample represents 1.00% of the container partition space Queried 2619 objects for dispersion reporting, 7s, 0 retries 100.00% of object copies found (7857 of 7857) Sample represents 1.00% of the object partition space
Alternatively, the dispersion report can also be output in JSON format. This allows it to be more easily consumed by third-party utilities:
$ swift-dispersion-report -j {"object": {"retries:": 0, "missing_two": 0, "copies_found": 7863, "missing_one": 0, "copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "missing_all": 0}, "container": {"retries:": 0, "missing_two": 0, "copies_found": 12534, "missing_one": 0, "copies_expected": 12534, "pct_found": 100.0, "overlapping": 15, "missing_all": 0}}
Configuration option = Default value | Description |
---|---|
auth_key = testing |
No help text available for this option. |
auth_url = http://localhost:8080/auth/v1.0 |
Endpoint for auth server, such as keystone |
auth_user = test:tester |
Default user for dispersion in this context |
auth_version = 1.0 |
Indicates which version of auth |
concurrency = 25 |
Number of replication workers to spawn |
container_populate = yes |
No help text available for this option. |
container_report = yes |
No help text available for this option. |
dispersion_coverage = 1.0 |
No help text available for this option. |
dump_json = no |
No help text available for this option. |
endpoint_type = publicURL |
Indicates whether endpoint for auth is public or internal |
keystone_api_insecure = no |
Allow accessing insecure keystone server. The keystone's certificate will not be verified. |
object_populate = yes |
No help text available for this option. |
object_report = yes |
No help text available for this option. |
project_domain_name = project_domain |
No help text available for this option. |
project_name = project |
No help text available for this option. |
retries = 5 |
No help text available for this option. |
swift_dir = /etc/swift |
Swift configuration directory |
user_domain_name = user_domain |
No help text available for this option. |