Cinder Volume Active/Active support - Replication

https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support

As it stands to reason replication v2.1 only works in deployment configurations that were available and supported in Cinder at the time of its design and implementation.

Now that we are also supporting Active-Active configurations this translates to replication not properly working on this new supported configuration.

This spec extends replication v2.1 functionality to support Active-Active configurations while preserving backward compatibility for non clustered configurations.

Problem description

On replication v2.1 failover is requested on a per backend basis, so when a failover request is received by the API it is then redirected to a specific volume service via an asynchronous RPC call using that service’s topic message queue. Same thing happens for freeze and thaw operations.

It works when we have a one-to-one relation between volume services and storage backends, but it doesn’t when we have many-to-one relationship because the failover RPC call will be received by only one of the services that form the cluster for the storage backend and the others will be oblivious to this change and will continue using the same replication site they had been using before. This will result in some operations succeeding, those going to the service that performed the failover, and some operations failing, since they are going to the site that’s not available.

While that’s the primary issue, it’s not the only one, since we also have to track the replication status at the cluster level.

Use Cases

Users want to have highly available cinder services with disaster recovery using replication.

It is not enough that individual features will be available on their own as they’ll want to have them both at the same time; so being able to use either Active-Active configurations without replication, or replication if not deployed as Active-Active, is insufficient.

They could probably make it work if they stopped all but one volume services in the cluster, issued the failover request, and once it has been completed they brought the other services back up, but this would not be a clean approach to the problem.

Proposed change

The proposed change in its core is to divide the failover operation in the driver into two individual operations, one that will do the side of things related with the storage backend, for example force promoting volumes to primary on the secondary site, and another that will make the driver perform all the operations against the secondary storage device.

As mentioned before only one volume service will receive the request to do the failover, so by splitting the operation the manager will be able to request the local driver to do the first part of the failover and once that is done it will send all volume nodes in the cluster handling that backend the signal that the failover has been completed and that they should start pointing to the failed over secondary site, thus solving the problem of some services not knowing that a new site should be used.

This will also require two homonymous RPC calls to the drivers new methods in the volume manager: failover and failover_completed.

We will also add the replication information to the clusters table to track replication at the cluster level for clustered services.

Given current use of the freeze and thaw operation there doesn’t seem to be a reason to do the same split, so these operations would be left as they are and will only be performed by one volume service when requested.

This change will require vendors to update their drivers to support replication on Active-Active configurations, so to avoid surprises we will be preventing the volume service from starting in Active-Active configurations with replication enabled on drivers that don’t support the Active-Active mechanism.

Alternatives

The splitting mechanism for the failover_host method is pretty straight forward, the only alternative to the proposed changed would be to split the thaw and freeze operations as well.

Data model impact

Three new fields related to the replication will be added to the clusters table. These will be the same fields we currently have in the services table and will hold the same meaning:

  • replication_status: String storing the replication status for the whole cluster.

  • active_backend_id: String storing which one of the replication sites is currently active.

  • frozen: Boolean reflecting whether the cluster is frozen or not.

These fields will be kept in sync between the clusters table and the services table for consistency.

REST API impact

  • A new action called failover equivalent to existing failover_host will be added, and it will support a new cluster parameter in addition to the host field already available in failover_host.

  • Cluster listing will accept replication_status, frozen and active_backend_id as filters,

  • Cluster listing will return additional replication_status, frozen and active_backend_id fields.

Security impact

None.

Notifications impact

None.

Other end user impact

The client will return the new fields when listing clusters using the new microversion and new filters will also be available.

Failover for this microversion will accept the cluster parameter.

Performance Impact

The new code should have no performance impact on existing deployments since it will only affect new Active-Active deployments.

Other deployer impact

None.

Developer impact

Drivers that wish to support replication on Active-Active deployments will have to implement failover and failover_completed methods as well as the current failover_host method since it is being used for backward compatibility with the base replication v2.1.

The easiest way to support this with minimum code would be to implement failover and failover_completed and then create failover_host based on those:

def failover_host(self, volumes, secondary_id):
    self.failover(volumes, secondary_id)
    self.failover_completed(secondary_id)

Implementation

Assignee(s)

Primary assignee:

Gorka Eguileor (geguileo)

Other contributors:

None

Work Items

  • Change service start to use active_backend_id from the cluster or the service.

  • Add new failover REST API

  • Update list REST API method to accept new filtering fields and update the view to return new information.

  • Update the DB model and create migration

  • Update Cluster Versioned Object

  • Make modifications to the manager to support the new RPC calls.

Dependencies

This work has no additional dependency besides the basic Active-Active mechanism being in place, which it already is.

Testing

Only unit tests will be implemented, since there is no reference driver that implements replication and can be used at the gate.

We also lack a mechanism to actually verify that the replication is actually working.

Documentation Impact

From a documentation perspective there won’t be much to document besides the changes related to the API changes.

References