Subcloud GEO Redundancy Error Root Cause and Correction Action¶
This section describes different error scenarios that can occur while using the GEO Redundancy feature. The error scenarios described here are based on the assumption that you are dealing with two distributed clouds, site A and site B. In this context, the GEO Redundancy feature is activated designating site A as the primary site and site B as the non-primary site. The GEO Redundancy feature allows migration of subclouds to the non-primary site when the primary site becomes unavailable, and also allows migrating them back to the primary site when it becomes available again.
The error scenarios are divided into the following categories:
Protection group setup¶
This scenario covers the errors detected during setup of the protection group and issues.
Error scenarios |
Recovery mechanism |
---|---|
Site A goes down temporarily in the middle of association. |
Upon site A recovery, the peer group association will automatically change its sync status to The administrator can trigger re-sync from the Possible values of Possible values of |
Site A is down in the middle of synchronization and remains offline for an extended period of time. How does the user check the syncing status from site B to initiate the migration? |
The administrator can check the peer group association sync status in the non-primary site to decide the next step. If the sync status is |
After initial sync is completed, site B goes down. How does site A sync to site B after site B comes back online? |
Site A needs to keep track of subcloud group updates when site B is down. The sync status will go into unknown status in site A. The peer group association sync status in site A will change to If changes are made to the peer group while site B is offline, the sync status in site A will change to |
Site B is offline while creating peer group association to associate peer and a SPG. |
Creation of association will be accepted but The administrator can re-sync the association after site B is online using the dcmanager peer-group-association sync <SiteA-Peer-Group-Association-ID> command. |
Swact occurs in site A while a peer group association is syncing. |
Expected behavior should be similar to that of site A abrupt shutdown during sync. Re-sync needs to be done. |
Swact occurs in site B while a peer group association is syncing. |
Expected behavior should be similar to that of site B abrupt shutdown during sync. Re-sync needs to be done. |
In the event of either site going down or swact occurring:
|
|
Migration¶
Assumption: Subclouds will be migrated to site B if site A goes down.
The following are the error scenarios that can occur during peer group migration.
Error scenarios |
Recovery mechanism |
---|---|
What will be the status of the SPG if some subclouds failed to migrate? |
After the migration, you can use dcmanager subcloud-peer-group list-subclouds to check the subclouds status under this SPG and you can check the SPG status using dcmanager subcloud-peer-group status. Re-run the dcmanager subcloud-peer-group migrate PEER_GROUP command after fixing the failure. |
How to recover when the subcloud rehome fails because of incorrect bootstrap address or bootstrap values and site A cannot recover in a time period? |
When site A goes down, migrate SPG to site B. The subcloud will go to the |
How to fix when the subcloud has incorrect bootstrap address or bootstrap values in the following situations of the SPG migration of site B?
|
Check the SPG migration status using the command dcmanager subcloud-peer-group status command to confirm if it has a subcloud in
Use the dcmanager subcloud update --bootstrap-address and dcmanager subcloud update --bootstrap-values commands to update the subcloud. You do not need to remove the rehome failed subcloud from the SPG. |
Site B goes down during SPG migration. |
Re-execute the SPG migration if there is any subcloud with |
Post migration¶
Audit operations will be triggered when the network is restored or
migration_status
of the peer group retrieved is changed to complete
.
Error scenarios |
Recovery mechanism |
---|---|
Site B goes down after the SPG has been migrated to its site. |
Upon site A recovery, the administrator can trigger the migration of the SPG back to site A. |