Testing Recovery : Live Disaster Recovery Testing : Basic Verification – User Traffic Is Not Run against the Recovered VMs
  
Basic Verification – User Traffic Is Not Run against the Recovered VMs
Basic testing that the virtual machines can recover is done using either a Failover Test operation or an uncommitted Move operation, using the Rollback setting.
Using a Failover Test Operation
You use a Failover Test operation if recovering the virtual machines in a sandbox, using the test network specified in the VPG definition for network isolation, is sufficient for a test. The Failover Test operation is described in “The Failover Test Operation”, on page 327 and in Starting and Stopping Failover Tests.
See the following sections:
Using a Failover Test Operation: Recommended Procedure for a Live DR Test
Using a Failover Test Operation: Failover Test Considerations
Using a Failover Test Operation: Recommended Procedure for a Live DR Test
1. Change the VPG Failover Test Network to the production network used at the recovery site.
2. Manually shut down the virtual machines in the VPG.
3. Insert a new checkpoint. This avoids potential data loss since the virtual machines are shut down and the new checkpoint is added after all I/Os have been written to disk.
4. Optionally simulate a disaster, for example by disconnecting the two sites.
5. Perform a test failover on the VPG, choosing the checkpoint you added in step 3.
6. Verify that the test machines are recovered as expected.
7. Run user traffic against the virtual machines.
8. Stop the failover test.
9. Reconnect the sites.
Using a Failover Test Operation: Failover Test Considerations
You do not have to shut down the protected virtual machines, and changes from the test phase are not kept or applied to the protected applications.
You can recover to a specific point-in-time.
You can use an isolated network to enable testing in a sandbox environment and not a live DR environment. This is the recommended practice.
During the testing period, every change is recorded in a scratch volume.
Therefore, since both the scratch volume and virtual machines tested are on the same site, performance can be impacted by the increased IOs during the failover test.
In addition, the longer the test period the more scratch volumes are used, until the maximum size is reached, at which point no more testing can be done.
The maximum size of all the scratch volumes is determined by the journal size hard limit and cannot be changed.
The scratch volumes reside on the storage defined for the journal.
At the end of the test, if you powered off the virtual machines in the protected site, you can power them back on and continue to work without the need to save or replicate back any data changed during the test.
You can also use a Failover Test operation if you want to simulate an actual disaster for around an hour or less and do not want to save any changes on the recovery site.
 
 
Using an Uncommitted Move Operation
Use a Move operation with the commit/rollback policy set to rollback after the test period, if you need to test the recovery of virtual machines in the recovery site production environment. The Move operation is described in “Moving Protected Virtual Machines to a Remote Site”, on page 345.
Note: Committing the Move operation requires failing the migrated virtual machines back to the production site after a Delta Sync has been performed on the committed machines in the recovery site.
Recommended Procedure for a Live DR Test
1. In the Move wizard, in the EXECUTION PARAMETERS tab, for commit policy, select None.
2. Either power off the relevant virtual machines or check the Force Shutdown checkbox, in the EXECUTION PARAMETERS tab, to make sure that the virtual machines are shut down, if they cannot be powered off using VMware Tools.
3. After testing the machines in the recovery site, roll back the Move operation, which will return the virtual machines to their pre-test state.
Changes from the pre-commit phase are not kept or applied to the protected applications.
The virtual machines are allocated disks and connected to the network for a full test of the environment.
The protected machines are turned off until the end of the test, ensuring that there are no conflicts between the protected site and recovery site.
During the testing period, every change is recorded in a scratch volume to enable rolling back.
Therefore, since both the scratch volume and virtual machines being moved are on the same site, performance can be impacted by the increased IOs during the testing period.
In addition, the longer the test period the more scratch volumes are used, until the maximum size is reached, at which point no more testing can be done.
The maximum size of all the scratch volumes is determined by the journal size hard limit and cannot be changed.
The scratch volumes reside on the storage defined for the journal.
You can only recover to the last checkpoint written to the journal, at the start of the Move operation.