Testing Recovery : Live Disaster Recovery Testing : Run User Traffic Against the Recovered VMs
  
Run User Traffic Against the Recovered VMs
Testing actual user traffic against recovered virtual machines can be done using a Clone, Move, or Failover operation, as follows:
Move operation: When you can shut down the protected virtual machines but you do not want or need to simulate an actual disaster.
Failover operation: When you want to simulate an actual disaster.
Clone operation – When the protected application has to run throughout the test.
Using a Move Operation
Use a Move operation when you can shut down the protected virtual machines but you do not want to simulate an actual disaster. After the virtual machines have been recovered in the target site, they are used as the protected machines for as long as the test lasts. The Move operation is described in “The Move Operation”, on page 328 and in “Moving Protected Virtual Machines to a Remote Site”, on page 345.
See the following sections:
Using a Move Operation - Recommended Procedure for a Live DR Test
Using a Move Operation - Move Considerations
Using a Move Operation - Recommended Procedure for a Live DR Test
1. To enable using the Move functionality for a DR test, in the Move wizard, in the EXECUTION PARAMETERS tab, for commit policy, select None.
2. Move the VPG back to the original protected site. A Delta Sync is performed to copy the new transactions performed on the virtual machines in the recovery site back to the original protected site.
Using a Move Operation - Move Considerations
You can test the moved machines before they are committed.
You can test for as long as you want.
The virtual machines are allocated disks and connected to the network for a full test of the environment.
The originally protected disks are maintained for a faster failback when reverse replication is specified.
The protected machines are turned off until the move is committed and then they are removed from the protected site. This ensures that there are no conflicts between the protected site and recovery site.
You must test to the last checkpoint, taken after the protected virtual machines are shut down.
An actual disaster is not simulated.
During the testing period, if reverse replication is not specified, there is no protection for the recovered machines.
 
 
Using a Failover Operation
Use a Failover operation when you can shut down the protected virtual machines and you want to simulate an actual disaster. After the virtual machines have been recovered in the target site they are used as the protected machines for as long as the test lasts.
Using a Failover operation to test DR requires specific steps to ensure that the virtual machines are gracefully migrated to the target site, similar to a Move operation and that, like a Move operation, they can be verified prior to committing the failover. The Failover operation is described in “The Failover Operation”, on page 328.
See the following sections:
Using a Failover Operation - Recommended Procedure for a Live DR Test
Using a Failover Operation - Failover Considerations
Using a Failover Operation - Recommended Procedure for a Live DR Test
1. Manually shut down the virtual machines.
2. Insert a new checkpoint. This avoids potential data loss since the virtual machines are shut down and the new checkpoint is added after all I/Os have been written to disk.
3. Optionally simulate a disaster, for example by disconnecting the two sites.
4. Perform a live failover on the VPG, specifying the commit policy and choosing the checkpoint you added in step 2. Choose a commit policy that will give you the necessary time to check that the failed over virtual machines have been successfully recovered to the correct point-in-time and if they are not, you are able to roll back the failover.
5. Continue to use the recovered virtual machines.
6. The VPG is in a Needs configuration state, because there is no access to the protected site.
After testing the recovered virtual machine you can finalize the live DR test and fail the virtual machines back to the original protected site:
1. Reconnect the sites.
2. Enable protection for the virtual machines by editing the VPG and clicking DONE.
3. Zerto Virtual Replication uses the original disks to preseed the volumes and expedite the synchronization between the two sites, using a Delta Sync.
The time it will take for the Delta Sync to complete is based on total size of the disks and storage performance at both sites.
After the synchronization completes, the VPG enters the Meeting SLA state.
4. Perform a Move operation to fail back the virtual machines to the original protected site.
5. In the Move wizard, in the EXECUTION PARAMETERS tab, for commit policy, set the commit policy to enable basic testing before the move is committed.
The virtual machines are recovered at the original protected site, and the VPG enters a Delta Sync phase before it enters a Meeting SLA state.
Using a Failover Operation - Failover Considerations
The originally protected disks are maintained for a faster failback.
Using the Failover operation for testing is non-intuitive.
Testing by using the Failover operation requires performing manual procedures, such as shutting down the protected virtual machines.
During the testing period, there is no protection for the recovered machines.
 
Using a Clone Operation
Use the Clone operation when the protected application must continue to run throughout the test. Create a clone of the virtual machines in a VPG on the recovery site to a specific point-in-time. The clone is a copy of the protected virtual machines on the recovery site, while the virtual machines on the protected site remain protected and live. The Clone operation is described in “The Clone Operation”, on page 329 and in “Cloning a VPG to the Recovery Site”, on page 366.
The cloned virtual machines are independent of Zerto Virtual Replication. At the end of the test you can remove these machines or leave them.
You use the Clone operation when the source application has to continue throughout the test.
You can create a clone of the virtual machines in a VPG on the peer site to a specific point-in-time.
The clone is a copy of the protected virtual machines on the recovery site, while the virtual machines on the protected site remain protected and live.
The Clone operation is described above, and in the Zerto Virtual Manager Administration Guide for the VMware vSphere Environment.
The cloned virtual machines are independent of Zerto Virtual Replication. At the end of the test you can remove these machines or leave them.
Using a Clone Operation - Clone Considerations
You can clone to a specific point-in-time.
There is no protection for the cloned machines.
After use of the clone ends, no changes made to the cloned virtual machines are applied to the protected virtual machines.
The original virtual machines on the source site are live and online throughout the test.