The Failover Live Process

Use the Failover operation following a disaster to recover protected virtual machines to the recovery site.

Note: You can also move virtual machines from the protected site to the recovery site in a planned migration. For details, Migrating a VPG to a Recovery Site.

When you set up a failover you always specify a checkpoint to which you want to recover the virtual machines. When you select a checkpoint – either the last automatically generated checkpoint, an earlier checkpoint, or a tagged checkpoint – Zerto makes sure that the virtual machines at the remote site are recovered to this specified point-in-time. By setting a commit policy that enables checking the recovered machines before committing the failover, you can check the integrity of the recovered machines. If the machines are OK, you can commit the failover. Otherwise, you can roll back the operation and then repeat the procedure using a different checkpoint.

The Failover operation has the following basic steps:

If the protected site or Zerto Virtual Manager is down, the process continues with the next step.

If the protected site or Zerto Virtual Manager is still running, the failover requirements are determined:

If the default is requested, doing nothing to the protected virtual machines, the Failover operation continues with the next step.
If shutting down the protected virtual machines is requested and the protected virtual machines do not have VMware Tools available, the Failover operation fails.
If forcibly shutting down the protected virtual machines is requested, the protected virtual machines are shut down and the Failover operation continues with the next step.
Creating the virtual machines at the remote site in the production network and attaching each virtual machine to its relevant virtual disks, configured to the checkpoint specified for the recovery. The virtual machines are created without CD-ROM or DVD drives, even if the protected virtual machines had CD-ROM or DVD drives. Also, the operation is considered successful, even if some of the virtual machines in a VPG fail to be created on the recovery site or are created without their complete settings, for example re-IP cannot be performed.

The operation is considered successful, even if some of the virtual machines in a VPG fail to be created on the recovery site or are created without their complete settings, for example re-IP cannot be performed.

Note: If the virtual machines fail to be created on the recovery site in Public Cloud, the failover operation will not succeed.
Note: The original protected virtual machines are not touched since the assumption is that the original protected site is down.
Preventing automatically moving virtual machines to other hosts: Setting HA to prevent DRS. This prevents automatic vMotioning of the affected virtual machines during the Failover operation.
Powering on the virtual machines making them available to the user. If applicable, the boot order defined in the VPG settings is used to power on the machines.
Note: If the virtual machines do not power on, the process continues and the virtual machines must be manually powered on. The virtual machines cannot be powered on automatically in a number of situations, such as when there are not enough resources in the resource pool or the required MAC address is part of a reserved range or there is a MAC address conflict or IP conflict, for example, if a clone was previously created with the MAC or IP address.
The default is to automatically commit the Failover operation without testing. However, you can also run basic tests on the machines to ensure their validity to the specified checkpoint. Depending on the commit/rollback policy that you specified for the operation, after testing either the operation is committed, finalizing the failover, or rolled back, aborting the operation.
If the protected site is still available, for example, after a partial disaster, and reverse protection is possible and specified for the Failover operation, the protected virtual machines are powered off and removed from the inventory. The virtual disks used by the virtual machines in the protected site are used for the reverse protection. A Delta Sync is performed to make sure that the two copies, the new target site disks and the original site disks, are consistent. A Delta Sync is required since the recovered machines can be updated while data is being promoted.
If reverse protection is selected, and the virtual machines or vCD vApp are already protected in other VPGs, continuing with the operation will cause the virtual machines or vCD vApp to be deleted from other VPGs that are protecting them and to the journals of these VPGs to be reset. In the event of vCD vApp or if no other virtual machines are left to protect, the entire VPG will be removed.
If reverse protection is selected, and the virtual machines or vCD vApp are already protected in other VPGs, continuing with the operation will cause the other VPGs protecting the same virtual machines or vCD vApp to pause the protection. In the event of vCD vApp or if no other virtual machines are left to protect, the entire VPG will be removed. To resume the VPGs protection, you must edit the VPGs on the other sites and remove the virtual machine that was failed over from the protected site.
Note: If reverse protection is not possible, the original protected site virtual machines or vCD vApp are not powered off and removed.
Protecting virtual machines or a vCD vApp in several VPGs is enabled only if both the protected site and the recovery site, as well as the VRAs installed on these sites, are of version 5.0 and higher.
The data from the journal is promoted to the machines. The machines can be used during the promotion and Zerto ensures that the user sees the latest image, even if this image, in part, includes data from the journal.
Note: Virtual machines cannot be moved to another host during promotion. If the host is rebooted during promotion, make sure that the VRA on the host is running and communicating with the Zerto Virtual Manager before starting up the recovered virtual machines.

Failback after the Original Site is Operational

To fail back to the original protected site, the VPG that is now protecting the virtual machines on the recovery site has to be configured and then a Delta Sync is performed with the disks in the original protected site. Once the VPG is in a protecting state the virtual machines can be moved back to the original protected site, as described in Migrating a VPG to a Recovery Site.

See also:

Initiating Failover Live
Reverse Protection for a Failed Over VPG
What Happens When the Protected Site is Down
Initiating Failover Live During a Test