Initiating a Failover

Managing Failover : Initiating a Failover

You can initiate a failover, whereby the virtual machines in the virtual protection group (VPG) are replicated to a set checkpoint in the recovery site. As part of the process you can also set up reverse protection, whereby you create a VPG on the recovery machine for the virtual machines being replicated, pointing back to the protected site.

You can initiate a failover to the last checkpoint recorded in the journal, even if the protected site is no longer up. You can initiate a failover during a test, as described in Initiating a Failover During a Test.

If you have time to initiate the failover from the protected site you can. However, if the protected site is down, you initiate the failover from the recovery site.

Note:

■ If a virtual machine is protected in several VPGs, and Reverse Protection is selected, the virtual machine is removed from all of the VPGs containing the virtual machine. The journals of these VPGs are reset.

If a vCD vApp is protected in several VPGs, and reverse protection is selected, all VPGs containing the vCD vApp are deleted.

If Reverse Protection is selected, and the virtual machines or vCD vApp are already protected in other VPGs, continuing with the operation will cause the other VPGs protecting the same virtual machines or vCD vApp to pause the protection. In the event of vCD vApp or if no other virtual machines are left to protect, the entire VPG will be removed. To resume the VPGs protection, you must edit the VPGs on the other sites and remove the virtual machine that was failed over from the protected site.

Protecting virtual machines or a vCD vApp in several VPGs is enabled only if both the protected site and the recovery site, as well as the VRAs installed on these sites, are of version 5.0 and higher.

■ Any VPGs that are in the process of being synchronized, cannot be recovered, unless the synchronization is a bitmap synchronization.

To initiate a failover:

1. In the Zerto User Interface set the operation to LIVE and click FAILOVER.

The Failover wizard appears.

2. Select the VPGs to failover. By default, all VPGs are listed.

At the bottom, the selection details show the amount of data and the total number of virtual machines selected.

The Direction arrow shows the direction of the process: From the protected site To the peer, recovery, site.

3. Click NEXT.

The EXECUTION PARAMETERS window is displayed.

You can change the following values to use for the recovery:

■ Commit Policy

■ Checkpoint to use

■ Force Shutdown

■ Reverse Protection settings

In this window you can also see if a Boot Order and Scripts are defined for the VPG.

4. By default, the last checkpoint added to the journal is displayed in the Checkpoint column

■ To use this checkpoint, proceed to the next step.

■ To change the checkpoint, click the link that appears as the checkpoint.

A window appears, displaying a list of the VPGs’ checkpoints.

Latest: Recovery is to the latest checkpoint. This ensures that the data is crash-consistent for the recovery.

When selecting the latest checkpoint, the checkpoint used is the latest at this point.

If a checkpoint is added between this point and starting the failover, this later checkpoint is not used.

Latest Tagged Checkpoint: The recovery operation is to the latest checkpoint added in one of the following situations:

■ By a user.

■ When a failover test was previously performed on the VPG that includes the virtual machine.

■ When the virtual machine was added to an existing VPG after the added virtual machine was synchronized.

5. To use a checkpoint which is not the latest checkpoint, or the latest tagged checkpoint, choose Select from all available checkpoints. By default, this option displays all checkpoints in the system. You can choose to display only automatic, or tagged checkpoints, or any combination of these types.

6. Click OK.

7. To change the commit policy, click the field or select the VPG and click EDIT SELECTED.

a) To commit the recovery operation automatically, with no testing, select Auto-Commit and 0 minutes.

b) Select None if you do not want an automatic commit or rollback. You must manually commit or roll back.

c) To test before committing or rolling back, specify an amount of time to test the recovered machines, in minutes.

This is the amount of time that the commit or rollback operation is delayed, before the automatic commit or rollback action is performed.

During this time period, check that the new virtual machines are OK and then commit the operation or roll it back.

The maximum amount of time you can delay the commit or rollback operation is 1440 minutes, which is 24 hours.

Testing that involves I/O is done on scratch volumes.

■ The more I/Os generated, the more scratch volumes are used, until the maximum size is reached, at which point no more testing can be done.

■ The maximum size of all the scratch volumes is determined by the journal size hard limit and cannot be changed.

■ The scratch volumes reside on the storage defined for the journal.

8. To specify the shutdown policy, click the VM Shutdown field and select the shutdown policy:

■ No (default): The protected virtual machines are not touched before starting the failover. This assumes that you do not know the state of the protected machines, or you know that they are not serviceable.

■ Yes: If the protected virtual machines have VMware Tools available, the virtual machines are gracefully shut down, otherwise the Failover operation fails. This is similar to performing a Move operation to a specified checkpoint.

■ Force Shutdown: The protected virtual machines are forcibly shut down before starting the failover. This is similar to performing a Move operation to a specified checkpoint. If the protected virtual machines have VMware Tools available, the procedure waits five minutes for the virtual machines to be gracefully shut down before forcibly powering them off.

9. To specify reverse protection, whereby the virtual machines in the VPG are failed over to the recovery site and then protected in the recovery site, back to the original site, either:

■ Click REVERSE PROTECT ALL. This activates reverse protection on all the VPGs that you plan to failover. The system default values for this procedure will be assigned to all the VPGs.

- Or -

■ Click the Reverse Protection field and select REVERSE.

a) To configure the VPG for reverse protection, click the REVERSE link.

The Edit Reverse VPG wizard is displayed.

You can edit the reverse protection configuration. The parameters are the same as described when you create a VPG, described in “Protecting Virtual Machines from a vCenter Server”, on page 45, with the following differences:

■ You cannot add or remove virtual machines to the reverse protection VPG.

■ By default, reverse protection is to the original protected disks. You can specify a different storage to be used for the reverse protection.

■ If VMware Tools is available, for each virtual machine in the VPG, the IP address of the originally protected virtual machine is used. Thus, during failback the original IP address of the virtual machine on the site where the machine was originally protected is reused. However, if the machine does not contain the utility, DHCP is used.

■ The host version must be 4.1 or higher for re-IP to be enabled.

IMPORTANT:

■ The virtual machines or vCD vApp will be removed from the other VPGs that are protecting them if the following conditions apply:

■ The virtual machines or vCD vApp are already protected in other VPGs

■ Reverse protection is specified

■ If your VPG has a vCD vApp, or if there are no other virtual machines left to protect, the entire VPG will be removed.

■ If Reverse Protection is selected, and the virtual machines or vCD vApp are already protected in other VPGs, continuing with the operation will cause the other VPGs protecting the same virtual machines or vCD vApp to pause the protection. In the event of vCD vApp or if no other virtual machines are left to protect, the entire VPG will be removed. To resume the VPGs protection, you must edit the VPGs on the other sites and remove the virtual machine that was failed over from the protected site.

Protecting virtual machines or vCD vApps in several VPGs is enabled only if both the protected site and the recovery site, as well as the VRAs installed on these sites, are of version 5.0 and higher.

When committing the failover, you can reconfigure reverse protection, regardless of the reverse protection settings specified here.

When reverse protection is specified for a VPG residing on a vCD site that is replicating to either a vSphere or Hyper-V site, the boot order settings will not reserve the start delay vCD vApp settings for virtual machines with the same order number.

10. Click NEXT.

■ A warning appears informing the user that the virtual machines or vCD vApp will be removed from the other VPGs that are protecting them.

■ If your VPG has a vCD vApp, or if there are no other virtual machines left to protect, the entire VPG will be removed.

11. Click OK. If a virtual machine is deleted from other VPGs, the journals of these VPGs are reset.

The FAILOVER step is displayed. The topology shows the number of VPGs and virtual machines being failed over to each recovery site. In the following example, 2 VPGs will be failed over to Site6-Ent2-R2, and they contain 5 virtual machines; and 1 VPG will be failed over to Site5-Ent2-P2-R2 and it contains 2 virtual machines.

12. Click START FAILOVER.

A warning message appears, presenting a summary of your Commit Policy.

13. Review the Commit Policy summary, and either click Change Settings, or click START FAILOVER to start the failover.

If a commit policy was set with a timeout greater than zero, you can check the failed over virtual machines on the recovery site before committing the failover operation.

The failover starts by creating the virtual machines in the recovery site to the point-in-time specified: either the last data transferred from the protected site or to one of the checkpoints written in the journal.

Note: If a virtual machine exists on the recovery site with the same name as a virtual machine being failed over, the machine is created and named in the peer site with a number added as a suffix to the name, starting with the number 1.

If the original protected site is still up and reverse protection is configured to use the protected virtual machines virtual disks, these virtual machines are powered off.

The status icon changes to orange and an alert is issued, to warn you that the procedure is waiting for either a commit or rollback.

All testing done during this period, before committing or rolling back the failover operation, is written to thin-provisioned scratch virtual disks. These virtual disks are automatically defined when the machines are created on the recovery site for testing. The longer the test period the more scratch volumes are used, until the maximum size is reached, at which point no more testing can be done. The maximum size of all the scratch volumes is determined by the journal size hard limit and cannot be changed. The scratch volumes reside on the same datastore defined for the journal. Using these scratch volumes makes committing or rolling back the failover operation more efficient.

Note: You cannot take a snapshot of a virtual machine before the failover operation is committed and the data from the journal promoted to the moved virtual machine disks, since the virtual machine volumes are still managed by the VRA and not directly by the virtual machine. Using a snapshot of a recovered machine before the failover operation has completed will result in a corrupted virtual machine being created.

14. Check the virtual machines on the recovery site, then either:

■ Wait for the specified Commit Policy time to elapse, and the specified operation, either Commit or Rollback, is performed automatically.

■ Or, in the specific VPG tab, click the Commit or Rollback icon (

a) If you clicked the Commit icon, the Commit window is displayed to confirm the commit and, if necessary set, or reset, the reverse protection configuration.

■ If the protected site is still up and you can set up reverse protection, you can reconfigure reverse protection by selecting the Reverse Protection checkbox and then click the Reverse link.

■ Configuring reverse protection at this point overwrites any of settings defined when initially configuring the move.

b) If you clicked the Rollback icon, this rolls back the operation, removing the virtual machines that were created on the recovery site and reboots the machines on the protected site.

The Rollback window is displayed to confirm the rollback.

a) You can also commit or roll back the operation via the TASKS popup window in the status bar, or by selecting MONITORING > TASKS.

If the original protected site is still up and reverse protection is configured to use the virtual disks of the protected virtual machines, these virtual machines are removed from this site, unless the original protected site does not have enough storage available to fail back the failed over virtual machines. Finally, data is promoted from the journal to the recovered virtual machines.

IMPORTANT:

■ If Reverse Protection is selected and the virtual machines or vCD vApp are already protected in other VPGs, the virtual machines or vCD vApp are deleted from the protected site and the journals of these VPGs are reset.

This will result in the removal of these virtual machines from other VPGs that are protecting them, or the removal of the entire VPG, in the event of vCD vApp or if no other virtual machines are left to protect.

In the event of vCD vApp or if no other virtual machines are left to protect, the entire VPG will be removed. To resume the VPGs protection, you must edit the VPGs on the other sites and remove the virtual machine that was failed over from the protected site.

Protecting virtual machines in several VPGs is enabled only if both the protected site and the recovery site, as well as the VRAs installed on these sites, are of version 5.0 and higher.

During promotion of data, you cannot move a host on the recovered virtual machines. If the host is rebooted during promotion, make sure that the VRA on the host is running and communicating with the Zerto Virtual Manager before starting up the recovered virtual machines.

By default the virtual machines are started with the same IPs as the protected machines in the protected site. If you do not specify Reverse Protection, the original machines still exist in the protected site and this can create clashes. In this case, Zerto recommends ensuring that a different IP is assigned to the virtual machines when they start, when configuring each virtual machine NIC properties in the VPG, during the definition of the VPG. For details, refer to

“Protecting Virtual Machines from a vCenter Server”, on page 45. If you have defined the new virtual machines so that they will be assigned different IPs, the re-IP cannot be performed until the new machine is started. Zerto Virtual Replication changes the machine IPs and then reboots these machines with their new IPs.

Note: If the virtual machines do not power on, the process continues and the virtual machines must be manually powered on. The virtual machines cannot be powered on automatically in a number of situations, such as when there is not enough resources in the resource pool or the required MAC address is part of a reserved range or there is a MAC address conflict or IP conflict, for example, if a clone was previously created with the MAC or IP address.

When a vCD vApp is failed over to a vCenter Server recovery site, a vCenter Server vCD vApp is created in the recovery site. If Reverse Protection was specified, the VPG state is Needs Configuration and the VPG must be recreated by protecting the virtual machines in the vCD vApp as separate machines and not as part of the vCD vApp.

Conversion considerations for a protected virtual machine in vSphere when it is recovered in Hyper-V:

■ A machine using BIOS is recovered in Hyper-V as a Generation 1 virtual machine.

■ A machine using EUFI is recovered in Hyper-V as a Generation 2 virtual machine.

■ A machine with a 32bit operating system is recovered in Hyper-V as a Generation 1 virtual machine.

■ A machine with a 64bit operating system is recovered in Hyper-V as either a Generation 1 or Generation 2 virtual machine, dependent on the operating system support in Hyper-V.

■ The boot disk is ported to a disk on an IDE controller. The boot location is 0:0.

■ A virtual machine using up to 4 SCSI controllers is recovered as a virtual machine with 1 SCSI controller.

■ The virtual machine NICs are recovered with Hyper-V network adapters except for protected Windows 2003 virtual machines which are recovered with Hyper-V legacy network adapters.

■ When VMware Tools is installed on the protected virtual machine running Windows Server 2012, Integration Services is installed on the recovered virtual machine automatically.

■ RDM disks are replicated to Hyper-V vhd or vhdx disks, and not to Pass-through disks.