Failover, Move, and Failback

To ensure a successful failover, move, or failback operation, the operation should always use a commit policy set to None to allow an indefinite time period to verify the successful recovery of Oracle Database virtual machines.

After a successful recovery has been confirmed, the VPG should be committed to the point-in-time selected. If the verification is not successful, then a different point-in-time should be selected from the journal and verification to this new point-in-time performed.

Zerto best practice: Before performing a move operation for Oracle Database virtual machines, take a database consistent checkpoint using the sample scripts in this document. This ensures that there is a guaranteed point-in-time in the journal if the database services do not shut down in a clean state.

In a failover operation you can failover to the most recent point-in-time to minimize data loss or to failover to the last database consistent point-in-time and accept the data loss for the time interval between the two points.

Deciding which point-in-time to use when performing a failover operation is both a user and business decision. Using the None commit policy allows you to verify failing over to the most recent crash-consistent point-in-time and, if this does point is not valid, you can perform a failover to the last database consistent point-in-time.

Zerto recommends ensuring that sufficient leeway is built into the RTO SLA defined for the business to allow for the virtual machines to be powered on, database services started, and the recovery verified as successful before re-commencing business operations. For example, if the virtual machine and VM service RTO take 15 minutes to start, a recommended RTO SLA for the business would be 45 minutes to allow verification and subsequent recovery to a previous point-in-time if the first point‑in‑time does not recover to a working state.

Sufficient time should be allowed when configuring failback to allow for a delta sync to read both source and target disks and then replicate the delta changes. The speed of a delta sync operation for Oracle Database VMs is dependent on the following factors:

Amount of data changed since the failover operation
Speed of the IP link and available bandwidth for replicating the change
Speed of both the source and target storage
Size of the virtual disks and amount of free space
The latency to the source and target storage

The delta sync process does not run as fast as the VRAs can run, it runs as fast as possible without impacting the performance of the Oracle Database virtual machine. This is done by dynamically scaling up and down the read rate to the protected virtual machine disks depending on the latency to the storage to ensure no performance impact of the delta sync process. This ensures that the performance of Oracle Database virtual machines in a recovery scenario is the highest priority with subsequent failback to production a background task that can be left to run until the VPGs are in a protected state ready for failback.

A failback operation is typically a planned migration back to the production site.

Zerto best practice: Perform a failback out of working hours. The failback involves shutting down the virtual machines now running in the disaster recovery site. A database consistent point-in-time should be added before initiating the failback procedure. The failback should also be tested by a test failover process to validate the process and recovery time objective.