Performance Impact of Snapshot-Based Replication
By Chris Snell, Zerto Sales Engineer, EMEA
This is the third post in our three-part ‘blogging from the VMworld show floor” series, covering the most common questions we get at VMworld. The most popular question today is, “Does Zerto use snapshots?”
In a word, “No” — but an explanation is due:
There are many products in the market now which provide protection for virtual environments using snapshots. One of the reasons is that VMWare provides a technology, APIs for Data Protection (VADP), whose primary aim is to provide a simple mechanism for protecting virtual machines(VMs) via snapshots. VM snapshots certainly present a superior mechanism to backup VMs compared to legacy technologies, such as backup agents designed for physical machines, but there are some negative impacts of using snapshots.
First of all, snapshots are not backups or replicas. VMWare snapshots work by keeping a record of changing data, the delta file, while the original disk does not change. So unless you copy the snapshot to secondary media, any media failure within the production data is likely catastrophic. Moving the snapshot data to a secondary storage area takes time, leaving the snapshot open during the process.
Snapshots can use up a lot of storage, both in terms space and IOPS. As snapshots work by keeping a record of changing data, it is necessary to store all of the changes that happen to the disk while the snapshot is in use. The size of the delta file has a direct impact on the length of time it takes to delete the snapshot associated to the child disk.
Deleting the snapshot at the end of a snapshot based backup/replication is an important consideration when looking at implementing new technologies. When a snapshot is in place the changes that happen to a VM are kept in temporary Consolidate Helper Snapshot files. If we imagine a VM hosting SQL Server, there will be a lot of IO happening during the backup/replication. These changes in the Consolidate Helper Snapshot files must be added to the VM as the snapshot gets deleted, once it has been fully copied. This VMWare knowledge-base article helps to explain why this problem occurs.
A quick look at any vendor who uses snapshot based backup/replication technology will confirm that this problem still happens, despite improvements from VMWare. Of course, the gravity of the stun is dependent on factors such as the type and configuration of storage used. It is possible, with a fair amount of forethought and planning, to iron out such issues.
Something else to consider when using traditional snapshot based backup/replication, is the Recovery Point Objective (RPO). As explained above, to convert a snapshot in to something that is considered a backup/replica involves copying the snapshot to another location. Transferring a typically sized 100GB snapshot would take at least 15 minutes. To understand the time it would take the full backup/replica process to finish would include allowing time for the snapshot to be created and committed, plus other tasks to happen. This all leads to large intervals between recovery points and explains why snapshot-based technologies are not achieving the required near zero RPO that for business critical applications and VMs.
An additional consideration to highlight is that many snapshot-based technology vendors, including the household names, are moving away from utilizing VM Snapshots, and are instead pushing storage array integration to achieve their published RPO\RTO targets. There is a general ecosystem-wide admission that, while VM Snapshot technology was a good temporary solution, ultimately there is a better way of doing things that has less impact to the virtual machines, better recovery points achieved, better recovery times achieved, and so on.
Here at Zerto, we have achieved an aggressive and automated approach with no impact to the production VMs whatsoever. By creating a hypervisor based solution, copying the write data between hypervisor and storage, Zerto creates no impact on the production VMs that are being protected. Users are able to work as usual without suffering outages. By offering a near-synchronous and continuous stream of replicated data from production to disaster recovery site, Zerto is able to create RPOs of just seconds. The continuous stream of data is then used to update the replica, but also stored in time order within the journal, providing write order fidelity. Users are can then select any point in time in the recent past as a failover point, unlike snapshot based backup/replication technology where you are limited to recovery points on the widely spaced snapshots.
To see the benefits of Zerto’s hypervisor-based solution for yourself click here for our free trial.