NetApp Tech OnTap

Tackle the Top Three VMware Backup Challenges

The rapid rate of VMware® deployment in modern data centers has created new challenges. Backup and recovery methods that worked well for physical servers may not be up to the job of data protection once you virtualize.

I want to discuss the three main VMware backup challenges:

  • Reducing the risk associated with consolidating tens of physical servers onto one ESX server
  • Improving backup performance
  • Eliminating the complexity of backing up tens or hundreds of virtual machines (VMs)

I’ll begin by discussing these problems in a general way, followed by details about specific NetApp® solutions that may help you enhance your VMware backup.

Are You Fully Protected?


When you consolidate servers, you also consolidate risk. If you take 10 or more physical servers and consolidate them onto a single VMware ESX server, you eliminate a lot of physical complexity, but you put a lot of eggs in one basket. Protection levels that seemed adequate for standalone servers may be exposing your operations to unnecessary risks.

Here are a few quick quiz questions to ask yourself:

  • Is my infrastructure as robust as it could be? Outside of the server itself, such things as RAID-protected storage, replicas, redundant HBAs, networks, SANs, and so on can improve resiliency.
  • Are my backups consistent at both the application and VM level? For performance, VMs and many common applications cache data in memory. This information has to be flushed to disk, writes to the file system must be temporarily suspended, and the application or VM has to be quiesced to create a consistent backup. More than one early adopter of VMware was surprised to discover that the backups they were creating were not consistent. A consistent backup allows operations to be restarted when a backup is restored.
  • Am I backing up my data frequently enough? With more applications tied to a single server, you may need to increase your backup frequency, providing multiple recovery points.
  • Is the backup workload on my ESX server affecting business operations? You can’t afford to have backups running outside of your designated backup window and slowing down your business.

A careful review of these questions may reveal some hidden gotchas in your backup plan.

Improving Performance

An ESX server hosting multiple virtual machines may not have the same aggregate I/O or CPU capability as the physical machines it replaces. Bandwidth-intensive backup and restore operations run in VMs can result in server bottlenecks that extend backup windows and decrease restore performance to unacceptable levels.

There are two approaches that you can take to improve backup performance:

  • Decrease the workload associated with backup by moving less data
  • Offload your backup workload from servers to storage

Eliminating Complexity

Some IT shops run tens of VMs per physical server and may manage hundreds of VMs in total. That’s a lot of backup agents to install and a lot to track and manage, and it can be difficult to ensure that everything is properly protected. Some things that may help are:

  • Reduce complexity by working at the ESX server level rather than at the individual VM level
  • Allow your storage to process backups, retention policies, and schedules
  • Choose appropriate management tools

NetApp Backup Solutions for VMware

Traditional backup methods can put a significant load on your ESX servers, resulting in extended backup windows and reduced performance for critical applications when backups are running. NetApp offers a complete suite of solutions designed to reduce risk, improve performance, and eliminate the complexity of backup and restore in VMware (and other) environments while reducing or eliminating server load. These solutions can dramatically shrink your backup windows and accelerate restores.

This discussion focuses on five solutions:

  • SnapManager® for Virtual Infrastructure
  • Other SnapManager Suite solutions for application backup
  • SnapVault®
  • Open Systems SnapVault
  • Protection Manager

If you’re already using NetApp storage systems as primary storage for VMware (either by using VMFS or over NFS), SnapManager for Virtual Infrastructure and SnapVault solutions are good options for local and disk-to-disk backups, respectively. If not, you can use Open Systems SnapVault in combination with NetApp secondary storage to achieve the same benefits for non NetApp primary storage. NetApp Protection Manager can simplify backup management in these environments, especially if you have large numbers of replicas to maintain.

SnapManager for Virtual Infrastructure
This data management tool allows you to create fast, consistent Snapshot™ copies of VMs running on VMFS or using NFS datastores. (Raw Device Mapping [RDM] is not currently supported.) It eliminates backup processing from your servers and moves the processing requirement to underlying storage.

 

Using the SnapManager for Virtual Infrastructure GUI, you can create a backup and retention schedule to regularly protect your VMs. Backups can be performed at either the VM or the data store level. Once a backup is created, it can be mounted by ESX to verify the backup before it is needed.

When a backup is scheduled, SnapManager for Virtual Infrastructure communicates with VirtualCenter and NetApp storage as necessary to coordinate the backup, as illustrated in Figure 1. Because backups occur very quickly and put almost no load on ESX servers, you can create many more backups per day for greater data protection.

Figure 1) SnapManager for Virtual Infrastructure Snapshot creation. SnapManager for Virtual Infrastructure signals VirtualCenter (VC) to place selected VMs (VM1, VM3, and VM6) in hot backup mode. Next, it triggers NetApp primary storage to create appropriate Snapshot copies. Finally, it signals VC again to take the VMs out of hot backup mode.

Once a Snapshot exists on primary storage, it can be retained locally, backed up to tape, or replicated to secondary storage by using NetApp SnapMirror® for disaster recovery. These features are controlled directly from SnapManager for Virtual Infrastructure. You can also manually coordinate your Snapshot schedule with SnapVault to back up your captured Snapshot copies to a local or remote secondary storage system, as described later.

SnapManager for Virtual Infrastructure restore gives you the ability to restore specific VMs or entire datastores, including the ability to restore VMs that have been deleted from VirtualCenter. The VM to be restored must be in the “powered-off” state. VMs are powered off before a restore is done.

SnapManager for Virtual Infrastructure is fully VMotion™ aware. It communicates with VirtualCenter, so it knows where a VM resides after VMotion has been run. A VM cannot be backed up until the VMotion activity is complete.

SnapManager Suite
SnapManager for Virtual Infrastructure quiesces underlying VMs so that they are consistent when backed up. Unfortunately, it cannot ensure that applications running in the VM are also quiescent and consistent. This requires more application specificity. For consistent application backups, you can install NetApp SnapDrive® along with one of the other SnapManager solutions (providing tailored protection for applications including Microsoft® Exchange, SQL Server™, and Oracle®) to create consistent application Snapshot copies. A recent Tech OnTap article provides more information about these products as applied in physical server environments. Like SnapManager for Virtual infrastructure, the load that these SnapManager products put on the server is minimal compared to traditional backup.

NetApp SnapVault
NetApp SnapVault enables you to vault Snapshot copies to secondary storage, either in the same data center or at a remote site, for longer-term retention. Unfortunately, there is currently no explicit coordination between SnapVault and SnapManager for Virtual Infrastructure or other SnapManager products, so you have to write a script to coordinate the Snapshot schedule that you create with these products with your SnapVault schedule.

Because all the work occurs on the storage system, SnapVault operation has no impact on your ESX servers. Each SnapVault backup is a read-only version of a file system at a particular point in time. These file systems can be shared or mounted and used for various purposes such as cloning, testing, recovery of VMDKs, and so on.

The first step in creating a SnapVault relationship is to perform a baseline transfer that makes an exact replica of the data store (containing VMs) to be protected. Subsequent SnapVault backups transfer only data blocks that have changed since the last backup, so it is highly efficient in terms of both network bandwidth and storage space. Most backup methods back up entire files, even when only a single block has changed.

SnapVault backups avoid the high level of duplication that occurs with traditional backups that create copies of the same files day after day, even when they have changed very little. Only blocks that change are replicated and stored by SnapVault. This makes SnapVault backups highly efficient in terms of space consumption and changes the economics of disk-to-disk backup versus tape. (See the companion article in this issue for more details.)

Once your backups are vaulted on secondary storage, you have several options:

  • You can recover entire data stores (qtree restore). Use of Protection Manager (optional) simplifies this function.
  • You can recover individual VMs (with NFS datastores). Use of Protection Manager (optional) simplifies this function.
  • You can recover single files from within a virtual machine by using FlexClone®.
  • You can clone the replicated VMs on your secondary storage with NetApp FlexClone and use the clones for testing and development.
  • You can replicate your backups off site for regional disaster recovery.
For more information about using SnapVault with VMware, see TR 3610.

Figure 2) Using SnapVault to provide a centralized backup repository for VMware. Once backups are stored on secondary storage, you can apply additional NetApp technologies such as NetApp deduplication for further data reduction or NetApp FlexClone to create copies for testing and development and so on .

Open Systems SnapVault
Historically, Open Systems SnapVault has been a successful solution for backing up platforms such as Windows®, UNIX®, and Linux®. Its use of block-level incremental transfers—like those described earlier for SnapVault—makes it especially suitable for remote offices where slow network connections typically impede centralized backups, while reduced IT staff makes local backup and data management problematic.

Open Systems SnapVault moves data management tasks from your remote sites to a centrally managed location. Like SnapVault, it uses network bandwidth and secondary disk storage in an extremely efficient manner, making disk-to-disk backup more economical.

With the release of Open Systems SnapVault version 2.6, you now have two options for backing up your virtual infrastructure with Open Systems SnapVault, making it a good option if your VMs are not stored on NetApp primary storage:

  • You can install the Open Systems SnapVault agent in each individual VM and back them up to secondary storage just as you would a physical server. The data retains the same format as the original virtual server.
  • With Open Systems SnapVault 2.6, the agent software can be installed in the Service Console of the ESX server.

The latter approach makes it possible to back up the individual files that make up each virtual machine. In other words, you can now back up . vmx, . vmdk, . nvram, and . log files, making bare metal and entire VM recovery possible.

Because you only have to install and manage Open Systems SnapVault on each ESX server rather than on each individual VM, the complexity of your backup environment is significantly reduced.

Like SnapMirror for Virtual Infrastructure, Open Systems SnapVault is VMotion aware. You can learn more about using Open Systems SnapVault with VMware in the OSSV Best Practices Guide for Virtual Infrastructure.

Protection Manager
The final NetApp tool I want to talk about is Protection Manager. Protection Manager simplifies the configuration, management, and monitoring of Snapshot copies, disk-based backup, and replication across a NetApp storage infrastructure; and it can also incorporate Windows, UNIX, Linux, and VMware server backup through Open Systems SnapVault. It is the preferred management interface for both SnapVault and Open Systems SnapVault.

Protection Manager provides policy-driven data management that can simplify your backup environment by eliminating many of the repetitive manual processes that are required in typical backup environments.

Conclusion

To achieve optimum data protection from VMware backups, you need to eliminate risk, accelerate performance, and reduce complexity. The first part of this article offers some general guidelines for achieving these goals. I’ve also described a set of unique NetApp solutions that can have significant benefits in VMware environments.

By implementing SnapManager for Virtual Infrastructure and/or SnapVault, you can do VM-consistent backups faster and more frequently while moving the backup workload off of your VMware servers to significantly reduce business risk. You don’t have to install software in each VM you want to back up, and you can do backups at the data store level to further reduce complexity. You also have the option of installing software from the NetApp SnapManager Suite of products to back up applications running in VMs.

Open Systems SnapVault provides similar benefits for non NetApp primary storage environments. Although the backup workload is not completely removed from your VMware servers, Open Systems SnapVault substantially reduces I/O requirements to better match server capabilities, and installation at the ESX server level substantially reduces the complexity of configuring backups that may need to include tens or hundreds of VMs.

Finally, Protection Manager works with both SnapVault and Open Systems SnapVault (plus NetApp SnapMirror software for disaster recovery) to significantly reduce the complexity of managing environments with a large number of backup and/or replication relationships.

Darrin Chapman

Darrin Chapman
Data Protection Subject Matter Expert and Technical Marketing Manager
NetApp

Darrin Chapman is the person you turn to for just about any question involving disaster recovery or backup and recovery at NetApp. He's been involved with almost every NetApp best practices guide about data protection since 2002, and in his spare time he designs training courses for customers and NetApp technical staff.

Originally schooled as an electrical engineer, Darrin's background includes several years in systems architecture for AT&T, Nortel, and EMC .

 
Explore