NetApp Tech OnTap
     

Case Study: Using VDI for Telecommuting

In February 2010, an epic snowstorm struck Washington, D.C., and the surrounding area, virtually shutting down the U.S. capital for four days. The U.S. government lost nearly $70 million a day due to lost productivity. While this number is staggering, it could have been as large as $100 million per day. Fortunately, 30% of government workers were able to telecommute, allowing them to continue working during the storm.

Beginning in 2001, the U.S. Congress began passing laws that mandated telework solutions for federal employees as a way to reduce gridlock on Washington, D.C., roadways. It quickly became apparent that teleworking was a way to increase productivity, reduce requirements for office space, increase hiring flexibility, and decrease carbon footprint.

A 2005 statute requires the Department of Commerce to provide a telework solution for every worker who qualifies. My company, Project Performance Corporation, was awarded the contract by one large agency within the Department of Commerce to help meet the telework mandate. Today, the solution we put in place with help from our partners VMware and NetApp supports over 3,000 teleworkers. Based in part on our efforts, this agency has received a number of awards from The Telework Exchange—a public-private partnership--including a 2009 award for best use of innovation and technology.

In this article, I want to describe how we were able to achieve these results using a virtual desktop infrastructure (VDI) solution from VMware in conjunction with NetApp® storage and data management.

Challenges and Requirements

The agency we were working for presented some unique challenges and requirements. For example, the workers have their own union with specific service-level agreements (SLAs) in place. This meant that the solution for each user had to look and feel like the user’s existing desktop and, if there was degradation in performance, SLAs would be violated. This also meant that network performance was critical and that we had to take extra care to ensure that bandwidth and port counts were adequate to support the expected number of simultaneous users.

Because of the requirement to look and feel like the existing environment, there were initially many desktop baselines. We could not simply pick a single baseline configuration to clone or use to build each desktop. Some existing baselines contained legacy applications with hard-coded references to the desktop C:\ drive, which meant that a significant amount of additional storage had to be allocated for each C:\ drive.

Finally, in addition to providing the same look and feel, we had to provide collaboration tools to make it easy for remote users to work together, as well as offer adequate support and training.

Solution Alternatives

We initially considered four possible solution alternatives:

  • Rack and Stack. Physical desktops are relocated to the machine room and made accessible via remote desktop protocol (RDP) and virtual private network (VPN). This had the advantage of being fast and simple, but had big disadvantages in terms of space, power, and logistics.
  • Blade Servers. Provision one blade to replace each desktop. This solution was also relatively fast to implement but, again, resource intensive. Initial experience with this approach within the agency had not been favorable.
  • Terminal Servers. This approach had also been previously tried within the agency. While more resource efficient, it had the disadvantage of changing the look and feel substantially and also created problems for legacy applications.
  • Virtual Desktop Infrastructure. Of the four options, VDI was best able to address the challenges and requirements. However, because it would require a more involved deployment it would be unable to meet the aggressive mandated deadline.

Scorecard showing the relative ranking of various approaches versus requirements.

Figure 1) Scorecard showing the relative ranking of various approaches versus requirements.

Because of the aggressive mandate for getting a solution in place, we ultimately decided on a two-stage rollout. We started with rack and stack as a cheap and easy way to get the process started—despite the complications it created in terms of server room space, extra network infrastructure, and other similar requirements.

Ultimately, we converted to a full VDI implementation using VMware and NetApp. This required additional training as well as some infrastructure changes, but was much better able to address the full project requirements. Because PPC already had substantial experience with VDI deployments on VMware and NetApp, we were able to quickly create a complete lifecycle management plan that addressed all of the agency’s specific requirements.

NetApp was selected as the storage solution for a variety of reasons, including:

  • The NetApp Unified Storage Architecture increased our operational flexibility and simplified administration
  • Superior backup and recovery for virtual desktops and user data
  • Ability to incorporate third-party storage arrays
  • Robust CIFS implementation for home directory support

Solution Architecture

The current VDI solution architecture is illustrated in Figure 2. (Because this project began in 2005, the infrastructure has evolved over time from VMware ESX 2.x to the current ESX 3.x.)

VDI configuration. Two NetApp systems provide VMware virtual desktop storage via SAN and home directory access via NAS. VMotion™ allows individual virtual desktops to be transparently moved between ESX servers.

Figure 2) VDI configuration. Two NetApp systems provide VMware virtual desktop storage via SAN and home directory access via NAS. VMotion™ allows individual virtual desktops to be transparently moved between ESX servers.

We currently use 16 VMware ESX servers in a “farm,” each supporting 14 desktops, for a total of 224 desktops per farm. We deploy multiple farms to support the required number of concurrent teleworkers. VMotion allows us to transparently move running desktops between ESX servers as needed.

The two NetApp systems illustrated in Figure 2 are actually shared by all the active farms. In other words, two storage systems support the entire environment with SAN storage for use by VMware and its virtual desktops as well as CIFS storage for home directory access. Although we currently have access partitioned, such that one NetApp system supports Fibre Channel SAN and another supports CIFS (which is not required—a single system can do both, if desired), there is still a significant management advantage to being able to support both types of storage access on a single platform rather than using two different platforms to meet the storage need.

Storage Evolution

We recently completed an upgrade in which we replaced NetApp FAS980 systems with NetApp FAS6080 clusters. This prepared the infrastructure to scale beyond 3,000 users in the future.
More details of the back-end storage configuration are shown in Figure 3.

Storage details. Disk-to-disk backups are performed to secondary storage at the same site. Deduplication is used to reduce capacity required on secondary storage. For DR, backups are replicated to a NetApp V-Series system front-ending an IBM DS4000 storage array.

Figure 3) Storage details. Disk-to-disk backups are performed to secondary storage at the same site. Deduplication is used to reduce capacity required on secondary storage. For DR, backups are replicated to a NetApp V-Series system front-ending an IBM DS4000 storage array.

Continuous operation is another mandate that the solution must meet. We currently perform disk-to-disk backups using NetApp SnapVault® software between our primary storage systems and secondary storage. We run NetApp deduplication on secondary storage, which reduces total backup storage requirements by 80%. We then replicate this secondary storage system to a NetApp V-Series system at our DR site that is front-ending an IBM DS4000. (NetApp V-Series makes the full suite of advanced NetApp data management capabilities available on your existing third-party storage.) Because the source storage for replication is deduplicated, the DR site sees the same level of storage savings and required WAN bandwidth is substantially reduced.

We have also added NetApp Performance Acceleration Modules (PAMs). These intelligent caches improve the end-user experience, accelerate backup and antivirus scans, and make our infrastructure more resistant to boot storms. Learn more in TR-3705.

Future Directions

This solution has been extremely successful and is set to scale beyond the 3,000-user requirement that was initially established. Teleworkers work an average of four days per week from home. To encourage adoption, the agency initially offered incentives to eligible employees. Today approximately 80% of eligible workers in the agency choose telework.

Despite this success, we continue to look for ways to improve the resiliency, performance, and efficiency of the solution. Important initiatives include efforts to improve provisioning by using NetApp Rapid Cloning Utility (RCU) to efficiently clone new virtual desktops. This approach can dramatically reduce the storage required for thousands of copies of the same desktop operating system. Enabling NetApp deduplication on our primary storage systems will further boost overall storage efficiency and reduce the amount of primary storage needed. We are also considering replacing our current Fibre Channel environment with NetApp NFS. This would not only eliminate the need to maintain a separate Fibre Channel infrastructure, it would also streamline management by making it easy to expand or shrink storage volumes and could possibly allow more virtual desktops per ESX server. Our ultimate goal is to evolve the infrastructure to a full cloud model in which desktops are provided as a service and the user neither knows, nor cares, where his or her desktop is coming from.

Because of the success of this program and others like it, the U.S. Congress recently approved increased funding for the government’s telework initiative, so it is clear that all telework programs, including this one, will continue to expand.

NetApp Coommunity
 Got opinions about supporting telecommuters with VDI?

Ask questions, exchange ideas, and share your thoughts
online in NetApp Communities.


Robert DeMay

Robert DeMay
Technical Lead
Project Performance Corporation

Bob has over fourteen years of experience in the IT industry. For the last five years at PPC, he has directly supported the U.S. Patent and Trademark Office, including three years supporting virtualization efforts. His certifications include VCP, MCSE, and MCSA.

 
Explore