The NetApp Kilo-Client 3G
Since 2006, Tech OnTap has been chronicling the evolution of the NetApp Kilo-Clientâ€”NetAppâ€™s large-scale engineering test environment. For this article, Tech OnTap asked Brad Flanary of the NetApp RTP Engineering Support Systems team to describe the goals and technologies behind the next planned iteration of this important and innovative facility. [Tech OnTap editor.]
The NetAppÂ® Kilo-Client is a test environment that allows NetApp to quickly configure and boot a large number of physical and/or virtual clients to run tests against NetApp storage hardware and software. The first iteration of the Kilo-Client was deployed in 2005 (as described in an early TOT article). That iteration initially offered 1,120 physical clients that booted over iSCSI instead of from local disk.
By mid-2007, the Kilo-Client had evolved to include 1,700 physical clients that could boot over iSCSI, FC, or NFS and could be deployed as physical clients running WindowsÂ® or LinuxÂ® or in virtualized VMwareÂ® environments. A Tech OnTap article that appeared at that time focused on the techniques we used to rapidly provision physical servers and virtual environments using NetApp FlexCloneÂ® and other NetApp technologies.
This configuration has served NetApp well (a few more servers have been added since the last article was published to support heavy virtualization) but now, almost three years laterâ€”with the lease on the original server equipment due to expireâ€”itâ€™s time to evolve the configuration once again to keep up with the latest technology and cloud computing developments.
This article focuses on the third-generation Kilo-Client design, which when built will allow us to:
Weâ€™ll begin by describing the new requirements we faced, talk about hardware evaluation, and then describe the design of Kilo-Client 3G, which will go live in the first half of this year. Weâ€™ll also discuss the unique design of the NetApp data center facility where the Kilo-Client is housed.
Based on meetings with our internal customers as well as requests that the current configuration is unable to meet, we began to form an idea of what was needed in the next-generation Kilo-Client. However, to be certain, we started the refresh process with a detailed survey of our existing internal customers plus other potential Kilo-Client users within NetApp. You can see the survey we used by clicking through to the full document shown in Figure 1. (Youâ€™ll notice that some questions are targeted toward virtualization because we specifically wanted to learn whether our customer needs could be met by virtual rather than physical clients.)
Major findings included:
This survey process was extremely valuable. It confirmed our suspicion that most of our customers could be serviced with virtual rather than physical hardware. This is obviously consistent with the current move in the IT industry toward increased virtualization and cloud computing. Itâ€™s also consistent with a recent drive toward more server virtualization within NetApp. (A Tech OnTap article from April 2009 described the physical-to-virtual migration at the NetApp engineering lab in Bangalore, India.)
With a sense of our requirements for the new Kilo-Client, our next step was to start evaluating server hardware. We sent out an RFP to a number of server vendors to get products for evaluation. Our testing process focused on several things:
We evaluated all servers in terms of the performance they could deliver from a CNA and how well they supported virtual machines at large scale as well as how well they ran a battery of standard benchmarks.
We quickly discovered that for our needs, servers based on IntelÂ® Nehalem-microarchitecture processors dramatically outperformed the older, Intel Coreâ„¢ microarchitecture processors (Dunnington). The two server models we chose both use Nehalem processors.
On the network side, we recently deployed a Cisco Nexus infrastructure in our new Global Dynamic Laboratory (GDL). That network infrastructure will continue to be used to meet the FCoE and IP needs of the Kilo-Client. Brocade switching will be used for Fibre Channel.
The Planned Kilo-Client 3G Deployment
In total, this will deliver 628 clients with 5,024 cores. These will replace three pods of the original Kilo-Client or 728 physical clients with 1,456 cores. These clients can all run as virtual servers primarily or be deployed as physical clients. At a possible density of 120 VMs per physical server, we will be able to deliver up to 75,360 VMs from the Kilo-Client.
The remaining approximately 1,000 clients from the previous-generation Kilo-Client will remain in place and continue to be used for testing. They will be phased out and returned as they come off lease.
We typically boot 500 VMs per NFS datastore. We use SnapMirrorÂ® to replicate golden images from a central repository to each boot storage system as needed.
Booting Physical Hardware and Virtual Machines
The real key to the Kilo-Client is its ability to perform fast, flexible, and space-efficient booting. As in any cloud infrastructure, we have to be able to quickly repurpose any number of clients for any taskâ€”physical or virtual. The Kilo-Client uses a combination of FC and FCoE boot to boot each physical server and NFS boot to support virtual machines booting on servers configured to run virtualization.
We chose FC boot for physical booting because it has proven very reliable in the existing Kilo-Client infrastructure. In most large server installations, a physical server boots the same boot image every time. It might boot Linux or Windows in a physical environment or VMware ESX in a virtual one, but itâ€™s always the same. Thatâ€™s not the case for the Kilo-Client. One of our servers might boot Linux one day, VMware the next day, and Windows the day after that. We use FC boot in combination with our dynamic LUN cloning capability to rapidly and efficiently boot our physical and virtual servers.
As described in previous articles, we maintain a set of "golden" boot images (as Fibre Channel LUNs) for each operating system and application stack we use. Using NetApp SnapMirrorÂ® and FlexClone, we can quickly reproduce hundreds of clones for each physical server being configured for a test. Only host-specific "personalization" needs to be added to the core image for each provisioned server. This unique approach gives us near-instantaneous image provisioning with a near-zero footprint.
The process of booting virtual machines builds on the same steps:
Complete Automation. Over the past several years weâ€™ve created Perl scripts that work in conjunction with NetApp and VMware tools to automate the steps above such that we can routinely deploy 500 to 1,000 virtual machines in 2 to 3 hours. (This includes both the physical booting process and the VM booting process. This is different than some of the other deployments described in Tech OnTap in which time to deployment is based on servers already running VMware.)
Maximum Space Efficiency. The other unique piece of the process is that because we use FlexClone to clone â€œgolden imagesâ€ rather than make copies, very little storage is required. We routinely deploy 500 virtual machines using just 500GB of storage space (1GB per client) and can use even less space if necessary.
With the new infrastructure, weâ€™ll be able to configure up to 75,000 virtual machines for very large tests. Once we have all the new hardware in place, weâ€™ll be able to report how quickly this can be done. We should note that, in general, the clients that make up the Kilo-Client are carved up into multiple smaller pieces all doing testing in parallel.
Physical Layout. The previous-generation Kilo-Client design was based on â€œpodsâ€ that colocated servers, networking, and boot storage. This approach made sense in a design in which hardware was in close proximity and manual setup and teardown might be required.
Weâ€™ve rethought and reengineered the pod approach for the new Kilo-Client. The new design concentrates all boot infrastructures in one location. Servers and storage systems will now be grouped into pods that include just the necessary switching (IP and FC) to meet the needs of the pod. This will make the pods easy to replicate and it will be easy to grow and scale the Kilo-Client in any dimension by adding another pod of that type. (In other words, we can add a pod of servers or a pod of storage, etc.) Since manual setup and teardown are no longer required (or desired), news pods can and will be deployed anywhere in the data center as more space is needed, so that the data center itself operates with maximum efficiency.
Our Global Dynamic Laboratory
The Kilo-Client is physically located in the NetApp Global Dynamic Laboratory, an innovative new data center located at the NetApp facility in Research Triangle Park, North Carolina. The Kilo-Client will be part of NetApp Engineeringâ€™s Shared Test Initiative (STI), which will provide multiple test beds and will focus heavily on automation for deployment, test execution, and results gathering. STI will help bridge these resources so that we can do dynamic sharing between all resources in our labs.
The GDL was designed with efficiency and automation in mind. It includes 36 cold rooms, each with approximately 60 cabinets, for a total of 2,136 racks.
Critical design elements for a modern data center such as GDL include:
For GDL, power and cooling distribution is based on 12 kW per rack on average, for a total of 720 kW per cold room. The power distribution within a rack is 42 kW. Using our proprietary pressure-control technology, we are able to cool up to 42 kW in a cabinet or have any combination of loads as long as the total cooling load in a cold room does not exceed 720 kW.
GDL uses a combination of technologies to run at maximum power efficiency, including:
These and other techniques allow the GDL to achieve an annualized PUE estimated at about 1.2. This translates into an operating savings for the GDL of over $7 million per year versus operating at a PUE of 2.0 and a corresponding avoidance of 93,000 tons of CO2. You can learn more about the NetApp approach to data center efficiency in a recent white paper.
The next-generation NetApp Kilo-Client will take full advantage of the latest server hardware, networking technology, and NetApp storage hardware and software to create a flexible, automated test bed for tests that require a large number of virtual or physical clients. When completed, the Kilo-Client will be able to deliver 75,000+ virtual clients and be able to take advantage of Gigabit Ethernet, 10-Gigabit Ethernet, Fibre Channel, or FCoE - all end to end.
While the next-generation Kilo-Client will greatly expand the capabilities of the existing version, ultimately it will reduce the physical server count.
Got opinions about the NetApp Kilo-Client?
Ask questions, exchange ideas, and share your thoughts online in NetApp Communities.