New VST Levels Let You Further Optimize Flash Use for Performance and Cost
Virtual Storage Tiering (VST) is the NetApp approach to automated storage tiering (AST). AST technologies help data centers benefit from the improved performance of Flash-based media while minimizing the cost and complexity. Flash-based devices such as solid-state disks (SSDs) and controller-based Flash can complete 25 to 100 times more random read operations per second than the fastest hard disk drives (HDDs), but that performance comes at a premium of 15 to 20 times higher cost per gigabyte.
Rather than permanently placing an entire dataset on expensive Flash media, VST automatically identifies and stores hot data blocks in Flash while storing cold data on slower, lower-cost media. NetApp has put a lot of time and energy into understanding the challenges that AST must address in order to architect an optimal solution.
With two recent product additions to VST, NetApp now offers end-to-end Flash options spanning from the client application through the disk subsystem.
All three levels continue to offer the full advantages of VST, including:
This article describes the disk subsystem–level and server-level VST options using NetApp Flash Pool and Flash Accel technology and provides general guidelines on when and where to deploy each of the three levels. If you’re not already familiar with Flash Cache, check out the original Flash Cache article for details .
Figure 1) NetApp Virtual Storage Tier now operates at different levels within your infrastructure, allowing you to better optimize your use of Flash.
A NetApp Flash Pool works at the level of the NetApp aggregate. (An aggregate is a collection of RAID groups.) A Flash Pool is created simply by adding a RAID group composed of solid-state disks (SSDs) to an existing 64-bit aggregate, creating a hybrid disk array that gets the best from both technologies. The SSDs are used to store random reads and repetitive random writes (overwrites) for the volumes within the aggregate, off-loading this work from hard disk drives (HDDs). As a result you can achieve the same level of performance (with better overall latency) using fewer disk spindles or using capacity-oriented disks rather than performance-oriented disks. Flash Pool gives you the latency and throughput advantages of SSD and the mass storage capacity of HDD.
The disk subsystem–level Flash Pool approach offers a number of advantages.
How Flash Pool works
To understand how Flash Pool technology works, you need to understand the processes for identifying and delivering random reads and random overwrites to SSD. The first time a block is read it is read into storage controller memory from disk, and the read event is categorized as random or sequential. As blocks that are categorized as random age out of controller memory they are written to SSD. Subsequent reads of the same block are then satisfied from SSD.
For writes, Data ONTAP is write optimized by design. It uses an efficient NVRAM to journal incoming write requests so that they can be acknowledged to the writer without delay. Writes are collected and written to disk in full stripes whenever possible, driving optimal performance from the underlying RAID implementation and HDDs by turning a collection of writes into sequential write activity.
The goal with Flash Pool is to off-load I/Os from HDD while enabling blocks that are likely to be reread or rewritten to end up on SSDs. Large sequential writes are handled efficiently by HDDs. Keeping them on SSDs would be a suboptimal use of resources. Random writes, and particularly blocks that are being repeatedly overwritten, turn out to be the ideal candidates to target to Flash Pool SSDs. Flash Pool populates SSDs with blocks that are likely to be read and blocks that are written repeatedly.
When a write request is received, Data ONTAP verifies that the write is random rather than sequential and that the previous write to the same block location was also random. If so, that write goes to SSD.
How blocks are evicted from a Flash Pool
Data ONTAP® technology maintains a heat map (stored on SSD for persistence) that keeps track of how “hot” each block is. Reads enter the Flash Pool at “neutral.” A subsequent read elevates the temperature of the block to “warm” and then to “hot.” Writes also enter the Flash Pool at “neutral.” Subsequent overwrites don’t elevate the temperature of the block, however.
When available SSD space runs low, Data ONTAP begins running an eviction scanner that decrements the temperature of each block on each pass. For example, “hot” blocks become “warm,” “warm” blocks become “neutral,” and “neutral” blocks become “cold.” If a block is read or overwritten between scanner passes, its temperature is again incremented—“hot” remains the maximum for reads and “neutral” the maximum for overwrites. If a “cold” block is not read or overwritten, it is decremented to a temperature of “evict” on the next scanner pass. At this point “read” blocks are evicted while overwrite blocks are scheduled to be written to HDD.
This mechanism enables only hot data to remain in a Flash Pool when it becomes full. Flash Pool adjusts dynamically to retain hot data, and the amount of a Flash Pool dedicated to reads versus overwrites depends solely on the particulars of the workloads using the pool.
Figure 2) Blocks are evicted from a Flash Pool based on a heat map. Once the pool is full, an eviction scanner decrements the “temperature” of each block on each pass. Blocks are evicted when they reach a temperature of “evict.” Accesses between scanner passes increment the temperature of a block, so “hot” data remains in the Flash Pool.
Flash Pool performance
Although we haven’t published any benchmarks yet using Flash Pool technology, NetApp has undertaken some comparative before-and-after studies using an OLTP workload to illustrate the potential impact. Starting from the same FAS6210 base configuration, we implemented Flash Pool, optimized in one case for cost per IOPS and in the second for cost per GB of storage. Results are shown in Figure 3. Note that both cases result in a significant improvement in overall latency, which can have a bigger impact on perceived performance than total IOPS in many cases.
Figure 3) Impact of Flash Pool on cost/efficiency and performance.
Table 1) Flash Pool requirements and options.
To learn more about deploying and using NetApp Flash Pool technology, check out NetApp TR-4070: Flash Pool Design and Implementation Guide .
NetApp Flash Accel software was announced in August 2012 and will be available in late 2012. Flash Accel is designed to extend the benefits of NetApp VST across the network to encompass the server itself. Having local Flash devices on a server means that you’ve got direct-attached storage that you have to manage. This creates potential problems with data protection and isolates silos of data. Server caching with Flash Accel eliminates these problems, and offers a number of advantages.
The first release of Flash Accel works with VMware® vSphere® 5.0 or higher and Windows® VMs only. Future releases will expand support to include additional VMs, other hypervisors, and bare metal.
How Flash Accel works
Flash Accel consists of three components:
NetApp Flash Accel Management Console (FMC). Configuration and management of Flash Accel is accomplished using a virtual appliance, which runs on vSphere. This management console allows you to:
Flash Accel hypervisor plug-in (installed on the ESX host). The hypervisor plug-in is installed on an ESX host and establishes control over locally attached devices (such as SSDs) and storage array paths according to the configuration you define using the FMC. The plug-in creates logical devices and presents them to the ESX storage stack as SCSI devices. Logical devices created on multiple ESX hosts with the same WWN allow ESX to treat a device as a shared device so that VMs using these devices can participate in vMotion® and VMware HA operations. In addition to being able to migrate the VMs, the hypervisor plug-in provides management of the Flash device and can enable dynamic resource sharing and cache block deduplication.
Flash Accel agent in Windows VM. A user-level agent is implemented for Windows guest VMs. This agent:
The service agent exports a Web service to FMC and communicates with the drive via Windows PowerShell™ cmdlets.
Figure 4) Flash Accel includes agents that run in each VM and a plug-in for VMware vSphere, and it is controlled from Flash Accel Management Console. It can use any PCI-e Flash card or SSD available on an ESX host.
As Figure 4 illustrates, Flash Accel uses local Flash resources on an ESX server to provide a caching layer for Windows virtual machines. The Flash device can be shared among multiple VMs on an ESX host, giving each VM its own local cache.
All reads from the VM are cached locally for reuse, off-loading future reads from back-end storage. Writes are written through to back-end storage but available for rereading from cache.
The Flash Accel cache has two key areas: cache operations and storage manager.
Data coherency is the most important feature of Flash Accel. If back-end data is changed without notifying Flash Accel, it is possible to have the cache data and the back-end storage data out of sync. This would result in incorrect data being returned to the application/end user from the cache, which would cause data corruption. There are two situations in which data coherency is an issue.
The advantage of Flash Accel in this type of situation is that it only invalidates blocks that are different while retaining all blocks that haven’t changed. When situations like this arise, other available solutions completely drop all cached data and rewarm the entire cache. This may take a few hours to days depending on the data, during which time performance is degraded.
Flash Accel performance
We compared the performance of the same configuration with and without Flash Accel using JetStress, which simulates the disk I/O load created by Microsoft® Exchange. The addition of Flash Accel resulted in approximately a 77% improvement in I/O performance for both reads and writes. Since application reads were primarily satisfied by Flash Accel, the back-end storage was less occupied by reads and could therefore deliver better write performance, resulting in significant application performance improvements overall. Results are shown in Figure 5.
Figure 5) Flash Accel increases read and write I/O by approximately 77% using JetStress to simulate an Exchange workload.
Choosing VST Options
Choosing the best VST level or levels is really about getting the most return for your investment in Flash by accelerating all the workloads that need to be sped up for the lowest cost.
In other words, within a shared storage infrastructure you get the most workload specificity at the server level and the least at the controller level. If you need to accelerate one workload, server-level VST is a good choice. If you need to accelerate all your workloads (and possibly switch from performance-oriented to capacity-oriented disks), choose disk subsystem level or controller level.
For new deployments, we suggest starting with either Flash Cache or Flash Pool technology and then adding Flash Accel if needed to provide further performance enhancement for the most latency-sensitive applications.
When it comes to choosing between Flash Cache and Flash Pool, the following bullets summarize the similarities and differences.
In general, Flash Pool is a good choice for mission-critical applications because the benefit persists after takeover events. It’s also preferred for applications that have high overwrite rates and is the only option available on the FAS2200 series. Because of its proximity to main memory, Flash Cache may offer advantages for high-performance file services.
While you can install both Flash Pool and Flash Cache on the same storage system, in general there isn’t a big advantage to doing so. Data blocks from an aggregate that has Flash Pool enabled are never cached in Flash Cache.
With the introduction of Flash Pool and Flash Accel to VST, NetApp gives you two new methods to optimize I/O performance using Flash. As a general guideline, it helps to remember that:
You can combine levels to optimize overall performance while minimizing your investment. Whichever options you choose, once VST is installed there’s virtually nothing to manage. You can fine-tune your deployment if needed, but the defaults work well in most cases and the benefits are significant and measurable.
Got Opinions About VST?
Ask questions, exchange ideas, and share your thoughts online in NetApp Communities.
Visit Tech OnTap in the NetApp Community to subscribe today.