| Quick Links |
| netapp.com |
| Tech OnTap Archive |
| September 2008 (PDF) |
Boost Performance Without Adding Disk DrivesThe NetApp Performance Acceleration ModuleMost Tech OnTap readers are probably aware that the random read performance of storage systems heavily depends on both drive count (total number of drives in the storage system) and drive rotational speed (RPM). Unfortunately, adding more drives to boost storage performance also means using more power, more cooling, and more space, and—with disk drive capacity growing much faster than performance—many applications may require extra disk spindles to achieve optimum performance even when the capacity is not needed.
What Is PAM?In the simplest terms, the Performance Acceleration Module is a second-layer cache: a cache used to hold blocks evicted from the WAFL® buffer cache. (WAFL is the NetApp® Write Anywhere File Layout, which defines how NetApp lays out data on disk. The WAFL buffer cache is a read cache maintained by WAFL in system memory.) In a system without PAM, any attempt to read data that is not in system memory results in a disk read. With PAM, the storage system first checks to see whether a requested read has been cached in one of its installed modules before issuing a disk read. Data ONTAP® maintains a set of cache tags in system memory and can determine whether or not a block resides in PAM without accessing the card. This reduces access latency because only one DMA operation is required on a cache hit. As with any cache, the key to success lies in the algorithms used to decide what goes into the cache. We’ll have more to say about that in the following section. Figure 1) Random reads with and without PAM. PAM is a combination of both hardware and software (the PAM software is known as FlexScale.) A license is required to enable the hardware. The PAM hardware module is implemented as a ¾-length PCIe card offering dual-channel DMA access to 16GB of DDR2 memory per card and a custom-coded field-programmable gate array (FPGA) that provides the onboard intelligence necessary to accelerate caching tasks. The maximum number of modules supported per storage system is shown in Table 1.
Table 1) Maximum number of PAM modules supported per controller by system type. PAM is designed to be highly resilient. Since the module acts as a cache, uncorrectable errors are simply discarded in favor of disk reads. If the rate of uncorrectable errors from a card exceeds a set threshold, the card is automatically disabled and the system reverts to noncached operation, with no interruption of service or reboot required. ECC is used to detect bit errors while data CRC protects the end-to-end delivery of data from CPU to card memory and back to CPU. Intelligent CachingThe caching policies implemented in PAM are intended to optimize small-block, random read access to a storage system. Random reads are reads from noncontiguous locations on a storage system’s disks. Because the reads are not located logically near one another, they are harder to satisfy than a workload with more localized reads, require more disk seek operations, and increase the average latency of reads. Since these reads are—by definition—random, there is no way to predict which block will be required next and prefetch it. Note that the PAM cache is implemented behind WAFL. This is because at this point we have a lot more information about the data and can make more intelligent decisions about what to cache versus what to let go.
Default Mode This mode is best used when the working set size is equal to or less than the size of the PAM cache. It also helps when there are hot spots of frequently accessed data and ensures that the data will reside in cache. Metadata Mode Low-Priority Mode The low-priority mode may be useful in applications that write data and read the same data after a time lag such that upstream caches evict the data. For example, this mode can avoid disk reads for a Web-based application that creates new data and distributes links that get accessed some time later by Web users. In some Web applications, we’ve found that the time lag for the first read is long enough that the data has to come from disk (even though subsequent data references are frequent enough to be handled by upstream caches). PAM in low-priority mode could accelerate these applications by turning such disk reads into cache hits. PCS: Determining If PAM Will Improve PerformanceTo determine whether your storage systems can benefit from added cache, NetApp has developed its Predictive Cache Statistics software, which is currently available in Data ONTAP 7.3 and later releases. PCS allows you to predict the effects of adding the cache equivalent of two, four, and eight times system memory. options flexscale.enable pcs Don’t enable PCS if your storage system is consistently above 80% CPU utilization. Once PCS is enabled, you have to let the simulated cache “warm up” or gather data blocks. Once the cache is warmed up, you can review and analyze the data using the NetApp perfstat tool.This procedure simulates caching using the default caching mode that includes both metadata and normal user data. You can also test the other operating modes. To enable metadata mode: options flexscale.normal_data_blocks off To enable low-priority mode:options flexscale.normal_data_blocks on options flexscale.enable off With PCS enabled, you can find out what's happening using the following command:> stats show -p flexscale-pcs Sample output is shown in Figure 2.
Figure 2) Example PCS output. Use the following guidelines to help you interpret the data:
Note that the three caches simulated in PCS are cascading caches. In the example above, ec0 represents the first cache of size 8GB, ec1 represents the second cache of size 8GB, and ec3 represents the third cache of size 16GB. The hits per second for a 32GB cache is the sum of all the hits per second for all three caches. The key advantage of cascading caches is that in the process of measuring an accurate hit rate for a 32GB cache, we also obtain hit rate estimates of both 8GB and 16GB caches. This gives us three points on the hit rate curve and the ability to estimate hit rates for intermediate cache sizes. PAM and FlexShareFlexShare™ is a Data ONTAP option that allows you to set priorities for system resources (processors, memory, and I/O) at a volume level, thereby allocating more resources to workloads on particular volumes when the controller is under significant load. FlexShare is fully compatible with PAM, and settings made in FlexShare apply to the data kept in the PAM cache. With FlexShare, finer-grained control can be applied on top of the global policies you implement with PAM. For example, if an individual volume is given a higher priority with FlexShare, data from that volume will receive a higher priority in the cache. ConclusionWith today’s tight IT budgets, it’s more critical than ever to get the most performance from every investment while keeping power, cooling, and space requirements down. PAM does just that. It gives you the flexibility to tune the caching mode to accommodate the needs of your particular workloads. Before you make a purchase, PCS allows you to determine if you can benefit from PAM and the number of modules and the settings you will need. Got opinions about PAM?
Ask questions, exchange ideas, and share your thoughts online in NetApp communities. |
|