NetApp Tech OnTap
     

Data Compression for NetApp Storage

Efficiency is the key to flexible IT. As a leader in storage efficiency innovation, NetApp has worked hard to bring you the latest efficiency innovations, including Snapshot® and related technologies, thin provisioning, FlexClone®, deduplication for primary storage, and many others.

Data compression technologies obviously have been around for a long time, but have offered significant challenges for large-scale storage systems, especially in terms of performance impact. Until recently, compression for devices such as tape drives and VTLs has almost always been provided using dedicated hardware that adds to expense and complexity. Now NetApp has developed a way to provide transparent inline data compression in software while mitigating the impact on computing resources. This allows us to make the benefits of compression available in Data ONTAP® at no extra charge for use on existing NetApp® storage systems that upgrade to Data ONTAP 8.0.1 or later.

In this article I discuss what NetApp data compression is and how it works, and I review some of the common use cases along with the space savings we’ve measured for each. I also discuss how data compression can be used in conjunction with other NetApp technologies and review how NetApp is rolling out this new capability to enable success.

What Is NetApp Data Compression?


NetApp data compression is being introduced as a free option integrated into Data ONTAP 8.0.1 as a software-based solution for transparent inline data compression. No application changes are required to use NetApp data compression.

NetApp data compression can reduce the physical capacity required to store data on storage systems by compressing data within a flexible volume (FlexVol®) on primary, secondary, and archive storage. It compresses regular files, virtual local disks, and LUNs. In the rest of this article references to files also apply to virtual local disks and LUNs.

NetApp data compression does not compress an entire file as a single contiguous stream of bytes. This would be prohibitively expensive when it comes to servicing small reads from part of a file, since it would require the entire file to be read from disk and uncompressed before servicing the read request. This would be especially difficult on large files. To avoid this, NetApp data compression works by compressing a small group of consecutive blocks at one time. This is a key design element that allows NetApp data compression to be more efficient. When a read request comes in you only need to read and decompress a small group of blocks, not the entire file. This optimizes reads and allows greater scalability in the size of the files being compressed.

The NetApp compression algorithm divides a file into 32KB chunks of data called “compression groups.” Each compression group contains data from one file only.

Writing Data. Write requests are handled at the compression group level. Once a group is formed a test is performed while the data is still in memory to decide if the data is compressible. If it isn’t, it is just passed through to disk. Only if the test says the data is compressible is the full group compressed. This optimizes the savings while minimizing the resource overhead.

Since compressed data contains fewer blocks to be written to disk, it reduces the number of write I/Os required for each compressed write operation. This not only lowers the data footprint on disk but can also decrease the time to complete disk write requests and can significantly reduce the time needed to perform backups.

Compression groups are tested for compressibility before any compression takes place. They are then flushed to disk, compressed or uncompressed, depending on the results of the test.

Figure 1) Compression groups are tested for compressibility before any compression takes place. They are then flushed to disk, compressed or uncompressed, depending on the results of the test.

Reading Data. When a read comes in for compressed data, Data ONTAP reads only the compression groups that contain the requested data, not the entire file. This minimizes the amount of I/O needed to service the request and results in very little overhead.

Performance of Compression


NetApp data compression is designed to work independently or with NetApp deduplication to achieve optimal savings. NetApp deduplication can be scheduled to run when it is most convenient, while NetApp data compression runs as an inline process as data is written to disk. When the two are enabled on the same volume, the data is first compressed and then deduplicated. Deduplication does not need to uncompress data in order to operate; it simply removes duplicate compressed or uncompressed blocks from a data volume.

Data compression leverages the internal characteristics of Data ONTAP to perform with high efficiency. While NetApp data compression minimizes performance impact it does not eliminate it. Your workloads should be evaluated for tolerance to the resources needed to perform data compression. The actual impact could depend on a number of the following factors:

  • The type of application
  • The compressibility of the dataset
  • The data access pattern (for example, sequential versus random access, the size and pattern of the I/O)
  • The average file size
  • The rate of change
  • The number of volumes that have compression enabled on the system
  • The hardware platform—the amount of CPU/memory in the system
  • The load on the system
  • Disk type and speed
  • The number of spindles in the aggregate

We have developed best practices to help you through sizing and other activities to optimize your implementation. Because of the many factors that can play a part, testing in your environment is the best way to determine the applicability of data compression for your desired use. The following sections discuss the savings measured with various application datasets as well as some typical use cases.

Space Savings with Data Compression and Deduplication


NetApp data compression provides immediate space savings via inline compression. NetApp deduplication runs periodically (postprocessing) to provide cumulative space savings. While compression and deduplication work together, it should be noted that the savings you achieve will not necessarily be the sum of the savings you see when each technology is run individually on a dataset.

For some types of data, compression will not increase the savings over deduplication alone, while in other cases the reverse will hold true. In still other cases, the greatest storage savings result from a combination of compression and deduplication running together. The following table includes examples to illustrate these points.

Table 1) Best space savings combination for various data types.

Dataset Type Application Best Savings Combination Typical Space Savings
Home Directories Both compression and deduplication 65%
Virtual Servers and Desktops Dedupe only 70%
Database Compression only 65%
E-mail Exchange 2003/2007 Compression only 35%
Exchange 2010 Both compression and deduplication 40%
Engineering Data Software Development Both compression and deduplication 75%
Geoseismic Compression only 75%

While these savings examples are typical, not all datasets are equal. Testing should be done on your data to evaluate the savings you’ll experience. NetApp is always available to help with the evaluation process.

Typical Use Cases


As I've already discussed, compression can provide some amazing storage savings at the cost of some performance. It is important to gauge the two together in order to determine where compression makes the best sense in your storage environment.

Database backups (and backups in general) are a potential sweet spot for data compression. Databases are often extremely large, and there are many users who will trade a slight performance impact on backup storage in return for 65%+ capacity savings.

Another possible use case is file services. In testing using a file services workload on a system that was ~50% busy with a dataset that was 50% compressible, we measured only a 10% decrease in throughput. In a file services environment that has a 2-millisecond response time for files, this would translate to an increase of only 0.2 ms, raising the response time to 2.2 milliseconds. For a space savings of 65%, this small decrease in performance might be acceptable to you. Such savings can be extended even further by replicating the data using NetApp volume SnapMirror® technology, which saves you network bandwidth and space on secondary storage. (Secondary storage inherits compression from primary storage in this case, so no additional processing is needed.) In this scenario you would have:

  • 65% storage capacity savings on primary storage
  • 65% less data sent over the network for replication
  • 65% faster replication
  • 65% storage capacity savings on secondary storage

There are many other use cases in which compression makes sense, and we have a number of tools and guides that can help you decide which use cases are best for your environment.

Using Data Compression with Other NetApp Technologies


As you’ve already seen, NetApp data compression works in a complementary fashion with NetApp deduplication. In this section, I discuss the use of data compression in conjunction with a few other popular NetApp technologies.

Volume SnapMirror. Volume SnapMirror operates at the physical block level; when deduplication and/or compression is enabled on the source volume, both the deduplication and compression space savings are maintained over the wire as well as on the destination. This can significantly reduce the amount of network bandwidth required during replication as well as the time it takes to complete the SnapMirror transfer. Here are a few general guidelines to keep in mind:

  • Both source and destination systems should use an identical release of Data ONTAP.
  • Compression and deduplication are managed only on the source system—the flexible volume at the destination system inherits the efficiency attributes and storage savings.
  • Shared blocks are transferred only once, so deduplication also reduces network bandwidth.
  • Compression is maintained throughout the transfer, so the amount of data being transferred is reduced, thus reducing network bandwidth usage.
  • SnapMirror link compression is not necessary, since the data has already been compressed with NetApp data compression.

The amount of reduction in network bandwidth and SnapMirror transfer time is directly proportional to the amount of space savings. As an example, if you were able to save 50% in disk capacity, then the SnapMirror transfer time would decrease by 50% and the amount of data you would have to send over the wire would be 50% less.

Qtree SnapMirror and SnapVault®. Both qtree SnapMirror and SnapVault operate at the logical block level; source and destination storage systems run deduplication and data compression independently. This allows you to compress and/or deduplicate your qtree SnapMirror and/or SnapVault backups even when the source data is not compressed or deduplicated.

Cloning. NetApp FlexClone technology instantly creates virtual copies of files or data volumes—copies that don’t consume additional storage space until changes are made to the clones. FlexClone supports both deduplication and compression.

Getting Started with NetApp Data Compression


NetApp data compression works on all NetApp FAS and V-Series systems running Data ONTAP 8.0.1 and above. Data compression is enabled at the volume level. This means that you choose which volumes to enable it on. If you know a volume contains data that is not compressible, you don’t have to (and shouldn’t) enable compression on that volume. A volume can be up to 16TB in size and must be contained within a 64-bit aggregate—a feature that was introduced in Data ONTAP 8. (You can learn more about Data ONTAP 8 in a companion article in this issue of Tech ONTAP.)

To start using data compression you simply install the free license on your storage system and then enable it on the volumes you choose. That’s all there is to it.

Data Compression Program for Early Adopters. Similar to the release of deduplication several years ago, NetApp is managing early access to our data compression technology. If you request the compression license, we’ll evaluate your environment and provide our best-practice recommendations.

Conclusion

NetApp data compression continues the NetApp tradition of adding significant value to Data ONTAP in terms of storage efficiency and bringing it to you at no additional charge. The technology significantly reduces storage requirements for compressible target datasets, and it can work in conjunction with NetApp deduplication and other NetApp technologies.

The power and potential of compression really pay off when it is used in combination with other NetApp storage efficiency technologies. Selecting efficiency technologies and features from an integrated portfolio enables you to flexibly and effectively manage the right balance between technology and business needs.

NetApp Coommunity
 Got opinions about Data Compression?

Ask questions, exchange ideas, and share your thoughts online in NetApp Communities.

Sandra Moulton

Sandra Moulton
Technical Marketing Engineer
NetApp

Since joining NetApp just over a year ago, Sandra has focused almost exclusively on storage efficiency, specializing in deduplication and data compression; she has been responsible for developing white papers, best- practice guides, and reference architectures for these critical technologies. Sandra has over 20 years of industry experience, including performing similar functions at other leading Silicon Valley companies.

 
Explore
 
TRUSTe
Contact Us   |   How To Buy   |   Feedback   |   Careers  |   Subscriptions   |   Privacy Policy   |   © 2010 NetApp