Accelerate Replication on Low-Bandwidth Links:
SnapMirror Network Compression
For many companies, available bandwidth is a major factor limiting the use of replication. In a recent survey of companies with greater than 1,000 employees, 44% reported that bandwidth limitations were the number one issue affecting their replication and data protection plans for branch and remote offices.
NetApp® SnapMirror® thin replication software has become popular with NetApp customers in large part because of its efficient use of network bandwidth. Unlike other replication products that copy entire files across the network any time a single block is changed, SnapMirror replicates only changed blocks, significantly reducing bandwidth requirements. A 50% reduction in management overhead versus competing solutions and the ability to use replicated data sets for dev/test, data mining, or other purposes add to the appeal of SnapMirror.
Despite the proven efficiency of SnapMirror, however, situations still arise where available network bandwidth is not sufficient for replication. To better address cases where bandwidth is the limiting factor, NetApp recently announced SnapMirror network compression, which can lower volume SnapMirror bandwidth utilization by up to 70% or more (depending on the compressibility of your data set).
SnapMirror network compression has been added to Data ONTAP® 7.3.2 and is available to SnapMirror users at no additional cost. This article explores how network compression works, talks about when you should (and shouldn’t) use it, and describes some observed results for common data sets.
How SnapMirror Network Compression Works
With SnapMirror network compression data is compressed only while it traverses the network; data on source and destination systems remains uncompressed. Enabling compression results in two additional steps:
- Compression on the source system
- Decompression on the destination system
On the source system, data blocks that need to be replicated are handed off to a compression engine, which compresses them. The compression engine creates multiple threads corresponding to the number of CPUs on the storage system. The multiple compression threads compress data in parallel. Compressed blocks are then transmitted over the network. On the destination system, compressed blocks are received and decompressed using a similar multithreaded approach. Decompressed data is then written to the appropriate volume.
Figure 1) Functional diagram of SnapMirror network compression.
The compression and decompression engines can either be configured to conserve network bandwidth or complete a transfer in the shortest time possible, depending on user preference.
SnapMirror network compression is supported on all NetApp storage platforms (including V-Series virtualization systems and the IBM N-series) in the asynchronous mode of operation only. The semi-synchronous and synchronous modes of SnapMirror operation are not currently supported with network compression enabled.
When to Use Network Compression
SnapMirror network compression is useful in the following scenarios:
When network bandwidth would otherwise be a limitation. Depending on the compressibility of your data set (see following section for information on data sets), enabling network compression can make it possible to use replication on links where it would otherwise not be possible to meet (or continue to meet) your goals.
In general, the suitability of a replication solution for disaster recovery is dictated by your recovery point objective (RPO), which defines the point in time (relative to the time a failure occurs) to which you want to be able to recover. You must be able to replicate data quickly enough to meet this objective. For example, if your RPO is one hour, you must be able to replicate all changed data every hour, even during peak periods of activity. If your network bandwidth is limited, network compression can help you:
- Maintain the same RPO level in the face of data growth and/or increased change rates
- Improve your RPO without buying additional bandwidth
Similarly, if you are using replication for data distribution, network compression can help you continue to meet your existing time objectives or to meet more stringent time requirements without adding bandwidth.
To save precious network bandwidth. Network compression can help you continue to perform the same replication schedule while conserving network bandwidth for other functions.
To perform baseline transfers without saturating network links. To use SnapMirror (or any replication product) you must first perform a baseline transfer in which all data is replicated from source to destination. Once the baseline is in place, subsequent transfers only require that changed or new data be replicated.
Creating a baseline can be very bandwidth intensive. Enabling compression can allow baseline transfers to complete more quickly while conserving network bandwidth. (SnapMirror provides a throttling feature that can prevent the software from saturating the network during baseline transfers or other operations.)
When Not to Use Network Compression
The major factor in determining when to enable SnapMirror network compression is available CPU capacity. The compression and decompression engines create additional load on CPU cores on both source and destination systems and should factor this load into your decision to enable network compression. You might want to avoid using network compression under the following conditions:
- When CPUs are already heavily loaded on either the source or destination storage system. Because compression and decompression result in increased CPU utilization, it is not generally recommended to enable compression under conditions where available CPU capacity is likely to become a bottleneck and affect other workloads.
- When network bandwidth is not a limitation. Network compression necessarily creates processing overhead and might limit throughput relative to SnapMirror without compression enabled. You should not turn on compression in situations where the network is not a bottleneck and you don’t need to conserve bandwidth for other traffic.
- When using “fan-in” configurations. Decompression creates only about 60% of the overhead of compression, so in most situations, performance of the destination system is unlikely to be an issue. However, if you have a configuration where you have multiple source systems all replicating to a single destination system at the same time, enabling compression for all transfers could overwhelm the processing capabilities of the destination system.
The following section provides more details on use of network compression with real workloads and effects on CPU resources.
How Well Does Network Compression Work with Typical Data Sets?
We chose three different data sets to measure the performance of SnapMirror network compression under laboratory conditions: Exchange database, Oracle® Database, and home directory data. We performed baseline transfers of eight 50GB volumes in each case and looked at both compression ratio and CPU utilization.
Oracle data was the most compressible, achieving a ratio of 3.5:1, followed by home directories at 2.7:1 and Exchange at 1.5:1. The reduction in bandwidth required (holding transfer time constant) is roughly proportional to the compression ratio as shown in Table 1.
Table 1) Compression ratios and bandwidth savings for common workloads.
| Exchange |
1.5:1 |
34% |
| Home directory |
2.7:1 |
60% |
Oracle
|
3.5:1
|
70% |
The effect on CPU overhead for each data set depends on whether you hold transfer time constant or accelerate transfer time.
- Transfer time held constant. If transfer time is kept the same as for the transfer without compression, overhead from compression processing is typically not significant. For example, assume a data set that yields 2:1 compression and that without network compression the SnapMirror update takes one hour using a bandwidth of 100Mb/sec. With network compression enabled only 50Mb/sec bandwidth is needed to achieve the same transfer time. Because over-the-wire bandwidth is lower, CPU utilization due to network processing is decreased, compensating somewhat for the increased CPU used by compression. In the case of Oracle Database, CPU overhead of just 14% reduces the needed bandwidth by over 70%.
- Reduced transfer time. If a transfer is allowed to use all available bandwidth with SnapMirror compression enabled, CPU overhead will be higher. For example, consider the same data set as above with 2:1 compression in which an update without compression takes one hour using a bandwidth of 100Mb/sec. With network compression enabled, the transfer completes in 30 minutes. Because the work is completed faster by using the entire bandwidth, network processing overhead is higher, and the compression processing must also be completed in half the time. If CPU utilization is too high, you can use SnapMirror throttling (either per transfer or global) to adjust the throughput so that CPU utilization does not go too high. The following figure summarizes the effects of compression on total transfer time for our three data sets.
Figure 2) Reduction in transfer time for different data sets. CPU overhead ranges from 38% for Exchange to 55% for Oracle. Total amount of data transferred was the same in all cases. Storage system was a FAS3070 using a single 155Mb/sec network connection.
- Another factor that affects CPU overhead is the compression ratio achievable for a data set. If a data set yields a high compression ratio, the total amount of effort to compress and send it over the wire is less compared to a data set that is hard to compress. In contrast, if you don’t throttle bandwidth for a transfer, a data set with a high compression ratio could require a lot of compression work to fill the pipe, raising CPU utilization. For example, to fill a 100Mb/sec pipe with Oracle data that compresses at 3.5:1, your storage system would have to compress data at a rate of 350Mb/sec, while filling the same pipe with Exchange data that compresses at 1.5:1 would only require a compression/decompression rate of 150Mb/sec.
Conclusion
In situations where network bandwidth is limited, SnapMirror network compression can make a big difference, making replication possible in situations where bandwidth would previously have been the limiting factor. If you’re already a SnapMirror user, you can take advantage of network compression simply by upgrading to Data ONTAP 7.3.2 (or later).
With SnapMirror network compression, there is a strong interdependence between data transfer time, network bandwidth consumption, and CPU overhead. Decreasing transfer time necessarily consumes more network bandwidth and system CPU resources, as illustrated in Figure 3.
Figure 3) Relationship between transfer time, bandwidth utilization, and storage system CPU overhead.
To find the sweetspot for your environment, pay attention to the guidelines outlined in this article. Plan to do a few trial transfers to assess the compressibility of your data sets and the resulting CPU overhead on source and destination systems before enabling network compression in production.
 |
Srinath Alapati
Technical Marketing Engineer
NetApp
Srinath joined NetApp in 2004 and has been a member of the Data Protection group for over two years. He has 10+ years of experience in IT, managing servers and storage infrastructure. Srinath has authored or coauthored multiple technical reports on SnapMirror, MetroCluster, VMware®, and Exchange and speaks at various technical conferences. He is also a core team member involved in NetApp IT’s disaster recovery implementation. |