Date
March 01, 2010
Author
Michael Condict
This tutorial provided a detailed look at the multitude of ways that deduplication can be used to improve the efficiency of storage and networking devices.
A Tutorial Presented at the USENIX Conference on File and Storage Technologies 2010 (FAST ’10)
Abstract
Economic and environmental concerns are currently motivating a push across the computing industry to do more with less: less energy and less money. Deduplication of data is one of the most effective tools to accomplish this. Removing redundant copies of stored data reduces hardware requirements, lowering capital expenses and using less power. Avoiding sending the same data repeatedly across a network increases the effective bandwidth of the link, reducing networking expenses.
This tutorial provided a detailed look at the multitude of ways deduplication can be used to improve the efficiency of storage and networking devices. It consisted of two parts.
The first part introduced the basic concepts of deduplication and compared it to the related technique of file compression. A taxonomy of basic deduplication techniques was covered, including the unit of deduplication (file, block, or variable-length segment), the deduplication scope (file system, storage system, or cluster), in-line vs. background deduplication, trusted fingerprints, and several other design choices. The relative merits of each were analyzed.
The second part discussed advanced techniques, such as the use of fingerprints other than a content hash to uniquely identify data, techniques for deduplicating across a storage cluster, and the use of deduplication within a client-side cache.