A Deduplication Study for Host-side Caches in Virtualized Data Center Environments


May 11, 2013


Jingxin Feng and Jiri Schindler.

This paper explores the effectiveness of content deduplication in large (typically hundreds of GB) flash memory-based caches inside VM hypervisors. 

Flash memory-based caches inside VM hypervisors can reduce I/O latencies and offload much of the I/O traffic from network-attached storage systems deployed in virtualized data centers. This paper explores the effectiveness of content deduplication in these large (typically 100s of GB) host-side caches. Previous deduplication studies focused on data mostly at rest in backup and archive applications. This study focuses on cached data and dynamic workloads within the shared VM infrastructure. We analyze I/O traces from six virtual desktop infrastructure (VDI) I/O storms and two long-term CIFS studies and show that deduplication can reduce the data footprint inside host-side caches by as much as 67%. This in turn allows for caching a larger portion of the data set and improves the effective cache hit rate. More importantly, such increased caching efficiency can alleviate load from networked storage systems during I/O storms when most VM instances perform the same operation such as virus scans, OS patch installs, and reboots.

In Proceedings of the IEEE Symposium on Massive Storage Systems and Technologies 2013 (MSST ’13).


The author's version of this paper is attached to this posting. Please observe the following copyright: © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. 

The definitive version of the paper can be found at:


Drift chat loading