February 05, 2016
Chris Dragga, Douglas J. Santry
ACM Transactions on Storage (TOS) Volume 12 Issue 1, January 2016
File-system snapshots have been a key component of enterprise storage management since their inception. Creating and managing them efficiently, while maintaining flexibility and low overhead, has been a constant struggle. Although the current state-of-the-art mechanism—hierarchical reference counting—performs reasonably well for traditional small-file workloads, these workloads are increasingly vanishing from the enterprise data center, replaced instead with virtual machine and database workloads. These workloads center around a few very large files, violating the assumptions that allow hierarchical reference counting to operate efficiently. To better cope with these workloads, we introduce Generational Chain Trees (GCTrees), a novel method of space management that uses concepts of block lineage across snapshots rather than explicit reference counting. As a proof of concept, we create a prototype file system—gcext4, a modified version of ext4 that uses GCTrees as a basis for snapshots and copy-on-write. In evaluating this prototype empirically, we find that although they have a somewhat higher overhead for traditional workloads, GCTrees have dramatically lower overhead than hierarchical reference counting for large-file workloads, improving by a factor of 34 or more in some cases. Furthermore, gcext4 performs comparably to ext4 across all workloads, showing that GCTrees impose minor cost for their benefits.
The definitive version of the paper can be found at: http://dl.acm.org/citation.cfm?id=2857056.