In this paper, we present Backlog, an efficient implementation of explicit back references, to address the problem of complications introduced by advanced file system features.
Many file systems reorganize data on disk, for example to defragment storage, shrink volumes, or migrate data between different classes of storage. Advanced file system features such as snapshots, writable clones, and deduplication make these tasks complicated, as moving a single block may require finding and updating dozens, or even hundreds, of pointers to it.
We present Backlog, an efficient implementation of explicit back references, to address this problem. Back references are file system meta-data that map physical block numbers to the data objects that use them. We show that by using LSM-Trees and exploiting the write-anywhere behavior of modern file systems such as NetApp® WAFL® or btrfs, we can maintain back reference meta-data with minimal overhead (one extra disk I/O per 102 block operations) and provide excellent query performance for the common case of queries covering ranges of physically adjacent blocks.
In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’10)
A copy of the paper is attached to this posting. tracking-fast10.pdf