Date
November 18, 2016
Author
Jason Flinn
Professor Jason Flinn and his students at the University of Michigan have built a prototype file system for archival data that selectively replaces file data with logs that reproduce that data. This substantially reduces the bytes written and stored for cold file data, even compared to aggressive storage efficiency mechanisms such as delta compression and chunk-based deduplication. In this project Professor Flinn’s team will extend these results in several ways. First, they will investigate how to structure logs to maximize compression since the logs of non-determinism can themselves be deduplicated. Second, they will explore whether minimal logging can still reproduce data faithfully; it is only necessary to store one computation that generates the needed data but not the precise computation that originally generated the data. Third, they will look at how semi-determinism can improve results; by encouraging applications to behave in predictable ways, can they reduce log sizes and improve storage savings? Finally, they will evaluate their system on a wider variety of realistic workloads to examine which domains see the most benefit from this class of archival storage.