June 15, 2012
Madalin Mihailescu, Gokul Soundararajan, and Cristiana Amza.
In this paper, we introduce MixApart, a scalable data processing framework for shared enterprise storage systems.
Data analytics and enterprise applications have very different storage functionality requirements. For this reason, enterprise deployments of data analytics are on a separate storage silo. This may generate additional costs and inefficiencies in data management, e.g., whenever data needs to be archived, copied, or migrated across silos. We introduce MixApart, a scalable data processing framework for shared enterprise storage systems. With MixApart, a single consolidated storage back-end manages enterprise data and services all types of workloads, thereby lowering hardware costs and simplifying data management. In addition, MixApart enables the local storage performance required by analytics through an integrated data caching and scheduling solution. Our preliminary evaluation shows that MixApart can be 45% faster than the traditional ingest-then-compute workflow used in enterprise IT analytics, while requiring one third of storage capacity when compared to HDFS.
In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems 2012 (HotStorage ’12)
A copy of the paper is attached to this posting. Link to presentation audio and slides