Date
April 04, 2011
Author
Raju Rangaswami
There is substantial interest in both the industry and academia in best intergrating flash-based storage into existing disk-based storage systems due to their complementary cost, performance, and power characteristics. There are two primary camps or schools of thought about doing flash storage integration.1) The caching camp argues for managing flash-based storage as a large caching layer in the storage hierarchy to capture the working set of data stored in the disk layer.2) The tiering (or multi-tiering) camp argues for managing flash-based storage in a manner equivalent to disk drive based storage, in other words, as a primary data store, in conjunction with one or more classes of disk drives (managed as separate tiers).Both of these camps are quite well represented in industry solutions with almost every storage vendor today incorporating flash storage into their product portfolio using very distinct approaches. The plethora of solutions in both these classes (caching and tiering) had led to debates about the superiority of each solution class for enterprise workloads.The goal of this proposed project is to bring some clarity to this high-spirited and current debate in the storage research and industry communities. Comprehensively characterizing each solution class is important to address and compare the sometimes strikingly different implementations even within the same class of solutions. While the high-level assumptions under which one solution is better than the other may be obvious, what is less clear is the scope of the applicability of each to real-world enterprise workloads. Thus, a complementary and equally important analysis involves workload characterization to determine the specific paramters under which caching and/or tiering would be optimally effective. Thus, we intend to focus on formalizing the rationale and concepts that underlie caching and tiering, categorizing solutions within the same class based on architectural assumptions (local and shared SSD deployment), analyzing the impact of tunable parameters in each solution class, characterizing workloads with the goal of determining suitability towards either or both classes of solutions, and determining rules-of-thumb for choosing superior storage solutions given a workload description. Further, when answering these question, it is important to carefully define the metrics that should be used to compare various solutions.