July 07, 2011
Angela Demke Brown
The complexity of modern storage systems continues to grow, making management of these systems a first-class concern. A storage system today may need to exploit the widely-varying characteristics of heterogeneous storage units to meet the simultaneous demands of many customers with differing requirements. In addition, desirable properties such as cost-effectiveness, scalability, reliability and power-efficiency may conflict with each other. A further challenge arises due to virtualization and other layers of indirection between applications and storage hardware, because application-level optimizations to exploit hardware features may not have the desired effect. Performance may be lost when the underlying physical layout does not match the assumptions made at the higher level. Worse, reliability may be reduced if an underlying deduplication system removes extra copies of data blocks that were deliberately replicated, such as critical file system metadata. Finally, existing management interfaces are not extensible, making it difficult to express novel policies. As a result, significant time and effort is spent designing, customizing, and maintaining storage solutions.We argue that a scalable storage system must expose a more flexible mechanism for researchers or storage administrators to express the desired properties. We propose a policy-driven architecture that introduces extensibility and dynamism into the control plane of a data center's storage system. Our proposed system consists of two parts: (1) a domain-specific policy language that allows the construction of sophisticated policies using both static and dynamic properties of the available storage devices and the storage requests; (2) an extension of the storage system's control plane, capable of interpreting and enforcing these policies by monitoring the stream of requests and the dynamic characteristics of the storage devices. Existing work on policy-based storage management is mainly concerned with storage allocation or configuration, and focuses on static properties of the storage devices (e.g. capacity, cost, throughput, reliability). Our goal is to automatically manage the daily operation of the storage system, adjusting to changes in workload requests, power consumption, and load hotspots according to high-level policies.