Workload-aware Efficient Storage Systems

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

The growing disparity in data storage and retrieval needs of modern applications is driving the proliferation of a wide variety of storage systems (e.g., key-value stores, cloud storage services, distributed filesystems, and flash cache, etc.). While extant storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, they lack the flexibility in adapting to the ever-changing workload behaviors. Moreover, the complexities in implementing modern storage systems and adapting ever-changing storage requirements present unique opportunities and engineering challenges.

In this dissertation, we design and develop a series of novel data management and storage systems solutions by applying a simple yet effective rule---workload awareness. We find that simple workload-aware data management strategies are effective in improving the efficiency of modern storage systems, sometimes by an order of magnitude. The first two works tackle the data management and storage space allocation issues at distributed and cloud storage level, while the third work focuses on low-level data management problems in the local storage system, which many high-level storage/data-intensive applications rely on.

In the first part of this dissertation (Chapter 3), we propose and develop MBal, a high-performance in-memory object caching framework with adaptive multi-phase load balancing, which supports not only horizontal (scale-out) but vertical (scale-up) scalability as well. MBal is able to make efficient use of available resources in the cloud through its fine-grained, partitioned, lockless design. In the second part of this dissertation (Chapter 4 and Chapter5), we design and build CAST (Chapter 4), a Cloud Analytics Storage Tiering solution that cloud tenants can use to reduce monetary cost and improve performance of analytics workloads. The approach takes the first step towards providing storage tiering support for data analytics in the cloud. Furthermore, we propose a hybrid cloud object storage system (Chapter 5) that could effectively engage both the cloud service providers and cloud tenants via a novel dynamic pricing mechanism. In the third part of this dissertation (Chapter 6), targeting local storage, we explore offline algorithms for flash caching in terms of both hit ratio and flash lifespan. We design and implement a multi-stage heuristic by synthesizing several techniques that manage data at the granularity of a flash erasure unit (which we call a container) to approximate the offline optimal algorithm. In the fourth part of this dissertation (Chapter 7), we are focused on how to enable fast prototyping of efficient distributed key-value stores targeting a proxy-based layered architecture. In this work, we design and build {con}, a framework that significantly reduce the engineering effort required to build a full-fledged distributed key-value store.

Our dissertation shows that simple workload-aware data management strategies can bring huge benefit in terms of both efficiency (i.e., performance, monetary cost, etc.) and flexibility (i.e., ease-of-use, ease-of-deployment, programmability, etc.). The principles of leveraging workload dynamicity and storage heterogeneity can be used to guide next-generation storage system software design, especially when being faced with new storage hardware technologies.

Storage Systems, Cloud Computing, Data Management, Key-Value Stores, Object Stores, Flash SSDs, Efficiency, Flexibility