Workload-aware Efficient Storage Systems

dc.contributor.authorCheng, Yueen
dc.contributor.committeechairButt, Ali R.en
dc.contributor.committeememberGupta, Aayushen
dc.contributor.committeememberCameron, Kirk W.en
dc.contributor.committeememberRibbens, Calvin J.en
dc.contributor.committeememberTilevich, Elien
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2017-08-08T08:00:16Zen
dc.date.available2017-08-08T08:00:16Zen
dc.date.issued2017-08-07en
dc.description.abstractThe growing disparity in data storage and retrieval needs of modern applications is driving the proliferation of a wide variety of storage systems (e.g., key-value stores, cloud storage services, distributed filesystems, and flash cache, etc.). While extant storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, they lack the flexibility in adapting to the ever-changing workload behaviors. Moreover, the complexities in implementing modern storage systems and adapting ever-changing storage requirements present unique opportunities and engineering challenges. In this dissertation, we design and develop a series of novel data management and storage systems solutions by applying a simple yet effective rule---workload awareness. We find that simple workload-aware data management strategies are effective in improving the efficiency of modern storage systems, sometimes by an order of magnitude. The first two works tackle the data management and storage space allocation issues at distributed and cloud storage level, while the third work focuses on low-level data management problems in the local storage system, which many high-level storage/data-intensive applications rely on. In the first part of this dissertation (Chapter 3), we propose and develop MBal, a high-performance in-memory object caching framework with adaptive multi-phase load balancing, which supports not only horizontal (scale-out) but vertical (scale-up) scalability as well. MBal is able to make efficient use of available resources in the cloud through its fine-grained, partitioned, lockless design. In the second part of this dissertation (Chapter 4 and Chapter5), we design and build CAST (Chapter 4), a Cloud Analytics Storage Tiering solution that cloud tenants can use to reduce monetary cost and improve performance of analytics workloads. The approach takes the first step towards providing storage tiering support for data analytics in the cloud. Furthermore, we propose a hybrid cloud object storage system (Chapter 5) that could effectively engage both the cloud service providers and cloud tenants via a novel dynamic pricing mechanism. In the third part of this dissertation (Chapter 6), targeting local storage, we explore offline algorithms for flash caching in terms of both hit ratio and flash lifespan. We design and implement a multi-stage heuristic by synthesizing several techniques that manage data at the granularity of a flash erasure unit (which we call a container) to approximate the offline optimal algorithm. In the fourth part of this dissertation (Chapter 7), we are focused on how to enable fast prototyping of efficient distributed key-value stores targeting a proxy-based layered architecture. In this work, we design and build {con}, a framework that significantly reduce the engineering effort required to build a full-fledged distributed key-value store. Our dissertation shows that simple workload-aware data management strategies can bring huge benefit in terms of both efficiency (i.e., performance, monetary cost, etc.) and flexibility (i.e., ease-of-use, ease-of-deployment, programmability, etc.). The principles of leveraging workload dynamicity and storage heterogeneity can be used to guide next-generation storage system software design, especially when being faced with new storage hardware technologies.en
dc.description.abstractgeneralModern storage systems often manage data without considering the dynamicity of user behaviors. This design approach does not consider the unique features of underlying storage medium either. To this end, this dissertation first studies how the combinational factors of random user workload dynamicity and inherent storage hardware heterogeneity impact the data management efficiency. This dissertation then presents a series of practical and efficient techniques, algorithms, and optimizations to make the storage systems workload-aware. The experimental evaluation demonstrates the effectiveness of our workload-aware design choices and strategies.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:12318en
dc.identifier.urihttp://hdl.handle.net/10919/78677en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectStorage Systemsen
dc.subjectCloud Computingen
dc.subjectData Managementen
dc.subjectKey-Value Storesen
dc.subjectObject Storesen
dc.subjectFlash SSDsen
dc.subjectEfficiencyen
dc.subjectFlexibilityen
dc.titleWorkload-aware Efficient Storage Systemsen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cheng_Y_D_2017.pdf
Size:
2.06 MB
Format:
Adobe Portable Document Format