dc.description.abstract | The growing disparity in data storage and retrieval needs of modern
applications is driving the proliferation of a wide variety of
storage systems (e.g., key-value stores, cloud storage services,
distributed filesystems, and flash cache, etc.). While extant storage
systems are designed and tuned for a specific set of applications
targeting a range of workload characteristics, they lack the
flexibility in adapting to the ever-changing workload behaviors.
Moreover, the complexities in implementing modern storage systems and
adapting ever-changing storage requirements present unique
opportunities and engineering challenges.
In this dissertation, we design and develop a series of novel data
management and storage systems solutions by applying a simple yet
effective rule---workload awareness. We find that simple
workload-aware data management strategies are effective in improving
the efficiency of modern storage systems, sometimes by an order of
magnitude. The first two works tackle the data management and
storage space allocation issues at distributed and cloud storage
level, while the third work focuses on low-level data management
problems in the local storage system, which many high-level
storage/data-intensive applications rely on.
In the first part of this dissertation (Chapter~ref{ch:mbal}), we
propose and develop MBal, a high-performance in-memory object caching
framework with adaptive multi-phase load balancing, which supports
not only horizontal (scale-out) but vertical (scale-up) scalability
as well. MBal is able to make efficient use of available resources in
the cloud through its fine-grained, partitioned, lockless design.
In the second part of this dissertation (Chapter~ref{ch:cast} and
Chapter~ref{ch:pricing}), we design and build CAST
(Chapter~ref{ch:cast}), a Cloud Analytics Storage Tiering solution
that cloud tenants can use to reduce monetary cost and improve
performance of analytics workloads. The approach takes the first step
towards providing storage tiering support for data analytics in the
cloud. Furthermore, we propose a hybrid cloud object storage system
(Chapter~ref{ch:pricing}) that could effectively engage both the
cloud service providers and cloud tenants via a novel dynamic pricing
mechanism. In the third part of this dissertation (Chapter~ref{ch:offline}),
targeting local storage, we explore offline algorithms for flash
caching in terms of both hit ratio and flash lifespan. We design and
implement a multi-stage heuristic by synthesizing several techniques
that manage data at the granularity of a flash erasure unit (which we
call a container) to approximate the offline optimal algorithm. In
the fourth part of this dissertation (Chapter~ref{ch:turnkey}), we
are focused on how to enable fast prototyping of efficient
distributed key-value stores targeting a proxy-based layered
architecture. In this work, we design and build {con}, a framework
that significantly reduce the engineering effort required to build a
full-fledged distributed key-value store.
Our dissertation shows that simple workload-aware data management
strategies can bring huge benefit in terms of both efficiency (i.e.,
performance, monetary cost, etc.) and flexibility (i.e., ease-of-use,
ease-of-deployment, programmability, etc.). The principles of
leveraging workload dynamicity and storage heterogeneity can be used
to guide next-generation storage system software design, especially
when being faced with new storage hardware technologies. | en |