Browsing by Author "Zhao, Nannan"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- Building A Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud StorageXu, Peng; Zhao, Nannan; Wan, Jiguang; Liu, Wei; Chen, Shuning; Zhou, Yuanhui; Albahar, Hadeel; Liu, Hanyang; Tang, Liu; Tan, Zhihu (ACM, 2022-05-25)The explosive growth of modern web-scale applications has made cost-effectiveness a primary design goal for their underlying databases. As a backbone of modern databases, LSM-tree based key-value stores (LSM store) face limited storage options. They are either designed for local storage that is relatively small, expensive, and fast or for cloud storage that offers larger capacities at reduced costs but slower. Designing an LSM store by integrating local storage with cloud storage services is a promising way to balance the cost and performance. However, such design faces challenges such as data reorganization, metadata overhead, and reliability issues. In this paper, we propose RocksMash, a fast and efficient LSM store that uses local storage to store frequently accessed data and metadata while using cloud to hold the rest of the data to achieve cost-effectiveness. To improve metadata space-efficiency and read performance, RocksMash uses an LSM-aware persistent cache that stores metadata in a space-efficient way and stores popular data blocks by using compaction-aware layouts. Moreover, RocksMash uses an extended write-ahead log for fast parallel data recovery. We implemented RocksMash by embedding these designs into RocksDB. The evaluation results show that RocksMash improves the performance by up to 1.7x compared to the state-of-the-art schemes and delivers high reliability, cost-effectiveness, and fast recovery.
- An End-to-End High-performance Deduplication Scheme for Docker Registries and Docker Container Storage SystemsZhao, Nannan; Lin, Muhui; Albahar, Hadeel; Paul, Arnab K.; Huan, Zhijie; Abraham, Subil; Chen, Keren; Tarasov, Vasily; Skourtis, Dimitrios; Anwar, Ali; Butt, Ali R. (ACM, 2024)The wide adoption of Docker containers for supporting agile and elastic enterprise applications has led to a broad proliferation of container images. The associated storage performance and capacity requirements place high pressure on the infrastructure of container registries that store and distribute images and container storage systems on the Docker client side that manage image layers and store ephemeral data generated at container runtime. The storage demand is worsened by the large amount of duplicate data in images. Moreover, container storage systems that use Copy-on-Write (CoW) file systems as storage drivers exacerbate the redundancy. Exploiting the high file redundancy in real-world images is a promising approach to drastically reduce the growing storage requirements of container registries and improve the space efficiency of container storage systems. However, existing deduplication techniques significantly degrade the performance of both registries and container storage systems because of data reconstruction overhead as well as the deduplication cost. We propose DupHunter, an end-to-end deduplication that deduplicates layers for both Docker registries and container storage systems while maintaining a high image distribution speed and container I/O performance. DupHunter is divided into 3 tiers: Docker registry tier, middle tier, and client tier. Specifically, we first build a high-performance deduplication engine at the Docker registry tier that not only natively deduplicates layers for space savings but also reduces layer restore overhead. Then, we use deduplication offloading at the middle tier that utilizes the deduplication engine to eliminate the redundant files from the client tier, which avoids introducing deduplication overhead to the Docker client side. To further reduce the data duplicates caused by CoW and improve the container I/O performance, we use a container-aware backing file system at the client tier that preallocates space for each container and ensures that files in a container and its modifications are placed and redirected closer on the disk to maintain locality. Under real workloads, DupHunter reduces storage space by up to 6.9× and reduces the GET layer latency by up to 2.8× compared to the state-of-the-art. Moreover, DupHunter can improve the container I/O performance by up to 93% for reads and 64% for writes.
- Towards a Flexible High-efficiency Storage System for Containerized ApplicationsZhao, Nannan (Virginia Tech, 2020-10-08)Due to their tight isolation, low overhead, and efficient packaging of the execution environment, Docker containers have become a prominent solution for deploying modern applications. Consequently, a large amount of Docker images are created and this massive image dataset presents challenges to the registry and container storage infrastructure and so far has remained a largely unexplored area. Hence, there is a need of docker image characterization that can help optimize and improve the storage systems for containerized applications. Moreover, existing deduplication techniques significantly degrade the performance of registries, which will slow down the container startup time. Therefore, there is growing demand for high storage efficiency and high-performance registry storage systems. Last but not least, different storage systems can be integrated with containers as backend storage systems and provide persistent storage for containerized applications. So, it is important to analyze the performance of different backend storage systems and storage drivers and draw out the implications for container storage system design. These above observations and challenges motivate my dissertation. In this dissertation, we aim to improve the flexibility, performance, and efficiency of the storage systems for containerized applications. To this end, we focus on the following three important aspects: Docker images, Docker registry storage system, and Docker container storage drivers with their backend storage systems. Specifically, this dissertation adopts three steps: (1) analyzing the Docker image dataset; (2) deriving the design implications; (3) designing a new storage framework for Docker registries and propose different optimizations for container storage systems. In the first part of this dissertation (Chapter 3), we analyze over 167TB of uncompressed Docker Hub images, characterize them using multiple metrics and evaluate the potential of le level deduplication in Docker Hub. In the second part of this dissertation (Chapter 4), we conduct a comprehensive performance analysis of container storage systems based on the key insights from our image characterizations, and derive several design implications. In the third part of this dissertation (Chapter 5), we propose DupHunter, a new Docker registry architecture, which not only natively deduplicates layers for space savings but also reduces layer restore overhead. DupHunter supports several configurable deduplication modes, which provide different levels of storage efficiency, durability, and performance, to support a range of uses. In the fourth part of this dissertation (Chapter 6), we explore an innovative holistic approach, Chameleon, that employs data redundancy techniques such as replication and erasure-coding, coupled with endurance-aware write offloading, to mitigate wear level imbalance in distributed SSD-based storage systems. This high-performance fash cluster can be used for registries to speedup performance.