Towards a Flexible High-efficiency Storage System for Containerized Applications

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

Due to their tight isolation, low overhead, and efficient packaging of the execution environment, Docker containers have become a prominent solution for deploying modern applications. Consequently, a large amount of Docker images are created and this massive image dataset presents challenges to the registry and container storage infrastructure and so far has remained a largely unexplored area. Hence, there is a need of docker image characterization that can help optimize and improve the storage systems for containerized applications. Moreover, existing deduplication techniques significantly degrade the performance of registries, which will slow down the container startup time. Therefore, there is growing demand for high storage efficiency and high-performance registry storage systems. Last but not least, different storage systems can be integrated with containers as backend storage systems and provide persistent storage for containerized applications. So, it is important to analyze the performance of different backend storage systems and storage drivers and draw out the implications for container storage system design. These above observations and challenges motivate my dissertation.

In this dissertation, we aim to improve the flexibility, performance, and efficiency of the storage systems for containerized applications. To this end, we focus on the following three important aspects: Docker images, Docker registry storage system, and Docker container storage drivers with their backend storage systems. Specifically, this dissertation adopts three steps: (1) analyzing the Docker image dataset; (2) deriving the design implications; (3) designing a new storage framework for Docker registries and propose different optimizations for container storage systems.

In the first part of this dissertation (Chapter 3), we analyze over 167TB of uncompressed Docker Hub images, characterize them using multiple metrics and evaluate the potential of le level deduplication in Docker Hub. In the second part of this dissertation (Chapter 4), we conduct a comprehensive performance analysis of container storage systems based on the key insights from our image characterizations, and derive several design implications. In the third part of this dissertation (Chapter 5), we propose DupHunter, a new Docker registry architecture, which not only natively deduplicates layers for space savings but also reduces layer restore overhead. DupHunter supports several configurable deduplication modes, which provide different levels of storage efficiency, durability, and performance, to support a range of uses. In the fourth part of this dissertation (Chapter 6), we explore an innovative holistic approach, Chameleon, that employs data redundancy techniques such as replication and erasure-coding, coupled with endurance-aware write offloading, to mitigate wear level imbalance in distributed SSD-based storage systems. This high-performance fash cluster can be used for registries to speedup performance.

Containers, Distributed storage systems, Deduplication, Wear Balancing, Flash memory, Docker registry, Docker images