An End-to-End High-performance Deduplication Scheme for Docker Registries and Docker Container Storage Systems

Zhao, Nannan; Lin, Muhui; Albahar, Hadeel; Paul, Arnab K.; Huan, Zhijie; Abraham, Subil; Chen, Keren; Tarasov, Vasily; Skourtis, Dimitrios; Anwar, Ali; Butt, Ali R.

An End-to-End High-performance Deduplication Scheme for Docker Registries and Docker Container Storage Systems

dc.contributor.author	Zhao, Nannan	en
dc.contributor.author	Lin, Muhui	en
dc.contributor.author	Albahar, Hadeel	en
dc.contributor.author	Paul, Arnab K.	en
dc.contributor.author	Huan, Zhijie	en
dc.contributor.author	Abraham, Subil	en
dc.contributor.author	Chen, Keren	en
dc.contributor.author	Tarasov, Vasily	en
dc.contributor.author	Skourtis, Dimitrios	en
dc.contributor.author	Anwar, Ali	en
dc.contributor.author	Butt, Ali R.	en
dc.date.accessioned	2024-03-01T13:17:36Z	en
dc.date.available	2024-03-01T13:17:36Z	en
dc.date.issued	2024	en
dc.date.updated	2024-02-01T08:47:23Z	en
dc.description.abstract	The wide adoption of Docker containers for supporting agile and elastic enterprise applications has led to a broad proliferation of container images. The associated storage performance and capacity requirements place high pressure on the infrastructure of container registries that store and distribute images and container storage systems on the Docker client side that manage image layers and store ephemeral data generated at container runtime. The storage demand is worsened by the large amount of duplicate data in images. Moreover, container storage systems that use Copy-on-Write (CoW) file systems as storage drivers exacerbate the redundancy. Exploiting the high file redundancy in real-world images is a promising approach to drastically reduce the growing storage requirements of container registries and improve the space efficiency of container storage systems. However, existing deduplication techniques significantly degrade the performance of both registries and container storage systems because of data reconstruction overhead as well as the deduplication cost. We propose DupHunter, an end-to-end deduplication that deduplicates layers for both Docker registries and container storage systems while maintaining a high image distribution speed and container I/O performance. DupHunter is divided into 3 tiers: Docker registry tier, middle tier, and client tier. Specifically, we first build a high-performance deduplication engine at the Docker registry tier that not only natively deduplicates layers for space savings but also reduces layer restore overhead. Then, we use deduplication offloading at the middle tier that utilizes the deduplication engine to eliminate the redundant files from the client tier, which avoids introducing deduplication overhead to the Docker client side. To further reduce the data duplicates caused by CoW and improve the container I/O performance, we use a container-aware backing file system at the client tier that preallocates space for each container and ensures that files in a container and its modifications are placed and redirected closer on the disk to maintain locality. Under real workloads, DupHunter reduces storage space by up to 6.9× and reduces the GET layer latency by up to 2.8× compared to the state-of-the-art. Moreover, DupHunter can improve the container I/O performance by up to 93% for reads and 64% for writes.	en
dc.description.version	Accepted version	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1145/3643819	en
dc.identifier.uri	https://hdl.handle.net/10919/118222	en
dc.language.iso	en	en
dc.publisher	ACM	en
dc.rights	In Copyright	en
dc.rights.holder	The author(s)	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.title	An End-to-End High-performance Deduplication Scheme for Docker Registries and Docker Container Storage Systems	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 3643819.pdf
Size:: 1.28 MB
Format:: Adobe Portable Document Format
Description:: Accepted version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Journal Articles, Association for Computing Machinery (ACM)
Scholarly Works, Computer Science