SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training

dc.contributor.authorKhan, Redwanen
dc.contributor.authorYazdani, Ahmaden
dc.contributor.authorFu, Yuqien
dc.contributor.authorPaul, Arnaben
dc.contributor.authorJi, Boen
dc.contributor.authorJian, Xunen
dc.contributor.authorCheng, Yueen
dc.contributor.authorButt, Alien
dc.date.accessioned2024-02-19T14:22:29Zen
dc.date.available2024-02-19T14:22:29Zen
dc.date.issued2023en
dc.description.abstractDeep learning training (DLT) applications exhibit unique I/O workload behaviors that pose new challenges for storage system design. DLT is I/O intensive since data samples need to be fetched continuously from a remote storage. Accelerators such as GPUs have been extensively used to support these applications. As accelerators become more powerful and more data-hungry, the I/O performance lags behind. This creates a crucial performance bottleneck, especially in distributed DLT. At the same time, the exponentially growing dataset sizes make it impossible to store these datasets entirely in memory. While today’s DLT frameworks typically use a random sampling policy that treat all samples uniformly equally, recent findings indicate that not all samples are equally important and different data samples contribute differently towards improving the accuracy of a model. This observation creates an opportunity for DLT I/O optimizations by exploiting the data locality enabled by importance sampling. To this end, we design and implement SHADE, a new DLT-aware caching system that detects fine-grained importance variations at per-sample level and leverages the variance to make informed caching decisions for a distributed DLT job. SHADE adopts a novel, rank-based approach, which captures the relative importance of data samples across different minibatches. SHADE then dynamically updates the importance scores of all samples during training. With these techniques, SHADE manages to significantly improve the cache hit ratio of the DLT job, and thus, improves the job’s training performance. Evaluation with representative computer vision (CV) models shows that SHADE, with a small cache, improves the cache hit ratio by up to 4.5× compared to the LRU caching policy.en
dc.description.versionPublished versionen
dc.format.extentPages 135-151en
dc.format.extent17 page(s)en
dc.format.mimetypeapplication/pdfen
dc.identifier.orcidJi, Bo [0000-0003-0149-7509]en
dc.identifier.orcidJian, Xun [0000-0002-7120-7426]en
dc.identifier.orcidButt, Ali [0000-0002-0871-7263]en
dc.identifier.urihttps://hdl.handle.net/10919/118017en
dc.language.isoenen
dc.publisherUsenix Associationen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.titleSHADE: Enable Fundamental Cacheability for Distributed Deep Learning Trainingen
dc.title.serialUSENIX FAST 2023en
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherArticleen
dcterms.dateAccepted2022-12-09en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Computer Scienceen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
fast23-khan.pdf
Size:
856.85 KB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Plain Text
Description: