BLP: Block-Level Pipelining for GPUs

dc.contributor.authorFeng, Wu-Chunen
dc.contributor.authorCui, Xuewenen
dc.contributor.authorScogland, Thomasen
dc.contributor.authorDe Supinski, Bronisen
dc.date.accessioned2024-08-07T12:10:18Zen
dc.date.available2024-08-07T12:10:18Zen
dc.date.issued2024-05-07en
dc.date.updated2024-08-01T07:51:17Zen
dc.description.abstractProgramming models like OpenMP offer expressive interfaces to program graphics processing units (GPUs) via directive-based off-load. By default, these models copy data to or from the device without overlapping computation, thus impacting performance. Rather than leave the onerous task of manually pipelining and tuning data communication and computation to the end user, we propose an OpenMP extension that supports block-level pipelining and, in turn, present our block-level pipelining (BLP) approach that overlaps data communication and computation in a single kernel. BLP uses persistent thread blocks with cooperative thread groups to process sub-tasks on different streaming multiprocessors and uses GPU flag arrays to enforce task dependencies without CPU involvement. To demonstrate the efficacy of BLP, we evaluate its performance using multiple benchmarks on NVIDIA V100 GPUs. Our experimental results show that BLP achieves 95% to 114% of the performance of hand-tuned kernel-level pipelining. In addition, using BLP with buffer mapping can reduce memory usage to support GPU memory oversubscription.We also show that BLP can reduce memory usage by 75% to 86% for data sets that exceed GPU memory while providing significantly better performance than CUDA Unified Memory (UM) with prefetching optimizations.en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1145/3649153.3649214en
dc.identifier.urihttps://hdl.handle.net/10919/120877en
dc.language.isoenen
dc.publisherACMen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.holderThe author(s)en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleBLP: Block-Level Pipelining for GPUsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3649153.3649214.pdf
Size:
678.48 KB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: