Performance Portability of CUDA Across NVIDIA GPU Architectures

dc.contributor.authorCoyne, Timothy Patricken
dc.contributor.committeechairNikolopoulos, Dimitrios S.en
dc.contributor.committeememberSandu, Adrianen
dc.contributor.committeememberFeng, Wu-Chunen
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2025-06-04T08:05:24Zen
dc.date.available2025-06-04T08:05:24Zen
dc.date.issued2025-06-03en
dc.description.abstractGraphics Processing Units (GPUs) provide impressive parallel performance that makes them invaluable to a number of computational workloads such as machine learning, simulations, and many others. NVIDIA GPUs currently outperform all of their competitors and thus make up the lion's share of today's market. Importantly, they are natively programmed using the proprietary framework Compute Unified Device Architecture (CUDA), which only compiles to machine code for NVIDIA hardware. Moreover, NVIDIA releases a new GPU with an updated architecture roughly every two to three years. Since CUDA is commonly forward compatible with the next generation of GPUs, it is natural to reuse CUDA code built for a previous architecture on a newer one. Unfortunately, the performance of CUDA applications from one architecture to the next does not necessarily benefit from the newer generation of GPUs. This work investigates a variety of CUDA workloads that fail to show a performance uplift moving from the V100 to A100 GPUs. While some kernels perform as expected, others exhibit up to a 700% performance drop when running on the newer architecture. For each, an analysis of the benchmarks is provided, and for some, a direct solution for improving performance portability is highlighted, where possible. These issues are also cross examined to provide a few holistic portability concerns. At the end, a set of programmer recommendations are made to assist developers in more easily maintaining performance portability between architectures.en
dc.description.abstractgeneralGraphics Processing Units (GPUs) provide high parallel performance by executing instructions across many smaller internal compute units. This high parallel performance greatly benefits numerous workloads, including machine learning, simulations, and many others. NVIDIA currently has the largest market share of GPUs, which are natively programmed using Compute Unified Device Architecture (CUDA). The company typically releases a new family of GPUs with updated architectures every 2 to 3 years. Given that CUDA is the standard language for programming these GPUs and the relatively high frequency at which new architectures are released by NVIDIA, it is essential for CUDA applications to exhibit strong performance portability. In other words, a new GPU should provide uplift for pre-existing kernels proportional to its generational improvements. Unfortunately, this is not always the case which means developers must sometimes retrofit their code in order to obtain optimal performance. This research investigates the performance portability of a number of different workloads and provides a set of programmer recommendations to assist developers in maximizing performance portability.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:43653en
dc.identifier.urihttps://hdl.handle.net/10919/135038en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectGPUen
dc.subjectCUDAen
dc.subjectPerformance Portabilityen
dc.titlePerformance Portability of CUDA Across NVIDIA GPU Architecturesen
dc.typeThesisen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Coyne_TP_T_2025.pdf
Size:
6.34 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Coyne_TP_T_2025_support_1.pdf
Size:
33.94 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents

Collections