Performance Portability of CUDA Across NVIDIA GPU Architectures

Coyne, Timothy Patrick

Performance Portability of CUDA Across NVIDIA GPU Architectures

dc.contributor.author	Coyne, Timothy Patrick	en
dc.contributor.committeechair	Nikolopoulos, Dimitrios S.	en
dc.contributor.committeemember	Sandu, Adrian	en
dc.contributor.committeemember	Feng, Wu-Chun	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2025-06-04T08:05:24Z	en
dc.date.available	2025-06-04T08:05:24Z	en
dc.date.issued	2025-06-03	en
dc.description.abstract	Graphics Processing Units (GPUs) provide impressive parallel performance that makes them invaluable to a number of computational workloads such as machine learning, simulations, and many others. NVIDIA GPUs currently outperform all of their competitors and thus make up the lion's share of today's market. Importantly, they are natively programmed using the proprietary framework Compute Unified Device Architecture (CUDA), which only compiles to machine code for NVIDIA hardware. Moreover, NVIDIA releases a new GPU with an updated architecture roughly every two to three years. Since CUDA is commonly forward compatible with the next generation of GPUs, it is natural to reuse CUDA code built for a previous architecture on a newer one. Unfortunately, the performance of CUDA applications from one architecture to the next does not necessarily benefit from the newer generation of GPUs. This work investigates a variety of CUDA workloads that fail to show a performance uplift moving from the V100 to A100 GPUs. While some kernels perform as expected, others exhibit up to a 700% performance drop when running on the newer architecture. For each, an analysis of the benchmarks is provided, and for some, a direct solution for improving performance portability is highlighted, where possible. These issues are also cross examined to provide a few holistic portability concerns. At the end, a set of programmer recommendations are made to assist developers in more easily maintaining performance portability between architectures.	en
dc.description.abstractgeneral	Graphics Processing Units (GPUs) provide high parallel performance by executing instructions across many smaller internal compute units. This high parallel performance greatly benefits numerous workloads, including machine learning, simulations, and many others. NVIDIA currently has the largest market share of GPUs, which are natively programmed using Compute Unified Device Architecture (CUDA). The company typically releases a new family of GPUs with updated architectures every 2 to 3 years. Given that CUDA is the standard language for programming these GPUs and the relatively high frequency at which new architectures are released by NVIDIA, it is essential for CUDA applications to exhibit strong performance portability. In other words, a new GPU should provide uplift for pre-existing kernels proportional to its generational improvements. Unfortunately, this is not always the case which means developers must sometimes retrofit their code in order to obtain optimal performance. This research investigates the performance portability of a number of different workloads and provides a set of programmer recommendations to assist developers in maximizing performance portability.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:43653	en
dc.identifier.uri	https://hdl.handle.net/10919/135038	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	GPU	en
dc.subject	CUDA	en
dc.subject	Performance Portability	en
dc.title	Performance Portability of CUDA Across NVIDIA GPU Architectures	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Coyne_TP_T_2025.pdf
Size:: 6.34 MB
Format:: Adobe Portable Document Format

Download

Name:: Coyne_TP_T_2025_support_1.pdf
Size:: 33.94 KB
Format:: Adobe Portable Document Format
Description:: Supporting documents

Download

Collections

Masters Theses