Performance Portability of CUDA Across NVIDIA GPU Architectures

Coyne, Timothy Patrick

Performance Portability of CUDA Across NVIDIA GPU Architectures

Files

Coyne_TP_T_2025.pdf (6.34 MB)

Downloads: 173

Supporting documents (33.94 KB)

Downloads: 20

Date

2025-06-03

Authors

Coyne, Timothy Patrick

Publisher

Virginia Tech

Abstract

Graphics Processing Units (GPUs) provide impressive parallel performance that makes them invaluable to a number of computational workloads such as machine learning, simulations, and many others. NVIDIA GPUs currently outperform all of their competitors and thus make up the lion's share of today's market. Importantly, they are natively programmed using the proprietary framework Compute Unified Device Architecture (CUDA), which only compiles to machine code for NVIDIA hardware. Moreover, NVIDIA releases a new GPU with an updated architecture roughly every two to three years. Since CUDA is commonly forward compatible with the next generation of GPUs, it is natural to reuse CUDA code built for a previous architecture on a newer one. Unfortunately, the performance of CUDA applications from one architecture to the next does not necessarily benefit from the newer generation of GPUs. This work investigates a variety of CUDA workloads that fail to show a performance uplift moving from the V100 to A100 GPUs. While some kernels perform as expected, others exhibit up to a 700% performance drop when running on the newer architecture. For each, an analysis of the benchmarks is provided, and for some, a direct solution for improving performance portability is highlighted, where possible. These issues are also cross examined to provide a few holistic portability concerns. At the end, a set of programmer recommendations are made to assist developers in more easily maintaining performance portability between architectures.

Keywords

GPU, CUDA, Performance Portability

Persistent link

https://hdl.handle.net/10919/135038

Collections

Masters Theses

Full item page

Performance Portability of CUDA Across NVIDIA GPU Architectures

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections