Applying the Midas Touch of Reproducibility to High-Performance Computing
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With the exponentially improving serial performance of CPUs from the 1980s and 1990s slowing to a standstill by the 2010s, the high-performance computing (HPC) community has seen parallel computing become ubiquitous, which, in turn, has led to a proliferation of parallel programming models, including CUDA, OpenACC, OpenCL, OpenMP, and SYCL. This diversity in hardware platform and programming model has forced application users to port their codes from one hardware platform to another (e.g., CUDA on NVIDIA GPU to HIP or OpenCL on AMD GPU) and demonstrate reproducibility via adhoc testing. To more rigorously ensure reproducibility between codes, we propose Midas, a system to ensure that the results of the original code match the results of the ported code by leveraging the power of snapshots to capture the state of a system before and after the execution of a kernel.