User-based I/O Profiling for Leadership Scale HPC Workloads

TR Number

Date

2025-01-04

Journal Title

Journal ISSN

Volume Title

Publisher

ACM

Abstract

I/O constitutes a significant portion of most of the application runtime. Spawning many such applications concurrently on an HPC system leads to severe I/O contention. Thus, understanding and subsequently reducing I/O contention induced by such multi-tenancy is critical for the efficient and reliable performance of the HPC system. In this study, we demonstrate that an application’s performance is influenced by the command line arguments passed to the job submission. We model an application’s I/O behavior based on two factors: past I/O behavior within a time window and userconfigured I/O settings via command-line arguments. We conclude that I/O patterns for well-known HPC applications like E3SM and LAMMP are predictable, with an average uncertainty below 0.25 (A probability of 80%) and near zero (A probability of 100%) within a day. However, I/O pattern variance increases as the study time window lengthens. Additionally, we show that for 38 users and at least 50 applications constituting approximately 93000 job submissions, there is a high correlation between a submitted command line and the past command lines made within 1 to 10 days submitted by the user. We claim the length of this time window is unique per user.

Description

Keywords

Citation