Statistical Methods for Variability Management in High-Performance Computing
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
High-performance computing (HPC) variability management is an important topic in computer science. Research topics include experimental designs for efficient data collection, surrogate models for predicting the performance variability, and system configuration optimization. Due to the complex architecture of HPC systems, a comprehensive study of HPC variability needs large-scale datasets, and experimental design techniques are useful for improved data collection. Surrogate models are essential to understand the variability as a function of system parameters, which can be obtained by mathematical and statistical models. After predicting the variability, optimization tools are needed for future system designs.
This dissertation focuses on HPC input/output (I/O) variability through three main chapters. After the general introduction in Chapter 1, Chapter 2 focuses on the prediction models for the scalar description of I/O variability. A comprehensive comparison study is conducted, and major surrogate models for computer experiments are investigated. In addition, a tool is developed for system configuration optimization based on the chosen surrogate model.
Chapter 3 conducts a detailed study for the multimodal phenomena in I/O throughput distribution and proposes an uncertainty estimation method for the optimal number of runs for future experiments. Mixture models are used to identify the number of modes for throughput distributions at different configurations. This chapter also addresses the uncertainty in parameter estimation and derives a formula for sample size calculation. The developed method is then applied to HPC variability data.
Chapter 4 focuses on the prediction of functional outcomes with both qualitative and quantitative factors. Instead of a scalar description of I/O variability, the distribution of I/O throughput provides a comprehensive description of I/O variability. We develop a modified Gaussian process for functional prediction and apply the developed method to the large-scale HPC I/O variability data.
Chapter 5 contains some general conclusions and areas for future work.