Statistical Methods for Variability Management in High-Performance Computing

Xu, Li

Statistical Methods for Variability Management in High-Performance Computing

dc.contributor.author	Xu, Li	en
dc.contributor.committeechair	Hong, Yili	en
dc.contributor.committeechair	Watson, Layne T.	en
dc.contributor.committeemember	Smith, Eric P.	en
dc.contributor.committeemember	Gramacy, Robert B.	en
dc.contributor.committeemember	Deng, Xinwei	en
dc.contributor.department	Statistics	en
dc.date.accessioned	2021-07-16T08:00:08Z	en
dc.date.available	2021-07-16T08:00:08Z	en
dc.date.issued	2021-07-15	en
dc.description.abstract	High-performance computing (HPC) variability management is an important topic in computer science. Research topics include experimental designs for efficient data collection, surrogate models for predicting the performance variability, and system configuration optimization. Due to the complex architecture of HPC systems, a comprehensive study of HPC variability needs large-scale datasets, and experimental design techniques are useful for improved data collection. Surrogate models are essential to understand the variability as a function of system parameters, which can be obtained by mathematical and statistical models. After predicting the variability, optimization tools are needed for future system designs. This dissertation focuses on HPC input/output (I/O) variability through three main chapters. After the general introduction in Chapter 1, Chapter 2 focuses on the prediction models for the scalar description of I/O variability. A comprehensive comparison study is conducted, and major surrogate models for computer experiments are investigated. In addition, a tool is developed for system configuration optimization based on the chosen surrogate model. Chapter 3 conducts a detailed study for the multimodal phenomena in I/O throughput distribution and proposes an uncertainty estimation method for the optimal number of runs for future experiments. Mixture models are used to identify the number of modes for throughput distributions at different configurations. This chapter also addresses the uncertainty in parameter estimation and derives a formula for sample size calculation. The developed method is then applied to HPC variability data. Chapter 4 focuses on the prediction of functional outcomes with both qualitative and quantitative factors. Instead of a scalar description of I/O variability, the distribution of I/O throughput provides a comprehensive description of I/O variability. We develop a modified Gaussian process for functional prediction and apply the developed method to the large-scale HPC I/O variability data. Chapter 5 contains some general conclusions and areas for future work.	en
dc.description.abstractgeneral	This dissertation focuses on three projects that are all related to statistical methods in performance variability management in high-performance computing (HPC). HPC systems are computer systems that create high performance by aggregating a large number of computing units. The performance of HPC is measured by the throughput of a benchmark called the IOZone Filesystem Benchmark. The performance variability is the variation among throughputs when the system configuration is fixed. Variability management involves studying the relationship between performance variability and the system configuration. In Chapter 2, we use several existing prediction models to predict the standard deviation of throughputs given different system configurations and compare the accuracy of predictions. We also conduct HPC system optimization using the chosen prediction model as the objective function. In Chapter 3, we use the mixture model to determine the number of modes in the distribution of throughput under different system configurations. In addition, we develop a model to determine the number of additional runs for future benchmark experiments. In Chapter 4, we develop a statistical model that can predict the throughout distributions given the system configurations. We also compare the prediction of summary statistics of the throughput distributions with existing prediction models.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:31861	en
dc.identifier.uri	http://hdl.handle.net/10919/104184	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	computer experiments	en
dc.subject	functional prediction	en
dc.subject	Gaussian process	en
dc.subject	Machine learning	en
dc.subject	prediction model	en
dc.subject	performance variability	en
dc.subject	mixture model	en
dc.subject	quantile regression	en
dc.title	Statistical Methods for Variability Management in High-Performance Computing	en
dc.type	Dissertation	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Xu_L_D_2021.pdf
Size:: 3.21 MB
Format:: Adobe Portable Document Format

Download

Name:: Xu_L_D_2021_support_1.pages
Size:: 169.59 KB
Format:: Unknown data format
Description:: Supporting documents

Download

Collections

Doctoral Dissertations