Bridging Machine Learning and Experimental Design for Enhanced Data Analysis and Optimization

Guo, Qing

Bridging Machine Learning and Experimental Design for Enhanced Data Analysis and Optimization

dc.contributor.author	Guo, Qing	en
dc.contributor.committeechair	Deng, Xinwei	en
dc.contributor.committeemember	Xing, Xin	en
dc.contributor.committeemember	Zhu, Hongxiao	en
dc.contributor.committeemember	Hong, Yili	en
dc.contributor.department	Statistics	en
dc.date.accessioned	2024-07-20T08:00:11Z	en
dc.date.available	2024-07-20T08:00:11Z	en
dc.date.issued	2024-07-19	en
dc.description.abstract	Experimental design is a powerful tool for gathering highly informative observations using a small number of experiments. The demand for smart data collection strategies is increasing due to the need to save time and budget, especially in online experiments and machine learning. However, the traditional experimental design method falls short in systematically assessing changing variables' effects. Specifically within Artificial Intelligence (AI), the challenge lies in assessing the impacts of model structures and training strategies on task performances with a limited number of trials. This shortfall underscores the necessity for the development of novel approaches. On the other side, the optimal design criterion has typically been model-based in classic design literature, which leads to restricting the flexibility of experimental design strategies. However, machine learning's inherent flexibility can empower the estimation of metrics efficiently using nonparametric and optimization techniques, thereby broadening the horizons of experimental design possibilities. In this dissertation, the aim is to develop a set of novel methods to bridge the merits between these two domains: 1) applying ideas from statistical experimental design to enhance data efficiency in machine learning, and 2) leveraging powerful deep neural networks to optimize experimental design strategies. This dissertation consists of 5 chapters. Chapter 1 provides a general introduction to mutual information, fractional factorial design, hyper-parameter tuning, multi-modality, etc. In Chapter 2, I propose a new mutual information estimator FLO by integrating techniques from variational inference (VAE), contrastive learning, and convex optimization. I apply FLO to broad data science applications, such as efficient data collection, transfer learning, fair learning, etc. Chapter 3 introduces a new design strategy called multi-layer sliced design (MLSD) with the application of AI assurance. It focuses on exploring the effects of hyper-parameters under different models and optimization strategies. Chapter 4 investigates classic vision challenges via multimodal large language models by implicitly optimizing mutual information and thoroughly exploring training strategies. Chapter 5 concludes this proposal and discusses several future research topics.	en
dc.description.abstractgeneral	In the digital age, artificial intelligence (AI) is reshaping our interactions with technology through advanced machine learning models. These models are complex, often opaque mechanisms that present challenges in understanding their inner workings. This complexity necessitates numerous experiments with different settings to optimize performance, which can be costly. Consequently, it is crucial to strategically evaluate the effects of various strategies on task performance using a limited number of trials. The Design of Experiments (DoE) offers invaluable techniques for investigating and understanding these complex systems efficiently. Moreover, integrating machine learning models can further enhance the DoE. Traditionally, experimental designs pre-specify a model and focus on finding the best strategies for experimentation. This assumption can restrict the adaptability and applicability of experimental designs. However, the inherent flexibility of machine learning models can enhance the capabilities of DoE, unlocking new possibilities for efficiently optimizing experimental strategies through an information-centric approach. Moreover, the information-based method can also be beneficial in other AI applications, including self-supervised learning, fair learning, transfer learning, etc. The research presented in this dissertation aims to bridge machine learning and experimental design, offering new insights and methodologies that benefit both AI techniques and DoE.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:41113	en
dc.identifier.uri	https://hdl.handle.net/10919/120681	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Mutual Information	en
dc.subject	Sliced Design	en
dc.subject	Bayesian Optimal Design	en
dc.subject	Induced Lasso	en
dc.subject	Few-shot Learning	en
dc.subject	Variational Inference	en
dc.subject	Contrastive Learning	en
dc.title	Bridging Machine Learning and Experimental Design for Enhanced Data Analysis and Optimization	en
dc.type	Dissertation	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Guo_Q_D_2024.pdf
Size:: 11.79 MB
Format:: Adobe Portable Document Format

Download

Name:: Guo_Q_D_2024_support_1.pdf
Size:: 49.14 KB
Format:: Adobe Portable Document Format
Description:: Supporting documents

Download

Collections

Doctoral Dissertations