Advanced Robust Statistical Learning Methods with Application in Healthcare and Manufacturing
| dc.contributor.author | Chen, Yixin | en |
| dc.contributor.committeechair | Liu, Meimei | en |
| dc.contributor.committeemember | Kim, Inyoung | en |
| dc.contributor.committeemember | Deng, Xinwei | en |
| dc.contributor.committeemember | Xing, Xin | en |
| dc.contributor.department | Statistics | en |
| dc.date.accessioned | 2025-09-09T08:00:53Z | en |
| dc.date.available | 2025-09-09T08:00:53Z | en |
| dc.date.issued | 2025-09-08 | en |
| dc.description.abstract | This dissertation presents the development and validation of advanced robust statistical methods tailored for applications in healthcare and manufacturing. This work consists of three main parts, each addressing unique challenges and demonstrating the necessity of robust algorithms in statistical learning. In the first part, motivated by the need to understand the relationship between brain networks and phenotypes of interest in small-scale neuroimaging studies with limited sample size, I developed a flow-based generative model termed Disentangled Adversarial Flow or DAF for short, which leverages large-scale multi-source datasets to improve prediction accuracy in neuroimaging studies with smaller sample sizes. A bidirectional-generative architecture and a kernel-based dependence measure are utilized to generate domain-invariant brain connectome. An ensemble-based DAF regression framework is proposed to integrate information from multiple source datasets to improve prediction on the target dataset. This framework ensures reliable predictions with limited sample sizes by borrowing information from other data sources despite the heterogeneity across different sources, exemplifying robustness in statistical learning. Similar challenges arise in the manufacturing context, where variations in product designs, process parameters, and sensor configurations generate diverse data distributions. This poses challenges for developing machine learning pipelines that can consistently achieve high performance under varying conditions. Motivated by this, the second part of the dissertation introduces a weighted ensemble mechanism based on the Bayesian Latent Space Model recommender system that optimizes sparse ensemble weights while incorporating uncertainty quantification. This method allows automatically selecting and adapting optimal pipelines, which helps data-driven decision-making in industrial settings. By automating the selection and adaptation of optimal machine learning pipelines, this method demonstrates robustness by maintaining high performance in the face of changing industrial data conditions. Distribution shifts are also common in medical records, where heterogeneity across different individuals hinders automated diagnosis for patients. A robust algorithm could generalize across different patients and lead to more accurate personalized patient care. Inspired by this, the third part proposes a latent factor model based on Interleaved-window Transformer to characterize the inter-subject heterogeneity, focusing on heterogeneous physiological time series data derived from Electronic Health Records, electrocardiograms, electroencephalograms and etc. Different factors in the latent factor model represent different characteristics of the time series. These latent factors are linked to the response through subject-specific weight, which captures varying contributions to the response in different subjects. Contrastive learning is utilized to estimate the weights for new subject not seen in the training phase. This part underlines the theme of robustness by developing a model that adapts to individual differences, ensuring that the statistical learning methods are effective across diverse patient data. This dissertation demonstrates the value of robustness as a unifying theme in advancing statistical learning methodologies and their applications. | en |
| dc.description.abstractgeneral | This dissertation presents the development and validation of advanced robust statistical methods tailored for applications in healthcare and manufacturing. These methods are robust, meaning they perform reliably under varying conditions. This work consists of three main parts, each addressing unique challenges and demonstrating the necessity of robust algorithms in statistical learning. In the first part, motivated by the need to understand the relationship between brain networks and human cognitive ability in small-scale neuroimaging studies with limited sample size, I developed a model termed Disentangled Adversarial Flow, or DAF for short, to improve the prediction of human cognitive ability in brain imaging studies, especially when the amount of data is limited. DAF effectively utilizes information from different large-scale datasets to improve prediction and enhance understanding in smaller-scale datasets by learning essential information that remain consistent across different datasets. This framework ensures reliable predictions with limited sample sizes by borrowing information from other data sources despite the heterogeneity across different sources, exemplifying robustness in statistical learning. Similar challenges arise in the manufacturing context, where variations in product designs, machine settings, and measurements generate data distribution shifts. This poses challenges for developing machine learning pipelines that consistently achieve high performance under varying conditions, where the pipelines refer to the integration of different methods across multiple steps of the machine learning process. Motivated by this, the second part of the dissertation proposes an automated workflow to construct new machine-learning pipelines that consistently achieve high performance for various datasets collected from varying manufacturing settings. The new pipelines are constructed via a weighted average of predictions from existing pipelines. This method allows automatically selecting and adapting optimal pipelines, which helps data-driven decision-making in industrial settings. By automating the selection and adaptation of optimal machine learning pipelines, this method demonstrates robustness by maintaining high performance in the face of changing industrial data conditions. Distribution shifts are also common in medical records, where heterogeneity across different individuals hinders automated diagnosis for patients. A robust algorithm could generalize across different patients and lead to more accurate personalized patient care. Inspired by this, the third part proposes a latent factor model to characterize the heterogeneity in complex time-series data from health records and various monitoring devices. Heterogeneity means the differences caused by genetic, environmental, and lifestyle differences of different individuals. The model identifies different influencing factors in the data, which are then linked to patient outcomes. Each factor contributes differently depending on the patient, allowing for tailored predictions. This part underlines the theme of robustness by developing a model that adapts to individual differences, ensuring that the statistical learning methods are effective across diverse patient data. This dissertation demonstrates the value of robustness as a unifying theme in advancing statistical learning methodologies and their applications. | en |
| dc.description.degree | Doctor of Philosophy | en |
| dc.format.medium | ETD | en |
| dc.identifier.other | vt_gsexam:44485 | en |
| dc.identifier.uri | https://hdl.handle.net/10919/137643 | en |
| dc.language.iso | en | en |
| dc.publisher | Virginia Tech | en |
| dc.rights | In Copyright | en |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
| dc.subject | Robust statistical methods | en |
| dc.subject | Uncertainty quantification | en |
| dc.subject | Deep learning | en |
| dc.subject | Latent Factor Model | en |
| dc.subject | Ensemble learning | en |
| dc.title | Advanced Robust Statistical Learning Methods with Application in Healthcare and Manufacturing | en |
| dc.type | Dissertation | en |
| thesis.degree.discipline | Statistics | en |
| thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
| thesis.degree.level | doctoral | en |
| thesis.degree.name | Doctor of Philosophy | en |
Files
Original bundle
1 - 1 of 1