High-dimensional Multimodal Bayesian Learning
dc.contributor.author | Salem, Mohamed Mahmoud | en |
dc.contributor.committeechair | Kim, Inyoung | en |
dc.contributor.committeemember | Franck, Christopher Thomas | en |
dc.contributor.committeemember | Van Mullekom, Jennifer Huffman | en |
dc.contributor.committeemember | Gramacy, Robert B. | en |
dc.contributor.department | Statistics | en |
dc.date.accessioned | 2024-12-13T09:00:19Z | en |
dc.date.available | 2024-12-13T09:00:19Z | en |
dc.date.issued | 2024-12-12 | en |
dc.description.abstract | High-dimensional datasets are fast becoming a cornerstone across diverse domains, fueled by advancements in data-capturing technology like DNA sequencing, medical imaging techniques, and social media. This dissertation delves into the inherent opportunities and challenges posed by these types of datasets. We develop three Bayesian methods: (1) Multilevel Network Recovery for Genomics, (2) Network Recovery for Functional data, and (3) Bayesian Inference in Transformer-based Models. Chapter 2 in our work examines a two-tiered data structure; to simultaneously explore the variable selection and identify dependency structures among both higher and lower-level variables, we propose a multi-level nonparametric kernel machine approach, utilizing variational inference to jointly identify multi-level variables as well as build the network. Chapter 3 addresses the development of a simultaneous selection of functional domain subsets, selection of functional graphical nodes, and continuous response modeling given both scalar and functional covariates under semiparametric, nonadditive models, which allow us to capture unknown, possibly nonlinear, interaction terms among high dimensional functional variables. In Chapter 4, we extend our investigation of leveraging structure in high dimensional datasets to the relatively new transformer architecture; we introduce a new penalty structure to the Bayesian classification transformer, leveraging the multi-tiered structure of the transformer-based model. This allows for increased, likelihood-based regularization, which is needed given the high dimensional nature of our motivating dataset. This new regularization approach allows us to integrate Bayesian inference via variational approximations into our transformer-based model and improves the calibration of probability estimates. | en |
dc.description.abstractgeneral | In today's data-driven landscape, high-dimensional datasets have emerged as a corner stone across diverse domains, fueled by advancements in technology like sensor networks, genomics, and social media platforms. This dissertation delves into the inherent opportunities and challenges posed by these datasets, emphasizing their potential for uncovering hidden patterns and correlations amidst their complexity. As high-dimensional datasets proliferate, researchers face significant challenges in effectively analyzing and interpreting them. This research focuses on leveraging Bayesian methods as a robust approach to address these challenges. Bayesian approaches offer unique advantages, particularly in handling small sample sizes and complex models. By providing robust uncertainty quantification and regularization techniques, Bayesian methods ensure reliable inference and model generalization, even in the face of sparse or noisy data. Furthermore, this work examines the strategic integration of structured information as a regularization technique. By exploiting patterns and dependencies within the data, structured regularization enhances the interpretability and resilience of statistical models across various domains. Whether the structure arises from spatial correlations, temporal dependencies, or coordinated actions among covariates, incorporating this information enriches the modeling process and improves the reliability of the results. By exploring these themes, this research contributes to advancing the understanding and application of high-dimensional data analysis. Through a thorough examination of Bayesian methods and structured regularization techniques, this dissertation aims to support researchers in effectively navigating and extracting meaningful insights from the complex landscape of high-dimensional datasets. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:42169 | en |
dc.identifier.uri | https://hdl.handle.net/10919/123788 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Gaussian Process | en |
dc.subject | High Dimensional Data | en |
dc.subject | Variable Selection | en |
dc.subject | Variational Inference | en |
dc.subject | Uncertainty Quantification | en |
dc.title | High-dimensional Multimodal Bayesian Learning | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Statistics | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |