Semiparametric Bayesian Kernel Survival Model for Highly Correlated High-Dimensional Data
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
We are living in an era in which many mysteries related to science, technologies and design can be answered by "learning" the huge amount of data accumulated over the past few decades. In the processes of those endeavors, highly-correlated high-dimensional data are frequently observed in many areas including predicting shelf life, controlling manufacturing processes, and identifying important pathways related with diseases. We define a "set" as a group of highly-correlated high-dimensional (HCHD) variables that possess a certain practical meaning or control a certain process, and define an "element" as one of the HCHD variables within a certain set. Such an elements-within-a-set structure is very complicated because: (i) the dimensions of elements in different sets can vary dramatically, ranging from two to hundreds or even thousands; (ii) the true relationships, include element-wise associations, set-wise interactions, and element-set interactions, are unknown; (iii) and the sample size (n) is usually much smaller than the dimension of the elements (p). The goal of this dissertation is to provide a systematic way to identify both the set effects and the element effects associated with survival outcomes from heterogeneous populations using Bayesian survival kernel models. By connecting kernel machines with semiparametric Bayesian hierarchical models, the proposed unified model frameworks can identify significant elements as well as sets regardless of mis-specifications of distributions or kernels. The proposed methods can potentially be applied to a vast range of fields to solve real-world problems.