Semiparametric Bayesian Kernel Survival Model for Highly Correlated High-Dimensional Data

dc.contributor.authorZhang, Linen
dc.contributor.committeechairKim, Inyoungen
dc.contributor.committeememberHouse, Leanna L.en
dc.contributor.committeememberHong, Yilien
dc.contributor.committeememberDu, Pangen
dc.contributor.departmentStatisticsen
dc.date.accessioned2019-10-24T06:00:21Zen
dc.date.available2019-10-24T06:00:21Zen
dc.date.issued2018-05-01en
dc.description.abstractWe are living in an era in which many mysteries related to science, technologies and design can be answered by "learning" the huge amount of data accumulated over the past few decades. In the processes of those endeavors, highly-correlated high-dimensional data are frequently observed in many areas including predicting shelf life, controlling manufacturing processes, and identifying important pathways related with diseases. We define a "set" as a group of highly-correlated high-dimensional (HCHD) variables that possess a certain practical meaning or control a certain process, and define an "element" as one of the HCHD variables within a certain set. Such an elements-within-a-set structure is very complicated because: (i) the dimensions of elements in different sets can vary dramatically, ranging from two to hundreds or even thousands; (ii) the true relationships, include element-wise associations, set-wise interactions, and element-set interactions, are unknown; (iii) and the sample size (n) is usually much smaller than the dimension of the elements (p). The goal of this dissertation is to provide a systematic way to identify both the set effects and the element effects associated with survival outcomes from heterogeneous populations using Bayesian survival kernel models. By connecting kernel machines with semiparametric Bayesian hierarchical models, the proposed unified model frameworks can identify significant elements as well as sets regardless of mis-specifications of distributions or kernels. The proposed methods can potentially be applied to a vast range of fields to solve real-world problems.en
dc.description.abstractgeneralWe are living in an era in which many mysteries related to science, technologies and design can be answered by “learning” the huge amount of data accumulated over the past few decades. In the processes of those endeavors, highly-correlated high-dimensional data are frequently observed in many areas including predicting shelf life, controlling manufacturing processes, and identifying important pathways related with diseases. For example, for a group of 30 patients in a medical study, values for an immense number of variables like gender, age, height, weight, and blood pressure of each patient are recorded. High-dimensional means the number of variables (i.e. p) could be very large (e.g. p > 500), while the number of subjects or the sample size (i.e. n) is small (n = 30). We define a “set” as a group of highly-correlated high-dimensional (HCHD) variables that possess a certain practical meaning or control a certain process, and define an “element” as one of the HCHD variables within a certain set. Such an elements-within-a-set structure is very complicated because: (i) the dimensions of elements in different sets can vary dramatically, ranging from two to hundreds or even thousands; (ii) the true relationships, include element-wise associations, set-wise interactions, and element-set interactions, are unknown; (iii) and the sample size (n) is usually much smaller than the dimension of the elements (p). The goal of this dissertation is to provide a systematic way to identify both the set effects and the element effects associated with survival outcomes from heterogeneous populations using different proposed statistical models. The proposed models can incorporate prior knowledge to boost the model performance. The proposed methods can potentially be applied to a vast range of fields to solve real-world problems.en
dc.description.degreePHDen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:15247en
dc.identifier.urihttp://hdl.handle.net/10919/95040en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectGaussian Processen
dc.subjectKernel Machineen
dc.subjectMixture Modelen
dc.subjectPathway-Based Analysisen
dc.subjectSemiparametric Bayesian Hierarchical Survival Modelen
dc.titleSemiparametric Bayesian Kernel Survival Model for Highly Correlated High-Dimensional Dataen
dc.typeDissertationen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePHDen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_L_D_2018.pdf
Size:
6.79 MB
Format:
Adobe Portable Document Format