Bayesian variable selection for linear mixed models when p is much larger than n with applications in genome wide association studies

dc.contributor.authorWilliams, Jacob Robert Michaelen
dc.contributor.committeechairFerreira, Marco A. R.en
dc.contributor.committeememberFranck, Christopher Thomasen
dc.contributor.committeememberTegge, Allisonen
dc.contributor.committeememberKim, Inyoungen
dc.contributor.departmentStatisticsen
dc.date.accessioned2023-06-06T08:01:59Zen
dc.date.available2023-06-06T08:01:59Zen
dc.date.issued2023-06-05en
dc.description.abstractGenome-wide association studies (GWAS) seek to identify single nucleotide polymorphisms (SNP) causing phenotypic responses in individuals. Commonly, GWAS analyses are done by using single marker association testing (SMA) which investigates the effect of a single SNP at a time and selects a candidate set of SNPs using a strict multiple correction penalty. As SNPs are not independent but instead strongly correlated, SMA methods lead to such high false discovery rates (FDR) that the results are difficult to use by wet lab scientists. To address this, this dissertation proposes three different novel Bayesian methods: BICOSS, BGWAS, and IEB. From a Bayesian modeling point of view, SNP search can be seen as a variable selection problem in linear mixed models (LMMs) where $p$ is much larger than $n$. To deal with the $p>>n$ issue, our three proposed methods use novel Bayesian approaches based on two steps: a screening step and a model selection step. To control false discoveries, we link the screening and model selection steps through a common probability of a null SNP. To deal with model selection, we propose novel priors that are extensions for LMMs of nonlocal priors, Zellner-g prior, unit Information prior, and Zellner-Siow prior. For each method, extensive simulation studies and case studies show that these methods improve the recall of true causal SNPs and, more importantly, drastically decrease FDR. Because our Bayesian methods provide more focused and precise results, they may speed up discovery of important SNPs and significantly contribute to scientific progress in the areas of biology, agricultural productivity, and human health.en
dc.description.abstractgeneralGenome-wide association studies (GWAS) seek to identify locations in DNA known as single nucleotide polymorphisms (SNPs) that are the underlying cause of observable traits such as height or breast cancer. Commonly, GWAS analyses are performed by investigating each SNP individually and seeing which SNPs are highly correlated with the response. However, as the SNPs themselves are highly correlated, investigating each one individually leads to a high number of false positives. To address this, this dissertation proposes three different advanced statistical methods: BICOSS, BGWAS, and IEB. Through extensive simulations, our methods are shown to not only drastically reduce the number of falsely detected SNPs but also increase the detection rate of true causal SNPs. Because our novel methods provide more focused and precise results, they may speed up discovery of important SNPs and significantly contribute to scientific progress in the areas of biology, agricultural productivity, and human health.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:38100en
dc.identifier.urihttp://hdl.handle.net/10919/115344en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectBayesian methodsen
dc.subjectGWASen
dc.subjectLinear Mixed Modelsen
dc.subjectModel Selectionen
dc.titleBayesian variable selection for linear mixed models when p is much larger than n with applications in genome wide association studiesen
dc.typeDissertationen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Williams_JR_D_2023.pdf
Size:
1.47 MB
Format:
Adobe Portable Document Format