Bayesian variable selection for linear mixed models when p is much larger than n with applications in genome wide association studies

TR Number

Date

2023-06-05

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Genome-wide association studies (GWAS) seek to identify single nucleotide polymorphisms (SNP) causing phenotypic responses in individuals. Commonly, GWAS analyses are done by using single marker association testing (SMA) which investigates the effect of a single SNP at a time and selects a candidate set of SNPs using a strict multiple correction penalty. As SNPs are not independent but instead strongly correlated, SMA methods lead to such high false discovery rates (FDR) that the results are difficult to use by wet lab scientists. To address this, this dissertation proposes three different novel Bayesian methods: BICOSS, BGWAS, and IEB. From a Bayesian modeling point of view, SNP search can be seen as a variable selection problem in linear mixed models (LMMs) where p is much larger than n. To deal with the p>>n issue, our three proposed methods use novel Bayesian approaches based on two steps: a screening step and a model selection step. To control false discoveries, we link the screening and model selection steps through a common probability of a null SNP. To deal with model selection, we propose novel priors that are extensions for LMMs of nonlocal priors, Zellner-g prior, unit Information prior, and Zellner-Siow prior. For each method, extensive simulation studies and case studies show that these methods improve the recall of true causal SNPs and, more importantly, drastically decrease FDR. Because our Bayesian methods provide more focused and precise results, they may speed up discovery of important SNPs and significantly contribute to scientific progress in the areas of biology, agricultural productivity, and human health.

Description

Keywords

Bayesian methods, GWAS, Linear Mixed Models, Model Selection

Citation