Novel Statistical Methods for Multiple-variant Genetic Association Studies with Related Individuals


TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


Genetic association studies usually include related individuals. Meanwhile, high-throughput sequencing technologies produce data of multiple genetic variants. Due to linkage disequilibrium (LD) and familial relatedness, the genotype data from such studies often carries complex correlations. Moreover, missing values in genotype usually lead to loss of power in genetic association tests. Also, repeated measurements of phenotype and dynamic covariates from longitudinal studies bring in more opportunities but also challenges in the discovery of disease-related genetic factors. This dissertation focuses on developing novel statistical methods to address some challenging questions remaining in genetic association studies due to the aforementioned reasons.

So far, a lot of methods have been proposed to detect disease-related genetic regions (e.g., genes, pathways). However, with multiple-variant data from a sample with relatedness, it is critical to account for the complex genotypic correlations when assessing genetic contribution. Recognizing the limitations of existing methods, in the first work of this dissertation, the Adaptive-weight Burden Test (ABT) --- a score test between a quantitative trait and the genotype data with complex correlations --- is proposed. ABT achieves higher power by adopting data-driven weights, which make good use of the LD and relatedness. Because the null distribution has been successfully derived, the computational simplicity of ABT makes it a good fit for genome-wide association studies.

Genotype missingness commonly arises due to limitations in genotyping technologies. Imputation of the missing values in genotype usually improves quality of the data used in the subsequent association test and thus increases power. Complex correlations, though troublesome, provide the opportunity to proper handling of genotypic missingness. In the second part of this dissertation, a genotype imputation method is developed, which can impute the missingness in multiple genetic variants via the LD and the relatedness.

The popularity of longitudinal studies in genetics and genomics calls for methods deliberately designed for repeated measurements. Therefore, a multiple-variant genetic association test for a longitudinal trait on samples with relatedness is developed, which treats the longitudinal measurements as observations of functions and thus takes into account the time factor properly.



genetic association test, related Individuals, burden test, genotype imputation, longitudinal study