Browsing by Author "Franck, Christopher T."
Now showing 1 - 20 of 31
Results Per Page
Sort Options
- Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profilesHighnam, Gareth; Franck, Christopher T.; Martin, Andy; Stephens, Calvin; Puthige, Ashwin; Mittelman, David (Oxford University Press, 2013-01)Repetitive sequences are biologically and clinically important because they can influence traits and disease, but repeats are challenging to analyse using short-read sequencing technology. We present a tool for genotyping microsatellite repeats called RepeatSeq, which uses Bayesian model selection guided by an empirically derived error model that incorporates sequence and read properties. Next, we apply RepeatSeq to high-coverage genomes from the 1000 Genomes Project to evaluate performance and accuracy. The software uses common formats, such as VCF, for compatibility with existing genome analysis pipelines. Source code and binaries are available at http://github.com/adaptivegenome/repeatseq.
- Adoption of High-Performance Housing Technologies Among U.S. Homebuilding Firms, 2000 Through 2010McCoy, Andrew P.; Koebel, C. Theodore; Sanderford, Andrew R.; Franck, Christopher T.; Keefe, Matthew J. (HUD, 2015)This article describes foundational processes of a larger project examining U.S. home builders’ choices to adopt innovative housing technologies that improve the environmental performance of new single-family homes. Home builders sit at a critical juncture in the housing creation decision chain and can influence how new housing units change related to energy consumption, and the units they produce can also reflect shifting technology, demography, and policy landscapes. With some exceptions, U.S. home builders have been characterized as being slow to adopt or resistant to the adoption of product and process innovations, largely because of path-dependent and risk-averse behavior. This article focuses on home builder choices by analyzing a summary of innovation adoption literature and that literature’s relationship to homebuilding. Researchers then describe analytical approaches for studying home builders’ choices and markets at a Core Based Statistical Area level, the data and statistical methodologies used in the study, and the policy implications for promoting energy efficiency in housing. Future work will draw on the foundation presented in this article to specify versions of this generic model and report results using improved quantitative analyses.
- Applying an Intrinsic Conditional Autoregressive Reference Prior for Areal DataPorter, Erica May (Virginia Tech, 2019-07-09)Bayesian hierarchical models are useful for modeling spatial data because they have flexibility to accommodate complicated dependencies that are common to spatial data. In particular, intrinsic conditional autoregressive (ICAR) models are commonly assigned as priors for spatial random effects in hierarchical models for areal data corresponding to spatial partitions of a region. However, selection of prior distributions for these spatial parameters presents a challenge to researchers. We present and describe ref.ICAR, an R package that implements an objective Bayes intrinsic conditional autoregressive prior on a vector of spatial random effects. This model provides an objective Bayesian approach for modeling spatially correlated areal data. ref.ICAR enables analysis of spatial areal data for a specified region, given user-provided data and information about the structure of the study region. The ref.ICAR package performs Markov Chain Monte Carlo (MCMC) sampling and outputs posterior medians, intervals, and trace plots for fixed effect and spatial parameters. Finally, the functions provide regional summaries, including medians and credible intervals for fitted values by subregion.
- Bayesian Model Selection for Spatial Data and Cost-constrained ApplicationsPorter, Erica May (Virginia Tech, 2023-07-03)Bayesian model selection is a useful tool for identifying an appropriate model class, dependence structure, and valuable predictors for a wide variety of applications. In this work we consider objective Bayesian model selection where no subjective information is available to inform priors on model parameters a priori, specifically in the case of hierarchical models for spatial data, which can have complex dependence structures. We develop an approach using trained priors via fractional Bayes factors where standard Bayesian model selection methods fail to produce valid probabilities under improper reference priors. This enables researchers to concurrently determine whether spatial dependence between observations is apparent and identify important predictors for modeling the response. In addition to model selection with objective priors on model parameters, we also consider the case where the priors on the model space are used to penalize individual predictors a priori based on their costs. We propose a flexible approach that introduces a tuning parameter to cost-penalizing model priors that allows researchers to control the level of cost penalization to meet budget constraints and accommodate increasing sample sizes.
- Bayesian Uncertainty Quantification while Leveraging Multiple Computer Model RunsWalsh, Stephen A. (Virginia Tech, 2023-06-22)In the face of spatially correlated data, Gaussian process regression is a very common modeling approach. Given observational data, kriging equations will provide the best linear unbiased predictor for the mean at unobserved locations. However, when a computer model provides a complete grid of forecasted values, kriging will not apply. To develop an approach to quantify uncertainty of computer model output in this setting, we leverage information from a collection of computer model runs (e.g., historical forecast and observation pairs for tropical cyclone precipitation totals) through a Bayesian hierarchical framework. This framework allows us to combine information and account for the spatial correlation within and across computer model output. Using maximum likelihood estimates and the corresponding Hessian matrices for Gaussian process parameters, these are input to a Gibbs sampler which provides posterior distributions for parameters of interest. These samples are used to generate predictions which provide uncertainty quantification for a given computer model run (e.g., tropical cyclone precipitation forecast). We then extend this framework using deep Gaussian processes to allow for nonstationary covariance structure, applied to multiple computer model runs from a cosmology application. We also perform sensitivity analyses to understand which parameter inputs most greatly impact cosmological computer model output.
- Contributions to Structured Variable Selection Towards Enhancing Model Interpretation and Computation EfficiencyShen, Sumin (Virginia Tech, 2020-02-07)The advances in data-collecting technologies provides great opportunities to access large sample-size data sets with high dimensionality. Variable selection is an important procedure to extract useful knowledge from such complex data. While in many real-data applications, appropriate selection of variables should facilitate the model interpretation and computation efficiency. It is thus important to incorporate domain knowledge of underlying data generation mechanism to select key variables for improving the model performance. However, general variable selection techniques, such as the best subset selection and the Lasso, often do not take the underlying data generation mechanism into considerations. This thesis proposal aims to develop statistical modeling methodologies with a focus on the structured variable selection towards better model interpretation and computation efficiency. Specifically, this thesis proposal consists of three parts: an additive heredity model with coefficients incorporating the multi-level data, a regularized dynamic generalized linear model with piecewise constant functional coefficients, and a structured variable selection method within the best subset selection framework. In Chapter 2, an additive heredity model is proposed for analyzing mixture-of-mixtures (MoM) experiments. The MoM experiment is different from the classical mixture experiment in that the mixture component in MoM experiments, known as the major component, is made up of sub-components, known as the minor components. The proposed model considers an additive structure to inherently connect the major components with the minor components. To enable a meaningful interpretation for the estimated model, we apply the hierarchical and heredity principles by using the nonnegative garrote technique for model selection. The performance of the additive heredity model was compared to several conventional methods in both unconstrained and constrained MoM experiments. The additive heredity model was then successfully applied in a real problem of optimizing the Pringlestextsuperscript{textregistered} potato crisp studied previously in the literature. In Chapter 3, we consider the dynamic effects of variables in the generalized linear model such as logistic regression. This work is motivated from the engineering problem with varying effects of process variables to product quality caused by equipment degradation. To address such challenge, we propose a penalized dynamic regression model which is flexible to estimate the dynamic coefficient structure. The proposed method considers modeling the functional coefficient parameter as piecewise constant functions. Specifically, under the penalized regression framework, the fused lasso penalty is adopted for detecting the changes in the dynamic coefficients. The group lasso penalty is applied to enable a sparse selection of variables. Moreover, an efficient parameter estimation algorithm is also developed based on alternating direction method of multipliers. The performance of the dynamic coefficient model is evaluated in numerical studies and three real-data examples. In Chapter 4, we develop a structured variable selection method within the best subset selection framework. In the literature, many techniques within the LASSO framework have been developed to address structured variable selection issues. However, less attention has been spent on structured best subset selection problems. In this work, we propose a sparse Ridge regression method to address structured variable selection issues. The key idea of the proposed method is to re-construct the regression matrix in the angle of experimental designs. We employ the estimation-maximization algorithm to formulate the best subset selection problem as an iterative linear integer optimization (LIO) problem. the mixed integer optimization algorithm as the selection step. We demonstrate the power of the proposed method in various structured variable selection problems. Moverover, the proposed method can be extended to the ridge penalized best subset selection problems. The performance of the proposed method is evaluated in numerical studies.
- Detection of Latent Heteroscedasticity and Group-Based Regression Effects in Linear Models via Bayesian Model SelectionMetzger, Thomas Anthony (Virginia Tech, 2019-08-22)Standard linear modeling approaches make potentially simplistic assumptions regarding the structure of categorical effects that may obfuscate more complex relationships governing data. For example, recent work focused on the two-way unreplicated layout has shown that hidden groupings among the levels of one categorical predictor frequently interact with the ungrouped factor. We extend the notion of a "latent grouping factor'' to linear models in general. The proposed work allows researchers to determine whether an apparent grouping of the levels of a categorical predictor reveals a plausible hidden structure given the observed data. Specifically, we offer Bayesian model selection-based approaches to reveal latent group-based heteroscedasticity, regression effects, and/or interactions. Failure to account for such structures can produce misleading conclusions. Since the presence of latent group structures is frequently unknown a priori to the researcher, we use fractional Bayes factor methods and mixture g-priors to overcome lack of prior information. We provide an R package, slgf, that implements our methodology in practice, and demonstrate its usage in practice.
- Dynamic Probability Control Limits for Risk-Adjusted Bernoulli Cumulative Sum ChartsZhang, Xiang (Virginia Tech, 2015-12-12)The risk-adjusted Bernoulli cumulative sum (CUSUM) chart developed by Steiner et al. (2000) is an increasingly popular tool for monitoring clinical and surgical performance. In practice, however, use of a fixed control limit for the chart leads to quite variable in-control average run length (ARL) performance for patient populations with different risk score distributions. To overcome this problem, the simulation-based dynamic probability control limits (DPCLs) patient-by-patient for the risk-adjusted Bernoulli CUSUM charts is determined in this study. By maintaining the probability of a false alarm at a constant level conditional on no false alarm for previous observations, the risk-adjusted CUSUM charts with DPCLs have consistent in-control performance at the desired level with approximately geometrically distributed run lengths. Simulation results demonstrate that the proposed method does not rely on any information or assumptions about the patients' risk distributions. The use of DPCLs for risk-adjusted Bernoulli CUSUM charts allows each chart to be designed for the corresponding particular sequence of patients for a surgeon or hospital. The effect of estimation error on performance of risk-adjusted Bernoulli CUSUM chart with DPCLs is also examined. Our simulation results show that the in-control performance of risk-adjusted Bernoulli CUSUM chart with DPCLs is affected by the estimation error. The most influential factors are the specified desired in-control average run length, the Phase I sample size and the overall adverse event rate. However, the effect of estimation error is uniformly smaller for the risk-adjusted Bernoulli CUSUM chart with DPCLs than for the corresponding chart with a constant control limit under various realistic scenarios. In addition, there is a substantial reduction in the standard deviation of the in-control run length when DPCLs are used. Therefore, use of DPCLs has yet another advantage when designing a risk-adjusted Bernoulli CUSUM chart. These researches are results of joint work with Dr. William H. Woodall (Department of Statistics, Virginia Tech). Moreover, DPCLs are adapted to design the risk-adjusted CUSUM charts for multiresponses developed by Tang et al. (2015). It is shown that the in-control performance of the charts with DPCLs can be controlled for different patient populations because these limits are determined for each specific sequence of patients. Thus, the risk-adjusted CUSUM chart for multiresponses with DPCLs is more practical and should be applied to effectively monitor surgical performance by hospitals and healthcare practitioners. This research is a result of joint work with Dr. William H. Woodall (Department of Statistics, Virginia Tech) and Mr. Justin Loda (Department of Statistics, Virginia Tech).
- Evidence of Executive Dysfunction in Co-occurring Substance Use Disorder and Major Depressive Disorder or Antisocial Personality DisorderMoody, Lara (Virginia Tech, 2014-09-12)Background and Aims: Executive dysfunction is pervasive in substance-dependent individuals (Verdejo-GarcÃa, Bechara, Recknor, & Perez-Garcia, 2006). As many as four-fifths of individuals in treatment for substance use disorders (SUDs) have co-existing lifetime psychopathology. Executive function deficits are tied to markers of decreased quality of life including increases in negative life events (Green, Kern, Braff, & Mintz, 2000), maladaptive social functioning (Kurtz, Moberg, Ragland, Gur, & Gur, 2005) and worsened treatment outcomes (Czuchry & Dansereau, 2003). Despite evidence of executive dysfunction across several mental disorders, few studies investigate how the co-occurrence of psychopathologies in SUDs impacts executive functioning. Methods: Here, we compare measures of executive function (i.e., the Iowa Gambling Test, Letter Number Sequencing Test, Stroop Test, Wisconsin Card Sorting Test, Continuous Performance Test, Towers Test, and Delay Discounting Test) in individuals with a) substance use disorder, b) substance use disorder and co-occurring major depressive disorder, c) substance use disorder and co-occurring antisocial personality disorder, d) substance use disorder and co-occurring major depressive disorder and antisocial personality disorder and e) no substance use disorder or co-occurring psychopathology. Results: Regression models of respective executive function measure outcomes as a function of education, income, age, and group membership indicated that the Delay Discounting Test and Continuous Performance Test were the only significant overall models (F(4, 313) = 12.699, p < 0.001 and F(4, 307) = 2.659, p = 0.033, respectively). Conclusions: Overall the Delay Discounting Test and Continuous Performance Test were the most sensitive to differences between substance use and psychopathology profiles assessed.
- An Exploratory Study Investigating the Time Duration of Slip-Induced Changes in GaitBeringer, Danielle Nicole (Virginia Tech, 2013-05-23)The biomechanics of slips are commonly studied in laboratory settings in an effort to improve the understanding of slip mechanisms for the advancement of slip and fall prevention strategies and risk assessment methods. Prior studies have shown changes in gait after slipping, and these changes can reduce the external validity of experimental results. As such, most researchers only slip participants one time. The ability to slip participants more than once, after allowing gait to return to a natural baseline, would improve the experimental efficiency of these studies. Therefore, the goal of this study was to determine the time duration of slip-induced changes in gait. The required coefficient of friction (RCOF), a parameter highly predictive of risk of slipping, was measured on thirty-one young male adults during level gait on three separate days before slipping, immediately (<10 minutes) after slipping, and either one, two, four, or six weeks later. On average, the RCOF decreased 12% from its baseline value (0.20) after slipping, indicating the adoption of a protective gait with a decreased risk of slipping. The RCOF data trended toward baseline values 4-6 weeks after the slip experience, but remained statistically different from baseline. This indicates that the slip-induced gait alterations have long-lasting effects, enduring up to six weeks after the slip experience.
- Exploring Construction Safety and Control Measures through Electrical FatalitiesZhao, Dong (Virginia Tech, 2015-01-09)Globally, construction is considered a hazardous industry with a disproportionate amount of fatal and non-fatal injuries as compared to other industries. Electrocution is named as one of the "fatal four" causes for construction injuries by the Occupational Safety and Health Administration (OSHA). In the United States, an average of 47.9% electrical fatalities occurred in the construction industry from 2003 to 2012, according to the U.S. Department of Labor. These fatalities include both electrical workers and non-electrical workers. Such a disproportionate rate suggests a need of research to improve construction safety and reduce injuries due to electrocution. However, there is a lack of understanding of causation mechanisms surrounding fatal accidents by electrocution using a systems approach; and there is a disconnection between the mechanism of fatal electrocution accidents and the associated control measures, which may lead to less effective prevention in construction. This dissertation has three objectives, including: (a) establishing a sociotechnical system model that reflects the electrocution occurrence in the U.S. construction industry and identify the associations among its internal subsystems; (b) determining specific electrocution patterns and associated mechanism constraints; and (c) examining hierarchy of control (HOC) measures and determining their appropriateness. Findings from his research include: (a) the identification of three system patterns of electrocution in construction work systems and the associations between personnel, technological, organizational/managerial subsystems, and the internal and external environment for each of the three patterns, using a macroergonomics framework; (b) the identification of five features of work, and map out their decision-making chains, critical decision-making points and constraints, as an interpretation of electrocution mechanisms in the workplace; and (c) revealing that behavioral controls remain prevalent in electrical hazard mitigation even though the knowledge of construction safety and health has increased in the past decades, and that the effectiveness of controls is not statistically different by construction type nor occupation. Based on these findings, the research also suggests corresponding mitigation recommendations that construction managers shall strictly follow HOC rules by giving priority to higher level of controls and upgrading the industry's prevention strategy by introducing more technological innovations and encouraging prevention through design (PtD) strategies.
- Identification and Modification of Risk Factors Contributing to Slip- and Trip-Induced FallsAllin, Leigh Jouett (Virginia Tech, 2020-01-20)Slips, trips, and falls are a serious public health concern, particularly among older adults and within occupational settings, given that falls contribute to a large number of injuries and associate with high medical costs. To reduce the number of falls, there is a need to better understand risk factors contributing to falls, and to develop and evaluate improved balance training interventions to prevent falls. To address these needs, this work has two primary goals: first, to better understand risk factors contributing to falls, including fatigue and balance reactions after a large postural perturbation, and, second, to develop and evaluate improved reactive balance training (RBT) interventions to reduce risk of falls due to slipping and tripping. The first study investigated the effects of performing occupationally-relevant fatigue-inducing physical work on trip and fall risk. Healthy young adults performed a simulated manual material handling (MMH) task, using either heavy or light boxes, for two hours. Gait measures related to risk of tripping and slipping were assessed before and after the task. Reactive balance during one laboratory-induced trip was also assessed after the task. Results showed that performing the heavy MMH task did not affect risk of tripping or slipping, or reactive balance after tripping. These results may have resulted from insufficient fatigue due to the MMH task. The second study investigated the relationship between feet kinematics upon slipping while walking, and the outcome of the slip. Seventy-one laboratory-induced slips were analyzed, which included recoveries, feet-split falls, feet-forward falls, and lateral falls. Feet kinematics differed between these four slip outcomes, and a discriminant model including six measures of feet kinematics correctly predicted 87% of slip outcomes. Two potentially modifiable characteristics of feet kinematics upon slipping that can improve the likelihood of successfully averting a fall were identified: (1) quickly arresting the motion of the slipping foot; and (2) a recovery step that places the trailing toe approximately 0-10% body height anterior to the sacrum. This information may be used to guide the development of improved RBT interventions to reduce risk of slip-induced falls. The third study evaluated the efficacy of two low-cost, low-tech RBT methods for improving reactive balance after slipping. The two methods were: unexpected slip training (UST), which involved repeated unexpected slips while walking and volitional slip-recovery training (VST), which involved practicing balance reactions after volitionally inducing a slip-like perturbation. Young adults completed one session of an assigned intervention (UST, VST, or control), followed by one unexpected, laboratory-induced slip while walking. Compared to controls, UST and VST resulted in a higher proportion of successful balance recoveries from the laboratory-induced slips. UST improved both proactive control and reactive stepping after slipping, while VST primarily improved the ability to arrest slipping foot motion. These results support the use of UST and VST as practical, low-tech methods of slip training. The fourth study evaluated the efficacy of RBT that targets both slipping and tripping. Community-dwelling, healthy older adults (61-75 years) completed four sessions of either RBT (treadmill-based trip-recovery training and VST) or control training (general strength and balance exercises). Reactive balance during unexpected laboratory-induced slips and trips was assessed before and after RBT, and compared between subjects at baseline (before the intervention), after control training, and after RBT. The incidence of slip-induced falls differed between groups in that 80% fell at baseline, 60% fell after control training, and 18% fell after RBT. Post-RBT subjects also exhibited less severe slips, compared to baseline and post-control subjects. The incidence of trip-induced falls did not differ between groups, but margin of stability after tripping was greater for post-RBT subjects, compared to post-control subjects. These results show promise for the use of RBT applied to both slipping and tripping to reduce fall risk among older adults.
- The Impact of Energy Efficient Design and Construction on LIHTC Housing in VirginiaMcCoy, Andrew P. (Housing Virginia, 2015)The purpose of this report is to identify and verify possible benefits of the shift in housing policy by the Virginia Housing development Authority (VHDA) to encourage energy efficiency (EE) in the affordable rental stock in Virginia through the LIHTC program. The research addresses key issues related to Energy Efficiency and affordable housing through a rigorous measurement of economic impacts for low-income residents, distinguishing the effects of design, construction, technologies and behavior per unit. In addition, the research addresses how the policy to use EE might impact developers and owners in terms of property capital and operating costs. Data, analysis and findings focus specifically on facilities constructed to the EarthCraft MultiFamily standard in Virginia, one of the only datasets currently available that allows for this type of inquiry.
- An Interdisciplinary Approach: Computational Sequence Motif Search and Prediction of Protein Function with Experimental ValidationChoi, Hyunjin (Virginia Tech, 2013-10-29)Pathogens colonize their hosts by releasing molecules that can enter host cells. A biotrophic oomycete plant pathogen, Phytophthora sojae harbors a superfamily of effector genes whose protein products enter the cells of the host, soybean. Many of the effectors contain an RXLR-dEER motif in their N-terminus. More than 400 members belonging to this family have been previously identified using a Hidden Markov Model. Amino acids flanking the RXLR motif have been utilized to identify effector proteins from the P. sojae secretome, despite the high level of sequence divergence among the members of this protein family. I present here machine learning methods to identify protein candidates that belong to a particular class, such as the effector superfamily. Converting the flanking amino acid sequences of RXLR motifs (or other candidate motifs) into numeric values that reflect their physical properties enabled the protein sequences to be analyzed through these methods. The methods evaluated include Support Vector Machines and a related spherical classification method that I have developed. I also approached the effector prediction problem by building functional linkage networks and have produced lists of predicted P. sojae effector proteins. I tested the best candidate through gene gun bombardment assays using the beta-glucuronidase reporter system, which revealed that there is a high likelihood that the candidate can enter the soybean cells.
- Measurement and modeling of transcriptional noise in the cell cycle regulatory networkBall, David A.; Adames, Neil R.; Reischmann, Nadine; Barik, Debashis; Franck, Christopher T.; Tyson, John J.; Peccoud, Jean (Landes Bioscience, 2013-10-01)Fifty years of genetic and molecular experiments have revealed a wealth of molecular interactions involved in the control of cell division. In light of the complexity of this control system, mathematical modeling has proved useful in analyzing biochemical hypotheses that can be tested experimentally. Stochastic modeling has been especially useful in understanding the intrinsic variability of cell cycle events, but stochastic modeling has been hampered by a lack of reliable data on the absolute numbers of mRNA molecules per cell for cell cycle control genes. To fill this void, we used fluorescence in situ hybridization (FISH) to collect single molecule mRNA data for 16 cell cycle regulators in budding yeast, Saccharomyces cerevisiae. From statistical distributions of single-cell mRNA counts, we are able to extract the periodicity, timing, and magnitude of transcript abundance during the cell cycle. We used these parameters to improve a stochastic model of the cell cycle to better reflect the variability of molecular and phenotypic data on cell cycle progression in budding yeast.
- Novel Prognostic Markers and Therapeutic Targets for GlioblastomaVarghese, Robin (Virginia Tech, 2016-06-23)Glioblastoma is the most common and lethal malignant brain tumor with a survival rate of 14.6 months and a tumor recurrence rate of ninety percent. Two key causes for glioblastomas grim outcome derive from the lack of applicable prognostic markers and effective therapeutic targets. By employing a loss of function RNAi screen in glioblastoma cells we found a list of 20 kinases that can be considered glioblastoma survival kinases. These survival kinases which we term as survival kinase genes, (SKGs) were investigated to find prognostic markers as well as therapeutic targets for glioblastoma. Analyzing these survival kinases in The Cancer Genome Atlas patient database, we found that CDCP1, CDKL5, CSNK1𝜀, IRAK3, LATS2, PRKAA1, STK3, TBRG4, and ULK4 genes could be used as prognostic markers for glioblastoma with or without temozolomide chemotherapeutic treatment as a covariate. For the first time, we found that patients with increased levels of NEK9 and PIK3CB mRNA expression had a higher probability of recurrent tumors. We also discovered that expression of CDCP1, IGF2R, IRAK3, LATS2, PIK3CB, ULK4, or VRK1 in primary glioblastoma tumors was associated with tumor recurrence prognosis. To note, of these recurrent prognostic candidates, PIK3CB expression in recurrent tumor tissue had much higher expression compared to primary tissue. Further investigation in the PI3K pathway showed a strong correlation with recurrence rate, days to recurrence and survival emphasizing the role of PIK3CB in tumor recurrence in glioblastoma. In efforts to find effective therapeutic targets for glioblastoma we used SKGs as potential candidates. We chose the serine/threonine kinase, Casein Kinase 1 Epsilon (CSNK1𝜀) as a target for glioblastoma because multiple shRNAs targeted this gene in our loss of function screen and multiple commercially available inhibitors of this gene are available. Casein kinase 1 epsilon protein and mRNA expression were investigated using computational tools. It was revealed that CSNK1𝜀 expression has higher expression in glioblastoma than normal tissue. To further examine this gene we knocked down (KD) or inhibited CSNK1𝜀 in glioblastoma cells lines and noticed a significant increase in cell death without any significant effect on normal cell lines. KD and inhibition of CSNK1𝜀 in cancer stem cells, a culprit of tumor recurrence, also revealed limited self-renewal and proliferation in cancer stem cells and a significant decrease in cell survival without affecting normal stem cells. Further analysis of downstream effects of CSNK1𝜀 knockdown and inhibition indicate a significant increase in the protein expression of β-catenin (CTNNB1). We found that CSNK1𝜀 KD activated β-catenin, which increased GBM cell death, but can be rescued using CTNNB1 shRNA. Our survival kinase screen, computational analyses, patient database analyses and experimental methods contributed to the discovery of novel prognostic markers and therapeutic targets for glioblastoma.
- Novel Statistical Methods for Multiple-variant Genetic Association Studies with Related IndividualsGuan, Ting (Virginia Tech, 2018-07-09)Genetic association studies usually include related individuals. Meanwhile, high-throughput sequencing technologies produce data of multiple genetic variants. Due to linkage disequilibrium (LD) and familial relatedness, the genotype data from such studies often carries complex correlations. Moreover, missing values in genotype usually lead to loss of power in genetic association tests. Also, repeated measurements of phenotype and dynamic covariates from longitudinal studies bring in more opportunities but also challenges in the discovery of disease-related genetic factors. This dissertation focuses on developing novel statistical methods to address some challenging questions remaining in genetic association studies due to the aforementioned reasons. So far, a lot of methods have been proposed to detect disease-related genetic regions (e.g., genes, pathways). However, with multiple-variant data from a sample with relatedness, it is critical to account for the complex genotypic correlations when assessing genetic contribution. Recognizing the limitations of existing methods, in the first work of this dissertation, the Adaptive-weight Burden Test (ABT) --- a score test between a quantitative trait and the genotype data with complex correlations --- is proposed. ABT achieves higher power by adopting data-driven weights, which make good use of the LD and relatedness. Because the null distribution has been successfully derived, the computational simplicity of ABT makes it a good fit for genome-wide association studies. Genotype missingness commonly arises due to limitations in genotyping technologies. Imputation of the missing values in genotype usually improves quality of the data used in the subsequent association test and thus increases power. Complex correlations, though troublesome, provide the opportunity to proper handling of genotypic missingness. In the second part of this dissertation, a genotype imputation method is developed, which can impute the missingness in multiple genetic variants via the LD and the relatedness. The popularity of longitudinal studies in genetics and genomics calls for methods deliberately designed for repeated measurements. Therefore, a multiple-variant genetic association test for a longitudinal trait on samples with relatedness is developed, which treats the longitudinal measurements as observations of functions and thus takes into account the time factor properly.
- Objective Bayesian Analysis for Gaussian Hierarchical Models with Intrinsic Conditional Autoregressive PriorsKeefe, Matthew J.; Ferreira, Marco A. R.; Franck, Christopher T. (2019-03)Bayesian hierarchical models are commonly used for modeling spatially correlated areal data. However, choosing appropriate prior distributions for the parameters in these models is necessary and sometimes challenging. In particular, an intrinsic conditional autoregressive (CAR) hierarchical component is often used to account for spatial association. Vague proper prior distributions have frequently been used for this type of model, but this requires the careful selection of suitable hyperparameters. In this paper, we derive several objective priors for the Gaussian hierarchical model with an intrinsic CAR component and discuss their properties. We show that the independence Jeffreys and Jeffreys-rule priors result in improper posterior distributions, while the reference prior results in a proper posterior distribution. We present results from a simulation study that compares frequentist properties of Bayesian procedures that use several competing priors, including the derived reference prior. We demonstrate that using the reference prior results in favorable coverage, interval length, and mean squared error. Finally, we illustrate our methodology with an application to 2012 housing foreclosure rates in the 88 counties of Ohio.
- Optimizing analysis pipelines for improved variant discoveryHighnam, Gareth Wei An (Virginia Tech, 2014-04-17)In modern genomics, all experiments begin data collection with sequencing and downstream alignment or assembly processing. As such, the development of reliable sequencing pipelines is hugely important as a foundation for any future analysis on that data. While much existing work has been done on enhancing the throughput and computational performance of such pipelines, there is still the question of accuracy. The rift in knowledge between speed and accuracy can be attributed to the more conceptually complex nature of what constitutes the measurement of accuracy. Unlike simply parsing logs of memory usage and CPU hours, accuracy requires experimental validation. Subsets of accuracy are also created when assessing alignment or variations around particular genomic features such as indels, Copy Number Variants (CNVs), or microsatellite repeats. Here is the development of accuracy measurements in read alignment and variation calls, allowing the optimization of sequencing pipelines at all stages. The underlying hypothesis, then, is that different sequencing platforms and analysis software can be distinguished from each other in accuracy by both sample and genomic variation of interest. As the term accuracy suggests, the measurements of alignment and variation recall require comparison against a truth set, for which read library simulations and high quality data from the Genome in a Bottle Consortium or Illumina Omni array have served us. In exploring the hypothesis, the measurements are built into a community resource to crowdsource the creation of a benchmarking repository for pipeline comparison. Results from pipelines promoted by this computational model are then wet lab validated with support for a hierarchy of pipeline performance. Particularly, the construction of an accurate pipeline for genotyping microsatellite repeats will be investigated, which is then used to create a database of human microsatellites. Progress in this area is vital for the growth of sequencing in both clinical and research settings. For genomics research to fully translate to the bedside, the boom of new technology must be controlled by rational metrics and industry standardization. This project will address both of these issues, as well as contribute to the understanding of human microsatellite variation.
- Quantifying the Effects of a Constricted Temporal Window in Reinforcer PathologyMellis, Alexandra Michelle (Virginia Tech, 2019-03-18)Health behaviors, positive and negative, can support or reduce risk for multiple chronic diseases, such as substance use disorder and obesity. These diseases are marked by overconsuming commodities that offer predictable short-term benefits, and neglecting other behaviors with variable long-term benefits (e.g., fast food is enjoyable in the moment; exercise may have delayed benefits, but moment-to-moment may not be as reinforcing as fast food). An individual's valuation of these fast food or exercise may depend on how far out into the future these benefits are considered, their temporal window. The first study shows that the temporal window is constricted among high-risk substance users than people who do not have substance problems, especially when considering higher-value choices. The second study shows that the temporal window can change depending on the environment. Specifically, engaging with stories of job loss can constrict the temporal window. The third study shows that engaging with job loss can specifically constrict the temporal window and increase the value of fast food among obese individuals. The final study shows that a similar hardship scenario, natural disasters, can constrict the temporal window, increase demand for alcohol and cigarettes, and decrease the valuation of more temporally extended reinforcers (e.g., employment, savings, and seatbelt wearing) among smoking drinkers. Together, these studies support a model, reinforcer pathology; wherein the temporal window, which can differ both between individuals and environments, drives valuation of reinforcers that impact health.