Department of Statistics
Permanent URI for this community
Browse
Browsing Department of Statistics by Department "Biological Sciences"
Now showing 1 - 17 of 17
Results Per Page
Sort Options
- Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profilesHighnam, Gareth; Franck, Christopher T.; Martin, Andy; Stephens, Calvin; Puthige, Ashwin; Mittelman, David (Oxford University Press, 2013-01)Repetitive sequences are biologically and clinically important because they can influence traits and disease, but repeats are challenging to analyse using short-read sequencing technology. We present a tool for genotyping microsatellite repeats called RepeatSeq, which uses Bayesian model selection guided by an empirically derived error model that incorporates sequence and read properties. Next, we apply RepeatSeq to high-coverage genomes from the 1000 Genomes Project to evaluate performance and accuracy. The software uses common formats, such as VCF, for compatibility with existing genome analysis pipelines. Source code and binaries are available at http://github.com/adaptivegenome/repeatseq.
- Cell Cycle Model System for Advancing Cancer Biomarker ResearchLazar, Iuliana M.; Hoeschele, Ina; de Morais, Juliana; Tenga, Milagros J. (Springer Nature, 2017-12-21)Progress in understanding the complexity of a devastating disease such as cancer has underscored the need for developing comprehensive panels of molecular markers for early disease detection and precision medicine applications. The present study was conducted to assess whether a cohesive biological context can be assigned to protein markers derived from public data mining, and whether mass spectrometry can be utilized to screen for the co-expression of functionally related biomarkers to be recommended for further exploration in clinical context. Cell cycle arrest/release experiments of MCF7/SKBR3 breast cancer and MCF10 non-tumorigenic cells were used as a surrogate to support the production of proteins relevant to aberrant cell proliferation. Information downloaded from the scientific public domain was queried with bioinformatics tools to generate an initial list of 1038 cancer-associated proteins. Mass spectrometric analysis of cell extracts identified 352 proteins that could be matched to the public list. Differential expression, enrichment, and protein-protein interaction analysis of the proteomic data revealed several functionally-related clusters of relevance to cancer. The results demonstrate that public data derived from independent experiments can be used to inform biological research and support the development of molecular assays for probing the characteristics of a disease.
- Development and implementation of a scalable and versatile test for COVID-19 diagnostics in rural communitiesCeci, Alessandro; Muñoz-Ballester, Carmen; Tegge, Allison N.; Brown, Katherine L.; Umans, Robyn A.; Michel, F. Marc; Patel, Dipankumar; Tewari, Bhanu P.; Martin, James E.; Alcoreza, Oscar Jr.; Maynard, Thomas M.; Martinez-Martinez, Daniel; Bordwine, Paige; Bissell, Noelle; Friedlander, Michael J.; Sontheimer, Harald; Finkielstein, Carla V. (Nature Publishing Group, 2021-07-20)Rapid and widespread testing of severe acute respiratory coronavirus 2 (SARS-CoV-2) is essential for an effective public health response aimed at containing and mitigating the coronavirus disease 2019 (COVID-19) pandemic. Successful health policy implementation relies on early identification of infected individuals and extensive contact tracing. However, rural communities, where resources for testing are sparse or simply absent, face distinctive challenges to achieving this success. Accordingly, we report the development of an academic, public land grant University laboratory-based detection assay for the identification of SARS-CoV-2 in samples from various clinical specimens that can be readily deployed in areas where access to testing is limited. The test, which is a quantitative reverse transcription polymerase chain reaction (RT-qPCR)-based procedure, was validated on samples provided by the state laboratory and submitted for FDA Emergency Use Authorization. Our test exhibits comparable sensitivity and exceeds specificity and inclusivity values compared to other molecular assays. Additionally, this test can be re-configured to meet supply chain shortages, modified for scale up demands, and is amenable to several clinical specimens. Test development also involved 3D engineering critical supplies and formulating a stable collection media that allowed samples to be transported for hours over a dispersed rural region without the need for a cold-chain. These two elements that were critical when shortages impacted testing and when personnel needed to reach areas that were geographically isolated from the testing center. Overall, using a robust, easy-to-adapt methodology, we show that an academic laboratory can supplement COVID-19 testing needs and help local health departments assess and manage outbreaks. This additional testing capacity is particularly germane for smaller cities and rural regions that would otherwise be unable to meet the testing demand.
- Divergent age-dependent peripheral immune transcriptomic profile following traumatic brain injuryHazy, Amanda; Bochicchio, Lauren; Oliver, Andrea; Xie, Eric; Geng, Shuo; Brickler, Thomas; Xie, Hehuang David; Li, Liwu; Allen, Irving C.; Theus, Michelle H. (Springer Nature, 2019-06-12)The peripheral immune system is a major regulator of the pathophysiology associated with traumatic brain injury (TBI). While age-at-injury influences recovery from TBI, the differential effects on the peripheral immune response remain unknown. Here, we investigated the effects of TBI on gene expression changes in murine whole blood using RNAseq analysis, gene ontology and network topology-based key driver analysis. Genome-wide comparison of CCI-injured peripheral whole blood showed a significant increase in genes involved in proteolysis and oxidative-reduction processes in juvenile compared to adult. Conversely, a greater number of genes, involved in migration, cytokine-mediated signaling and adhesion, were found reduced in CCI-injured juvenile compared to CCI-injured adult immune cells. Key driver analysis also identified G-protein coupled and novel pattern recognition receptor (PRR), P2RY10, as a central regulator of these genes. Lastly, we found Dectin-1, a c-type lectin PRR to be reduced at the protein level in both naive neutrophils and on infiltrating immune cells in the CCI-injured juvenile cortex. These findings demonstrate a distinct peripheral inflammatory profile in juvenile mice, which may impact the injury and repair response to brain trauma.
- Identifying Transcriptional Regulatory Modules Among Different Chromatin States in Mouse Neural Stem CellsBanerjee, Sharmi; Zhu, Hongxiao; Tang, Man; Feng, Wu-chun; Wu, Xiaowei; Xie, Hehuang David (Frontiers, 2019-01-15)Gene expression regulation is a complex process involving the interplay between transcription factors and chromatin states. Significant progress has been made toward understanding the impact of chromatin states on gene expression. Nevertheless, the mechanism of transcription factors binding combinatorially in different chromatin states to enable selective regulation of gene expression remains an interesting research area. We introduce a nonparametric Bayesian clustering method for inhomogeneous Poisson processes to detect heterogeneous binding patterns of multiple proteins including transcription factors to form regulatory modules in different chromatin states. We applied this approach on ChIP-seq data for mouse neural stem cells containing 21 proteins and observed different groups or modules of proteins clustered within different chromatin states. These chromatin-state-specific regulatory modules were found to have significant influence on gene expression. We also observed different motif preferences for certain TFs between different chromatin states. Our results reveal a degree of interdependency between chromatin states and combinatorial binding of proteins in the complex transcriptional regulatory process. The software package is available on Github at - https://github.com/BSharmi/DPM-LGCP.
- Individual Variability of Nosema ceranae Infections in Apis mellifera ColoniesMulholland, Grace E.; Traver, Brenna E.; Johnson, Nels G.; Fell, Richard D. (MDPI, 2012-11-01)Since 2006, beekeepers have reported increased losses of Apis mellifera colonies, and one factor that has been potentially implicated in these losses is the microsporidian Nosema ceranae. Since N. ceranae is a fairly recently discovered parasite, there is little knowledge of the variation in infection levels among individual workers within a colony. In this study we examined the levels of infection in individual bees from five colonies over three seasons using both spore counting and quantitative real-time PCR. The results show considerable intra-colony variation in infection intensity among individual workers with a higher percentage of low-level infections detected by PCR than by spore counting. Colonies generally had the highest percentage of infected bees in early summer (June) and the lowest levels in the fall (September). Nosema apis was detected in only 16/705 bees (2.3%) and always as a low-level co-infection with N. ceranae. The results also indicate that intra-colony variation in infection levels could influence the accuracy of Nosema diagnosis.
- Integrative single-cell omics analyses reveal epigenetic heterogeneity in mouse embryonic stem cellsLuo, Yanting; He, Jianlin; Xu, Xiguang; Sun, Ming-an; Wu, Xiaowei; Lu, Xuemei; Xie, Hehuang David (PLOS, 2018-03)Embryonic stem cells (ESCs) consist of a population of self-renewing cells displaying extensive phenotypic and functional heterogeneity. Research towards the understanding of the epigenetic mechanisms underlying the heterogeneity among ESCs is still in its initial stage. Key issues, such as how to identify cell-subset specifically methylated loci and how to interpret the biological meanings of methylation variations remain largely unexplored. To fill in the research gap, we implemented a computational pipeline to analyze single-cell methylome and to perform an integrative analysis with single-cell transcriptome data. According to the origins of variation in DNA methylation, we determined the genomic loci associated with allelic-specific methylation or asymmetric DNA methylation, and explored a beta mixture model to infer the genomic loci exhibiting cell-subset specific methylation (CSM). We observed that the putative CSM loci in ESCs are significantly enriched in CpG island (CGI) shelves and regions with histone marks for promoter and enhancer, and the genes hosting putative CSM loci show wide-ranging expression among ESCs. More interestingly, the putative CSM loci may be clustered into co-methylated modules enriching the binding motifs of distinct sets of transcription factors. Taken together, our study provided a novel tool to explore single-cell methylome and transcriptome to reveal the underlying transcriptional regulatory networks associated with epigenetic heterogeneity of ESCs.
- Linked within-host and between-host models and data for infectious diseases: a systematic reviewChilds, Lauren M.; El Moustaid, Fadoua; Gajewski, Zachary J.; Kadelka, Sarah; Nikin-Beers, Ryan; Smith, John W. Jr.; Walker, Melody; Johnson, Leah R. (PeerJ, 2019-06-19)The observed dynamics of infectious diseases are driven by processes across multiple scales. Here we focus on two: within-host, that is, how an infection progresses inside a single individual (for instance viral and immune dynamics), and between-host, that is, how the infection is transmitted between multiple individuals of a host population. The dynamics of each of these may be influenced by the other, particularly across evolutionary time. Thus understanding each of these scales, and the links between them, is necessary for a holistic understanding of the spread of infectious diseases. One approach to combining these scales is through mathematical modeling. We conducted a systematic review of the published literature on multi-scale mathematical models of disease transmission (as defined by combining within-host and between-host scales) to determine the extent to which mathematical models are being used to understand across-scale transmission, and the extent to which these models are being confronted with data. Following the PRISMA guidelines for systematic reviews, we identified 24 of 197 qualifying papers across 30 years that include both linked models at the within and between host scales and that used data to parameterize/calibrate models. We find that the approach that incorporates both modeling with data is under-utilized, if increasing. This highlights the need for better communication and collaboration between modelers and empiricists to build well-calibrated models that both improve understanding and may be used for prediction.
- Measurement and modeling of transcriptional noise in the cell cycle regulatory networkBall, David A.; Adames, Neil R.; Reischmann, Nadine; Barik, Debashis; Franck, Christopher T.; Tyson, John J.; Peccoud, Jean (Landes Bioscience, 2013-10-01)Fifty years of genetic and molecular experiments have revealed a wealth of molecular interactions involved in the control of cell division. In light of the complexity of this control system, mathematical modeling has proved useful in analyzing biochemical hypotheses that can be tested experimentally. Stochastic modeling has been especially useful in understanding the intrinsic variability of cell cycle events, but stochastic modeling has been hampered by a lack of reliable data on the absolute numbers of mRNA molecules per cell for cell cycle control genes. To fill this void, we used fluorescence in situ hybridization (FISH) to collect single molecule mRNA data for 16 cell cycle regulators in budding yeast, Saccharomyces cerevisiae. From statistical distributions of single-cell mRNA counts, we are able to extract the periodicity, timing, and magnitude of transcript abundance during the cell cycle. We used these parameters to improve a stochastic model of the cell cycle to better reflect the variability of molecular and phenotypic data on cell cycle progression in budding yeast.
- Modeling Temperature Effects on Population Density of the Dengue Mosquito Aedes aegyptiEl Moustaid, Fadoua; Johnson, Leah R. (MDPI, 2019-11-07)Mosquito density plays an important role in the spread of mosquito-borne diseases such as dengue and Zika. While it remains very challenging to estimate the density of mosquitoes, modelers have tried different methods to represent it in mathematical models. The goal of this paper is to investigate the various ways mosquito density has been quantified, as well as to propose a dynamical system model that includes the details of mosquito life stages leading to the adult population. We first discuss the mosquito traits involved in determining mosquito density, focusing on those that are temperature dependent. We evaluate different forms of models for mosquito densities based on these traits and explore their dynamics as temperature varies. Finally, we compare the predictions of the models to observations of Aedes aegypti abundances over time in Vitòria, Brazil. Our results indicate that the four models exhibit qualitatively and quantitatively different behaviors when forced by temperature, but that all seem reasonably consistent with observed abundance data.
- Panamanian frog species host unique skin bacterial communitiesBelden, Lisa K.; Hughey, Myra C.; Rebollar, Eria A.; Umile, Thomas P.; Loftus, Stephen C.; Burzynski, Elizabeth A.; Minbiole, Kevin P. C.; House, Leanna L.; Jensen, Roderick V.; Becker, Matthew H.; Walke, Jenifer B.; Medina, Daniel; Ibanez, Roberto; Harris, Reid N. (Frontiers, 2015-10-27)Vertebrates, including amphibians, host diverse symbiotic microbes that contribute to host disease resistance. Globally, and especially in montane tropical systems, many amphibian species are threatened by a chytrid fungus, Batrachochytrium dendrobatidis (Bd), that causes a lethal skin disease. Bd therefore may be a strong selective agent on the diversity and function of the microbial communities inhabiting amphibian skin. In Panama, amphibian population declines and the spread of Bd have been tracked. In 2012, we completed a field survey in Panama to examine frog skin microbiota in the context of Bd infection. We focused on three frog species and collected two skin swabs per frog from a total of 136 frogs across four sites that varied from west to east in the time since Bd arrival. One swab was used to assess bacterial community structure using 16S rRNA amplicon sequencing and to determine Bd infection status, and one was used to assess metabolite diversity, as the bacterial production of anti fungal metabolites is an important disease resistance function. The skin microbiota of the three Panamanian frog species differed in OTU (operational taxonomic unit, bacterial species) community composition and metabolite profiles, although the pattern was less strong for the metabolites. Comparisons between frog skin bacterial communities from Panama and the US suggest broad similarities at the phylum level, but key differences at lower taxonomic levels. In our field survey in Panama, across all four sites, only 35 individuals (similar to 26%) were Bd infected. There was no clustering of OTUs or metabolite profiles based on Bd infection status and no clear pattern of west east changes in OTUs or metabolite profiles across the four sites. Overall, our field survey data suggest that different bacterial communities might be producing broadly similar sets of metabolites across frog hosts and sites. Community structure and function may not be as tightly coupled in these skin symbiont microbial systems as it is in many macro systems.
- Predicting temperature-dependent transmission suitability of bluetongue virus in livestockEl Moustaid, Fadoua; Thornton, Zorian; Slamani, Hani; Ryan, Sadie J.; Johnson, Leah R. (2021-07-30)The transmission of vector-borne diseases is governed by complex factors including pathogen characteristics, vector–host interactions, and environmental conditions. Temperature is a major driver for many vector-borne diseases including Bluetongue viral (BTV) disease, a midge-borne febrile disease of ruminants, notably livestock, whose etiology ranges from mild or asymptomatic to rapidly fatal, thus threatening animal agriculture and the economy of affected countries. Using modeling tools, we seek to predict where the transmission can occur based on suitable temperatures for BTV. We fit thermal performance curves to temperature-sensitive midge life-history traits, using a Bayesian approach. We incorporate these curves into S(T), a transmission suitability metric derived from the disease’s basic reproductive number, 𝑅0. This suitability metric encompasses all components that are known to be temperature-dependent. We use trait responses for two species of key midge vectors, Culicoides sonorensis and Culicoides variipennis present in North America. Our results show that outbreaks of BTV are more likely between 15∘ C and 34∘ C, with predicted peak transmission risk at 26 ∘ C. The greatest uncertainty in S(T) is associated with the following: the uncertainty in mortality and fecundity of midges near optimal temperature for transmission; midges’ probability of becoming infectious post-infection at the lower edge of the thermal range; and the biting rate together with vector competence at the higher edge of the thermal range. We compare three model formulations and show that incorporating thermal curves into all three leads to similar BTV risk predictions. To demonstrate the utility of this modeling approach, we created global suitability maps indicating the areas at high and long-term risk of BTV transmission, to assess risk and to anticipate potential locations of disease establishment.
- The Role of Vector Trait Variation in Vector-Borne Disease DynamicsCator, Lauren J.; Johnson, Leah R.; Mordecai, Erin A.; El Moustaid, Fadoua; Smallwood, Thomas R. C.; LaDeau, Shannon L.; Johansson, Michael A.; Hudson, Peter J.; Boots, Michael; Thomas, Matthew B.; Power, Alison G.; Pawar, Samraat (2020-07-10)Many important endemic and emerging diseases are transmitted by vectors that are biting arthropods. The functional traits of vectors can affect pathogen transmission rates directly and also through their effect on vector population dynamics. Increasing empirical evidence shows that vector traits vary significantly across individuals, populations, and environmental conditions, and at time scales relevant to disease transmission dynamics. Here, we review empirical evidence for variation in vector traits and how this trait variation is currently incorporated into mathematical models of vector-borne disease transmission. We argue that mechanistically incorporating trait variation into these models, by explicitly capturing its effects on vector fitness and abundance, can improve the reliability of their predictions in a changing world. We provide a conceptual framework for incorporating trait variation into vector-borne disease transmission models, and highlight key empirical and theoretical challenges. This framework provides a means to conceptualize how traits can be incorporated in vector borne disease systems, and identifies key areas in which trait variation can be explored. Determining when and to what extent it is important to incorporate trait variation into vector borne disease models remains an important, outstanding question.
- System and method for genotyping using informed error profiles(United States Patent and Trademark Office, 2018-03-13)A system and method for genotyping tandem repeats in sequencing data. The invention uses Bayesian model selection guided by an empirically-derived error model that incorporates properties of sequence reads and reference sequences to which they map.
- Transmission of West Nile and five other temperate mosquito-borne viruses peaks at temperatures between 23 degrees C and 26 degrees CShocket, Marta S.; Verwillow, Anna B.; Numazu, Mailo G.; Slamani, Hani; Cohen, Jeremy M.; El Moustaid, Fadoua; Rohr, Jason R.; Johnson, Leah R.; Mordecai, Erin A. (2020-09-15)The temperature-dependence of many important mosquito-borne diseases has never been quantified. These relationships are critical for understanding current distributions and predicting future shifts from climate change. We used trait-based models to characterize temperature-dependent transmission of 10 vector-pathogen pairs of mosquitoes (Culex pipiens, Cx. quinquefascsiatus, Cx. tarsalis, and others) and viruses (West Nile, Eastern and Western Equine Encephalitis, St. Louis Encephalitis, Sindbis, and Rift Valley Fever viruses), most with substantial transmission in temperate regions. Transmission is optimized at intermediate temperatures (23-26 degrees C) and often has wider thermal breadths (due to cooler lower thermal limits) compared to pathogens with predominately tropical distributions (in previous studies). The incidence of human West Nile virus cases across US counties responded unimodally to average summer temperature and peaked at 24 degrees C, matching model-predicted optima (24-25 degrees C). Climate warming will likely shift transmission of these diseases, increasing it in cooler locations while decreasing it in warmer locations.
- Virtual methylome dissection facilitated by single-cell analysesYin, Liduo; Luo, Yanting; Xu, Xiguang; Wen, Shiyu; Wu, Xiaowei; Lu, Xuemei; Xie, Hehuang David (2019-11-11)Background Numerous cell types can be identified within plant tissues and animal organs, and the epigenetic modifications underlying such enormous cellular heterogeneity are just beginning to be understood. It remains a challenge to infer cellular composition using DNA methylomes generated for mixed cell populations. Here, we propose a semi-reference-free procedure to perform virtual methylome dissection using the nonnegative matrix factorization (NMF) algorithm. Results In the pipeline that we implemented to predict cell-subtype percentages, putative cell-type-specific methylated (pCSM) loci were first determined according to their DNA methylation patterns in bulk methylomes and clustered into groups based on their correlations in methylation profiles. A representative set of pCSM loci was then chosen to decompose target methylomes into multiple latent DNA methylation components (LMCs). To test the performance of this pipeline, we made use of single-cell brain methylomes to create synthetic methylomes of known cell composition. Compared with highly variable CpG sites, pCSM loci achieved a higher prediction accuracy in the virtual methylome dissection of synthetic methylomes. In addition, pCSM loci were shown to be good predictors of the cell type of the sorted brain cells. The software package developed in this study is available in the GitHub repository (https://github.com/Gavin-Yinld). Conclusions We anticipate that the pipeline implemented in this study will be an innovative and valuable tool for the decoding of cellular heterogeneity.
- XTALKDB: a database of signaling pathway crosstalkSam, Sarah A.; Teel, Joelle; Tegge, Allison N.; Bharadwaj, Aditya; Murali, T. M. (2017-01-04)Analysis of signaling pathways and their crosstalk is a cornerstone of systems biology. Thousands of papers have been published on these topics. Surprisingly, there is no database that carefully and explicitly documents crosstalk between specific pairs of signaling pathways. We have developed XTALKDB (http://www.xtalkdb.org) to fill this very important gap. XTALKDB contains curated information for 650 pairs of pathways from over 1600 publications. In addition, the database reports the molecular components (e.g. proteins, hormones, microRNAs) that mediate crosstalk between a pair of pathways and the species and tissue in which the crosstalk was observed. The XTALKDB website provides an easy-to- use interface for scientists to browse crosstalk information by querying one or more pathways or molecules of interest.