Department of Statistics
Permanent URI for this community
Browse
Browsing Department of Statistics by Department "Computer Science"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
- Identifying Transcriptional Regulatory Modules Among Different Chromatin States in Mouse Neural Stem CellsBanerjee, Sharmi; Zhu, Hongxiao; Tang, Man; Feng, Wu-chun; Wu, Xiaowei; Xie, Hehuang David (Frontiers, 2019-01-15)Gene expression regulation is a complex process involving the interplay between transcription factors and chromatin states. Significant progress has been made toward understanding the impact of chromatin states on gene expression. Nevertheless, the mechanism of transcription factors binding combinatorially in different chromatin states to enable selective regulation of gene expression remains an interesting research area. We introduce a nonparametric Bayesian clustering method for inhomogeneous Poisson processes to detect heterogeneous binding patterns of multiple proteins including transcription factors to form regulatory modules in different chromatin states. We applied this approach on ChIP-seq data for mouse neural stem cells containing 21 proteins and observed different groups or modules of proteins clustered within different chromatin states. These chromatin-state-specific regulatory modules were found to have significant influence on gene expression. We also observed different motif preferences for certain TFs between different chromatin states. Our results reveal a degree of interdependency between chromatin states and combinatorial binding of proteins in the complex transcriptional regulatory process. The software package is available on Github at - https://github.com/BSharmi/DPM-LGCP.
- Performance evaluation of indel calling tools using real short-read dataHasan, Mohammad Shabbir; Wu, Xiaowei; Zhang, Liqing (Biomed Central, 2015-08-19)Background Insertion and deletion (indel), a common form of genetic variation, has been shown to cause or contribute to human genetic diseases and cancer. With the advance of next-generation sequencing technology, many indel calling tools have been developed; however, evaluation and comparison of these tools using large-scale real data are still scant. Here we evaluated seven popular and publicly available indel calling tools, GATK Unified Genotyper, VarScan, Pindel, SAMtools, Dindel, GTAK HaplotypeCaller, and Platypus, using 78 human genome low-coverage data from the 1000 Genomes project. Results Comparing indels called by these tools with a known set of indels, we found that Platypus outperforms other tools. In addition, a high percentage of known indels still remain undetected and the number of common indels called by all seven tools is very low. Conclusion All these findings indicate the necessity of improving the existing tools or developing new algorithms to achieve reliable and consistent indel calling results.
- Transcriptomic Analysis of Hepatic Cells in Multicellular Organotypic Liver ModelsTegge, Allison N.; Rodrigues, Richard R.; Larkin, Adam L.; Vu, Lucas T.; Murali, T. M.; Rajagopalan, Padmavathy (Springer Nature, 2018-07-27)Liver homeostasis requires the presence of both parenchymal and non-parenchymal cells (NPCs). However, systems biology studies of the liver have primarily focused on hepatocytes. Using an organotypic three-dimensional (3D) hepatic culture, we report the first transcriptomic study of liver sinusoidal endothelial cells (LSECs) and Kupffer cells (KCs) cultured with hepatocytes. Through computational pathway and interaction network analyses, we demonstrate that hepatocytes, LSECs and KCs have distinct expression profiles and functional characteristics. Our results show that LSECs in the presence of KCs exhibit decreased expression of focal adhesion kinase (FAK) signaling, a pathway linked to LSEC dedifferentiation. We report the novel result that peroxisome proliferator-activated receptor alpha (PPAR alpha) is transcribed in LSECs. The expression of downstream processes corroborates active PPAR alpha signaling in LSECs. We uncover transcriptional evidence in LSECs for a feedback mechanism between PPAR alpha and farnesoid X-activated receptor (FXR) that maintains bile acid homeostasis; previously, this feedback was known occur only in HepG2 cells. We demonstrate that KCs in 3D liver models display expression patterns consistent with an anti-inflammatory phenotype when compared to monocultures. These results highlight the distinct roles of LSECs and KCs in maintaining liver function and emphasize the need for additional mechanistic studies of NPCs in addition to hepatocytes in liver-mimetic microenvironments.
- Uncovering missed indels by leveraging unmapped readsHasan, Mohammad Shabbir; Wu, Xiaowei; Zhang, Liqing (Springer Nature, 2019-07-31)In current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic variants. Although existing alignment tools have shown great accuracy in mapping short reads to the reference genome, a significant number of short reads still remain unmapped and are often excluded from downstream analyses thereby causing nonnegligible information loss in the subsequent variant calling procedure. This paper describes Genesis-indel, a computational pipeline that explores the unmapped reads to identify novel indels that are initially missed in the original procedure. Genesis-indel is applied to the unmapped reads of 30 breast cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-indel identifies 72,997 novel high-quality indels previously not found, among which 16,141 have not been annotated in the widely used mutation database. Statistical analysis of these indels shows significant enrichment of indels residing in oncogenes and tumour suppressor genes. Functional annotation further reveals that these indels are strongly correlated with pathways of cancer and can have high to moderate impact on protein functions. Additionally, some of the indels overlap with the genes that do not have any indel mutations called from the originally mapped reads but have been shown to contribute to the tumorigenesis in multiple carcinomas, further emphasizing the importance of rescuing indels hidden in the unmapped reads in cancer and disease studies.
- Using data-driven agent-based models for forecasting emerging infectious diseasesVenkatramanan, Srinivasan; Lewis, Bryan L.; Chen, Jiangzhuo; Higdon, Dave; Vullikanti, Anil Kumar S.; Marathe, Madhav V. (Elsevier, 2017-02-22)Producing timely, well-informed and reliable forecasts for an ongoing epidemic of an emerging infectious disease is a huge challenge. Epidemiologists and policy makers have to deal with poor data quality, limited understanding of the disease dynamics, rapidly changing social environment and the uncertainty on effects of various interventions in place. Under this setting, detailed computational models providea comprehensive framework for integrating diverse data sources into a well-defined model of disease dynamics and social behavior, potentially leading to better understanding and actions. In this paper,we describe one such agent-based model framework developed for forecasting the 2014–2015 Ebola epidemic in Liberia, and subsequently used during the Ebola forecasting challenge. We describe the various components of the model, the calibration process and summarize the forecast performance across scenarios of the challenge. We conclude by highlighting how such a data-driven approach can be refinedand adapted for future epidemics, and share the lessons learned over the course of the challenge.
- vi-HMM: a novel HMM-based method for sequence variant identification in short-read dataTang, Man; Hasan, Mohammad Shabbir; Zhu, Hongxiao; Zhang, Liqing; Wu, Xiaowei (2019-02-13)Background Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD). Results and conclusion We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short-read data. This method allows transitions between hidden states (defined as “SNP,” “Ins,” “Del,” and “Match”) of adjacent genomic bases and determines an optimal hidden state path by using the Viterbi algorithm. The inferred hidden state path provides a direct solution to the identification of SNPs and INDELs. Simulation studies show that, under various sequencing depths, vi-HMM outperforms commonly used variant calling methods in terms of sensitivity and F1 score. When applied to the real data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs.
- Vindel: a simple pipeline for checking indel redundancyLi, Zhiyi; Wu, Xiaowei; He, Bin; Zhang, Liqing (Biomed Central, 2014-11-19)Background With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors. Results In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in the human indels in dbSNP. Our pipeline identified approximately 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent indel distance distribution for a better understanding of the mechanisms generating indel variants. Conclusions Vindel, a simple yet effective computational pipeline, can be used to check whether a set of indels are redundant with respect to those already in the database of interest such as NCBI’s dbSNP. Of the approximately 5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the consistency of the pipeline on indel redundancy detection for all 22 chromosomes. Apart from the standalone Vindel pipeline, the indel redundancy check algorithm is also implemented in the web server http://bioinformatics.cs.vt.edu/zhanglab/indelRedundant.php.
- Visual to Parametric Interaction (V2PI)Leman, Scotland C.; House, Leanna L.; Maiti, Dipayan; Endert, Alex; North, Christopher L. (PLOS, 2013-03-20)Typical data visualizations result from linear pipelines that start by characterizing data using a model or algorithm to reduce the dimension and summarize structure, and end by displaying the data in a reduced dimensional form. Sensemaking may take place at the end of the pipeline when users have an opportunity to observe, digest, and internalize any information displayed. However, some visualizations mask meaningful data structures when model or algorithm constraints (e.g., parameter specifications) contradict information in the data. Yet, due to the linearity of the pipeline, users do not have a natural means to adjust the displays. In this paper, we present a framework for creating dynamic data displays that rely on both mechanistic data summaries and expert judgement. The key is that we develop both the theory and methods of a new human-data interaction to which we refer as ‘‘ Visual to Parametric Interaction’’ (V2PI). With V2PI, the pipeline becomes bidirectional in that users are embedded in the pipeline; users learn from visualizations and the visualizations adjust to expert judgement. We demonstrate the utility of V2PI and a bi-directional pipeline with two examples.
- XTALKDB: a database of signaling pathway crosstalkSam, Sarah A.; Teel, Joelle; Tegge, Allison N.; Bharadwaj, Aditya; Murali, T. M. (2017-01-04)Analysis of signaling pathways and their crosstalk is a cornerstone of systems biology. Thousands of papers have been published on these topics. Surprisingly, there is no database that carefully and explicitly documents crosstalk between specific pairs of signaling pathways. We have developed XTALKDB (http://www.xtalkdb.org) to fill this very important gap. XTALKDB contains curated information for 650 pairs of pathways from over 1600 publications. In addition, the database reports the molecular components (e.g. proteins, hormones, microRNAs) that mediate crosstalk between a pair of pathways and the species and tissue in which the crosstalk was observed. The XTALKDB website provides an easy-to- use interface for scientists to browse crosstalk information by querying one or more pathways or molecules of interest.