Browsing by Author "Pati, Amrita"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- ClaMS: A Classifier for Metagenomic SequencesPati, Amrita; Heath, Lenwood S.; Kyrpides, Nikos C.; Ivanova, Natalia (Springer, 2011)ClaMS – “Classifier for Metagenomic Sequences” – is a Java application for binning assembled contigs in metagenomes using user-specified training sets and initial parameters. Since ClaMS trains on sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; ClaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 GH× Intel Core 2 Duo processor and 2 GB RAM. ClaMS is meant to be a desktop application for biologists and can be run on any machine under any Operating System on which the Java Runtime Environment can be installed.
- CMGSDB: integrating heterogeneous Caenorhabditis elegans data sources using compositional data miningPati, Amrita; Jin, Ying; Klage, Karsten; Helm, Richard F.; Heath, Lenwood S.; Ramakrishnan, Naren (Oxford University Press, 2008-01-01)CMGSDB (Database for Computational Modeling of Gene Silencing) is an integration of heterogeneous data sources about Caenorhabditis elegans with capabilities for compositional data mining (CDM) across diverse domains. Besides gene, protein and functional annotations, CMGSDB currently unifies information about 531 RNAi phenotypes obtained from heterogeneous databases using a hierarchical scheme. A phenotype browser at the CMGSDB website serves this hierarchy and relates phenotypes to other biological entities. The application of CDM to CMGSDB produces ‘chains’ of relationships in the data by finding two-way connections between sets of biological entities. Chains can, for example, relate the knock down of a set of genes during an RNAi experiment to the disruption of a pathway or specific gene expression through another set of genes not directly related to the former set. The web interface for CMGSDB is available at https://bioinformatics.cs.vt.edu/cmgs/CMGSDB/, and serves individual biological entity information as well as details of all chains computed by CDM.
- Graph-based genomic signaturesPati, Amrita (Virginia Tech, 2008-04-14)Genomes have both deterministic and random aspects, with the underlying DNA sequences exhibiting features at numerous scales, from codons to regions of conserved or divergent gene order. Genomic signatures work by capturing one or more such features efficiently into a compact mathematical structure. This work examines the unique manner in which oligonucleotides fit together to comprise a genome, within a graph-theoretic setting. A de Bruijn chain (DBC) is a marriage of a de Bruijn graph and a finite Markov chain. By representing a DNA sequence as a walk over a DBC and retaining specific information at nodes and edges, we are able to obtain the de Bruijn chain genomic signature (DBCGS), based on both graph structure and the stationary distribution of the DBC. We demonstrate that DBCGS is information-rich, efficient, sufficiently representative of the sequence from which it is derived, and superior to existing genomic signatures such as the dinucleotides odds ratio and word frequency based signatures. We develop a mathematical framework to elucidate the power of the DBCGS signature to distinguish between sequences hypothesized to be generated by DBCs of distinct parameters. We study the effect of order of the DBCGS signature on accuracy while presenting relationships with genome size and genome variety. We illustrate its practical value in distinguishing genomic sequences and predicting the origin of short DNA sequences of unknown origin, while highlighting its superior performance compared to existing genomic signatures including the dinucleotides odds ratio. Additionally, we describe details of the CMGS database, a centralized repository for raw and value-added data particular to C. elegans.
- Modeling and Analysis of Regulatory Elements in Arabidopsis thaliana from Annotated Genomes and Gene Expression DataPati, Amrita (Virginia Tech, 2005-07-13)Modeling of cis-elements in the upstream regions of genes is a challenging computational problem. A set of regulatory motifs present in the promoters of a set of genes can be modeled by a biclique. Combinations of cis-elements play a vital role in ascertaining that the correct co-action of transcription factors binding to the gene promoter, results in appropriate gene expression in response to various stimuli. Geometrical and spatial constraints in transcription factor binding also impose restrictions on order and separation of cis-elements. Not all regulatory elements that coexist are biologically significant. If the set of genes in which a set of regulatory elements co-occur, are tightly correlated with respect to gene expression data over a set of treatments, the regulatory element combination can be biologically directed. The system developed in this work, XcisClique, consists of a comprehensive infrastructure for annotated genome and gene expression data for Arabidopsis thaliana. XcisClique models cis-regulatory elements as regular expressions and detects maximal bicliques of genes and motifs, called itemsets. An itemset consists of a set of genes (called a geneset) and a set of motifs (called a motifset) such that every motif in the motifset occurs in the promoter of every gene in the geneset. XcisClique differs from existing tools of the same kind in that, it offers a common platform for the integration of sequence and gene expression data. Itemsets identified by XcisClique are not only evaluated for statistical over-representation in sequence data, but are also examined with respect to the expression patterns of the corresponding geneset. Thus, the results produced are biologically directed. XcisClique is also the only tool of its kind for Arabidopsis thaliana, and can also be used for other organisms in the presence of appropriate sequence, expression, and regulatory element data. The web-interface to a subset of functionalities, source code and supplemental material are available online at http://bioinformatics.cs.vt.edu/xcisclique.
- XcisClique: analysis of regulatory bicliquesPati, Amrita; Vasquez-Robinet, Cecilia; Heath, Lenwood S.; Grene, Ruth; Murali, T. M. (2006-04-21)Background Modeling of cis-elements or regulatory motifs in promoter (upstream) regions of genes is a challenging computational problem. In this work, set of regulatory motifs simultaneously present in the promoters of a set of genes is modeled as a biclique in a suitably defined bipartite graph. A biologically meaningful co-occurrence of multiple cis-elements in a gene promoter is assessed by the combined analysis of genomic and gene expression data. Greater statistical significance is associated with a set of genes that shares a common set of regulatory motifs, while simultaneously exhibiting highly correlated gene expression under given experimental conditions. Methods XcisClique, the system developed in this work, is a comprehensive infrastructure that associates annotated genome and gene expression data, models known cis-elements as regular expressions, identifies maximal bicliques in a bipartite gene-motif graph; and ranks bicliques based on their computed statistical significance. Significance is a function of the probability of occurrence of those motifs in a biclique (a hypergeometric distribution), and on the new sum of absolute values statistic (SAV) that uses Spearman correlations of gene expression vectors. SAV is a statistic well-suited for this purpose as described in the discussion. Results XcisClique identifies new motif and gene combinations that might indicate as yet unidentified involvement of sets of genes in biological functions and processes. It currently supports Arabidopsis thaliana and can be adapted to other organisms, assuming the existence of annotated genomic sequences, suitable gene expression data, and identified regulatory motifs. A subset of Xcis Clique functionalities, including the motif visualization component MotifSee, source code, and supplementary material are available at https://bioinformatics.cs.vt.edu/xcisclique/.