Browsing by Author "Jin, Ying"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- CMGSDB: integrating heterogeneous Caenorhabditis elegans data sources using compositional data miningPati, Amrita; Jin, Ying; Klage, Karsten; Helm, Richard F.; Heath, Lenwood S.; Ramakrishnan, Naren (Oxford University Press, 2008-01-01)CMGSDB (Database for Computational Modeling of Gene Silencing) is an integration of heterogeneous data sources about Caenorhabditis elegans with capabilities for compositional data mining (CDM) across diverse domains. Besides gene, protein and functional annotations, CMGSDB currently unifies information about 531 RNAi phenotypes obtained from heterogeneous databases using a hierarchical scheme. A phenotype browser at the CMGSDB website serves this hierarchy and relates phenotypes to other biological entities. The application of CDM to CMGSDB produces ‘chains’ of relationships in the data by finding two-way connections between sets of biological entities. Chains can, for example, relate the knock down of a set of genes during an RNAi experiment to the disruption of a pathway or specific gene expression through another set of genes not directly related to the former set. The web interface for CMGSDB is available at https://bioinformatics.cs.vt.edu/cmgs/CMGSDB/, and serves individual biological entity information as well as details of all chains computed by CDM.
- Compositional Mining of Multi-Relational Biological DatasetsJin, Ying; Murali, T. M.; Ramakrishnan, Naren (Department of Computer Science, Virginia Polytechnic Institute & State University, 2007-08-01)High-throughput biological screens are yielding ever-growing streams of information about multiple aspects of cellular activity. As more and more categories of datasets come online, there is a corresponding multitude of ways in which inferences can be chained across them, motivating the need for compositional data mining algorithms. In this paper, we argue that such compositional data mining can be effectively realized by functionally cascading redescription mining and biclustering algorithms as primitives. Both these primitives mirror shifts of vocabulary that can be composed in arbitrary ways to create rich chains of inferences. Given a relational database and its schema, we show how the schema can be automatically compiled into a compositional data mining program, and how different domains in the schema can be related through logical sequences of biclustering and redescription invocations. This feature allows us to rapidly prototype new data mining applications, yielding greater understanding of scientific datasets. We describe two applications of compositional data mining: (i) matching terms across categories of the Gene Ontology and (ii) understanding the molecular mechanisms underlying stress response in human cells.
- Microwave-based Pretreatment, Pathogen Fate and Microbial Population in a Dairy Manure Treatment SystemJin, Ying (Virginia Tech, 2010-11-29)Anaerobic digestion and struvite precipitation are two effective ways of treating dairy manure for recovering biogas and phosphorus. Anaerobic digestion of dairy manure is commonly limited by slow fiber degradation, while one of the limitations to struvite precipitation is the availability of orthophosphate. The aim of this work was to study the use of microwave-based thermochemical pretreatment to simultaneously enhance manure anaerobic digestibility (through fiber degradation) and struvite precipitation (through phosphorus solubilization). Microwave heating combined with different chemicals (NaOH, CaO, H₂SO₄, or HCl) enhanced solubilization of manure and degradation of glucan/xylan in dairy manure. However, sulfuric acid-based pretreatment resulted in a low anaerobic digestibility, probably due to the sulfur inhibition and side reactions. The pretreatments released 20-40% soluble phosphorus and 9-14% ammonium. However, CaO-based pretreatment resulted in lower orthophosphate releases and struvite precipitation efficiency as calcium reacts with phosphate to form calcium phosphate. Collectively, microwave heating combined with NaOH or HCl led to a high anaerobic digestibility and phosphorus recovery. Using these two chemicals, the performance of microwave- and conventional-heating in thermochemical pretreatment was further compared. The microwave heating resulted in a better performance in terms of COD solubilization, glucan/xylan reduction, phosphorus solubilization and anaerobic digestibility. Lastly, temperature and heating time used in microwave treatment were optimized. The optimal values of temperature and heating time were 147°C and 25.3 min for methane production, and 135°C and 26 min for orthophosphate release, respectively. Applying manure or slurry directly to the land can contribute to pathogen contamination of land, freshwater and groundwater. Thus it is important to study the fate of pathogens in diary manure anaerobic digestion systems. The goal of the project was to establish a molecular based quantitative method for pathogen identification and quantification, compare the molecular based method with culture based method and study pathogen fate in dairy manure and different anaerobic digesters. Result showed that molecular based method detected more E.coli than the culture based method with less variability. Thermophilic anaerobic digestion can achieve more than 95% pathogen removal rate while mesophilic anaerobic digester had increased E.coli number than fresh manure, indicating temperature is a key factor for pathogen removal. In general, the overall goal of the study is to develop an integrated dairy manure treatment system. The microwave based pretreatment enhanced the subsequent biogas production and struvite precipitation, and the molecular tool based method provided a more precise and faster way to study the pathogen fate in various anaerobic digestions.
- New Algorithms for Mining Network Datasets: Applications to Phenotype and Pathway ModelingJin, Ying (Virginia Tech, 2009-12-08)Biological network data is plentiful with practically every experimental methodology giving 'network views' into cellular function and behavior. Bioinformatic screens that yield network data include, for example, genome-wide deletion screens, protein-protein interaction assays, RNA interference experiments, and methods to probe metabolic pathways. Efficient and comprehensive computational approaches are required to model these screens and gain insight into the nature of biological networks. This thesis presents three new algorithms to model and mine network datasets. First, we present an algorithm that models genome-wide perturbation screens by deriving relations between phenotypes and subsequently using these relations in a local manner to derive genephenotype relationships. We show how this algorithm outperforms all previously described algorithms for gene-phenotype modeling. We also present theoretical insight into the convergence and accuracy properties of this approach. Second, we define a new data mining problem–constrained minimal separator mining—and propose algorithms as well as applications to modeling gene perturbation screens by viewing the perturbed genes as a graph separator. Both of these data mining applications are evaluated on network datasets from S. cerevisiae and C. elegans. Finally, we present an approach to model the relationship between metabolic pathways and operon structure in prokaryotic genomes. In this approach, we present a new pattern class—biclusters over domains with supplied partial orders—and present algorithms for systematically detecting such biclusters. Together, our data mining algorithms provide a comprehensive arsenal of techniques for modeling gene perturbation screens and metabolic pathways.