CMGSDB: integrating heterogeneous Caenorhabditis elegans data sources using compositional data mining
CMGSDB (Database for Computational Modeling of Gene Silencing) is an integration of heterogeneous data sources about Caenorhabditis elegans with capabilities for compositional data mining (CDM) across diverse domains. Besides gene, protein and functional annotations, CMGSDB currently unifies information about 531 RNAi phenotypes obtained from heterogeneous databases using a hierarchical scheme. A phenotype browser at the CMGSDB website serves this hierarchy and relates phenotypes to other biological entities. The application of CDM to CMGSDB produces ‘chains’ of relationships in the data by finding two-way connections between sets of biological entities. Chains can, for example, relate the knock down of a set of genes during an RNAi experiment to the disruption of a pathway or specific gene expression through another set of genes not directly related to the former set. The web interface for CMGSDB is available at https://bioinformatics.cs.vt.edu/cmgs/CMGSDB/, and serves individual biological entity information as well as details of all chains computed by CDM.