New Algorithms for Mining Network Datasets: Applications to Phenotype and Pathway Modeling
MetadataShow full item record
Biological network data is plentiful with practically every experimental methodology giving â network viewsâ into cellular function and behavior. Bioinformatic screens that yield network data include, for example, genome-wide deletion screens, protein-protein interaction assays, RNA interference experiments, and methods to probe metabolic pathways. Efficient and comprehensive computational approaches are required to model these screens and gain insight into the nature of biological networks. This thesis presents three new algorithms to model and mine network datasets. First, we present an algorithm that models genome-wide perturbation screens by deriving relations between phenotypes and subsequently using these relations in a local manner to derive genephenotype relationships. We show how this algorithm outperforms all previously described algorithms for gene-phenotype modeling. We also present theoretical insight into the convergence and accuracy properties of this approach. Second, we define a new data mining problemâ constrained minimal separator miningâ and propose algorithms as well as applications to modeling gene perturbation screens by viewing the perturbed genes as a graph separator. Both of these data mining applications are evaluated on network datasets from S. cerevisiae and C. elegans. Finally, we present an approach to model the relationship between metabolic pathways and operon structure in prokaryotic genomes. In this approach, we present a new pattern classâ biclusters over domains with supplied partial ordersâ and present algorithms for systematically detecting such biclusters. Together, our data mining algorithms provide a comprehensive arsenal of techniques for modeling gene perturbation screens and metabolic pathways.
- Doctoral Dissertations