New Algorithms for Mining Network Datasets: Applications to Phenotype and Pathway Modeling

dc.contributor.authorJin, Yingen
dc.contributor.committeechairRamakrishnan, Narenen
dc.contributor.committeememberFox, Edward A.en
dc.contributor.committeememberHeath, Lenwood S.en
dc.contributor.committeememberMurali, T. M.en
dc.contributor.committeememberHelm, Richard F.en
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-03-14T21:23:44Zen
dc.date.adate2010-01-22en
dc.date.available2014-03-14T21:23:44Zen
dc.date.issued2009-12-08en
dc.date.rdate2010-01-22en
dc.date.sdate2009-12-30en
dc.description.abstractBiological network data is plentiful with practically every experimental methodology giving 'network views' into cellular function and behavior. Bioinformatic screens that yield network data include, for example, genome-wide deletion screens, protein-protein interaction assays, RNA interference experiments, and methods to probe metabolic pathways. Efficient and comprehensive computational approaches are required to model these screens and gain insight into the nature of biological networks. This thesis presents three new algorithms to model and mine network datasets. First, we present an algorithm that models genome-wide perturbation screens by deriving relations between phenotypes and subsequently using these relations in a local manner to derive genephenotype relationships. We show how this algorithm outperforms all previously described algorithms for gene-phenotype modeling. We also present theoretical insight into the convergence and accuracy properties of this approach. Second, we define a new data mining problem–constrained minimal separator mining—and propose algorithms as well as applications to modeling gene perturbation screens by viewing the perturbed genes as a graph separator. Both of these data mining applications are evaluated on network datasets from S. cerevisiae and C. elegans. Finally, we present an approach to model the relationship between metabolic pathways and operon structure in prokaryotic genomes. In this approach, we present a new pattern class—biclusters over domains with supplied partial orders—and present algorithms for systematically detecting such biclusters. Together, our data mining algorithms provide a comprehensive arsenal of techniques for modeling gene perturbation screens and metabolic pathways.en
dc.description.degreePh. D.en
dc.identifier.otheretd-12302009-142944en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-12302009-142944/en
dc.identifier.urihttp://hdl.handle.net/10919/40493en
dc.publisherVirginia Techen
dc.relation.haspartJin_Ying_D_2009.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectpartial ordersen
dc.subjectbiclustersen
dc.subjectgraph separatorsen
dc.subjectrelative importance methodsen
dc.subjectBiological networksen
dc.titleNew Algorithms for Mining Network Datasets: Applications to Phenotype and Pathway Modelingen
dc.typeDissertationen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jin_Ying_D_2009.pdf
Size:
1.52 MB
Format:
Adobe Portable Document Format