Show simple item record

dc.contributor.authorKumar, Deepten_US
dc.date.accessioned2014-03-14T20:11:29Z
dc.date.available2014-03-14T20:11:29Z
dc.date.issued2007-04-19en_US
dc.identifier.otheretd-05032007-223232en_US
dc.identifier.urihttp://hdl.handle.net/10919/27518
dc.description.abstractScientific data mining purports to extract useful knowledge from massive datasets curated through computational science efforts, e.g., in bioinformatics, cosmology, geographic sciences, and computational chemistry. In the recent past, we have witnessed major transformations of these applied sciences into data-driven endeavors. In particular, scientists are now faced with an overload of vocabularies for describing domain entities. All of these vocabularies offer alternative and mostly complementary (sometimes, even contradictory) ways to organize information and each vocabulary provides a different perspective into the problem being studied. To further knowledge discovery, computational scientists need tools to help uniformly reason across vocabularies, integrate multiple forms of characterizing datasets, and situate knowledge gained from one study in terms of others. This dissertation defines a new pattern class called redescriptions that provides high level capabilities for reasoning across domain vocabularies. A redescription is a shift of vocabulary, or a different way of communicating the same information; redescription mining finds concerted sets of objects that can be defined in (at least) two ways using given descriptors. We present the CARTwheels algorithm for mining redescriptions by exploiting equivalences of partitions induced by distinct descriptor classes as well as applications of CARTwheels to several bioinformatics datasets. We then outline how we can build more complex data mining operations by cascading redescriptions to realize a story, leading to a new data mining capability called storytelling. Besides applications to characterizing gene sets, we showcase its uses in other datasets as well. Finally, we extend the core CARTwheels algorithm by introducing a theoretical framework, based on partitions, to systematically explore redescription space; generalizing from mining redescriptions (and stories) within a single domain to relating descriptors across different domains, to support complex relational data mining scenarios; and exploiting structure of the underlying descriptor space to yield more effective algorithms for specific classes of datasets.en_US
dc.publisherVirginia Techen_US
dc.relation.haspartdeept_redescs.pdfen_US
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Virginia Tech or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectbioinformaticsen_US
dc.subjectstorytellingen_US
dc.subjectredescription miningen_US
dc.subjectredescriptionsen_US
dc.titleRedescription Mining: Algorithms and Applications in Bioinformaticsen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Scienceen_US
dc.contributor.committeechairRamakrishnan, Narendranen_US
dc.contributor.committeememberNorth, Christopher L.en_US
dc.contributor.committeememberMurali, T. M.en_US
dc.contributor.committeememberPotts, Malcolmen_US
dc.contributor.committeememberHelm, Richard Fredericken_US
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-05032007-223232/en_US
dc.date.sdate2007-05-03en_US
dc.date.rdate2007-05-10
dc.date.adate2007-05-10en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record