Redescription Mining: Algorithms and Applications in Bioinformatics

Kumar, Deept

Redescription Mining: Algorithms and Applications in Bioinformatics

dc.contributor.author	Kumar, Deept	en
dc.contributor.committeechair	Ramakrishnan, Naren	en
dc.contributor.committeemember	North, Christopher L.	en
dc.contributor.committeemember	Murali, T. M.	en
dc.contributor.committeemember	Potts, Malcolm	en
dc.contributor.committeemember	Helm, Richard F.	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2014-03-14T20:11:29Z	en
dc.date.adate	2007-05-10	en
dc.date.available	2014-03-14T20:11:29Z	en
dc.date.issued	2007-04-19	en
dc.date.rdate	2007-05-10	en
dc.date.sdate	2007-05-03	en
dc.description.abstract	Scientific data mining purports to extract useful knowledge from massive datasets curated through computational science efforts, e.g., in bioinformatics, cosmology, geographic sciences, and computational chemistry. In the recent past, we have witnessed major transformations of these applied sciences into data-driven endeavors. In particular, scientists are now faced with an overload of vocabularies for describing domain entities. All of these vocabularies offer alternative and mostly complementary (sometimes, even contradictory) ways to organize information and each vocabulary provides a different perspective into the problem being studied. To further knowledge discovery, computational scientists need tools to help uniformly reason across vocabularies, integrate multiple forms of characterizing datasets, and situate knowledge gained from one study in terms of others. This dissertation defines a new pattern class called redescriptions that provides high level capabilities for reasoning across domain vocabularies. A redescription is a shift of vocabulary, or a different way of communicating the same information; redescription mining finds concerted sets of objects that can be defined in (at least) two ways using given descriptors. We present the CARTwheels algorithm for mining redescriptions by exploiting equivalences of partitions induced by distinct descriptor classes as well as applications of CARTwheels to several bioinformatics datasets. We then outline how we can build more complex data mining operations by cascading redescriptions to realize a story, leading to a new data mining capability called storytelling. Besides applications to characterizing gene sets, we showcase its uses in other datasets as well. Finally, we extend the core CARTwheels algorithm by introducing a theoretical framework, based on partitions, to systematically explore redescription space; generalizing from mining redescriptions (and stories) within a single domain to relating descriptors across different domains, to support complex relational data mining scenarios; and exploiting structure of the underlying descriptor space to yield more effective algorithms for specific classes of datasets.	en
dc.description.degree	Ph. D.	en
dc.identifier.other	etd-05032007-223232	en
dc.identifier.sourceurl	http://scholar.lib.vt.edu/theses/available/etd-05032007-223232/	en
dc.identifier.uri	http://hdl.handle.net/10919/27518	en
dc.publisher	Virginia Tech	en
dc.relation.haspart	deept_redescs.pdf	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	bioinformatics	en
dc.subject	storytelling	en
dc.subject	redescription mining	en
dc.subject	redescriptions	en
dc.title	Redescription Mining: Algorithms and Applications in Bioinformatics	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Ph. D.	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: deept_redescs.pdf
Size:: 2.74 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations