Show simple item record

dc.contributor.authorZhang, Baopingen_US
dc.date.accessioned2014-03-14T20:13:43Z
dc.date.available2014-03-14T20:13:43Z
dc.date.issued2006-06-20en_US
dc.identifier.otheretd-07032006-152103en_US
dc.identifier.urihttp://hdl.handle.net/10919/28198
dc.description.abstractAutomatic text classification using current approaches is known to perform poorly when documents are noisy or when limited amounts of textual content is available. Yet, many users need access to such documents, which are found in large numbers in digital libraries and in the WWW. If documents are not classified, they are difficult to find when browsing. Further, searching precision suffers when categories cannot be checked, since many documents may be retrieved that would fail to meet category constraints. In this work, we study how different types of evidence from multiple sources can be intelligently fused to improve classification of text documents into predefined categories. We present a classification framework based on an inductive learning method -- Genetic Programming (GP) -- to fuse evidence from multiple sources. We show that good classification is possible with documents which are noisy or which have small amounts of text (e.g., short metadata records) -- if multiple sources of evidence are fused in an intelligent way. The framework is validated through experiments performed on documents in two testbeds. One is the ACM Digital Library (using a subset available in connection with CITIDEL, part of NSF's National Science Digital Library). The other is Web data, in particular that portion associated with the Cadê Web directory. Our studies have shown that improvement can be achieved relative to other machine learning approaches if genetic programming methods are combined with classifiers such as kNN. Extensive analysis was performed to study the results generated through the GP-based fusion approach and to understand key factors that promote good classification.en_US
dc.publisherVirginia Techen_US
dc.relation.haspartBaopingDissertationFinal.pdfen_US
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Virginia Tech or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectexperimentationen_US
dc.subjectclassificationen_US
dc.subjectGenetic Programmingen_US
dc.subjectdigital librariesen_US
dc.titleIntelligent Fusion of Evidence from Multiple Sources for Text Classificationen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Scienceen_US
dc.contributor.committeechairFox, Edward Alanen_US
dc.contributor.committeememberSpitzner, Dan J.en_US
dc.contributor.committeememberLu, Chang-Tienen_US
dc.contributor.committeememberFan, Weiguo Patricken_US
dc.contributor.committeememberCalado, Pavelen_US
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-07032006-152103/en_US
dc.date.sdate2006-07-03en_US
dc.date.rdate2006-09-06
dc.date.adate2006-09-06en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record