Intelligent Fusion of Evidence from Multiple Sources for Text Classification

dc.contributor.authorZhang, Baopingen
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeememberSpitzner, Dan J.en
dc.contributor.committeememberLu, Chang-Tienen
dc.contributor.committeememberFan, Weiguo Patricken
dc.contributor.committeememberCalado, Pavelen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-03-14T20:13:43Zen
dc.date.adate2006-09-06en
dc.date.available2014-03-14T20:13:43Zen
dc.date.issued2006-06-20en
dc.date.rdate2006-09-06en
dc.date.sdate2006-07-03en
dc.description.abstractAutomatic text classification using current approaches is known to perform poorly when documents are noisy or when limited amounts of textual content is available. Yet, many users need access to such documents, which are found in large numbers in digital libraries and in the WWW. If documents are not classified, they are difficult to find when browsing. Further, searching precision suffers when categories cannot be checked, since many documents may be retrieved that would fail to meet category constraints. In this work, we study how different types of evidence from multiple sources can be intelligently fused to improve classification of text documents into predefined categories. We present a classification framework based on an inductive learning method -- Genetic Programming (GP) -- to fuse evidence from multiple sources. We show that good classification is possible with documents which are noisy or which have small amounts of text (e.g., short metadata records) -- if multiple sources of evidence are fused in an intelligent way. The framework is validated through experiments performed on documents in two testbeds. One is the ACM Digital Library (using a subset available in connection with CITIDEL, part of NSF's National Science Digital Library). The other is Web data, in particular that portion associated with the Cadê Web directory. Our studies have shown that improvement can be achieved relative to other machine learning approaches if genetic programming methods are combined with classifiers such as kNN. Extensive analysis was performed to study the results generated through the GP-based fusion approach and to understand key factors that promote good classification.en
dc.description.degreePh. D.en
dc.identifier.otheretd-07032006-152103en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-07032006-152103/en
dc.identifier.urihttp://hdl.handle.net/10919/28198en
dc.publisherVirginia Techen
dc.relation.haspartBaopingDissertationFinal.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectexperimentationen
dc.subjectclassificationen
dc.subjectGenetic Programmingen
dc.subjectdigital librariesen
dc.titleIntelligent Fusion of Evidence from Multiple Sources for Text Classificationen
dc.typeDissertationen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BaopingDissertationFinal.pdf
Size:
3.61 MB
Format:
Adobe Portable Document Format