Show simple item record

dc.contributor.authorYang, Seungwonen_US
dc.date.accessioned2014-01-23T09:00:15Z
dc.date.available2014-01-23T09:00:15Z
dc.date.issued2014-01-22en_US
dc.identifier.othervt_gsexam:1938en_US
dc.identifier.urihttp://hdl.handle.net/10919/25111
dc.description.abstractIdentifying topics of a textual document is useful for many purposes. We can organize the documents by topics in digital libraries. Then, we could browse and search for the documents with specific topics. By examining the topics of a document, we can quickly understand what the document is about. To augment the traditional manual way of topic tagging tasks, which is labor-intensive, solutions using computers have been developed. This dissertation describes the design and development of a topic identification approach, in this case applied to disaster events. In a sense, this study represents the marriage of research analysis with an engineering effort in that it combines inspiration from Cognitive Informatics with a practical model from Information Retrieval. One of the design constraints, however, is that the Web was used as a universal knowledge source, which was essential in accessing the required information for inferring topics from texts. Retrieving specific information of interest from such a vast information source was achieved by querying a search engine's application programming interface. Specifically, the information gathered was processed mainly by incorporating the Vector Space Model from the Information Retrieval field. As a proof of concept, we subsequently developed and evaluated a prototype tool, Xpantrac, which is able to run in a batch mode to automatically process text documents. A user interface of Xpantrac also was constructed to support an interactive semi-automatic topic tagging application, which was subsequently assessed via a usability study. Throughout the design, development, and evaluation of these various study components, we detail how the hypotheses and research questions of this dissertation have been supported and answered. We also present that our overarching goal, which was the identification of topics in a human-comparable way without depending on a large training set or a corpus, has been achieved.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis Item is protected by copyright and/or related rights. Some uses of this Item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjecttopic identificationen_US
dc.subjecttaggingen_US
dc.subjectcognitive informaticsen_US
dc.subjectvector space modelen_US
dc.subjectknowledge sourcesen_US
dc.subjectnatural language processingen_US
dc.subjectdigital librariesen_US
dc.subjectusability studyen_US
dc.titleAutomatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approachen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairFox, Edward Alanen_US
dc.contributor.committeememberWildemuth, Barbara Marieen_US
dc.contributor.committeememberRamakrishnan, Narendranen_US
dc.contributor.committeememberMoore, John F.en_US
dc.contributor.committeememberFan, Weiguoen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record