Show simple item record

dc.contributor.authorPark, Sung Heeen_US
dc.date.accessioned2015-05-30T06:00:21Z
dc.date.available2015-05-30T06:00:21Z
dc.date.issued2013-07-11en_US
dc.identifier.othervt_gsexam:1167en_US
dc.identifier.urihttp://hdl.handle.net/10919/52860
dc.description.abstractIn education and research, references play a key role. They give credit to prior works, and provide support for reviews, discussions, and arguments. The set of references attached to a publication can help describe that publication, can aid with its categorization and retrieval, can support bibliometric studies, and can guide interested readers and researchers. If suitably analyzed, that set can aid with the analysis of the publication itself, especially regarding all its citing passages. However, extracting and parsing references are difficult problems. One concern is that there are many styles of references, and identifying what style was employed is problematic, especially in heterogeneous collections of theses and dissertations, which cover many fields and disciplines, and where different styles may be used even in the same publication. We address these problems by drawing upon suitable knowledge found in the WWW. In particular, we use appropriate lists (e.g., of names, cities, and other types of entities). We use available information about the many reference styles found, in a type of reverse engineering. We use available references to guide machine learning. In particular, we research a two-stage classifier approach, with multi-class classification with respect to reference styles, and partially solve the problem of parsing surface representations of references. We describe empirical evidence for the effectiveness of our approach and plans for improvement of our method.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis Item is protected by copyright and/or related rights. Some uses of this Item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectCanonical Representation Extractionen_US
dc.subjectKnowledge Acquisitionen_US
dc.subjectReverse-Engineeringen_US
dc.subjectDomain-Independent Reference Metadata Exten_US
dc.titleDiscipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the Weben_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairFox, Edward Alanen_US
dc.contributor.committeememberRamakrishnan, Narendranen_US
dc.contributor.committeememberFan, Weiguoen_US
dc.contributor.committeememberGiles, C. Leeen_US
dc.contributor.committeememberEhrich, Roger W.en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record