Discipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the Web

dc.contributor.authorPark, Sung Heeen
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeememberRamakrishnan, Narenen
dc.contributor.committeememberFan, Weiguoen
dc.contributor.committeememberGiles, C. Leeen
dc.contributor.committeememberEhrich, Roger W.en
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2015-05-30T06:00:21Zen
dc.date.available2015-05-30T06:00:21Zen
dc.date.issued2013-07-11en
dc.description.abstractIn education and research, references play a key role. They give credit to prior works, and provide support for reviews, discussions, and arguments. The set of references attached to a publication can help describe that publication, can aid with its categorization and retrieval, can support bibliometric studies, and can guide interested readers and researchers. If suitably analyzed, that set can aid with the analysis of the publication itself, especially regarding all its citing passages. However, extracting and parsing references are difficult problems. One concern is that there are many styles of references, and identifying what style was employed is problematic, especially in heterogeneous collections of theses and dissertations, which cover many fields and disciplines, and where different styles may be used even in the same publication. We address these problems by drawing upon suitable knowledge found in the WWW. In particular, we use appropriate lists (e.g., of names, cities, and other types of entities). We use available information about the many reference styles found, in a type of reverse engineering. We use available references to guide machine learning. In particular, we research a two-stage classifier approach, with multi-class classification with respect to reference styles, and partially solve the problem of parsing surface representations of references. We describe empirical evidence for the effectiveness of our approach and plans for improvement of our method.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:1167en
dc.identifier.urihttp://hdl.handle.net/10919/52860en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectCanonical Representation Extractionen
dc.subjectKnowledge Acquisitionen
dc.subjectReverse-Engineeringen
dc.subjectDomain-Independent Reference Metadata Exten
dc.titleDiscipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the Weben
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Park_S_D_2013.pdf
Size:
6.66 MB
Format:
Adobe Portable Document Format