Discipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the Web

Park, Sung Hee

Discipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the Web

dc.contributor.author	Park, Sung Hee	en
dc.contributor.committeechair	Fox, Edward A.	en
dc.contributor.committeemember	Ramakrishnan, Naren	en
dc.contributor.committeemember	Fan, Weiguo	en
dc.contributor.committeemember	Giles, C. Lee	en
dc.contributor.committeemember	Ehrich, Roger W.	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2015-05-30T06:00:21Z	en
dc.date.available	2015-05-30T06:00:21Z	en
dc.date.issued	2013-07-11	en
dc.description.abstract	In education and research, references play a key role. They give credit to prior works, and provide support for reviews, discussions, and arguments. The set of references attached to a publication can help describe that publication, can aid with its categorization and retrieval, can support bibliometric studies, and can guide interested readers and researchers. If suitably analyzed, that set can aid with the analysis of the publication itself, especially regarding all its citing passages. However, extracting and parsing references are difficult problems. One concern is that there are many styles of references, and identifying what style was employed is problematic, especially in heterogeneous collections of theses and dissertations, which cover many fields and disciplines, and where different styles may be used even in the same publication. We address these problems by drawing upon suitable knowledge found in the WWW. In particular, we use appropriate lists (e.g., of names, cities, and other types of entities). We use available information about the many reference styles found, in a type of reverse engineering. We use available references to guide machine learning. In particular, we research a two-stage classifier approach, with multi-class classification with respect to reference styles, and partially solve the problem of parsing surface representations of references. We describe empirical evidence for the effectiveness of our approach and plans for improvement of our method.	en
dc.description.degree	Ph. D.	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:1167	en
dc.identifier.uri	http://hdl.handle.net/10919/52860	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Canonical Representation Extraction	en
dc.subject	Knowledge Acquisition	en
dc.subject	Reverse-Engineering	en
dc.subject	Domain-Independent Reference Metadata Ext	en
dc.title	Discipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the Web	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Ph. D.	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Park_S_D_2013.pdf
Size:: 6.66 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations