Browsing by Author "Park, Sung Hee"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- Discipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the WebPark, Sung Hee (Virginia Tech, 2013-07-11)In education and research, references play a key role. They give credit to prior works, and provide support for reviews, discussions, and arguments. The set of references attached to a publication can help describe that publication, can aid with its categorization and retrieval, can support bibliometric studies, and can guide interested readers and researchers. If suitably analyzed, that set can aid with the analysis of the publication itself, especially regarding all its citing passages. However, extracting and parsing references are difficult problems. One concern is that there are many styles of references, and identifying what style was employed is problematic, especially in heterogeneous collections of theses and dissertations, which cover many fields and disciplines, and where different styles may be used even in the same publication. We address these problems by drawing upon suitable knowledge found in the WWW. In particular, we use appropriate lists (e.g., of names, cities, and other types of entities). We use available information about the many reference styles found, in a type of reverse engineering. We use available references to guide machine learning. In particular, we research a two-stage classifier approach, with multi-class classification with respect to reference styles, and partially solve the problem of parsing surface representations of references. We describe empirical evidence for the effectiveness of our approach and plans for improvement of our method.
- File Formats, Transformation, and MigrationLeidig, Jonathan; Alon, A. J.; Chigani, Amine; Gopalakrishnan, Mahima; Park, Sung Hee (2009-10-09)This module covers the principles and applications of the transformation and migration processes for the preservation of digital content, as well as key issues surrounding digital preservation strategies.
- Integrated Digital Library System for Long Documents and their ElementsChekuri, Satvik; Chandrasekar, Prashant; Banerjee, Bipasha; Park, Sung Hee; Masrourisaadat, Nila; Ahuja, Aman; Ingram, William A.; Fox, Edward A. (ACM, 2023)We describe a next-generation integrated Digital Library (DL) system that addresses the numerous goals associated with long documents such as Electronic Theses and Dissertations (ETDs). Our extensible workflow-centric design supports a variety of users/personas (e.g., researchers, curators, and experimenters) who can benefit from improved access to ETDs and the content buried therein. Our approach leverages natural language processing, deep learning, information retrieval, and software engineering methods. The services cover ingesting, storing, curating, analyzing, detecting, extracting, classifying, summarizing, topic modeling, browsing, searching, retrieving, recommending, visualizing/reporting, and interacting with ETDs and derivative text/image-based elements/objects. Workflows connect the services and their APIs, along with UI-based access. We believe our approach can guide others to combine tailored user support, research, and education by way of extensible DLs.