Increasing Accessibility of Electronic Theses and Dissertations (ETDs) Through Chapter-level Classification

Jude, Palakh Mignonne

Increasing Accessibility of Electronic Theses and Dissertations (ETDs) Through Chapter-level Classification

Files

Jude_P_T_2020.pdf (22.43 MB)

Downloads: 2244

Date

2020-07-07

Authors

Jude, Palakh Mignonne

Publisher

Virginia Tech

Abstract

Great progress has been made to leverage the improvements made in natural language processing and machine learning to better mine data from journals, conference proceedings, and other digital library documents. However, these advances do not extend well to book-length documents such as electronic theses and dissertations (ETDs). ETDs contain extensive research data; stakeholders -- including researchers, librarians, students, and educators -- can benefit from increased access to this corpus. Challenges arise while working with this corpus owing to the varied nature of disciplines covered as well as the use of domain-specific language. Prior systems are not tuned to this corpus. This research aims to increase the accessibility of ETDs by the automatic classification of chapters of an ETD using machine learning and deep learning techniques. This work utilizes an ETD-centric target classification system. It demonstrates the use of custom trained word and document embeddings to generate better vector representations of this corpus. It also describes a methodology to leverage extractive summaries of chapters of an ETD to aid in the classification process. Our findings indicate that custom embeddings and the use of summarization techniques can increase the performance of the classifiers. The chapter-level labels generated by this research help to identify the level of interdisciplinarity in the corpus. The automatic classifiers can also be further used in a search engine interface that would help users to find the most appropriate chapters.

Keywords

Electronic Theses and Dissertations, Classification, Machine learning, Deep learning (Machine learning), Natural Language Processing

Persistent link

http://hdl.handle.net/10919/99294

Collections

Masters Theses

Full item page

Increasing Accessibility of Electronic Theses and Dissertations (ETDs) Through Chapter-level Classification

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections