Segmenting Electronic Theses and Dissertations By Chapters

dc.contributor.authorManzoor, Javaid Akbaren
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeememberWu, Jianen
dc.contributor.committeememberHeath, Lenwood S.en
dc.contributor.departmentComputer Science and Applicationsen
dc.date.accessioned2023-01-19T09:00:28Zen
dc.date.available2023-01-19T09:00:28Zen
dc.date.issued2023-01-18en
dc.description.abstractgeneralElectronic theses and dissertations (ETDs) are structured documents in which chapters are major components. There is a lack of any repository that contains chapter boundary details alongside these structured documents. Revealing these details of the documents can help increase accessibility. This research explores the manipulation of ETDs marked up using LaTeX to generate chapter boundaries. We use this to create a data set of 1,459 ETDs and their chapter boundaries. Additionally, for the task of automatic segmentation of unseen documents, we prototype three deep learning models that are trained using this data set. We hope to encourage researchers to incorporate LaTeX manipulation techniques to create similar data sets.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:35736en
dc.identifier.urihttp://hdl.handle.net/10919/113246en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectsegmentationen
dc.subjectdeep learningen
dc.subjectnatural language processingen
dc.subjectETDen
dc.subjectdigital librariesen
dc.titleSegmenting Electronic Theses and Dissertations By Chaptersen
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Manzoor_JA_T_2023.pdf
Size:
4.61 MB
Format:
Adobe Portable Document Format

Collections