Building A Large Collection of Multi-domain Electronic Theses and Dissertations
dc.contributor.author | Uddin, Sami | en |
dc.contributor.author | Banerjee, Bipasha | en |
dc.contributor.author | Wu, Jian | en |
dc.contributor.author | Ingram, William A. | en |
dc.contributor.author | Fox, Edward A. | en |
dc.date.accessioned | 2024-01-22T13:03:43Z | en |
dc.date.available | 2024-01-22T13:03:43Z | en |
dc.date.issued | 2021-12-15 | en |
dc.description.abstract | In this work, we report our progress on building a collection containing over 450k Electronic Theses and Dissertations (ETDs), including full-text and metadata. Our goal is to close the gap of accessibility between long text and short text documents, and to create a new research opportunity for the scholarly community. For that, we developed an ETD Ingestion Framework (EIF) that automatically harvests metadata and PDFs of ETDs from university libraries. We faced multiple challenges and learned many lessons during the process, that led to proposed solutions to overcome/mitigate the limitations of the current data. We also described the data that we have collected. We hope our methods will be useful for building similar collections from university libraries and that the data can be used for research and education. | en |
dc.description.notes | Yes, full paper (Peer reviewed?) | en |
dc.description.version | Published version | en |
dc.format.extent | Pages 6043-6045 | en |
dc.format.extent | 3 page(s) | en |
dc.format.mimetype | application/pdf | en |
dc.identifier.doi | https://doi.org/10.1109/bigdata52589.2021.9672058 | en |
dc.identifier.isbn | 9781665439022 | en |
dc.identifier.issn | 2639-1589 | en |
dc.identifier.orcid | Fox, Edward [0000-0003-1447-6870] | en |
dc.identifier.orcid | Ingram, William [0000-0002-8307-8844] | en |
dc.identifier.uri | https://hdl.handle.net/10919/117428 | en |
dc.language.iso | en | en |
dc.publisher | IEEE | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | ETD | en |
dc.subject | OAI-PMH | en |
dc.subject | Big data | en |
dc.title | Building A Large Collection of Multi-domain Electronic Theses and Dissertations | en |
dc.title.serial | 2021 IEEE International Conference on Big Data (Big Data) | en |
dc.type | Conference proceeding | en |
dc.type.dcmitype | Text | en |
dc.type.other | Proceedings Paper | en |
dc.type.other | Book in series | en |
pubs.finish-date | 2021-12-18 | en |
pubs.organisational-group | /Virginia Tech | en |
pubs.organisational-group | /Virginia Tech/Engineering | en |
pubs.organisational-group | /Virginia Tech/Engineering/Computer Science | en |
pubs.organisational-group | /Virginia Tech/Library | en |
pubs.organisational-group | /Virginia Tech/All T&R Faculty | en |
pubs.organisational-group | /Virginia Tech/Engineering/COE T&R Faculty | en |
pubs.organisational-group | /Virginia Tech/Library/Library assessment administrators | en |
pubs.organisational-group | /Virginia Tech/Library/Dean's office | en |
pubs.organisational-group | /Virginia Tech/Library/Information Technology | en |
pubs.organisational-group | /Virginia Tech/Graduate students | en |
pubs.organisational-group | /Virginia Tech/Graduate students/Doctoral students | en |
pubs.start-date | 2021-12-15 | en |