VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

Building A Large Collection of Multi-domain Electronic Theses and Dissertations

dc.contributor.authorUddin, Samien
dc.contributor.authorBanerjee, Bipashaen
dc.contributor.authorWu, Jianen
dc.contributor.authorIngram, William A.en
dc.contributor.authorFox, Edward A.en
dc.date.accessioned2024-01-22T13:03:43Zen
dc.date.available2024-01-22T13:03:43Zen
dc.date.issued2021-12-15en
dc.description.abstractIn this work, we report our progress on building a collection containing over 450k Electronic Theses and Dissertations (ETDs), including full-text and metadata. Our goal is to close the gap of accessibility between long text and short text documents, and to create a new research opportunity for the scholarly community. For that, we developed an ETD Ingestion Framework (EIF) that automatically harvests metadata and PDFs of ETDs from university libraries. We faced multiple challenges and learned many lessons during the process, that led to proposed solutions to overcome/mitigate the limitations of the current data. We also described the data that we have collected. We hope our methods will be useful for building similar collections from university libraries and that the data can be used for research and education.en
dc.description.notesYes, full paper (Peer reviewed?)en
dc.description.versionPublished versionen
dc.format.extentPages 6043-6045en
dc.format.extent3 page(s)en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1109/bigdata52589.2021.9672058en
dc.identifier.isbn9781665439022en
dc.identifier.issn2639-1589en
dc.identifier.orcidFox, Edward [0000-0003-1447-6870]en
dc.identifier.orcidIngram, William [0000-0002-8307-8844]en
dc.identifier.urihttps://hdl.handle.net/10919/117428en
dc.language.isoenen
dc.publisherIEEEen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectETDen
dc.subjectOAI-PMHen
dc.subjectBig dataen
dc.titleBuilding A Large Collection of Multi-domain Electronic Theses and Dissertationsen
dc.title.serial2021 IEEE International Conference on Big Data (Big Data)en
dc.typeConference proceedingen
dc.type.dcmitypeTexten
dc.type.otherProceedings Paperen
dc.type.otherBook in seriesen
pubs.finish-date2021-12-18en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Computer Scienceen
pubs.organisational-group/Virginia Tech/Libraryen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen
pubs.organisational-group/Virginia Tech/Library/Library assessment administratorsen
pubs.organisational-group/Virginia Tech/Library/Dean's officeen
pubs.organisational-group/Virginia Tech/Library/Information Technologyen
pubs.organisational-group/Virginia Tech/Graduate studentsen
pubs.organisational-group/Virginia Tech/Graduate students/Doctoral studentsen
pubs.start-date2021-12-15en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2021IEEE_BigDataETDcrawling.pdf
Size:
558.81 KB
Format:
Adobe Portable Document Format
Description:
Accepted version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Plain Text
Description: