VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

CS5604: Team 1 ETD Collection Management

dc.contributor.authorJain, Tanyaen
dc.contributor.authorBhagat, Hirvaen
dc.contributor.authorLee, Wen-Yuen
dc.contributor.authorThukkaraju, Ashrith Reddyen
dc.contributor.authorSethi, Raghaven
dc.date.accessioned2023-03-10T18:01:52Zen
dc.date.available2023-03-10T18:01:52Zen
dc.date.issued2023-01-13en
dc.description.abstractAcademic institutions the world over are known to produce hundreds of thousands of ETDs (Electronic Theses and Dissertations) every year. At the end of an academic year, we are left with large volumes of ETD data that are rarely used for further research or ever cited in future work, writings, or publications. As part of the CS5604: Information Storage and Retrieval graduate-level course at Virginia Polytechnic Institute and State University (Virginia Tech), we collectively created a search engine for a collection of more than 500,000 ETDs from academic institutions in the United States, which constitutes the class-wide project. This system enables users to ingest, pre-process, and store ETDs in a repository; apply deep learning models to perform topic modeling, text segmentation, chapter summarization, and classification, backed by a DevOps, user experience and integrations team. We are Team 1 or the “ETD Collection Management” team. During the course of the Fall 2022 semester at Virginia Tech, we were responsible for setting up the repository of ETDs, which encompasses broadly the following three components: (1) setting up a database, (2) storing digital objects in a file system, and (3) creating a knowledge graph. Our work enabled other teams to efficiently retrieve the stored ETD data, and perform appropriate pre-processing operations, and during the final few months of the semester, to apply the aforementioned deep learning models to the ETD collection we created. The key deliverable for Team 1 was to create an interactive user interface to perform CRUD operations (create, retrieve, update, and delete) in order to interact with the repository of ETDs, which is essentially an extrapolation of the work already taken up at Virginia Tech’s Digital Library Research Laboratory. Owing to the fact that the other teams had no direct access to the repository set up by us, we designed a host of Application Programming Interfaces (APIs) which are elaborated in depth in the subsequent sections of the report. The end goal for Team 1 was to be able to set up an accessible repository of ETDs so that they can be used for further research work. This is taking into account how each ETD is a well-curated resource and how it may even prove to be an excellent asset for an in-depth analysis on a certain topic, not limited to academic or research purposes.en
dc.description.notesThe following files are being uploaded as part of this submission: 1. Team1CollMgmntCode.zip - Code Repository Zip File 2. Team1CollMgmntReport.zip - Final Report Zip File from Overleaf 3. Team1CollMgmntReport.pdf - Final Report PDF 4. Team1CollMgmntPresentation.pdf - Final Presentation PDF 5. Team1CollMgmntPresentation.pptx - Final Presentation PPT Fileen
dc.identifier.urihttp://hdl.handle.net/10919/114079en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCC0 1.0 Universalen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectElectronic Theses and Dissertations (ETD)en
dc.subjectInformation Storageen
dc.subjectInformation Retrievalen
dc.subjectCurationen
dc.subjectFile Systemen
dc.subjectDatabase Schemaen
dc.titleCS5604: Team 1 ETD Collection Managementen
dc.typePresentationen
dc.typeReporten
dc.typeOtheren

Files

Original bundle
Now showing 1 - 5 of 5
Name:
Team1CollMgmntCode.zip
Size:
725.57 MB
Format:
Name:
Team1CollMgmntReport.zip
Size:
4.28 MB
Format:
Loading...
Thumbnail Image
Name:
Team1CollMgmntReport.pdf
Size:
2.12 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Team1CollMgmntPresentation.pdf
Size:
738.33 KB
Format:
Adobe Portable Document Format
Name:
Team1CollMgmntPresentation.pptx
Size:
1017.95 KB
Format:
Microsoft Powerpoint XML
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: