Enhancing Digital Libraries as Communication Tools: LLMs for Automated Subject Classification of Electronic Theses and Dissertations

Klair, Hajra2026-02-092026-02-092025-10-24https://hdl.handle.net/10919/141204Digital libraries are vital communication platforms that facilitate discoverability, collaboration, and strategic engagement among academics, administrators, funding agencies, and policymakers. Central to their effectiveness is accurate subject classification of Electronic Theses and Dissertations (ETDs), which enables clear information sharing and supports scholarly communication. However, author-supplied categories are frequently inconsistent or incorrect, often requiring manual review and complicating search and reporting. This study examines how Large Language Models (LLMs) can automate ETD subject classification, comparing prompt-based and fine-tuned approaches using over 9,200 records from Virginia Tech. Both methods are evaluated against established machine learning baselines, such as Support Vector Machines and multinomial Naive Bayes. Results indicate LLMs perform competitivSely in applied fields, but show systematic biases in more abstract or interdisciplinary categories—highlighting both their promise and the need for thoughtful communication system design in digital repositories.Pages 243-2453 page(s)application/pdfenCreative Commons Attribution 4.0 InternationalResearch ClassificationElectronic Theses and DissertationsMachine Learning and Artificial IntelligenceEnhancing Digital Libraries as Communication Tools: LLMs for Automated Subject Classification of Electronic Theses and DissertationsConference proceedingSIGDOC '25: Proceedings of the 43rd ACM International Conference on Design of Communicationhttps://doi.org/10.1145/3711670.3764654