Enhancing Digital Libraries as Communication Tools: LLMs for Automated Subject Classification of Electronic Theses and Dissertations
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Digital libraries are vital communication platforms that facilitate discoverability, collaboration, and strategic engagement among academics, administrators, funding agencies, and policymakers. Central to their effectiveness is accurate subject classification of Electronic Theses and Dissertations (ETDs), which enables clear information sharing and supports scholarly communication. However, author-supplied categories are frequently inconsistent or incorrect, often requiring manual review and complicating search and reporting. This study examines how Large Language Models (LLMs) can automate ETD subject classification, comparing prompt-based and fine-tuned approaches using over 9,200 records from Virginia Tech. Both methods are evaluated against established machine learning baselines, such as Support Vector Machines and multinomial Naive Bayes. Results indicate LLMs perform competitivSely in applied fields, but show systematic biases in more abstract or interdisciplinary categories—highlighting both their promise and the need for thoughtful communication system design in digital repositories.