Enhancing Digital Libraries as Communication Tools: LLMs for Automated Subject Classification of Electronic Theses and Dissertations

Loading...
Thumbnail Image

TR Number

Date

2025-10-24

Journal Title

Journal ISSN

Volume Title

Publisher

ACM

Abstract

Digital libraries are vital communication platforms that facilitate discoverability, collaboration, and strategic engagement among academics, administrators, funding agencies, and policymakers. Central to their effectiveness is accurate subject classification of Electronic Theses and Dissertations (ETDs), which enables clear information sharing and supports scholarly communication. However, author-supplied categories are frequently inconsistent or incorrect, often requiring manual review and complicating search and reporting. This study examines how Large Language Models (LLMs) can automate ETD subject classification, comparing prompt-based and fine-tuned approaches using over 9,200 records from Virginia Tech. Both methods are evaluated against established machine learning baselines, such as Support Vector Machines and multinomial Naive Bayes. Results indicate LLMs perform competitivSely in applied fields, but show systematic biases in more abstract or interdisciplinary categories—highlighting both their promise and the need for thoughtful communication system design in digital repositories.

Description

Keywords

Research Classification, Electronic Theses and Dissertations, Machine Learning and Artificial Intelligence

Citation