Evaluating Human-LLM Alignment in ETD Subject Classification

dc.contributor.authorKlair, Hajraen
dc.contributor.authorGerman, Faustoen
dc.contributor.authorBanerjee, Bipashaen
dc.contributor.authorIngram, William A.en
dc.date.accessioned2026-02-09T15:32:15Zen
dc.date.available2026-02-09T15:32:15Zen
dc.date.issued2025-09-27en
dc.description.abstractAuthor-assigned subject labels in Electronic Theses and Dissertations (ETDs) are often inconsistent, overly broad, or misaligned with the research focus. This hampers discovery, aggregation, and analysis, especially for interdisciplinary research. LLMs offer a scalable alternative for automated classification, but their labeling rationale is opaque and introduces systematic biases. This study compares subject labels generated by LLMs with human-assigned labels for over 9,000 ETDs across 21 academic categories to assess the disagreement. We evaluate multiple prompt-based and fine-tuned LLM configurations and analyze areas of agreement and disagreement to identify patterns of misclassification. LLMs achieve competitive performance overall but frequently misclassify theoretical or interdisciplinary texts, often due to overweighting lexical cues and disregarding context. We show such errors are not random but reflect structured semantic divergences from human interpretation. These findings suggest a need for hybrid frameworks that combine LLM scalability with human contextual judgment to improve subject labeling in academic repositories.en
dc.description.versionAccepted versionen
dc.format.extentPages 57-69en
dc.format.extent13 page(s)en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1007/978-3-032-06136-2_6en
dc.identifier.eissn1865-0937en
dc.identifier.isbn978-3-032-06135-5en
dc.identifier.issn1865-0929en
dc.identifier.orcidIngram, William [0000-0002-8307-8844]en
dc.identifier.orcidBanerjee, Bipasha [0000-0003-4472-1902]en
dc.identifier.urihttps://hdl.handle.net/10919/141203en
dc.identifier.volume2694en
dc.language.isoenen
dc.publisherSpringeren
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectClassificationen
dc.subjectLarge Language Modelsen
dc.titleEvaluating Human-LLM Alignment in ETD Subject Classificationen
dc.title.serialNew Trends in Theory and Practice of Digital Libraries, TPDL 2025en
dc.typeConference proceedingen
dc.type.dcmitypeTexten
dc.type.otherProceedings Paperen
dc.type.otherBook in seriesen
pubs.finish-date2025-09-26en
pubs.organisational-groupVirginia Techen
pubs.organisational-groupVirginia Tech/Engineeringen
pubs.organisational-groupVirginia Tech/Engineering/Computer Scienceen
pubs.organisational-groupVirginia Tech/Libraryen
pubs.organisational-groupVirginia Tech/All T&R Facultyen
pubs.organisational-groupVirginia Tech/Library/Library assessment administratorsen
pubs.organisational-groupVirginia Tech/Library/Dean's officeen
pubs.organisational-groupVirginia Tech/Library/Information Technologyen
pubs.organisational-groupVirginia Tech/Graduate studentsen
pubs.organisational-groupVirginia Tech/Graduate students/Doctoral studentsen
pubs.start-date2025-09-23en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TPDL_Short_Paper_Camera_Ready_HajraKlair.pdf
Size:
350.81 KB
Format:
Adobe Portable Document Format
Description:
Accepted version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Plain Text
Description: