Improving LLM Reasoning and Retrieval for Structured and Complex Information Spaces

dc.contributor.authorYousuf, Raquib Binen
dc.contributor.committeechairRamakrishnan, Narendranen
dc.contributor.committeememberWang, Xuanen
dc.contributor.committeememberMuthiah, Sathappanen
dc.contributor.committeememberLu, Chang Tienen
dc.contributor.committeememberNorth, Christopher L.en
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2026-05-21T08:00:30Zen
dc.date.available2026-05-21T08:00:30Zen
dc.date.issued2026-05-20en
dc.description.abstractLarge Language Models (LLMs) excel at fluent language generation but face critical challenges in high-stakes domains that require reasoning over long contexts, structured information use, grounded retrieval, and human-verifiable outputs. This dissertation explores how to improve LLM performance on complex, context-rich tasks through four contributions. First, we introduce memory-augmented architectures for multi-document reasoning, highlighting gaps between summarization and true inference. Second, we benchmark relational reasoning by reconstructing latent graphs from long texts, revealing a limitation we term "memory drift." Third, we show that incorporating structured metadata as a first-class signal in retrieval-augmented generation (RAG) systems improves retrieval consistency in large, repetitive corpora by better disambiguating context. Finally, we present a human-in-the-loop system for structured data analysis that enables transparent, code-centric interaction and supports iterative sensemaking over complex datasets. Together, these efforts advance LLM capabilities in analytical synthesis, structured retrieval, long-context evaluation, and explainability, offering practical tools for building more trustworthy and effective AI systems in real-world applications.en
dc.description.abstractgeneralLarge Language Models (LLMs) can generate fluent text, but they often struggle with real-world analytical tasks that require following information over long contexts, using structured details, retrieving the right evidence, and allowing users to verify results. This dissertation explores how to make LLMs more reliable for such tasks. We first develop methods to help LLMs organize and connect information across multiple documents. We then show that LLMs have difficulty retaining and using relationships over long inputs, introducing a new way to measure this limitation, called "memory drift." Next, we improve how LLM systems retrieve relevant information by incorporating structured details that help distinguish similar documents. Finally, we present an interactive system that allows users to guide and refine structured data analysis, making the process more transparent, inspectable, and reliable. Together, these contributions show that improving LLMs requires not only better models, but also better ways to structure information, retrieve relevant context, and involve users in the analysis process.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:46600en
dc.identifier.urihttps://hdl.handle.net/10919/143122en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectLarge Language Modelsen
dc.subjectLong-Context Reasoningen
dc.subjectRetrieval-Augmented Generationen
dc.subjectMetadata-Aware Retrievalen
dc.subjectHuman-in-the-Loop Systemsen
dc.titleImproving LLM Reasoning and Retrieval for Structured and Complex Information Spacesen
dc.typeDissertationen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Yousuf_R_D_2026.pdf
Size:
29.15 MB
Format:
Adobe Portable Document Format