VTechWorks staff will be away for the Thanksgiving holiday beginning at noon on Wednesday, November 27, through Friday, November 29. We will resume normal operations on Monday, December 2. Thank you for your patience.
 

Non-linguistic Notions in Language Modeling: Learning, Retention, and Applications

dc.contributor.authorSharma, Mandaren
dc.contributor.committeechairRamakrishnan, Narendranen
dc.contributor.committeememberNorth, Christopher L.en
dc.contributor.committeememberLu, Chang Tienen
dc.contributor.committeememberHuang, Lifuen
dc.contributor.committeememberKumar, Srijanen
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2024-09-12T08:00:20Zen
dc.date.available2024-09-12T08:00:20Zen
dc.date.issued2024-09-11en
dc.description.abstractLanguage modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models (less than 1 Billion parameters) than can run natively on-device. Between the complementary capabilities of qualitative and quantitative reasoning, this thesis focuses on the latter, where the goal is to devise mechanisms to instill quantitative reasoning capabilities into these models. However, instilling this notion is not as straight forward as traditional end-to-end learning. The learning of quantitative notions include the ability of the model to discern between regular linguistic tokens and magnitude/scale-oriented non-linguistic tokens. The learning of these notions, specially after pre-training, comes at a cost for these models: catastrophic forgetting. Thus, learning needs to be followed with retention - making sure these models do not forget what they have learned. Thus, we first motivate the need for numeracy-enhanced models via their potential applications in field of data-to-text generation (D2T), showcasing how these models behave as quantitative reasoners as-is. Then, we devise both token-level training interventions and information-theoretic training interventions to numerically enhance these models, with the latter specifically focused on combating catastrophic forgetting. Our information-theoretic interventions not only lead to numerically-enhanced models but lend us critical insights into the learning behavior of these models, especially when it comes to adapting these models to the target task distribution from their pretraining distribution. Finally, we extrapolate these insights to devise more effective strategies transfer learning and unlearning for language modeling.en
dc.description.abstractgeneralLanguage modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models than can run natively on-device. This thesis focuses on instilling within these models the ability to perform quantitative reasoning - the ability to differentiate between words and numbers and understand the notions of magnitude tied with said numbers, while retaining their linguistic skills. The learned insights from our experiments are further used to devise models that better adapt to target tasks.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:41290en
dc.identifier.urihttps://hdl.handle.net/10919/121122en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectMultitask Learningen
dc.subjectTransfer Learningen
dc.subjectFinetuningen
dc.subjectCatastrophic Forgettingen
dc.subjectFisheren
dc.subjectNatural Language Processingen
dc.titleNon-linguistic Notions in Language Modeling: Learning, Retention, and Applicationsen
dc.typeDissertationen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sharma_M_D_2024.pdf
Size:
41.3 MB
Format:
Adobe Portable Document Format