Non-linguistic Notions in Language Modeling: Learning, Retention, and Applications

Sharma, Mandar

Non-linguistic Notions in Language Modeling: Learning, Retention, and Applications

dc.contributor.author	Sharma, Mandar	en
dc.contributor.committeechair	Ramakrishnan, Narendran	en
dc.contributor.committeemember	North, Christopher L.	en
dc.contributor.committeemember	Lu, Chang Tien	en
dc.contributor.committeemember	Huang, Lifu	en
dc.contributor.committeemember	Kumar, Srijan	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2024-09-12T08:00:20Z	en
dc.date.available	2024-09-12T08:00:20Z	en
dc.date.issued	2024-09-11	en
dc.description.abstract	Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models (less than 1 Billion parameters) than can run natively on-device. Between the complementary capabilities of qualitative and quantitative reasoning, this thesis focuses on the latter, where the goal is to devise mechanisms to instill quantitative reasoning capabilities into these models. However, instilling this notion is not as straight forward as traditional end-to-end learning. The learning of quantitative notions include the ability of the model to discern between regular linguistic tokens and magnitude/scale-oriented non-linguistic tokens. The learning of these notions, specially after pre-training, comes at a cost for these models: catastrophic forgetting. Thus, learning needs to be followed with retention - making sure these models do not forget what they have learned. Thus, we first motivate the need for numeracy-enhanced models via their potential applications in field of data-to-text generation (D2T), showcasing how these models behave as quantitative reasoners as-is. Then, we devise both token-level training interventions and information-theoretic training interventions to numerically enhance these models, with the latter specifically focused on combating catastrophic forgetting. Our information-theoretic interventions not only lead to numerically-enhanced models but lend us critical insights into the learning behavior of these models, especially when it comes to adapting these models to the target task distribution from their pretraining distribution. Finally, we extrapolate these insights to devise more effective strategies transfer learning and unlearning for language modeling.	en
dc.description.abstractgeneral	Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models than can run natively on-device. This thesis focuses on instilling within these models the ability to perform quantitative reasoning - the ability to differentiate between words and numbers and understand the notions of magnitude tied with said numbers, while retaining their linguistic skills. The learned insights from our experiments are further used to devise models that better adapt to target tasks.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:41290	en
dc.identifier.uri	https://hdl.handle.net/10919/121122	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	Creative Commons Attribution 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en
dc.subject	Multitask Learning	en
dc.subject	Transfer Learning	en
dc.subject	Finetuning	en
dc.subject	Catastrophic Forgetting	en
dc.subject	Fisher	en
dc.subject	Natural Language Processing	en
dc.title	Non-linguistic Notions in Language Modeling: Learning, Retention, and Applications	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sharma_M_D_2024.pdf
Size:: 41.3 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations