Enabling Small Language Models as Efficient and Capable Agents

Srivastava, Gaurav

Enabling Small Language Models as Efficient and Capable Agents

dc.contributor.author	Srivastava, Gaurav	en
dc.contributor.committeechair	Wang, Xuan	en
dc.contributor.committeemember	Thomas, Christopher Lee	en
dc.contributor.committeemember	Ramakrishnan, Narendran	en
dc.contributor.committeemember	Vu, Tu	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2026-05-22T08:00:26Z	en
dc.date.available	2026-05-22T08:00:26Z	en
dc.date.issued	2026-05-21	en
dc.description.abstract	Most agentic systems today are built around large language models (LLMs) accessed through proprietary APIs (for example, GPT, Claude, and Gemini), which raises concerns about cost, latency, and privacy. This thesis argues that small language models (SLMs), typically under 30 billion parameters, can serve as efficient and capable agents when paired with the right system design choices. The work proceeds as a sequence of five studies that together support this case. We begin with ThinkSLM, a study of 72 small models across 17 reasoning tasks, which shows that training methodology and data quality drive reasoning more than parameter count. This motivates Debate, Train, Evolve (DTE), a self-evolution framework that turns multi-agent debate traces into reinforcement learning signals, improving small-model reasoning without ground-truth supervision and matching or surpassing the multi-agent system at single-model inference cost. The limits we observe in DTE prompt a closer look at how models allocate compute, leading to our overthinking analysis (LLMThinkBench), which shows that reasoning-trained models often produce around 18 times more tokens on basic math while achieving lower accuracy. To investigate memorization further, we develop BeyondBench, a contamination-resistant evaluation framework that algorithmically generates problems from combinatorial spaces of more than 10^15 instances across 44 tasks and 117 variations. Evaluating 101 models shows that hard-suite language-only performance remains low for many strong models, such as Gemini-2.5-Pro at 56.21%, while tool-augmented GPT-5 reaches 71.68% on the same suite, suggesting that agentic capabilities are a useful complement to raw scale. We synthesize these insights into EffGen, an open-source agentic framework built from the ground up for SLMs. EffGen contributes prompt optimization that compresses context by 70--80%, complexity-based routing, task decomposition into parallel and sequential subgraphs, a unified three-tier memory system, and the first unified implementation of the MCP, A2A, and ACP protocols. Across 13 benchmarks, EffGen consistently outperforms LangChain, AutoGen, and Smolagents in success rate, latency, and memory use. Together, these results show that with the right system design, small models combined with tools, memory, and intelligent orchestration can perform competitively with much larger models on a meaningful range of tasks. The contribution of this thesis is to identify the regimes in which SLMs are effective, to characterize where they break, and to provide an open framework that lets practitioners deploy them responsibly.	en
dc.description.abstractgeneral	Artificial intelligence systems powered by language models have become highly capable, but the largest and most powerful systems are expensive to run, slow to respond, and raise privacy concerns because they often require sending personal data to outside servers. This thesis asks a simple question: can smaller, more affordable language models reason well enough to be useful, and if not, how can we build systems that make them effective? The work proceeds through five connected studies. We first measured how well 72 small models reason across a wide range of tasks and found that how a model is trained matters more than how large it is. Building on this finding, we developed a method that allows small models to improve their own reasoning by debating with copies of themselves and learning from the result, without requiring human-provided answers. We then observed that some models trained for deeper thinking overthink simple problems, generating much longer answers while in some cases producing incorrect ones. We next constructed a new way to test reasoning that prevents models from succeeding through recall of memorized answers. Across more than a hundred models, this test showed that pure language-based reasoning encounters a ceiling on hard problems, and that the gap is closed not by enlarging models but by providing them with tools such as calculators and code execution. Finally, we incorporated these findings into EffGen, an open-source software framework that enables small language models to function as capable AI assistants by combining them with tools, memory, and intelligent task management. Our work demonstrates that, with appropriate system design, smaller models combined with tools, memory, and intelligent task management can be competitive with much larger models across a useful range of tasks. The contribution of this thesis is to identify where small models perform well, where they fail, and how to build systems around them responsibly.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:46340	en
dc.identifier.uri	https://hdl.handle.net/10919/143129	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Small Language Models	en
dc.subject	Reasoning	en
dc.subject	Benchmarking	en
dc.subject	Agentic Systems	en
dc.subject	Language Model Evaluation	en
dc.subject	EffGen	en
dc.subject	BeyondBench	en
dc.subject	Self-Evolution	en
dc.subject	Multi-Agent Debate	en
dc.subject	Tool Use	en
dc.title	Enabling Small Language Models as Efficient and Capable Agents	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Srivastava_G_T_2026.pdf
Size:: 11.21 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses