Evaluating Function Calling in Language Models through a Conversational Agent
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This study investigates the effectiveness of function calling in Large Language Models (LLMs) for data-driven conversational systems. While LLMs excel in natural language understanding, reliably mapping user queries to structured computational functions remains a key challenge. To address this, the study develops the India Policy Insights (IPI) Chatbot, a GenAI-powered system that enables natural language interaction with complex, spatiotemporal public policy datasets. The chatbot integrates LLMs with backend functions to translate user queries into structured parameters, execute database operations, and generate multi-modal outputs, including text, charts, and maps. Proposed workflow demonstrates how natural language queries are converted into actionable analytical tasks. A systematic evaluation of GPT-4o and GPT-4o-mini is conducted across prompt specificity, query difficulty, and query type. Results show that function-calling accuracy improves significantly with more explicit and structured prompts, with GPT-4o achieving the highest performance. Spatial and constraint-based queries yield consistently high accuracy, while complex multi-indicator tasks remain challenging, particularly for smaller models. Overall, this study highlights the importance of prompt design, parameter clarity, and model selection in optimizing function-calling performance. It also demonstrates the potential of conversational AI to improve accessibility to policy-relevant data, contributing to more inclusive and data-informed decision-making.