Are Bigger LLMs Always Better? A Study of Open and Closed-Source Models in Code Generation and Translation

TR Number

Date

2025-04-29

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

As Large Language Models (LLMs) advance, their roles in both code generation and translation are gaining increasing attention in software engineering. Evaluating their effectiveness across different programming languages remains a critical challenge. This paper presents the results of a study that evaluates LLMs in generating and translating code snippets across Java, Go, and Python, with a focus on accuracy, efficiency, and quality. We conduct a comparative analysis of both open-source and closed-source LLMs, including GPT-3.5, Google Gemini, Gemma 2, and Llama-3.1, using a curated dataset of LeetCode solutions. Problems were selected across three difficulty levels (easy, medium, and hard), with solutions randomly sourced from Github and verified on the LeetCode platform. Our investigation assesses the feasibility and cost-effectiveness of code translation tasks, particularly under resource constraints, and examines different methodologies suitable for such conditions. Our findings indicate that both open-source and closed-source LLMs exhibit hallucinations in solving LeetCode problems and translating code. However, some closed-source LLMs produce more useless explanations, particularly in generating non-existent programming constructs. We identify instances in which LLMs fail to translate code correctly and across which languages, uncovering novel insights. Notably, smaller, open-source models demonstrate unexpected commendable performance for some LeetCode problems. Although LLMs show great promise for modernizing legacy codebases, our results suggest that these models in their current form may lack the necessary accuracy and speed for real-world applications.

Description

Keywords

Efficient Large Language Model, Code Generation, Code Translation

Citation

Collections