Are Bigger LLMs Always Better? A Study of Open and Closed-Source Models in Code Generation and Translation
dc.contributor.author | Shiung, Tian-Yu | en |
dc.contributor.committeechair | Brown, Dwayne Christian | en |
dc.contributor.committeemember | Tilevich, Eli | en |
dc.contributor.committeemember | Seyam, Mohammed Saad Mohamed Elmahdy | en |
dc.contributor.department | Computer Science and#38; Applications | en |
dc.date.accessioned | 2025-04-30T08:00:17Z | en |
dc.date.available | 2025-04-30T08:00:17Z | en |
dc.date.issued | 2025-04-29 | en |
dc.description.abstract | As Large Language Models (LLMs) advance, their roles in both code generation and translation are gaining increasing attention in software engineering. Evaluating their effectiveness across different programming languages remains a critical challenge. This paper presents the results of a study that evaluates LLMs in generating and translating code snippets across Java, Go, and Python, with a focus on accuracy, efficiency, and quality. We conduct a comparative analysis of both open-source and closed-source LLMs, including GPT-3.5, Google Gemini, Gemma 2, and Llama-3.1, using a curated dataset of LeetCode solutions. Problems were selected across three difficulty levels (easy, medium, and hard), with solutions randomly sourced from Github and verified on the LeetCode platform. Our investigation assesses the feasibility and cost-effectiveness of code translation tasks, particularly under resource constraints, and examines different methodologies suitable for such conditions. Our findings indicate that both open-source and closed-source LLMs exhibit hallucinations in solving LeetCode problems and translating code. However, some closed-source LLMs produce more useless explanations, particularly in generating non-existent programming constructs. We identify instances in which LLMs fail to translate code correctly and across which languages, uncovering novel insights. Notably, smaller, open-source models demonstrate unexpected commendable performance for some LeetCode problems. Although LLMs show great promise for modernizing legacy codebases, our results suggest that these models in their current form may lack the necessary accuracy and speed for real-world applications. | en |
dc.description.abstractgeneral | With the advancement of software development, enterprises and developers increasingly leverage advanced tools to enhance efficiency. Among these, large language models (LLMs) such as ChatGPT have gained significant attention. As LLMs grow in size and capabilities, more developers and enterprises incorporate them into their workflows for various tasks, including code generation, which helps accelerate the development process and improve efficiency. However, many enterprises and novice programmers hold misconceptions about the coding capabilities of closed-source LLMs. Closed-source models (e.g., GPT-3.5, Gemini) are proprietary, while open-source models (e.g., Llama-3.1, Gemma) provide transparency and flexibility. Many assume that closed-source LLMs and larger models inherently outperform their open-source and smaller counterparts in code generation and translation. These assumptions may influence organizations to invest in expensive models without fully evaluating their real-world performance. In this research, we systematically evaluate LLMs in generating and translating code across Java, Go, and Python. Our comparative analysis examines both open-source and closed-source models, considering their architectures, parameter sizes, and accessibility. By using LeetCode, a well-known platform for technical assessments, we assess code generation and translation. Our findings reveal that while large closed-source models often achieve higher accuracy, some smaller open-source models perform comparably with lower computational costs. These insights help developers and enterprises choose LLMs wisely, balancing accuracy, cost, and efficiency. | en |
dc.description.degree | Master of Science | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:42884 | en |
dc.identifier.uri | https://hdl.handle.net/10919/127260 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Efficient Large Language Model | en |
dc.subject | Code Generation | en |
dc.subject | Code Translation | en |
dc.title | Are Bigger LLMs Always Better? A Study of Open and Closed-Source Models in Code Generation and Translation | en |
dc.type | Thesis | en |
thesis.degree.discipline | Computer Science & Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | masters | en |
thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1