FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

Haque, Md Mahim Anjum

FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

Files

Haque_MA_T_2023.pdf (505.88 KB)

Downloads: 640

Date

2023-11-14

Authors

Haque, Md Mahim Anjum

Publisher

Virginia Tech

Abstract

In a software life-cycle Source code repositories serve as vast storage areas for program code, ensuring its maintenance and version control throughout the development process. It is not uncommon for these repositories to house programs with hidden errors, which only manifest under specific input conditions, causing the program to deviate from its intended functionality. The growing intricacy of software design has amplified the time and resources required to pinpoint and rectify these issues. These errors, often unintended by developers, can be challenging to identify and correct. While there are techniques to auto-correct faulty code, the expansive realm of potential solutions for a single bug means there's a scarcity of tools and datasets for effective evaluation of the corrected code. This study presents FIXEVAL, a benchmark that includes flawed code entries from competitive coding challenges and their corresponding corrections. FIXEVAL offers an extensive test suite that not only gauges the accuracy of fixes generated by models but also allows for the assessment of a program's functional correctness. This suite further sheds light on time, memory limits, and acceptance based on specific outcomes. We utilize cutting-edge language models, trained on coding languages, as our reference point and juxtapose them using match-based (essentially token similarity) and execution-based (focusing on functional assessment) criteria. Our research indicates that while match-based criteria might not truly represent the functional precision of fixes generated by models, execution-based approaches offer a comprehensive evaluation tailored to the solution. Consequently, we posit that FIXEVAL paves the way for practical automated error correction and assessment of code generated by models. Dataset and models for all of our experiments are made publicly available at https://github.com/mahimanzum/FixEval.

Keywords

Automated Software Engineering

Persistent link

http://hdl.handle.net/10919/116666

Collections

Masters Theses

Full item page

FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections