FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

Haque, Md Mahim Anjum

FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

dc.contributor.author	Haque, Md Mahim Anjum	en
dc.contributor.committeechair	Brown, Dwayne Christian	en
dc.contributor.committeemember	Lourentzou, Ismini	en
dc.contributor.committeemember	Tilevich, Eli	en
dc.contributor.department	Computer Science and Applications	en
dc.date.accessioned	2023-11-15T09:00:27Z	en
dc.date.available	2023-11-15T09:00:27Z	en
dc.date.issued	2023-11-14	en
dc.description.abstract	In a software life-cycle Source code repositories serve as vast storage areas for program code, ensuring its maintenance and version control throughout the development process. It is not uncommon for these repositories to house programs with hidden errors, which only manifest under specific input conditions, causing the program to deviate from its intended functionality. The growing intricacy of software design has amplified the time and resources required to pinpoint and rectify these issues. These errors, often unintended by developers, can be challenging to identify and correct. While there are techniques to auto-correct faulty code, the expansive realm of potential solutions for a single bug means there's a scarcity of tools and datasets for effective evaluation of the corrected code. This study presents FIXEVAL, a benchmark that includes flawed code entries from competitive coding challenges and their corresponding corrections. FIXEVAL offers an extensive test suite that not only gauges the accuracy of fixes generated by models but also allows for the assessment of a program's functional correctness. This suite further sheds light on time, memory limits, and acceptance based on specific outcomes. We utilize cutting-edge language models, trained on coding languages, as our reference point and juxtapose them using match-based (essentially token similarity) and execution-based (focusing on functional assessment) criteria. Our research indicates that while match-based criteria might not truly represent the functional precision of fixes generated by models, execution-based approaches offer a comprehensive evaluation tailored to the solution. Consequently, we posit that FIXEVAL paves the way for practical automated error correction and assessment of code generated by models. Dataset and models for all of our experiments are made publicly available at https://github.com/mahimanzum/FixEval.	en
dc.description.abstractgeneral	Think of source code repositories as big digital libraries where computer programs are kept safe and updated. Sometimes, these programs have hidden mistakes that only show up under certain conditions, making the program act differently than planned which we call bugs or errors. As software gets more complex, it takes more time and effort to find and fix these mistakes. Even though there are ways to automatically fix these errors, finding the best solution can be like looking for a needle in a haystack. That's why there aren't many tools to check if the automatic fixes are right. Enter FIXEVAL: our new tool that tests and compares faulty computer code from coding competitions and their fixes. It has a set of tests to see how well the fixed code works and gives insights into its performance and results. We used the latest computer language tools to see how well they fix code, comparing them in two ways: by looking at the code's structure and by testing its function. Our findings? Just looking at the code's structure isn't enough; we need to test how it works in action. We believe FIXEVAL is a big step forward in making sure automatic code fixes are spot-on. Dataset and models for all of our experiments are made publicly available at https://github.com/mahimanzum/FixEval.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:38773	en
dc.identifier.uri	http://hdl.handle.net/10919/116666	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Automated Software Engineering	en
dc.title	FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Haque_MA_T_2023.pdf
Size:: 505.88 KB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses