From Prompts to Properties: Rethinking LLM Code Generation with Property-Based Testing

Bose, Dibyendu Brinto

From Prompts to Properties: Rethinking LLM Code Generation with Property-Based Testing

Files

Published version (476.44 KB)

Downloads: 151

Date

2025-06-23

Authors

Bose, Dibyendu Brinto

Publisher

ACM

Abstract

Large Language Models (LLMs) have shown promise in automated code generation, but ensuring correctness remains a significant challenge. Traditional unit testing evaluates functional correctness but often fails to capture deeper logical constraints. We apply Property-Based Testing (PBT) as an alternative evaluation strategy to StarCoder and CodeLlama on MBPP and HumanEval. Our results reveal that while pass@k evaluation shows moderate success, PBT exposes additional correctness gaps. A significant portion of generated solutions only partially adhere to correctness properties (30–32%), while 18–23% fail outright. Property extraction is also imperfect, with 9–13% of constraints missing. These findings highlight that unit test-based evaluations may overestimate solution correctness by not capturing fundamental logical errors. Our study demonstrates that combining unit testing with PBT can offer a more comprehensive assessment of generated code correctness, revealing limitations that traditional verification approaches miss.

Persistent link

https://hdl.handle.net/10919/137468

Collections

Journal Articles, Association for Computing Machinery (ACM)
Scholarly Works, Computer Science

Full item page

From Prompts to Properties: Rethinking LLM Code Generation with Property-Based Testing

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections