Demystifying regular expression bugs: A comprehensive study on regular expression bug causes, fixes, and testing

dc.contributor.authorWang, Peipeien
dc.contributor.authorBrown, Chrisen
dc.contributor.authorJennings, Jamie A.en
dc.contributor.authorStolee, Kathryn T.en
dc.date.accessioned2022-06-13T19:18:22Zen
dc.date.available2022-06-13T19:18:22Zen
dc.date.issued2021-11-05en
dc.date.updated2022-06-13T16:36:02Zen
dc.description.abstractRegular expressions cause string-related bugs and open security vulnerabilities for DOS attacks. However, beyond ReDoS (Regular expression Denial of Service), little is known about the extent to which regular expression issues affect software development and how these issues are addressed in practice. We conduct an empirical study of 356 regex-related bugs from merged pull requests in Apache, Mozilla, Facebook, and Google GitHub repositories. We identify and classify the nature of the regular expression problems, the fixes, and the related changes in the test code. The most important findings in this paper are as follows: 1) incorrect regular expression semantics is the dominant root cause of regular expression bugs (165/356, 46.3%). The remaining root causes are incorrect API usage (9.3%) and other code issues that require regular expression changes in the fix (29.5%), 2) fixing regular expression bugs is nontrivial as it takes more time and more lines of code to fix them compared to the general pull requests, 3) most (51%) of the regex-related pull requests do not contain test code changes. Certain regex bug types (e.g., compile error, performance issues, regex representation) are less likely to include test code changes than others, and 4) the dominant type of test code changes in regex-related pull requests is test case addition (75%). The results of this study contribute to a broader understanding of the practical problems faced by developers when using, fixing, and testing regular expressions.en
dc.description.versionAccepted versionen
dc.format.extent35 page(s)en
dc.format.mimetypeapplication/pdfen
dc.identifierARTN 21 (Article number)en
dc.identifier.doihttps://doi.org/10.1007/s10664-021-10033-1en
dc.identifier.eissn1573-7616en
dc.identifier.issn1382-3256en
dc.identifier.issue1en
dc.identifier.orcidBrown, Dwayne [0000-0002-6036-4733]en
dc.identifier.urihttp://hdl.handle.net/10919/110759en
dc.identifier.volume27en
dc.language.isoenen
dc.publisherSpringeren
dc.relation.urihttp://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000714913900001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=930d57c9ac61a043676db62af60056c1en
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectBug fixesen
dc.subjectPull requestsen
dc.subjectRegular expression bug characteristicsen
dc.subjectTest codeen
dc.titleDemystifying regular expression bugs: A comprehensive study on regular expression bug causes, fixes, and testingen
dc.title.serialEmpirical Software Engineeringen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherArticleen
dcterms.dateAccepted2021-04-27en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Computer Scienceen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
regex2.pdf
Size:
1.94 MB
Format:
Adobe Portable Document Format
Description:
Accepted version