Exploring the Process and Challenges of Programming with Regular Expressions
Michael, Louis Guy IV
MetadataShow full item record
Regular expressions (regexes) are a powerful mechanism for solving string-matching problems and are supported by all modern programming languages. While regular expressions are highly expressive, they are also often perceived to be highly complex and hard to read. While existing studies have focused on improving the readability of regular expressions, little is known about any other difficulties that developers face when programming with regular expressions. In this paper, we aim to provide a deeper understanding of the process of programming regular expressions by studying: (1) how developers make decisions through the process, (2) what difficulties they face, and (3) how aware they are about serious risks involved in programming regexes. We surveyed 158 professional developers from a diversity of backgrounds, and we conducted a series of interviews to learn more details about the difficulties and solutions that participants face in this process. This mixed methods approach revealed that some of the difficulties of regexes come in the shape of: inability to effectively search for them; fully validate them; and document them. Developers also reported cascading impacts of poor readability, lack of universal portability, and struggling with overall problem comprehension. The majority of our studied developers were unaware of critical security risks that can occur when using regexes, and those that were aware of potential problems felt that they lacked the ability to identify problematic regexes. Our findings provide multiple implications for future work, including development of semantic regex search engines for regex reuse, and improved input generators for regex validation.
General Audience Abstract
Regular expressions (regexes) are a method to search for a set of matching text. They are easily understood as a way to flexibly search beyond exact matching and are frequently seen in the capacity of the find functionality of ctrl-f. Regexes are also very common in source code for a range of tasks including form validation, where a program needs to confirm that a user provided information that conformed to a specific structure, such as an email address. Despite being a widely supported programming feature, little is known about how developers go about creating regexes or what they struggle with when doing so. To gain a better understanding of how regexes are created and reused, we surveyed 158 professional developers from a diversity of backgrounds and experience levels about their processes and perceptions about regexes. As a followup to the survey we conducted a series of interviews focusing on the challenges faced by developers when tackling problems for which they felt that a regex was worth using. Through the combination of these studies, we developed a detailed understanding of how professional developers create regexes as well as many of the struggles that they face when doing so. These challenges come in the form of the inability to effectively search for, fully validate, and document regexes, as well as the cascading impacts of poor readability, lack of universal portability, and overall problem comprehension.
- Masters Theses