On the Impact and Defeat of Regular Expression Denial of Service

TR Number

Date

2020-05-28

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Regular expressions (regexes) are a widely-used yet little-studied software component. Engineers use regexes to match domain-specific languages of strings. Unfortunately, many regex engine implementations perform these matches with worst-case polynomial or exponential time complexity in the length of the string. Because they are commonly used in user-facing contexts, super-linear regexes are a potential denial of service vector known as Regular expression Denial of Service (ReDoS). Part I gives the necessary background to understand this problem.

In Part II of this dissertation, I present the first large-scale empirical studies of super-linear regex use. Guided by case studies of ReDoS issues in practice (Chapter 3), I report that the risk of ReDoS affects up to 10% of the regexes used in practice (Chapter 4), and that these findings generalize to software written in eight popular programming languages (Chapter 5). ReDoS appears to be a widespread vulnerability, motivating the consideration of defenses.

In Part III I present the first systematic comparison of ReDoS defenses. Based on the necessary conditions for ReDoS, a ReDoS defense can be erected at the application level, the regex engine level, or the framework/runtime level. In my experiments I report that application-level defenses are difficult and error prone to implement (Chapter 6), that finding a compatible higher-performing regex engine is unlikely (Chapter 7), that optimizing an existing regex engine using memoization incurs (perhaps acceptable) space overheads (Chapter 8), and that incorporating resource caps into the framework or runtime is feasible but faces barriers to adoption (Chapter 9).

In Part IV of this dissertation, we reflect on our findings. By leveraging empirical software engineering techniques, we have exposed the scope of potential ReDoS vulnerabilities, and given strong motivation for a solution. To assist practitioners, we have conducted a systematic evaluation of the solution space. We hope that our findings assist in the elimination of ReDoS, and more generally that we have provided a case study in the value of data-driven software engineering.

Description

Keywords

Regular expressions, denial of service, ReDoS, empirical software engineering, software security

Citation