Member-only story
ReDoS — Denial of Service by RegEx 😈

Regular expressions (RegEx) are a formal language to define simple patterns. It is commonly used to find interesting parts within a larger body of text or to validate data. It is typically fast and sometimes a clean solution. However, in some cases, an attacker can craft input to a vulnerable regular expression which causes the time to check the input for the regex to be way longer than expected.
Why you should care
I’ve only found one example of a successful ReDoS “attack”. In 2016, 34 minutes of outage of StackOverflow was caused by ReDoS (source).
Wikipedia and OWASP don’t mention a single successful attack. I guess the reason for that is that RegEx is not used that often on the server-side 🤷♂️ There are a lot of parsing tools for Python, but I only vaguely remember using pyparsing once.
In many cases when you might want to use a regex, doing it completely correctly is just overkill. For example, email regexes. Instead of jumping in that rabbit hole, you can just do some super basic checks, e.g. if the length is at least 3 and if there is exactly one @
symbol in there. Then you just send a confirmation email which verifies that the user actually has access to this email address.
For URLs, you can typically just ping them. For HTML, you want to use an HTML parser. For many other cases, using split
goes a long way.
What I want to say: It might be that you don’t need to worry as you might not use RegEx in most of your code. It’s interesting nevertheless 😁
An Introduction to PCRE
Perl Compatible Regular Expressions (PCRE) are the most common style of regular expressions. They are a way to specify types of strings. They are way more specific than just wildcard pattern matching.
For example, the following pattern matches first a single uppercase letter and then arbitrary many lowercase letters:
[A-Z][a-z]*
The characters within the square brackets [...]
define the character class. In this case, all characters from A
to Z
. The *
is a quantifier and means “at least zero, but up to arbitrary many”.