InfoSec Write-ups

A collection of write-ups from the best hackers in the world on topics ranging from bug bounties and CTFs to vulnhub machines, hardware challenges and real life encounters. Subscribe to our weekly newsletter for the coolest infosec updates: https://weekly.infosecwriteups.com/

Follow publication

ReDoS — Denial of Service by RegEx 😈

Martin Thoma
InfoSec Write-ups
Published in
5 min readMar 7, 2022

Regular expressions (RegEx) are a formal language to define simple patterns. It is commonly used to find interesting parts within a larger body of text or to validate data. It is typically fast and sometimes a clean solution. However, in some cases, an attacker can craft input to a vulnerable regular expression which causes the time to check the input for the regex to be way longer than expected.

Why you should care

I’ve only found one example of a successful ReDoS “attack”. In 2016, 34 minutes of outage of StackOverflow was caused by ReDoS (source).

Wikipedia and OWASP don’t mention a single successful attack. I guess the reason for that is that RegEx is not used that often on the server-side 🤷‍♂️ There are a lot of parsing tools for Python, but I only vaguely remember using pyparsing once.

In many cases when you might want to use a regex, doing it completely correctly is just overkill. For example, email regexes. Instead of jumping in that rabbit hole, you can just do some super basic checks, e.g. if the length is at least 3 and if there is exactly one @ symbol in there. Then you just send a confirmation email which verifies that the user actually has access to this email address.

For URLs, you can typically just ping them. For HTML, you want to use an HTML parser. For many other cases, using split goes a long way.

What I want to say: It might be that you don’t need to worry as you might not use RegEx in most of your code. It’s interesting nevertheless 😁

An Introduction to PCRE

Perl Compatible Regular Expressions (PCRE) are the most common style of regular expressions. They are a way to specify types of strings. They are way more specific than just wildcard pattern matching.

For example, the following pattern matches first a single uppercase letter and then arbitrary many lowercase letters:

[A-Z][a-z]*

The characters within the square brackets [...] define the character class. In this case, all characters from A to Z . The * is a quantifier and means “at least zero, but up to arbitrary many”.

RegEx Engines

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Published in InfoSec Write-ups

A collection of write-ups from the best hackers in the world on topics ranging from bug bounties and CTFs to vulnhub machines, hardware challenges and real life encounters. Subscribe to our weekly newsletter for the coolest infosec updates: https://weekly.infosecwriteups.com/

Written by Martin Thoma

I’m a Software Engineer with over 10 years of Python experience (Backend/ML/AI). Support me via https://martinthoma.medium.com/membership

No responses yet

Write a response