ReDoS: Regular Expression Denial of Service -- How Catastrophic Backtracking Crashes Your API

ReDoS is the denial-of-service attack that hides inside input validation. Most developers trust their regex patterns as safe — after all, validation code is supposed to stop attacks, not enable them. But certain regex constructs cause most regex engines to exhibit exponential time complexity on specially crafted input strings, allowing a single HTTP request with a few kilobytes of payload to pin a server’s CPU at 100% for seconds, minutes, or hours.

The attack is purely computational. No memory corruption, no injection — just a carefully shaped string that exploits the way backtracking NFA engines search for matches.

How Catastrophic Backtracking Works

Most regex engines use a backtracking NFA (Non-deterministic Finite Automaton). When a pattern match fails, the engine backs up and tries alternative paths. For well-written patterns, this is fast. For patterns with nested quantifiers and ambiguity, the number of paths the engine must explore grows exponentially with input length.

The canonical vulnerable pattern:

(a+)+$

Against the input "aaaaaaaaaaaaaaaaaaaaaaaX", this pattern must try every way to partition the a characters into groups. For n characters, there are 2^(n-1) possible partitions. 30 as followed by an X results in over 500 million attempted matches.

Why it’s exploitable:

The attacker only needs to know (or guess) what validation regex the server uses, then craft a string that maximises backtracking. The crafted string often follows the valid pattern for most of its length, then contains a character at the end that causes the final match to fail — forcing the engine to exhaustively explore all alternatives.

Identifying Vulnerable Patterns

The structural markers of potentially vulnerable patterns are:

Nested quantifiers: (a+)+, (a*)*, (a+)*, (a{1,10})+
Alternation within a quantifier: (a|b)+X, ([a-z]|\d)+!
Overlapping character classes in adjacent quantifiers: \w+\s+\w+ (overlapping when input has many \w\s combinations)

Python — testing a pattern for ReDoS:

import re
import time

def test_redos(pattern: str, evil_input: str, timeout_seconds: float = 2.0) -> bool:
    """Returns True if the pattern is likely vulnerable to ReDoS."""
    start = time.time()
    try:
        re.match(pattern, evil_input, re.TIMEOUT)  # Python 3.11+ supports timeout
    except TimeoutError:
        return True
    elapsed = time.time() - start
    return elapsed > timeout_seconds

# Test for the classic nested quantifier pattern
pattern = r'^(a+)+$'
evil = 'a' * 30 + 'X'
print(test_redos(pattern, evil))  # True -- vulnerable

JavaScript — the same problem:

// Vulnerable pattern in email validation (simplified)
const vulnerableEmailPattern = /^([a-zA-Z0-9]+\.)*[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]+$/;

// Crafted input that causes catastrophic backtracking
const evil = 'a'.repeat(30) + '@';

console.time('regex');
vulnerableEmailPattern.test(evil);  // Will hang the event loop
console.timeEnd('regex');
// For 30 'a's: takes ~5 seconds in V8
// For 40 'a's: minutes

Node.js runs regex on the main thread. A hanging regex blocks the event loop and makes the entire Node process unresponsive to all requests — denial of service for every user, from a single crafted HTTP request.

Real-World Vulnerable Patterns

These pattern types appear regularly in production code:

# Email validation -- common ReDoS vector
r'^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$'
# This is roughly the RFC 5321 pattern -- some implementations of it are vulnerable

# URL validation
r'^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$'
# The nested ([\/\w \.-]*)* is a classic catastrophic backtracking target

# Password strength checking
r'^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])((?:[A-Za-z0-9]|[^\x00-\x7F])+)$'
# Mixed Unicode + ASCII quantifier -- often slow on crafted input

Fixes: Safe Regex Patterns

Principle 1: Eliminate nested quantifiers

# Vulnerable
r'(a+)+'

# Fixed -- atomic group (Python 3.11+ with re module, or use regex package)
import regex
r'(?>a+)+'  # Atomic group: once matched, never backtrack into it

Principle 2: Use possessive quantifiers / atomic groups

# Python regex package supports possessive quantifiers
import regex

# Vulnerable
bad = regex.compile(r'(\w+\s+)+')

# Fixed with possessive quantifier -- no backtracking
good = regex.compile(r'(\w++\s++)+')  # ++ means possessive

Principle 3: Use linear-time regex engines

The re2 library (Google) guarantees O(n) time complexity for any input. It does not support lookaheads or backreferences, which are the features that enable catastrophic backtracking.

# Install: pip install google-re2
import re2

# Drop-in replacement for re module
pattern = re2.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
result = pattern.match(user_input)  # Guaranteed O(n) regardless of input

// Node.js: use the re2 package
const RE2 = require('re2');
const emailPattern = new RE2('^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$');
emailPattern.test(userInput);  // Linear time, safe for untrusted input

Principle 4: Enforce input length limits before regex evaluation

def validate_email(email: str) -> bool:
    if len(email) > 254:  # RFC 5321 maximum email length
        return False
    # Now the regex runs on bounded input, limiting worst-case time
    return bool(EMAIL_PATTERN.match(email))

Even a vulnerable pattern becomes practically safe when the input is bounded to a small maximum. A pattern that takes 2^n operations becomes manageable when n is capped at 254.

Detection and Testing Tools

safe-regex (npm): Statically analyses JavaScript regex patterns for catastrophic backtracking potential.

npx safe-regex '(a+)+'
# Output: unsafe! (exponential)

vuln-regex-detector (Weideman, 2017): More comprehensive static analysis, finds ReDoS in nested and polynomial backtracking cases.

Semgrep rules: The semgrep registry includes rules for common ReDoS patterns in Python, JavaScript, and Java. Run as part of CI to catch new vulnerable patterns before they reach production.

semgrep --config=r/python.lang.security.audit.regex-dos python/

OWASP’s ReDoS Cheat Sheet: Documents safe alternatives for the most common use cases (email, URL, IP address validation) where ReDoS is regularly found.

Apply input length limits, prefer re2 or equivalent linear-time engines for user-controlled inputs, and include static regex analysis in your SAST pipeline. Catastrophic backtracking is deterministic — it will be found by anyone who looks for it.