Skip to content

Conversation

Crozzers
Copy link
Contributor

This PR fixes #633, which showcases a ReDoS vulnerability in _sorta_html_tokenize_re. This PR also adds a regression test suite for ReDoS attacks and integrates it as part of the CICD checks.

The fix

The problem was with the section of regex that matches tags with attributes, specifically line 1276:

(?: # attributes
\s+ # whitespace after tag
(?:[^\t<>"'=/]+:)?
[^<>"'=/]+= # attr name
(?:".*?"|'.*?'|[^<>"'=/\s]+) # value, quoted or unquoted. If unquoted, no spaces allowed
)*
\s*/?>

The problem was with the section that matches quoted attribute values using ".*?". Take the following input:

<p m="1"<p m="1"<p m="1"<p m="1"<p m="1"<p m="1"<p m="1" ...[x 5000]...         </div

The first m="1" matches the attribute regex just fine, but then we hit another <p right after. The regex expects a closing bracket for this tag, so it assumes this is part of the attribute as well. It ends up consuming the whole string and matching m="1"<p m="1"<p m="1" ...[x5000]...<p m="1" as the attribute until it reaches the end of the input, at which point it fails to find a match and catastrophically backtracks.

By changing the attribute matching criteria to "[^"]*?", we negate this. The regex reads m="1" as the attribute, <p then breaks the match immediately and the regex can exit.

This is what I believe happened, based on what I saw stepping through it with debuggex.

The new test suite

Since alot of these redos attacks rely on creating a massive string to force extensive backtracking, I thought it would be inefficient to include them with the normal testcase files.

What I've done instead is added a separate test suite that will generate these inputs on the fly and pass it to the library. It uses a time limit to decide pass/fail, with no test case taking longer than 3s.

I searched the repo for any ReDoS related issues/PRs and added a test case for each of them.

I've added this to the makefile as make testredos and also added it to the CICD workflow.

@Crozzers
Copy link
Contributor Author

@nicholasserra on a related note, I was looking at the pending changes and thinking it might be time for a release. I can see 1 other security fix in there, as well as some other stuff. According to PyPI the last release was Jan so probably due for one

@nicholasserra
Copy link
Collaborator

Thanks for the writeup on this, and the new test suite! Also first time seeing that debuggex tool, that's pretty awesome. Definitely gonna use that at some point.

I'll get a release out after this merge

@nicholasserra nicholasserra merged commit 4840300 into trentm:master Jul 27, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Protential ReDoS vulnerability in markdown2.py
2 participants