Skip to content

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Sep 4, 2025

  • Don't fail trying to parse weird patterns.
  • Don't fail trying to decode non-UTF-8 "robots.txt" files.
  • No longer ignore trailing "?" in patterns and URLs.
  • Distinguish raw special characters "?", "=" and "&" from the percent-encoded ones.
  • Remove tests that do nothing.

@serhiy-storchaka
Copy link
Member Author

This PR fixes also #88375.

…obotparser

* Distinguish the query separator from a percent-encoded ?.
* Fix support of non-UTF-8 robots.txt files.
* Don't fail trying to parse weird paths.
@serhiy-storchaka serhiy-storchaka force-pushed the robotparser-percent-encoding branch from 9c026fa to b43f987 Compare September 4, 2025 16:11
@serhiy-storchaka
Copy link
Member Author

Also, this PR removes tests that does nothing.

@serhiy-storchaka serhiy-storchaka changed the title gh-111788: Fix parsing and normalization of rules and URLs in robotparser gh-88375, gh-111788: Fix parsing and normalization of rules and URLs in robotparser Sep 5, 2025
@serhiy-storchaka serhiy-storchaka merged commit cb7ef18 into python:main Sep 5, 2025
45 checks passed
@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 5, 2025
… in robotparser (pythonGH-138502)

* Don't fail trying to parse weird patterns.
* Don't fail trying to decode non-UTF-8 "robots.txt" files.
* No longer ignore trailing "?" in patterns and URLs.
* Distinguish raw special characters "?", "=" and "&" from the
  percent-encoded ones.
* Remove tests that do nothing.
(cherry picked from commit cb7ef18)

Co-authored-by: Serhiy Storchaka <[email protected]>
@serhiy-storchaka serhiy-storchaka deleted the robotparser-percent-encoding branch September 5, 2025 15:58
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 5, 2025
… in robotparser (pythonGH-138502)

* Don't fail trying to parse weird patterns.
* Don't fail trying to decode non-UTF-8 "robots.txt" files.
* No longer ignore trailing "?" in patterns and URLs.
* Distinguish raw special characters "?", "=" and "&" from the
  percent-encoded ones.
* Remove tests that do nothing.
(cherry picked from commit cb7ef18)

Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Sep 5, 2025

GH-138548 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Sep 5, 2025
@bedevere-app
Copy link

bedevere-app bot commented Sep 5, 2025

GH-138549 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Sep 5, 2025
@serhiy-storchaka serhiy-storchaka changed the title gh-88375, gh-111788: Fix parsing and normalization of rules and URLs in robotparser gh-88375, gh-111788: Fix parsing errors and normalization in robotparser Sep 5, 2025
serhiy-storchaka added a commit that referenced this pull request Sep 5, 2025
…obotparser (GH-138502) (GH-138549)

* Don't fail trying to parse weird patterns.
* Don't fail trying to decode non-UTF-8 "robots.txt" files.
* No longer ignore trailing "?" in patterns and URLs.
* Distinguish raw special characters "?", "=" and "&" from the
  percent-encoded ones.
* Remove tests that do nothing.
(cherry picked from commit cb7ef18)

Co-authored-by: Serhiy Storchaka <[email protected]>
lkollar pushed a commit to lkollar/cpython that referenced this pull request Sep 9, 2025
… in robotparser (pythonGH-138502)

* Don't fail trying to parse weird patterns.
* Don't fail trying to decode non-UTF-8 "robots.txt" files.
* No longer ignore trailing "?" in patterns and URLs.
* Distinguish raw special characters "?", "=" and "&" from the
  percent-encoded ones.
* Remove tests that do nothing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant