-
-
Notifications
You must be signed in to change notification settings - Fork 32.9k
Closed
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
The robots.txt parsing fails if one line is not parsable from a robots.txt file. I don't think this is valid behavior. Ideally, non-parsable/invalid lines should be skipped. The norobots-rfc says the same too: Implementors should pay particular attention to the robustness in parsing of the /robots.txt file.
.
File "/usr/local/lib/python3.11/urllib/robotparser.py", line 123, in parse
entry.rulelines.append(RuleLine(line[1], False))
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/robotparser.py", line 222, in __init__
path = urllib.parse.urlunparse(urllib.parse.urlparse(path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/parse.py", line 395, in urlparse
splitresult = urlsplit(url, scheme, allow_fragments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/parse.py", line 500, in urlsplit
_check_bracketed_host(bracketed_host)
File "/usr/local/lib/python3.11/urllib/parse.py", line 446, in _check_bracketed_host
ip = ipaddress.ip_address(hostname) # Throws Value Error if not IPv6 or IPv4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: '[routes.productDetail(product.sku, product.slug)' does not appear to be an IPv4 or IPv6 address
I know [routes.productDetail(product.sku, product.slug)
is clearly not a valid URL, but I don't think the whole parsing should error out because of this one line.
CPython versions tested on:
3.11
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error