Skip to content

Document - Correct when instantiated with bytes content instead of bytes #195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tomdpsrd
Copy link

@tomdpsrd tomdpsrd commented Jul 28, 2025

Actual Behavior

Document.summary() is not working with python3 when the document is based on bytes and not on string content.
The new released version (0.8.4.1) contains an old modification that put the regexp in string instead of bytes.

Linked issue :
#194

Steps to Reproduce the Problem

Follow the readme steps

>>> import requests
>>> from readability import Document

>>> response = requests.get('http://example.com')
>>> doc = Document(response.content)
>>> doc.title()
Traceback (most recent call last):
...
    RE_CHARSET.findall(page) + RE_PRAGMA.findall(page) + RE_XML.findall(page)
    ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot use a string pattern on a bytes-like object

@tomdpsrd tomdpsrd changed the title Correction bytes Document - Correct when instantiated with bytes content instead of bytes Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant