`python-magic` Incorrectly Classifies HTML as `text/plain`

Problem
Our current content type detection using the python-magic library misidentifies HTML content as text/plain if the <html> tag is missing, even when <div> or other HTML tags are present. This causes incorrect handling of HTML fragments.

Solution
We'll enhance detection by manually checking for <html> or <div> tags. If found, we'll explicitly set the MIME type to text/html, overriding python-magic's default.

```
mime = magic.from_buffer(content, mime=True)

# If the file content contains HTML tags, override the detected mime type to text/html
if b"<html" in content.lower() or b"<div" in content.lower():
    mime = "text/html"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`python-magic` Incorrectly Classifies HTML as `text/plain` #207

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

python-magic Incorrectly Classifies HTML as text/plain #207

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`python-magic` Incorrectly Classifies HTML as `text/plain` #207