Skip to content

UnicodeDecodeError when decoding message body with non-UTF-8 encoding #1030

@suribe06

Description

@suribe06

Hi team,

First, thank you for your work on this integration — it's been very helpful.

I'm encountering a UnicodeDecodeError when using the GmailSearch tool to retrieve and parse messages. The error occurs in the _parse_messages method when the message body is decoded as UTF-8, but the actual encoding of the message is different (e.g., Latin-1 or Windows-1252). Here's the traceback:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 13503: invalid start byte

The issue seems to happen at this line (from gmail/search.py):

message_body = email_msg.get_payload(decode=True).decode("utf-8")

While multipart messages have a fallback to Latin-1 decoding, non-multipart messages are always decoded as UTF-8, which can cause the tool to crash when the content is in a different encoding.

To improve robustness, I recommend wrapping the decoding step with a fallback, more general solution could try multiple encodings or use errors="replace" or errors="ignore" to avoid hard crashes on malformed characters.

Let me know if you'd like me to open a PR with a patch.

Thanks again!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions