Skip to content

[Bug]: Special Characters in a models .md file cause the lineage tool to fail #594

@craigchurch

Description

@craigchurch

What happened?

AI tells me:

Perfect! I found it. At byte position 1901709, there's the byte sequence 0xC3 0x81, which is the UTF-8 encoding for the character "Á" (Latin Capital Letter A with Acute).

The context shows this is part of the text: "Area México, Área Brasil, Área Centroaméric..."

This appears to be in a description or documentation field that includes geographic regions with Portuguese/Spanish names containing accented characters. The word "Área" means "Area" in Portuguese/Spanish.

The problem: The dbt MCP server is trying to read the manifest.json file using Windows' default 'charmap' encoding (CP-1252) instead of UTF-8, so it can't decode these valid UTF-8 accented characters.

Let me search for where this text appears in your dbt project:

Found it! The special characters are in cdol_mdm_org_dim_init.md at line 433:

These are valid UTF-8 characters (Spanish/Portuguese accented letters and Japanese katakana/kanji), but the dbt MCP server is trying to read the compiled manifest.json using Windows' default charmap encoding instead of UTF-8.

Steps to Reproduce

Ask AI to get lineage for a model

Deployment

Remote MCP server

Environment

Windows 11
VSCode

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions