Skip to content

Decode encoded unicode characters#10

Open
silkeh wants to merge 1 commit intoZaryob:mainfrom
silkeh:unicode-decode
Open

Decode encoded unicode characters#10
silkeh wants to merge 1 commit intoZaryob:mainfrom
silkeh:unicode-decode

Conversation

@silkeh
Copy link

@silkeh silkeh commented May 27, 2025

Add supported for decoding &#xhhhh; and &#nnnn; encoded characters. This ensures that XML encoded with Iksemel can actually be decoded without loss.

Resolves #9

@silkeh silkeh force-pushed the unicode-decode branch 5 times, most recently from 518e104 to e617c32 Compare May 27, 2025 17:38
@silkeh
Copy link
Author

silkeh commented Jun 11, 2025

This contains a bug where the characters after the special character disappear. I've added a unit test showing this behaviour (which currently fails). I have no idea what causes it, but I'll do some more digging soon.

@silkeh silkeh marked this pull request as draft June 11, 2025 21:15
Add supported for decoding `&#xhhhh;` and `&#nnnn;` encoded characters.
This ensures that XML encoded with Iksemel can actually be decoded without loss.
@silkeh
Copy link
Author

silkeh commented Jun 11, 2025

Fixed! 🙂

@Neustradamus
Copy link

Any progress on it?

cc: @Zaryob.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML escaped characters get replaced with question marks ?

2 participants