Skip to content

Incorrect Unicode Handling in String Diff FunctionsΒ #963

@olegmingaleev

Description

@olegmingaleev

When comparing strings containing complex Unicode characters like emojis with ZWJ sequences, the diff functions produce incorrect results due to improper character segmentation.

🌍 Affected Unicode Characters

Emojis with ZWJ: πŸ‘¨β€πŸ³, πŸ‘©β€πŸ’», πŸ‘¨β€πŸŽ¨, etc.
Multi-byte characters: accented characters, CJK characters
Surrogate pairs: any emoji or character outside BMP
Combining characters: characters with diacritics

Related

Unicode Standard: https://unicode.org/
Intl.Segmenter MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/S

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions