-
Notifications
You must be signed in to change notification settings - Fork 641
fix: add UTF-16 encoding detection and conversion to prevent assertion failures #4347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: add UTF-16 encoding detection and conversion to prevent assertion failures #4347
Conversation
…n failures Universal Ctags crashed with assertion failure in vStringPutImpl() when encountering files with UTF-16 encoding. The assertion `c >= 0 && c <= 0xff` failed because ctags expected all characters to fit within single byte range, but UTF-16 files contain multi-byte sequences that violate this assumption. This fix adds: - Detection of UTF-16 BOM (both LE and BE) in file reading - Automatic conversion from UTF-16 to UTF-8 using iconv when UTF-16 is detected - Force memory stream processing for UTF-16 files to enable conversion - Test cases for both UTF-16 LE and BE files Resolves issue universal-ctags#4342 Signed-off-by: Bernát Gábor <[email protected]>
55764aa to
c041872
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #4347 +/- ##
=======================================
Coverage 85.87% 85.88%
=======================================
Files 252 252
Lines 62597 62631 +34
=======================================
+ Hits 53755 53789 +34
Misses 8842 8842 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Enables test execution for existing UTF-16 test files by adding the required args.ctags configuration file. This ensures the UTF-16 LE and UTF-16 BE files are processed during test runs, improving code coverage for the UTF-16 to UTF-8 conversion functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Adds test for UTF-16 conversion failure path using malformed UTF-16 data with invalid surrogate sequences. This triggers the iconv() failure path and tests the fallback mechanism that preserves original data when UTF-16 to UTF-8 conversion fails. This ensures 100% coverage of the UTF-16 conversion error handling code including the eFree(converted_data) cleanup logic. Signed-off-by: Bernát Gábor <[email protected]>
Adds specific test for UTF-16 Big Endian BOM detection (FE FF) to ensure complete coverage of line 899: (bom[0] == 0xFE && bom[1] == 0xFF). This test completes 100% coverage of all UTF-16 BOM detection paths including both LE (FF FE) and BE (FE FF) byte order markers. Signed-off-by: Bernát Gábor <[email protected]>
|
@masatake any updates on this? |
|
Sorry to be late to respond. I will work on this request next. |
|
Ideally you can just review and accept this PR. Anything wrong with the solution in it? 🤔 |
|
The change for getMioFull() is excellent. This change requires new section like: I need time for thinking about the new test cases. |
Universal Ctags crashed with assertion failure in
vStringPutImpl()when encountering files with UTF-16 encoding. The assertionc >= 0 && c <= 0xfffailed because ctags expected all characters to fit within single byte range, but UTF-16 files contain multi-byte sequences that violate this assumption.This fix adds:
Resolves issue #4342
Signed-off-by: Bernát Gábor [email protected]