Skip to content

Conversation

@gaborbernat
Copy link
Contributor

@gaborbernat gaborbernat commented Nov 13, 2025

Universal Ctags crashed with assertion failure in vStringPutImpl() when encountering files with UTF-16 encoding. The assertion c >= 0 && c <= 0xff failed because ctags expected all characters to fit within single byte range, but UTF-16 files contain multi-byte sequences that violate this assumption.

This fix adds:

  • Detection of UTF-16 BOM (both LE and BE) in file reading
  • Automatic conversion from UTF-16 to UTF-8 using iconv when UTF-16 is detected
  • Force memory stream processing for UTF-16 files to enable conversion
  • Test cases for both UTF-16 LE and BE files

Resolves issue #4342

Signed-off-by: Bernát Gábor [email protected]

…n failures

Universal Ctags crashed with assertion failure in vStringPutImpl() when
encountering files with UTF-16 encoding. The assertion `c >= 0 && c <= 0xff`
failed because ctags expected all characters to fit within single byte range,
but UTF-16 files contain multi-byte sequences that violate this assumption.

This fix adds:
- Detection of UTF-16 BOM (both LE and BE) in file reading
- Automatic conversion from UTF-16 to UTF-8 using iconv when UTF-16 is detected
- Force memory stream processing for UTF-16 files to enable conversion
- Test cases for both UTF-16 LE and BE files

Resolves issue universal-ctags#4342

Signed-off-by: Bernát Gábor <[email protected]>
@gaborbernat gaborbernat force-pushed the fix-utf16-encoding-crash branch from 55764aa to c041872 Compare November 13, 2025 08:18
@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.88%. Comparing base (d48558f) to head (fa7d3e7).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4347   +/-   ##
=======================================
  Coverage   85.87%   85.88%           
=======================================
  Files         252      252           
  Lines       62597    62631   +34     
=======================================
+ Hits        53755    53789   +34     
  Misses       8842     8842           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gaborbernat and others added 3 commits November 13, 2025 06:45
Enables test execution for existing UTF-16 test files by adding the
required args.ctags configuration file. This ensures the UTF-16 LE and
UTF-16 BE files are processed during test runs, improving code coverage
for the UTF-16 to UTF-8 conversion functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Adds test for UTF-16 conversion failure path using malformed UTF-16 data
with invalid surrogate sequences. This triggers the iconv() failure path
and tests the fallback mechanism that preserves original data when
UTF-16 to UTF-8 conversion fails.

This ensures 100% coverage of the UTF-16 conversion error handling code
including the eFree(converted_data) cleanup logic.

Signed-off-by: Bernát Gábor <[email protected]>
Adds specific test for UTF-16 Big Endian BOM detection (FE FF) to ensure
complete coverage of line 899: (bom[0] == 0xFE && bom[1] == 0xFF).

This test completes 100% coverage of all UTF-16 BOM detection paths
including both LE (FF FE) and BE (FE FF) byte order markers.

Signed-off-by: Bernát Gábor <[email protected]>
@gaborbernat
Copy link
Contributor Author

@masatake any updates on this?

@masatake masatake self-assigned this Nov 26, 2025
@masatake
Copy link
Member

Sorry to be late to respond. I will work on this request next.

@gaborbernat
Copy link
Contributor Author

Ideally you can just review and accept this PR. Anything wrong with the solution in it? 🤔

@masatake
Copy link
Member

The change for getMioFull() is excellent.
Could you write about this change to docs/news/HEAD.rst ?

This change requires new section like:

Bug fixes
-----------------------------

I need time for thinking about the new test cases.
I had struggled once in #4268 but I had burned out.
This is time to focus on the topic again, what we should do with .gitattributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants