[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

cor3ntin · 2025-10-22T16:11:16Z

UTF-16 to UTF-32 conversions seems widespread,
and lone surrogate have a distinct representation in UTF-32.

Lets not warn on this case to make the warning easier to adopt. This follows SG-16 guideline

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3695r2.html#changes-since-r1

UTF-16 to UTF-16 conversions seems widespread, and lone surrogate have a distinct representation in UTF-32. Lets not warn on this case to make the warning easier to adopt. This follows SG-16 guideline https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3695r2.html#changes-since-r1 Fixes llvm#163719

llvmbot · 2025-10-22T16:12:04Z

@llvm/pr-subscribers-clang

Author: Corentin Jabot (cor3ntin)

Changes

UTF-16 to UTF-16 conversions seems widespread,
and lone surrogate have a distinct representation in UTF-32.

Lets not warn on this case to make the warning easier to adopt. This follows SG-16 guideline

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3695r2.html#changes-since-r1

Fixes #163719

Full diff: https://github.com/llvm/llvm-project/pull/164654.diff

2 Files Affected:

(modified) clang/lib/Sema/SemaChecking.cpp (+8-1)
(modified) clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp (+4-4)

diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index dd5b710d7e1d4..41bcf8fd493fc 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -12014,13 +12014,20 @@ static void DiagnoseMixedUnicodeImplicitConversion(Sema &S, const Type *Source,
                                                    SourceLocation CC) {
   assert(Source->isUnicodeCharacterType() && Target->isUnicodeCharacterType() &&
          Source != Target);
+
+  // Lone surrogates have a distinct representation in UTF-32.
+  // Converting between UTF-16 and UTF-32 codepoints seems very widespread,
+  // so don't warn on such conversion.
+  if (Source->isChar16Type() && Target->isChar32Type())
+    return;
+
   Expr::EvalResult Result;
   if (E->EvaluateAsInt(Result, S.getASTContext(), Expr::SE_AllowSideEffects,
                        S.isConstantEvaluatedContext())) {
     llvm::APSInt Value(32);
     Value = Result.Val.getInt();
     bool IsASCII = Value <= 0x7F;
-    bool IsBMP = Value <= 0xD7FF || (Value >= 0xE000 && Value <= 0xFFFF);
+    bool IsBMP = Value <= 0xDFFF || (Value >= 0xE000 && Value <= 0xFFFF);
     bool ConversionPreservesSemantics =
         IsASCII || (!Source->isChar8Type() && !Target->isChar8Type() && IsBMP);
 
diff --git a/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp b/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp
index fcff006d0e028..f17f20ca25295 100644
--- a/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp
+++ b/clang/test/SemaCXX/warn-implicit-unicode-conversions.cpp
@@ -14,7 +14,7 @@ void test(char8_t u8, char16_t u16, char32_t u32) {
     c16(u32); // expected-warning {{implicit conversion from 'char32_t' to 'char16_t' may lose precision and change the meaning of the represented code unit}}
 
     c32(u8);  // expected-warning {{implicit conversion from 'char8_t' to 'char32_t' may change the meaning of the represented code unit}}
-    c32(u16); // expected-warning {{implicit conversion from 'char16_t' to 'char32_t' may change the meaning of the represented code unit}}
+    c32(u16);
     c32(u32);
 
 
@@ -30,7 +30,7 @@ void test(char8_t u8, char16_t u16, char32_t u32) {
     c16(char32_t(0x7f));
     c16(char32_t(0x80));
     c16(char32_t(0xD7FF));
-    c16(char32_t(0xD800)); // expected-warning {{implicit conversion from 'char32_t' to 'char16_t' changes the meaning of the code unit '<0xD800>'}}
+    c16(char32_t(0xD800));
     c16(char32_t(0xE000));
     c16(char32_t(U'🐉')); // expected-warning {{implicit conversion from 'char32_t' to 'char16_t' changes the meaning of the code point '🐉'}}
 
@@ -44,8 +44,8 @@ void test(char8_t u8, char16_t u16, char32_t u32) {
     c32(char16_t(0x80));
 
     c32(char16_t(0xD7FF));
-    c32(char16_t(0xD800)); // expected-warning {{implicit conversion from 'char16_t' to 'char32_t' changes the meaning of the code unit '<0xD800>'}}
-    c32(char16_t(0xDFFF)); // expected-warning {{implicit conversion from 'char16_t' to 'char32_t' changes the meaning of the code unit '<0xDFFF>'}}
+    c32(char16_t(0xD800));
+    c32(char16_t(0xDFFF));
     c32(char16_t(0xE000));
     c32(char16_t(u'☕'));

cor3ntin · 2025-10-23T09:58:30Z

@AaronBallman

AaronBallman · 2025-10-23T12:11:29Z

UTF-16 to UTF-16 conversions seems widespread,

UTF-16 to UTF-32, right?

AaronBallman

LGTM aside from the PR summary, should have a release note (perhaps on the release branch).

cor3ntin requested a review from AaronBallman October 22, 2025 16:11

cor3ntin added the release:backport label Oct 22, 2025

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 22, 2025

efriedma-quic added this to the LLVM 21.x Release milestone Oct 22, 2025

github-project-automation bot added this to LLVM Release Status Oct 22, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status Oct 22, 2025

c-rhodes moved this from Needs Triage to Needs Review in LLVM Release Status Oct 23, 2025

AaronBallman approved these changes Oct 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

Uh oh!

cor3ntin commented Oct 22, 2025 •

edited

Loading

Uh oh!

llvmbot commented Oct 22, 2025

Uh oh!

cor3ntin commented Oct 23, 2025

Uh oh!

AaronBallman commented Oct 23, 2025

Uh oh!

AaronBallman left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

Are you sure you want to change the base?

[Clang] Do not warn on UTF-16 -> UTF-32 conversions. (#163927) #164654

Uh oh!

Conversation

cor3ntin commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 22, 2025

Uh oh!

cor3ntin commented Oct 23, 2025

Uh oh!

AaronBallman commented Oct 23, 2025

Uh oh!

AaronBallman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cor3ntin commented Oct 22, 2025 •

edited

Loading