RFC: ID_Compat_Math characters allowed in identifiers #3840

Danvil · 2025-07-16T19:17:40Z

This RFC extends the set of Unicode character which can be used in identifiers with ID_Compat_Math_Start and ID_Compat_Math_Continue, most notable: ∇, ∂, ∞, subscripts ⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ and superscripts ₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎.

This can be a boon to implementers of scientific concepts as they can write for example let ∇E₁₂ = 0.5;.

Rendered

clarfonthey · 2025-07-16T22:11:21Z

While I mostly sympathise with this and think that it's probably fine to do this, I think that an RFC suggesting this should at minimum:

Reference the actual section of UAX 31 that defines these groups of characters: https://www.unicode.org/reports/tr31/#Standard_Profiles
Reference the section of UTS 55 linked in the above section that explains why you might not want to use these groups of characters, which currently cites Rust as an existing example: https://www.unicode.org/reports/tr55/#General-Security-Profile
Reference the section of UTS 39 linked in the above section that explains the exact mechanisms which the above can be made safe: https://www.unicode.org/reports/tr39/#General_Security_Profile

Note that your reference to NFKC is technically correct: Not_NFKC is one of the restricted security profile cases that is covered by UTS 39, but it's not the only one, and it's worth discussing whether Rust's handling would need to be expanded because of this case.

FWIW, I very much sympathise with both the desire to have more scientific characters in variables and the desire to hand-wave away the issues as being already solved. It's also harder than ever before to do proper research online due to the shift of focus toward crystal-ball-based decisionmaking. I mostly want to clarify where you can find the relevant Unicode resources discussing this issue, and I think that the RFC should be updated to directly reference them so that we don't try and reinvent the wheel and redo all their hard work.

Also, I think it's pretty great that Rust is explicitly mentioned in the Unicode standard as someone who does this right! I didn't know this was the case until now.

text/0000-compat-math-identifiers.md

* Added links to UAX31 and others as requested in CR * Fixed typos as requested in CR * Extended the drawbacks section * Other improvements

Danvil · 2025-07-16T23:23:10Z

@clarfonthey Thanks for the review! I made the requested changes and added more links to the Unicode resources and expanded some sections.
@programmerjake Thanks for the review - typos are fixed.

text/0000-compat-math-identifiers.md

Noratrieb · 2025-07-17T10:57:06Z

text/0000-compat-math-identifiers.md

+
+* Rust might want to decide in the future to give certain superscripts and subscripts syntactic meaning. For example they might want to interpret `a²` as `a * a` or `a₁` as `a[0]`. The latter sounds espcially unlikely though due to the general disagreement of 0-based vs 1-based indexing.
+
+# Rationale and alternatives


Rust currently just follows Unicode's recommendation on what should be allowed as a programming language identifier: https://rust-lang.github.io/rfcs/2457-non-ascii-idents.html (Annex 31).

This seems like a reasonable choice, letting the Unicode Consortium handle Unicode decisions, so while I can certainly see the motivation you presented, I am cautious about this change.

It would be very good to have a description here of why Annex 31 does not contain these symbols, if such discussion can be found anywhere, to ensure that we are not missing something important and are sure about our choice to deviate from the recommendation.

It would be very good to have a description here of why Annex 31 does not contain these symbols

UAX 31 does contain these symbols, that's what this profile comes from: https://www.unicode.org/reports/tr31/#Mathematical_Compatibility_Notation_Profile

For the question "why are they not in the default profile", the answer is basically to leave room for languages that want to do custom operators, or use these as builtin operators.

It's also just caution in expanding the set to include new meanings: while the XID set expands with each Unicode release as new characters get added, it would not be good for new types of characters to get included: if a programming language cared only about linguistic content in identifiers; it would perhaps be surprised if mathematical subscripts entered the fray. This separate profile allows for explicit choice.

This seems like a reasonable choice, letting the Unicode Consortium handle Unicode decisions, so while I can certainly see the motivation you presented, I am cautious about this change.

The mathematical profile is included in UAX 31, the identifiers standard: that is the Unicode consortium making a Unicode decision that these are acceptable in identifiers. It's a choice from a menu that programming languages may choose from. Rust is currently following Unicode's recommendation, but this RFC would have Rust continuing to follow Unicode's recommendation.

Changed formulation related to UAX31 a bit.

Noratrieb · 2025-07-17T10:58:20Z

cc @Manishearth as our Unicode person

Manishearth

Overall seems fine to me. I didn't include this in the original RFC since IIRC the mathematical profile was still being worked on, and I didn't wish to have this facet be another thing that needed to be discussed.

Manishearth · 2025-07-17T14:25:09Z

text/0000-compat-math-identifiers.md

+
+* Rust might want to decide in the future to give certain superscripts and subscripts syntactic meaning. For example they might want to interpret `a²` as `a * a` or `a₁` as `a[0]`. The latter sounds espcially unlikely though due to the general disagreement of 0-based vs 1-based indexing.
+
+# Rationale and alternatives


It would be very good to have a description here of why Annex 31 does not contain these symbols

UAX 31 does contain these symbols, that's what this profile comes from: https://www.unicode.org/reports/tr31/#Mathematical_Compatibility_Notation_Profile

For the question "why are they not in the default profile", the answer is basically to leave room for languages that want to do custom operators, or use these as builtin operators.

It's also just caution in expanding the set to include new meanings: while the XID set expands with each Unicode release as new characters get added, it would not be good for new types of characters to get included: if a programming language cared only about linguistic content in identifiers; it would perhaps be surprised if mathematical subscripts entered the fray. This separate profile allows for explicit choice.

This seems like a reasonable choice, letting the Unicode Consortium handle Unicode decisions, so while I can certainly see the motivation you presented, I am cautious about this change.

The mathematical profile is included in UAX 31, the identifiers standard: that is the Unicode consortium making a Unicode decision that these are acceptable in identifiers. It's a choice from a menu that programming languages may choose from. Rust is currently following Unicode's recommendation, but this RFC would have Rust continuing to follow Unicode's recommendation.

Manishearth · 2025-07-17T14:26:57Z

text/0000-compat-math-identifiers.md

+# Drawbacks
+[drawbacks]: #drawbacks
+
+* Characters like `𝛁𝛛𝛻𝜕𝜵𝝏𝝯𝞉𝞩𝟃` are easily confusable with their base versions `∂∇` and can lead to subtle bugs. However the precedence in Rust seems to be to add them alongside their base version but trigger the NFKC warning.


... also the confusables warning if they get mixed. We have redundant protections here.

Manishearth · 2025-07-17T14:33:42Z

text/0000-compat-math-identifiers.md

+
+The characters 5) are added to the set of Rust identifiers, but will trigger an NFKC warning when used:
+```
+warning: identifier contains a non normalized (NFKC) character: '𝛁'


Probably mention that this will be uncommon_codepoints.

If more characters are added to this set; while they may not always be non-NFKC, they will very likely still trigger uncommon_codepoints

Basically we should make it clear that using these characters will very likely trigger lints, even if more get added to the set.

* Clarified choice between syntactic and identifier use * Added link to a similar C++ proposal * Expanded the alternatives section discussing how characters could be given syntactic meaning instead

Danvil · 2025-07-17T19:42:45Z

@Manishearth @kennytm @Noratrieb thank you for the review! I added your suggestions and comments to the draft.

programmerjake · 2025-07-17T20:14:10Z

text/0000-compat-math-identifiers.md

 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives

 If this RFC is not implemented then everyone has to keep using ASCII characters for identifier in scientific code, for example `gradient_energy` or `a_12`.

 The impact of not implementing it should be fairly small, but implementing it could invite more scientific oriented people to the Rust language and make it easier for them to implement complex concepts.

+Alternatively Rust could decide to give the proposed characters syntatic meaning.
+
+Superscript characters could be interpreted as potentiation, for example `let a = 2; let b = a²;` could be a synonym to `let a = 2; let b = a * a;`.


I think a better word is exponentiation rather than potentiation (I've never heard of the latter in the context of mathematics).

programmerjake · 2025-07-17T20:26:34Z

text/0000-compat-math-identifiers.md

+
+`∞` could be a synonym or replacement to `f32::INIFITY`, however there is no precedence for using non-ASCII characters in `core`/`std` and this would likely meet considerable opposition.
+
+Derivatives could be added as a language features via auto-differentiation techniques thus giving `∇` and `∂` syntactic meaning, however there is no precedence of this in other languages and similar features are usually provided by libraries.


there is experimental support for automatic differentiation being worked on for rustc.
Mathematica supports the syntax $∂_{x}f$ for $\frac{∂f}{∂x}$ and the syntax $∇_{x}f$ for the gradient of $f$ with respect to $x$.

Thanks for the very interesting link! Done

Jules-Bertholet · 2025-07-19T16:53:48Z

text/3840-compat-math-identifiers.md

+Having these symbols available as Rust identifiers could simplify the implementation of these concepts and stay closer to a reference publication, thus reducing confusing and implementation errors.
+
+For example instead of:
+```


Suggested change

```

```rust

And so on for other code blocks. Makes syntax highlighting work

Thanks! Feel free to hit “resolve” on this review thread (and others you have addressed) so it doesn’t clutter the page

clarfonthey · 2025-07-20T06:30:25Z

text/3840-compat-math-identifiers.md

+Similarly `let a = [2, 0]; let b = a₁;` will naturally give a compiler error that `a₁` is an unknown identifier and not be interpreted as `let b = a[0];`.
+`∞` will just be a character usable in identifiers and not be a synonym to the likes of `f32::INFINITY`.
+
+The characters 5) are added to the set of Rust identifiers, but will trigger an NFKC or `uncommon_codepoints` warning when used depending on their Unicode classification.


FWIW, it's worth noting that actually the characters from 3) and 4) would also be included in the NFKC warning; if you look at the definition of NFKC, superscripts and subscripts are an explicit example: https://unicode.org/reports/tr15/#Compatibility_Composite_Figure

So, it's worth noting that only three characters from this wouldn't trigger the warning. These are still three good characters to include, but it's worth noting for accuracy.

Another side note to mention, also to clarify the above, is that NFKC effectively removes super/subscripts when normalizing, and I personally think it's kind of weird that this results in some normalized characters which are not normally allowed in identifiers. (for example, parentheses and +/- signs)

I think it's particularly strange that these are included in the definition at all, but I guess it kind of makes sense.

RFC: ID_Compat_Math characters allowed in identifiers

32f0414

programmerjake reviewed Jul 16, 2025

View reviewed changes

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

Links to Unicode and various improvements

60dabca

* Added links to UAX31 and others as requested in CR * Fixed typos as requested in CR * Extended the drawbacks section * Other improvements

ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jul 16, 2025

programmerjake reviewed Jul 16, 2025

View reviewed changes

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

Danvil added 2 commits July 16, 2025 17:24

Improved grammar of a sentence

b52ddab

Added another, longer motivating example

fe3d007

kennytm reviewed Jul 17, 2025

View reviewed changes

text/0000-compat-math-identifiers.md Outdated Show resolved Hide resolved

Noratrieb reviewed Jul 17, 2025

View reviewed changes

Manishearth reviewed Jul 17, 2025

View reviewed changes

Changes suggested by CR and expanded alternatives

e6d4ec2

* Clarified choice between syntactic and identifier use * Added link to a similar C++ proposal * Expanded the alternatives section discussing how characters could be given syntactic meaning instead

programmerjake reviewed Jul 17, 2025

View reviewed changes

Danvil added 2 commits July 17, 2025 14:01

Added link to a Rust autodiff experiment and improved some wording

c1fe8b4

Updated file name with issue number

b58cbb6

Jules-Bertholet reviewed Jul 19, 2025

View reviewed changes

Annotated code blocks with ```rust to enable syntax highlighting

1113b8a

clarfonthey reviewed Jul 20, 2025

View reviewed changes


		* Rust might want to decide in the future to give certain superscripts and subscripts syntactic meaning. For example they might want to interpret `a²` as `a * a` or `a₁` as `a[0]`. The latter sounds espcially unlikely though due to the general disagreement of 0-based vs 1-based indexing.

		# Rationale and alternatives


		`∞` could be a synonym or replacement to `f32::INIFITY`, however there is no precedence for using non-ASCII characters in `core`/`std` and this would likely meet considerable opposition.

		Derivatives could be added as a language features via auto-differentiation techniques thus giving `∇` and `∂` syntactic meaning, however there is no precedence of this in other languages and similar features are usually provided by libraries.

RFC: ID_Compat_Math characters allowed in identifiers #3840

Are you sure you want to change the base?

RFC: ID_Compat_Math characters allowed in identifiers #3840

Conversation

Danvil commented Jul 16, 2025 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clarfonthey commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Danvil commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Noratrieb Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Noratrieb commented Jul 17, 2025

Uh oh!

Manishearth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Danvil commented Jul 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jules-Bertholet Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Danvil commented Jul 16, 2025 •

edited by rustbot

Loading

Noratrieb Jul 17, 2025 •

edited

Loading

Jules-Bertholet Jul 19, 2025 •

edited

Loading