Skip to content

Implement optional modifiers #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

T0mstone
Copy link
Collaborator

@T0mstone T0mstone commented Jul 4, 2025

Closes #51.

This is mostly done, but there are some unresolved questions and some issues and it still needs some finishing touches like documentation, which is why it's a draft for now.

This is also a breaking change since it changes the way ModifierSet works, so I should probably remember to bump the version in Cargo.toml before we merge this.

Changes to observable syntax

The way I've implemented it, the modifier? syntax applies directly in the ModifierSet, so it would be exposed to the users in typst too. (Or typst would have to do some extra sanitizing).

And also, it's not really clear how this whole new resolution algorithm would interact with user-defined symbols from typst: I've kept the code in best_match_in mostly the same for now, but we could strip it down way more if the match was actually guaranteed to be unique. (But of course, this may not be true for user-defined symbols.)

Changes to variant resolution

I have another branch where I wrote some ad-hoc code (mostly identical to what is now the no_overlap test) to brute-force check every possible modifier set for every symbol and generate a list of which ones differ between the old algorithm and the new one.

This is that list:

emoji.arrow : Some(("↙", None)) => Some(("➡", None))
emoji.bubble : Some(("💭", None)) => Some(("💬", None))
emoji.cloud.hidden : Some(("🌥", None)) => None
emoji.dancing.bunny : Some(("👯", None)) => None
emoji.dancing.women : Some(("👯", None)) => None
emoji.face.not : Some(("🫢", None)) => None
emoji.face.slight : Some(("🙁", None)) => None
emoji.face.withheld : Some(("🥹", None)) => None
emoji.faith : Some(("✝", None)) => None
emoji.faith.dot : Some(("🔯", None)) => None
emoji.finger.alt : Some(("☝", None)) => None
emoji.globe.af : Some(("🌍", None)) => None
emoji.globe.as : Some(("🌏", None)) => None
emoji.globe.au : Some(("🌏", None)) => None
emoji.globe.eu : Some(("🌍", None)) => None
emoji.handholding : Some(("👬", None)) => None
emoji.leaf.four : Some(("🍀", None)) => None
emoji.leaf.three : Some(("☘", None)) => None
emoji.monkey.not : Some(("🙉", None)) => None
emoji.moon.one : Some(("🌖", None)) => None
emoji.moon.face.three : Some(("🌜", None)) => None
emoji.moon.face.two : Some(("🌛", None)) => None
emoji.playback.once : Some(("🔂", None)) => None
emoji.playback.v : Some(("🔃", None)) => None
emoji.suit : Some(("♣", None)) => None
sym.angle.t : Some(("⦡", None)) => None
sym.angle.top : Some(("⦡", Some("`angle.spheric.top` is deprecated, use `angle.spheric.t` instead"))) => None
sym.arrow.half : Some(("↶", None)) => None
sym.arrows.stop : Some(("↹", None)) => None
sym.ballot.heavy : Some(("🗹", None)) => None
sym.colon.op : Some(("⫶", None)) => None
sym.dash.double : Some(("〰", None)) => None
sym.dash.three : Some(("⸻", None)) => None
sym.dash.two : Some(("⸺", None)) => None
sym.divides.rev : Some(("⫮", None)) => None
sym.dot.big : Some(("⨀", None)) => None
sym.emptyset.l : Some(("⦴", None)) => None
sym.emptyset.r : Some(("⦳", None)) => None
sym.eq.down : Some(("≒", None)) => None
sym.eq.up : Some(("≓", None)) => None
sym.gender.male.r : Some(("⚩", None)) => None
sym.gender.male.t : Some(("⚨", None)) => None
sym.gt.nested : Some(("⫸", None)) => None
sym.gt.slant : Some(("⩾", None)) => None
sym.integral.hook : Some(("⨗", None)) => None
sym.lt.nested : Some(("⫷", None)) => None
sym.lt.slant : Some(("⩽", None)) => None
sym.note.alt : Some(("♩", None)) => None
sym.note.beamed : Some(("♫", None)) => None
sym.note.slash : Some(("𝆔", None)) => None
sym.nothing.l : Some(("⦴", None)) => None
sym.nothing.r : Some(("⦳", None)) => None
sym.parallel.slanted : Some(("⧣", None)) => None
sym.parallel.slanted.tilde : Some(("⧤", None)) => None
sym.plus.arrow : Some(("⟴", None)) => None
sym.plus.big : Some(("⨁", None)) => None
sym.prec.curly : Some(("≼", None)) => None
sym.prec.curly.not : Some(("⋠", None)) => None
sym.prec.eq.not : Some(("⋠", None)) => None
sym.rest.measure : Some(("𝄩", None)) => None
sym.space.narrow : Some(("\u{202f}", None)) => None
sym.subset.not.sq : Some(("⋢", None)) => None
sym.succ.curly : Some(("≽", None)) => None
sym.succ.curly.not : Some(("⋡", None)) => None
sym.succ.eq.not : Some(("⋡", None)) => None
sym.suit : Some(("♣", None)) => None
sym.suit.filled : Some(("♣", None)) => None
sym.suit.stroked : Some(("♧", None)) => None
sym.supset.not.sq : Some(("⋣", None)) => None
Most of these should be net positives, but there are also some cases like emoji.globe where a variant has two optional modifiers, of which at least one needs to be present since it is not the default variant. This could in theory be encoded by duplicating the variant, but that'd be observable in the repr and also a bit hacky, so I haven't done it for now.

Other stuff

I have some other minor things I'd like to say, but I'm too tired now, so I'll add them later.

@T0mstone T0mstone added meta Discussion about the structure of this repo breaking This involves a breaking change labels Jul 4, 2025
@knuesel
Copy link
Collaborator

knuesel commented Jul 4, 2025

This looks great! The list shows how useful this is to avoid ambiguous cases that would easily break in the future.

For the globe it is a bit unfortunate as it makes sense to write globe.eu for the globe that shows "Europe and I don't care what else".

Maybe a solution would be to define

globe
  .as?.au 🌏
  .as.au? 🌏
  .eu?.af 🌍
  .eu.af? 🌍

and have the validation code recognize that .as?.au and .as.au? don't clash because it's the same variant?

@T0mstone
Copy link
Collaborator Author

T0mstone commented Jul 4, 2025

Maybe a solution would be to define

globe
  .as?.au 🌏
  .as.au? 🌏
  .eu?.af 🌍
  .eu.af? 🌍

and have the validation code recognize that .as?.au and .as.au? don't clash because it's the same variant?

Yes, that's what I meant with "duplicating the variant". Tho your example wouldn't work since globe.as.au would be ambiguous, but the following would work:

globe
  .as.au? 🌏
  .au 🌏
  .eu.af? 🌍
  .af 🌍

The issue with this is something that ties into the next thing I want to talk about here (see my comment below this).

@T0mstone
Copy link
Collaborator Author

T0mstone commented Jul 4, 2025

These changes to ModifierSet require us to add some extra code to typst if we don't want them to be observable to the end users, because if we use it as-is, the modifier? syntax will be exposed. (I've already said this above.)

Now my point here is that I actually think exposing this is a good thing and we should give typst users access to the new resolution system too, so custom symbols aren't second-class citizens.
But that poses the question of what to do when it comes to verifying unique abbreviations.

In codex, we can just run the test no_overlap, which is very inefficient and takes a few seconds on my machine, but typst has stricter performance requirements.
In particular, I see two options there:

  1. Somehow implement uniqueness checking for typst as well.(maybe pull it from the test into a global function that works with any Symbol-likes) This would probably require some heavy optimization of which I'm not sure whether it's possible.
  2. Let users define symbols with overlapping abbreviations and retain the extra code in the loop in best_match_in to handle these ambiguities when they exist. This has a huge performance benefit, but comes at the cost of not letting package authors get the same guarantees that codex now has for forward-compatibility. (Maybe we could expose a uniqueness check explicitly as a method on typst's symbol, so package authors can incorporate it into their tests?)

@T0mstone
Copy link
Collaborator Author

T0mstone commented Jul 4, 2025

Another small thing that confused me a bit (and why I had to double-check that sym.supset.not.sq should be invalid) is that DejaVu Sans Mono displays without the "or equal" part. I guess it's technically not totally wrong since that's also a notation some set theorists use, but it really feels like an error.

@knuesel
Copy link
Collaborator

knuesel commented Jul 14, 2025

Somehow implement uniqueness checking for typst as well.(maybe pull it from the test into a global function that works with any Symbol-likes) This would probably require some heavy optimization of which I'm not sure whether it's possible.

I think a fast implementation in Typst should be possible. To detect overlaps it should be sufficient to populate a dictionary where the keys are all "aliases" of all variants. The initial list of keys could be generated statically when building the Typst executable. Then checking overlap for a new symbol would just be a dict insertion for each new alias (negligible runtime) which would fail if the key already exists (unless I'm missing something and the check must be more sophisticated?).

@T0mstone
Copy link
Collaborator Author

Keeping a dict might be fine in practice, but it does have exponential space complexity in the worst case (symbol(("a?.b?.c?.d?.e?.f?.g?.h?", "🫨")) has 256 "aliases").

@knuesel
Copy link
Collaborator

knuesel commented Jul 15, 2025

If that's a concern, what about an algorithm that dynamically generates a dict for the symbol under consideration? I.e. when the user wants to define a.b?.c?.d, we can generate the dict of "aliases" for a. This should be very fast.

@T0mstone
Copy link
Collaborator Author

If that's a concern, what about an algorithm that dynamically generates a dict for the symbol under consideration? I.e. when the user wants to define a.b?.c?.d, we can generate the dict of "aliases" for a. This should be very fast.

That's already what I meant. But I suppose the memory usage isn't actually that bad for all sane applications...
And we explicitly don't need such a dict for the built-in symbols after all, since those are already checked with the no_overlap test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This involves a breaking change meta Discussion about the structure of this repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve forward compatibility with a notion of minimal modifier set
2 participants