Skip to content

Conversation

user202729
Copy link
Contributor

@user202729 user202729 commented Aug 15, 2025

Because PDF build does not handle Unicode characters correctly. And some are genuine bugs.

With this change, there should be less

Missing character: There is no τ (U+03C4) in font cmmi10!

in the documentation.

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

Copy link

github-actions bot commented Aug 15, 2025

Documentation preview for this PR (built with commit 83b9197; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@user202729 user202729 force-pushed the doc-use-latex branch 2 times, most recently from 916ea36 to 389559a Compare August 15, 2025 18:10
@vincentmacri
Copy link
Member

Some of this is definitely good, like the change from \^ to \wedge.

For things like \sigma and \tau, I think this makes the documentation slightly less readable when viewing it with a text interface like a Sage session in the terminal. Not a huge deal, but I think it should be possible to deal with this when the PDF is generated rather than in the docstrings. It looks like the mathletters option from the ucs LaTeX package can do this.

What commands exactly did you run to build the PDF doc? I've always had trouble with the Sage docbuild, especially with meson.

@user202729
Copy link
Contributor Author

What commands exactly did you run to build the PDF doc? I've always had trouble with the Sage docbuild, especially with meson.

I have troubles too. See #40290 . Would be nice if someone contribute to https://doc.sagemath.org/html/en/developer/sage_manuals.html to explain more clearly what needs to be done, I guess.

@user202729
Copy link
Contributor Author

it should be possible to deal with this when the PDF is generated rather than in the docstrings. It looks like the mathletters option from the ucs LaTeX package can do this.

The official LaTeX team doesn't want to support it, see https://tex.stackexchange.com/a/628285/250119 section 2.2.

We could just \usepackage{unicode-math} and change from Computer Modern font to Latin Modern font, but this would slow down the code also. In any case, LaTeX is the language defined by the LaTeX team, and they're mostly in favor of typing out the command, so it may be better to live with it after all.

How about fixing this on IPython?

@user202729
Copy link
Contributor Author

actually Sage already have logic to deal with this:

https://doc.sagemath.org/html/en/reference/misc/sage/misc/sagedoc.html

You may try it out like this:


sage: def f():
....:     r"""
....:     `1 \ge 0`
....:     """
sage: f?
Signature:      f()
Docstring:         1 >= 0
Init docstring: Initialize self.  See help(type(self)) for accurate signature.
File:           ...
Type:           function

So the problem is not modify that module to also substitute Unicode characters. Anyway, out of scope for this pull request.

@tobiasdiez
Copy link
Contributor

The pdf build uses lualatex. Hence, the package unicode-math should do the trick to allow unicode in docstrings.

@user202729
Copy link
Contributor Author

user202729 commented Aug 17, 2025

unicode-math has a few other quirks, for example, it uses Latin Modern Math font instead of Computer Modern Math. See https://tex.stackexchange.com/questions/125175/how-to-get-xelatex-unicode-math-output-as-close-as-possible-to-that-of-pdfla , for example \varnothing gives the same (more ugly in my opinion) symbol as \nothing in unicode-math.

Another issue is: MathJax does not support several Unicode characters e.g. $x²$ does not render as well as $x^2$, but unicode-math does. So using unicode-math might hide some issues in the rendered HTML.

In any case, everywhere else in the code base LaTeX commands are used. Discussion on whether to use unicode-math can be left for later.

@tobiasdiez
Copy link
Contributor

but I think it should be possible to deal with this when the PDF is generated rather than in the docstrings

I agree with this sentiment of @vincentmacri. So maybe just leave the tau's and sigma's untouched in this PR, and see in a follow-up PR if unicode-math (perhaps with a different other font) solves the pdf issues?

@user202729
Copy link
Contributor Author

There's another option: use unicode-math-input. As the maintainer of that package, I encourage people to use it... not really, since I'm the only maintainer and I don't look at it much, it may break at any time.

Also, there's the ². If you don't change it to ^2, it renders badly in the HTML also. At least everyone can agree to change that one right?

@vincentmacri
Copy link
Member

vincentmacri commented Aug 18, 2025

There's another option: use unicode-math-input. As the maintainer of that package, I encourage people to use it... not really, since I'm the only maintainer and I don't look at it much, it may break at any time.

I don't really care which package is used. LaTeX is stable enough that I'm not concerned about a package published on CTAN breaking, and I'm even less worried if the package maintainer is an active Sage contributor as well (like yourself).

Also, there's the ². If you don't change it to ^2, it renders badly in the HTML also. At least everyone can agree to change that one right?

Changing to ^2 is fine. It would be nice to get that working in Unicode but it seems difficult and I don't think ^2 is bad for readability.

@user202729
Copy link
Contributor Author

LaTeX is stable enough that I'm not concerned about a package published on CTAN breaking

Bad news, my packages on CTAN breaks all the time. For example when pdflatex was changed to disallow obtaining the \endwrite token, my hacks that require obtaining it breaks. Another example is when I use an experimental feature in expl3 that gets removed in a future version.

More bad news, if I understood correctly if it breaks there isn't really a way to quickly fix it. If I understood correctly currently the CI uses TeX Live distribution, and that one only gets updated once in a while (depending on the Linux distribution).

Anyway, we can discuss these options later, for now I'll just revert the σ and τ here.

@user202729
Copy link
Contributor Author

okay, reverted.

I think build PDF is currently broken (discussed in #40586) so you can't really test this apart from the HTML output.

@vincentmacri
Copy link
Member

LGTM

@vincentmacri
Copy link
Member

Mentioning this for future reference, just came across it.

Possibly useful for the LaTeX Greek letters, there is a textgreek feature in Sphinx, documented on this page: https://www.sphinx-doc.org/en/master/latex.html

vbraun pushed a commit to vbraun/sage that referenced this pull request Aug 21, 2025
sagemathgh-40589: Use LaTeX commands instead of Unicode characters
    
Because PDF build does not handle Unicode characters correctly. And some
are genuine bugs.

With this change, there should be less

```
Missing character: There is no τ (U+03C4) in font cmmi10!
```

in the documentation.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#40589
Reported by: user202729
Reviewer(s): Vincent Macri
@vbraun vbraun merged commit b08d1e2 into sagemath:develop Aug 27, 2025
23 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants