Skip to content

Conversation

tobiasdiez
Copy link
Contributor

@tobiasdiez tobiasdiez commented Aug 19, 2025

Add the typing information that is present in the method signature automatically to the docs.
So for def format_unit(value: float, unit: str) -> str: both the type of the parameters, as well as the return type are extracted and shown in the docs. This is done using the https://github.com/tox-dev/sphinx-autodoc-typehints package. Fixes #30894.

For this to work, one needs to parse Sage's custom docstring syntax and convert them to standard sphinx commands like :params. This is accomplished in the newly added sage transform extension. As a nice byproduct the headers of these sections are now shown as intended by the furo style (ie as headings), and one additionally gets stronger uniformity as the docstrings are now syntax-checked (at least to some degree).

The parsing also works for other sections (like authors and references). Similar work in this direction had be done in #37614).

A good example that already contains quite a few typing info is at https://doc-pr-40634--sagemath.netlify.app/html/en/reference/manifolds/sage/manifolds/differentiable/symplectic_form.html#sage.manifolds.differentiable.symplectic_form.SymplecticForm.

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

Copy link

github-actions bot commented Aug 19, 2025

Documentation preview for this PR (built with commit bd9b771; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@mantepse
Copy link
Contributor

mantepse commented Aug 20, 2025

There is something odd with the rendering: some of the items below Parameters are indented and bold, the remaining ones get a bullet and are not indented. Also, the first (indented line) should actually be a single line.

For example, in https://doc-pr-40634--sagemath.netlify.app/html/en/reference/combinat/sage/combinat/bijectionist

A toolbox to list all possible bijections between two finite sets under various constraints.

Parameters:

        A – sets of equal size, given as a list

        B – sets of equal size, given as a list

    tau – (optional) a function from B to Z, in case of None, the identity map lambda x: x is used

@vincentmacri
Copy link
Member

image

Some formatting inconsistencies here. None is a link to the python docs for the type, but not for the default value.

The default value of 'omega' is displayed as omega which is the same style as code variables like name.

Can you apply whatever formatting you applied to None in the typing info to the default value? And put quotes around or otherwise differentiate string default values from other default values?

@vincentmacri
Copy link
Member

Some formatting inconsistencies here. None is a link to the python docs for the type, but not for the default value.

The default value of 'omega' is displayed as omega which is the same style as code variables like name.

This might be because what the docstring gives as the default value of name (omega) does not match what the actual function definition uses as the default value (None).

@vincentmacri
Copy link
Member

vincentmacri commented Aug 20, 2025

I think this closes #30894, so you can add "Fixes #30894" to the PR description.

@mantepse
Copy link
Contributor

The parsing for https://doc-pr-40634--sagemath.netlify.app/html/en/reference/combinat/sage/combinat/bijectionist#sage-combinat-bijectionist looks mostly correct now, except that, apparently,

    - ``A``, ``B`` -- sets of equal size, given as a list

is turned into

    - ``A`` -- sets of equal size, given as a list
    - ``B`` -- sets of equal size, given as a list

which is confusing.

@tobiasdiez
Copy link
Contributor Author

image

Some formatting inconsistencies here. None is a link to the python docs for the type, but not for the default value.

That comes from the fact that None here is used in two different ways: once as a type (that one is linked) and once as a value (that one is formatted as a code variable).

I should also note that the typing extension has the ability to automatically add "default: xyz" to the parameter description in the docs based on the function signature. I've disabled this for now, since otherwise one gets a duplication when the existing docs already point out the default value. Any suggestions on how to proceed here? Perhaps, it's okay to accept that duplication until someone finds the time to cleanup the docstring?

I think this closes #30894, so you can add "Fixes #30894" to the PR description.

Right, thanks. Gosh, it only took me 5 years to finally work on this ;-)

The parsing for https://doc-pr-40634--sagemath.netlify.app/html/en/reference/combinat/sage/combinat/bijectionist#sage-combinat-bijectionist looks mostly correct now, except that, apparently,

    - ``A``, ``B`` -- sets of equal size, given as a list

is turned into

    - ``A`` -- sets of equal size, given as a list
    - ``B`` -- sets of equal size, given as a list

which is confusing.

I agree this is a bit confusing and suboptimal. I oriented myself here at the numpy doc style (https://numpydoc.readthedocs.io/en/latest/format.html#parameters), which allow to specify the description in a similar way:

x1, x2 :
    Input arrays, description of `x1`, `x2`.

And according to https://github.com/sphinx-doc/sphinx/blob/e8ab5cf1c57d106f57cb6c77b60dc4a5ae2c9a37/tests/test_extensions/test_ext_napoleon_docstring.py#L1696-L1699
this code translates into

:param x1: Input arrays, description of ``x1``, ``x2``.
:param x2: Input arrays, description of ``x1``, ``x2``.

so a complete duplication of the description. I think the reason is that the type could be different, e.g.

def print(a: int, b: float):
   """
   INPUT:
      a,b -- value to be printed
   """

would need separate bullet points for a and b due to their different types.


I clicked around the docs and I'm quite happy how they look now. Could very well be that certain docstrings are not parsed completely correctly - if you see any, let me know and I'll fix them still here. Otherwise we can always improve things later.

@vincentmacri
Copy link
Member

I should also note that the typing extension has the ability to automatically add "default: xyz" to the parameter description in the docs based on the function signature. I've disabled this for now, since otherwise one gets a duplication when the existing docs already point out the default value. Any suggestions on how to proceed here? Perhaps, it's okay to accept that duplication until someone finds the time to cleanup the docstring?

Is is possible to add the default value from the function signature only if a default value is not specified? And then update the relevant part of the developer guide to say that docstrings should not include the default value unless they need to override what the function signature says. Ideally we should never need to override the default value in the function signature in the docstring, but I'm sure there's some weird edge case I'm not thinking of.

When updating the developer guide I'd also mention that we previouslyincluded default values in docstrings but it is no longer needed and they can be removed as part of refactoring/formatting now.

I clicked around the docs and I'm quite happy how they look now. Could very well be that certain docstrings are not parsed completely correctly - if you see any, let me know and I'll fix them still here. Otherwise we can always improve things later.

These two look weird, the parameter is described by a bulleted list which I think is causing formatting issues. There are probably other docstrings with this issue as well:

@mantepse
Copy link
Contributor

The parsing for https://doc-pr-40634--sagemath.netlify.app/html/en/reference/combinat/sage/combinat/bijectionist#sage-combinat-bijectionist looks mostly correct now, except that, apparently,

    - ``A``, ``B`` -- sets of equal size, given as a list

is turned into

    - ``A`` -- sets of equal size, given as a list
    - ``B`` -- sets of equal size, given as a list

which is confusing.

I agree this is a bit confusing and suboptimal. I oriented myself here at the numpy doc style (https://numpydoc.readthedocs.io/en/latest/format.html#parameters), which allow to specify the description in a similar way:

x1, x2 :
    Input arrays, description of `x1`, `x2`.

And according to https://github.com/sphinx-doc/sphinx/blob/e8ab5cf1c57d106f57cb6c77b60dc4a5ae2c9a37/tests/test_extensions/test_ext_napoleon_docstring.py#L1696-L1699 this code translates into

:param x1: Input arrays, description of ``x1``, ``x2``.
:param x2: Input arrays, description of ``x1``, ``x2``.

so a complete duplication of the description. I think the reason is that the type could be different, e.g.

def print(a: int, b: float):
   """
   INPUT:
      a,b -- value to be printed
   """

would need separate bullet points for a and b due to their different types.

So, how should this be fixed (in this particular instance)? I wouldn't be surprised if several parameters that share descriptions (possibly depending on each other) occur frequently. It would be good to have a standard pattern for this situation. Note that, in the case above, it will be hard for the reader to figure out what is meant.

@tobiasdiez
Copy link
Contributor Author

tobiasdiez commented Aug 23, 2025

I should also note that the typing extension has the ability to automatically add "default: xyz" to the parameter description in the docs based on the function signature. I've disabled this for now, since otherwise one gets a duplication when the existing docs already point out the default value. Any suggestions on how to proceed here? Perhaps, it's okay to accept that duplication until someone finds the time to cleanup the docstring?

Is is possible to add the default value from the function signature only if a default value is not specified?

Maybe, but this looks somewhat complicated. One could patch inspect.signature to not extract the default value if it's set in the description, which then hopefully propagates via https://github.com/sphinx-doc/sphinx/blob/e8ab5cf1c57d106f57cb6c77b60dc4a5ae2c9a37/sphinx/util/inspect.py#L721 to the the typing extension (https://github.com/tox-dev/sphinx-autodoc-typehints/blob/7dd4a4e73f64840611aa5931050a52f418dea62c/src/sphinx_autodoc_typehints/__init__.py#L773). But the default value in the description is also not standardized. Sometimes it's "(default: xyz)", sometimes "defaults to xyz", sometimes "default: xyz" at the end...

These two look weird, the parameter is described by a bulleted list which I think is causing formatting issues. There are probably other docstrings with this issue as well:

* https://doc-pr-40634--sagemath.netlify.app/html/en/reference/curves/sage/schemes/curves/affine_curve.html#sage.schemes.curves.affine_curve.AffinePlaneCurve_finite_field.rational_points

* https://doc-pr-40634--sagemath.netlify.app/html/en/reference/finite_rings/sage/rings/finite_rings/element_base.html#sage.rings.finite_rings.element_base.FinitePolyExtElement.charpoly

This is sadly a sphinx bug, see sphinx-doc/sphinx#4220, sphinx-doc/sphinx#2768. (Side note: at least the rational_points description looks better than with the current develop branch: https://doc-develop--sagemath.netlify.app/html/en/reference/curves/sage/schemes/curves/affine_curve.html#sage.schemes.curves.affine_curve.AffinePlaneCurve_finite_field.rational_points)

would need separate bullet points for a and b due to their different types.

So, how should this be fixed (in this particular instance)? I wouldn't be surprised if several parameters that share descriptions (possibly depending on each other) occur frequently. It would be good to have a standard pattern for this situation. Note that, in the case above, it will be hard for the reader to figure out what is meant.

If it's really the same description, you can use a comma between the paramaters like in

- ``*args``, **kwargs`` -- passed verbatim to another function

If the description is actually not the same, one needs to use the more verbose way:

- ``A`` -- a list of something
- ``B`` -- a list of something, has to have the same size as `A`

In general, there is a slight change of perspective. Sage's INPUT is more used to generally describe the input of the function as a whole, while Sphinx's Parameters section is a list of parameters, that then each are described in detail.
Most often that boils down to the same thing, but in a handful of places an INPUT section contains a formulation like

either specify 
- x -- coord 
- y -- coord
or 
- vec -- tuple of coords

This is not possible with Sphinx parameter list and needs to be reworked to display nicely.

@vincentmacri
Copy link
Member

vincentmacri commented Aug 25, 2025

the default value in the description is also not standardized. Sometimes it's "(default: xyz)", sometimes "defaults to xyz", sometimes "default: xyz" at the end...

The (default: xyz) syntax is what our style guide says is the correct syntax, so I'm comfortable with it only working for that case. If there are only a handful of cases where the other syntax is used then it can be changed in this PR, otherwise leave it for later.

That said, if it's too complicated we can leave that for future work. The default value shows up in the function signature in the docs anyway.


This next question is somewhat open-ended, not sure if there's a right answer.

The style guide says this about types in docstrings:

The type names should be descriptive, but do not have to represent the exact Sage/Python types. For example, use “integer” for anything that behaves like an integer, rather than “int” or “Integer”.

This is because sometimes the typing information is more technical than users might need, or the actual valid inputs are more restrictive than the type (if the function accepts a prime integer then the type is still just Integer). On the other hand, for type annotations we do need to say int or Integer (or int | Integer).

Consider the parameter g in DiffieHellman. The user only needs to know that the function accepts something that behaves like an integer. When I was writing this code, I expected users to call this constructor with g either an Integer or an element of GF(p). The annotation that covers all possible types for a member of GF(p) is IntegerMod_abstract. So I made the type annotation Integer | IntegerMod_abstract. With this PR the DiffieHellman parameter g now renders as (Integer | IntegerMod_abstract). I don't think the user needs to know about IntegerMod_abstract, they only need to know that it accepts an element of GF(p), but IntegerMod_abstract is the correct type annotation for this. Do you think displaying this might be confusing for users? Similarly, we have many functions that work for either Integer or int and those would now render as Integer | int.

Either way, I think with this PR it is appropriate to update the General Conventions in our Developer Guide to say that the type information can be omitted from the docstring if it is included in a type annotation, at least for basic Python types like bool and str. I'm not sure where that leaves Integer/int though.

@tobiasdiez
Copy link
Contributor Author

That said, if it's too complicated we can leave that for future work. The default value shows up in the function signature in the docs anyway.

I think this would be a nice addition, but would leave it for a follow-up PR.

The issue you raise about the various forms of integers is a good one, and I don't really have a solution for now. My intuition would be to create a type alias (say IntegerLike = int | Integer | some other ones?), document and use that alias then in other functions. But as your example showed, there might be need for further specialized "integer" type alias, or perhaps a completely different solution. Maybe the problem and solution becomes crystalizes itself once we gain more experience with the typing system.

Either way, I think with this PR it is appropriate to update the General Conventions in our Developer Guide to say that the type information can be omitted from the docstring if it is included in a type annotation

Good point, done now.

@vincentmacri
Copy link
Member

Either way, I think with this PR it is appropriate to update the General Conventions in our Developer Guide to say that the type information can be omitted from the docstring if it is included in a type annotation

Good point, done now.

That description looks good. I would like to see something added about the Integer | int situation before merging, but I'm not entirely sure what it should be yet. More thoughts on that next.

The issue you raise about the various forms of integers is a good one, and I don't really have a solution for now. My intuition would be to create a type alias (say IntegerLike = int | Integer | some other ones?), document and use that alias then in other functions. But as your example showed, there might be need for further specialized "integer" type alias, or perhaps a completely different solution. Maybe the problem and solution becomes crystalizes itself once we gain more experience with the typing system.

Sometimes we actually want only one of int or Integer. A function that calls n.factor() works if n is an Integer but not an int. Some low-level things in Cython or functions like __len__ and __hash__ needs to use int and not Integer.

This is also complicated by differences between Sage the programming language and Sage the Python library. When used as a Python library int will be used frequently. When used as Sage the programming language Integer will be used pretty much everywhere except for things like __len__ and __hash__.

I think we should use generally use Integer except for situations where we actually want to use int. I would write the type annotations for the purpose of type-checking Sage itself, not for documenting everything possible input type that currently works for user code. Then if a function that previously worked for int needs to be changed to only work for Integer it's not an API breaking change since we never claimed to officially support input of type int in the documentation. Of course, Python doesn't do runtime type-checking so we won't break anything by doing this if someone was calling some Sage function that happens to work with int.

If there is some function where we expect both will be used by Sage itself, or where we want to explicitly advertise int as supported then use Integer | int in those situations so it's clear that supporting both was deliberate.

@tobiasdiez
Copy link
Contributor Author

tobiasdiez commented Aug 28, 2025

I generally agree, but would say that int | Integer should be the default (except of course if really only one of them is supported). Otherwise you get typing errors when calling an Integer-annotated function using plain integers as in
image

Personally, the main advantage of the typing info is for the convenience while developing for sage. Thus, I would say the typing info should cater for the use case of "Sage the Python library". (sage the language also doesn't support type checking, right?)


Do you have a suggestion where one should add the type alias for int | Integer, and how one should call it?

@vincentmacri
Copy link
Member

vincentmacri commented Aug 28, 2025

Personally, the main advantage of the typing info is for the convenience while developing for sage. Thus, I would say the typing info should cater for the use case of "Sage the Python library". (sage the language also doesn't support type checking, right?)

While developing for Sage (i.e. what we do in this GitHub repo), I think we should usually be using Integer and not int. So in most cases I would say your linter is correct to flag that.

int | Integer should be the default

I assume you only mean for parameters, not returns types, since the return type should usually be one of the two (unless perhaps if it returns the same type that was input, but I'd prefer if that was rare). I'm still not entirely convinced the default for parameters should be int | Integer instead of Integer but you've been working on this stuff longer so I'll defer to you.

Wouldn't this make compatibility with int part of the API then and require us to go through a deprecation period if a function that worked for int is changed to only work for Integer? Although this is probably a rare enough situation that it's not worth worrying about. One could always convert to Integer in the function itself if really necessary.

Do you have a suggestion where one should add the type alias for int | Integer, and how one should call it?

I think I would make a new file for all type aliases, maybe sage/misc/types.py. Then any type unions or things like that are all documented in one place. We could probably use TYPE_CHECKING to avoid import overhead at runtime. Might even be worth adding aliases for some of the most common type imports in there as well like Integer. As for the name of Integer | int, I think Int should work.

Something like this:

types.py

"""This module defines and documents common type aliases and unions used in Sage."""

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    # Import modules, define type unions, aliases, etc.
    # All those imports might be expensive at runtime, hence why this is wrapped in `if TYPE_CHECKING`

    type Int = Integer | int
    """Type alias for Sage integers or Python integers, used to annotate parameters of functions that accept both."""

    type ModInt = Int | IntegerMod_abstract
    """Type alias for types that can be used in Sage's modular arithmetic."""

    # etc.

@vincentmacri
Copy link
Member

If we want to force people to also wrap the call to import sage.misc.types in an if TYPE_CHECKING: block we could add this to the end of types.py:

else:
    assert False, 'typing.py can only be imported in an `if TYPE_CHECKING:` block, not at runtime'

The overhead of importing types.py might be negligible if the body of the module is wrapped in if TYPE_CHECKING so not sure if this is necessary.

Copy link
Member

@vincentmacri vincentmacri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is pending work (also testing the labels for https://groups.google.com/g/sage-devel/c/jTpHiM7YmTM)

Copy link
Member

@vincentmacri vincentmacri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s: needs work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add typing info to documentation using sphinx_autodoc_typehints
3 participants