Skip to content

✨: add CanArrayX protocols #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

nstarman
Copy link
Collaborator

@nstarman nstarman commented Jun 22, 2025

No description provided.

@nstarman
Copy link
Collaborator Author

Ok This PR is doing too much. Let me pair it down to just a few Protocols and do the rest as a series of followups.

@nstarman nstarman force-pushed the has_x branch 5 times, most recently from 96067a4 to a1be18e Compare June 23, 2025 19:19
@nstarman nstarman marked this pull request as ready for review June 23, 2025 21:59
@nstarman nstarman requested a review from jorenham June 23, 2025 21:59
@nstarman
Copy link
Collaborator Author

Ping @NeilGirdhar, given related discussions.

@nstarman nstarman changed the title ✨: add HasArrayX protocols ✨: add CanArrayX protocols Jun 23, 2025
@nstarman
Copy link
Collaborator Author

Should all the Protocols inherit from HasArrayNamespace?
Also should it be rename to CanArrayNamespace ?

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jun 23, 2025

Should all the Protocols inherit from HasArrayNamespace?
Also should it be rename to CanArrayNamespace ?

I don't know what Joren will say, but I would guess no and no? (I think you got it right in this PR?)

Also, I'm guessing you're aware that int | float is float, and you're intentionally specifying both?

@nstarman
Copy link
Collaborator Author

nstarman commented Jun 24, 2025

don't know what Joren will say, but I would guess no

My thought was for building stuff like

class Positive(Protocol):
    def __call__(self, array: CanArrayPos, /) -> CanArrayPos: ...

is wrong.

It should be something like

class Positive(Protocol):
    def __call__(self, array: HasArrayNamespace, /) -> HasArrayNamespace: ...

But I think we want

class Positive(Protocol):
    def __call__(self, array: CanArrayPos, /) -> HasArrayNamespace: ...

Which I think works best if it's

class CanArrayPos(HasArrayNamespace, Protocol): ...

Also, I'm guessing you're aware that int | float is float, and you're intentionally specifying both?

Yes. :).

@NeilGirdhar
Copy link
Contributor

I see, you're kind of using it as a poor man's intersection?

Also, I'm guessing you're aware that int | float is float, and you're intentionally specifying both?

Yes. :).

Okay, is that because you're going to generate some documentation from these annotations? Or you find it less confusing?

Also, are you going to add complex to the union?

@nstarman
Copy link
Collaborator Author

Okay, is that because you're going to generate some documentation from these annotations? Or you find it less confusing?

It's for 2 reasons: the array api does it in their docs and because I think the Python numerical tower is a mess and since ints and floats aren't subclasses of each other, it makes little sense for them to be interchangeable at the static type level. 😤😆

Also, are you going to add complex to the union?

Worth discussing. The array api does not.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jun 24, 2025

It's for 2 reasons: the array api does it in their docs

The docs are that way to help beginners who might be confused. (At least that was the argument that was presented.) But you aren't expecting beginners to read your code, are you?

And, you aren't using this repo to build docs?

The downside of populating the unions unnecessarily is overcomplicated type errors. So from a user standpoint, I think this is worse.

From a developer standpoint, it's a matter of taste. Personally, I think more succinct is easier to understand.

because I think the Python numerical tower is a mess and since ints and floats aren't subclasses of each other, it makes little sense for them to be interchangeable at the static type level.

As much as you might like to turn back time and change the typing decisions that were made, the fact is that the static type int is a subclass of float as far as type checkers are concerned, and that will not change for the foreseeable future.

I think I understand what you're doing and why. I spent years writing if x != 0 for a similar reason. But I think this is a fact that you just have to accept even if you dislike it.

Worth discussing. The array api does not.

Does it not?

array.__add__(other: int | float | complex | array, /) → array

Have I misunderstood the documentation?

@nstarman
Copy link
Collaborator Author

nstarman commented Jun 24, 2025

Ah. We're building towards v2021 first.
A release branch for every major version.
The versions have almost been entirely additive, so it's not too onerous.
This also makes backporting easier.

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier to use optype for this, as it already provides single-method generic protocols for each of the special dunders:

https://github.com/jorenham/optype/blob/master/optype/_core/_can.py

There's even documentation: https://github.com/jorenham/optype#binary-operations

And of course it's tested and thoroughly type-checked and stuff

@nstarman
Copy link
Collaborator Author

nstarman commented Jun 24, 2025

Sounds good to me...
It's good to have in-house expertise.

CleanShot 2025-06-24 at 10 22 36@2x

@nstarman
Copy link
Collaborator Author

nstarman commented Jul 1, 2025

@jorenham is this prep for using optype?

@nstarman nstarman mentioned this pull request Jul 1, 2025
@jorenham
Copy link
Member

jorenham commented Jul 1, 2025

@jorenham is this prep for using optype?

Yea, pretty much.

@nstarman
Copy link
Collaborator Author

nstarman commented Jul 9, 2025

And, you aren't using this repo to build docs?

I think the plan is to build docs, which would show the types.

@nstarman
Copy link
Collaborator Author

nstarman commented Jul 9, 2025

@jorenham do you want to switch some of these to be optype objects, or does the Self and docstring mean we should go ahead with rolling our own Protocols ?

@jorenham
Copy link
Member

jorenham commented Jul 9, 2025

@jorenham do you want to switch some of these to be optype objects, or does the Self and docstring mean we should go ahead with rolling our own Protocols ?

I've thought about this, but I'm not sure what the best approach is. I considered four approaches:

  1. Use optype but monkeypatch the __doc__ of the protocols. The downside is that we'd pollute these protocols, which might be annoying for users that use optype for other things as well.
  2. Bundle optype as git submodule, so that we can monkeypatch __doc__ without polluting the "actual" optype protocols.
  3. We write our own protocols (copy-pasting those of optype). This won't pollute optype, but we'd have to do quite a lot of work to write- test- and maintain them.
  4. Use optype, but ignore the docstrings. If we later want docstrings after all, then we can revisit the 3 options above.

Now that I've written these down, I think I feel most for option 4. As far as I'm concerned, docstrings are a "should-have", not a "must-have" (MoSCow jargon). By postponing worrying about docstrings, we can focus on building the actual functionality first. This feels like the most agile approach to me.

Thoughts?

Signed-off-by: Nathaniel Starkman <[email protected]>
@nstarman nstarman added this to the v2021-12-0.0 milestone Jul 21, 2025
@nstarman nstarman added the ✨ feature Introduce new features. label Jul 21, 2025
jorenham and others added 2 commits July 21, 2025 09:28
📌 pin the correct python version
⬆️ update optype version

Co-authored-by: Nathaniel Starkman <[email protected]>
@nstarman nstarman force-pushed the has_x branch 7 times, most recently from 083f5dc to fe8dc7c Compare July 21, 2025 14:09
@nstarman
Copy link
Collaborator Author

nstarman commented Jul 21, 2025

Almost got it. Struggling with numpy < 2 atm.
np.array_api doesn't have class-based dtypes. This caused a lot of issues. It also didn't like np.dtype subclasses, only python dtypes (e.g. float).

@nstarman nstarman force-pushed the has_x branch 8 times, most recently from 78edc18 to d4485f4 Compare July 21, 2025 14:39
@nstarman
Copy link
Collaborator Author

@jorenham. This PR might finally be ready.

@nstarman nstarman added ➕ deps Add a dependency. 🚚 mv Move or rename resources (e.g.: files, paths, routes). labels Jul 21, 2025
Comment on lines +30 to +31
"optype>=0.9.3; python_version < '3.11'",
"optype>=0.12.2; python_version >= '3.11'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes... Do you think we really need 3.10 support, or could we get away with dropping it? I mean, I could add py310 support back to optype, but I'd really like to avoid that if I can 😅.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering. This wasn't my commit...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol I wasn't sure what I was thinking when I did that.

But either way, my question still stands.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can drop 3.10!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CI doesn't seem to mind this, we can do that in a follow-up then.

"FBT", # flake8-boolean-trap
"FIX", # flake8-fixme
"ISC001", # Conflicts with formatter
"PLW1641", # Object does not implement `__hash__` method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"PLW1641", # Object does not implement `__hash__` method

with _docstrings_path.open("rb") as f:
_array_docstrings = tomllib.load(f)["docstrings"]

NS_co = TypeVar("NS_co", covariant=True, default=ModuleType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this NamespaceT_co instead?

Personally, I've kinda grown attached to having a trailing T in type parameters. That way you can always tell at a glance if something is a type parameter or something else. Especially in case of numpy, where the _co suffix is also used for array-likes with "coercible" dtypes, such as numpy._typing._ArrayLikeInt_co.

But that being said, in this context here, it's probably won't be much of a problem. So it'd mostly be for the sake of consistency I suppose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Comment on lines +24 to +25
with _docstrings_path.open("rb") as f:
_array_docstrings = tomllib.load(f)["docstrings"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a mouthful, but it helps with namespace pollution (I'm talking about the f).

Suggested change
with _docstrings_path.open("rb") as f:
_array_docstrings = tomllib.load(f)["docstrings"]
_array_docstrings = tomllib.loads(_docstrings_path.read_text())["docstrings"]

And _array_docstrings will now be inferred as Any, so it could use an annotation (e.g. one wrapped as Final).

Also, since these are constants (the path too), uppercase names seem appropriate.



class HasArrayNamespace(Protocol[NS_co]):
"""Protocol for classes that have an `__array_namespace__` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should mention that this is only intended for static typing, not for runtime things. And maybe also that this is intended as purely structural type, not a nominal one (i.e. not intended as base class).

op.CanMulSame[T_contra, R_co],
op.CanTruedivSame[T_contra, R_co],
op.CanFloordivSame[T_contra, R_co],
op.CanModSame[T_contra, R_co],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> np.array([True]) % np.array([True])
array([0], dtype=int8)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by R_co, right?

Array[bool, Array[float | int, _, _], _]

op.CanTruedivSame[T_contra, R_co],
op.CanFloordivSame[T_contra, R_co],
op.CanModSame[T_contra, R_co],
op.CanPowSame[T_contra, R_co],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> np.array([True]) ** np.array([True])
array([1], dtype=int8)

Copy link
Collaborator Author

@nstarman nstarman Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by R_co, right?

Array[bool, Array[float | int, _, _], _]

Copy link
Member

@jorenham jorenham Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But R_co defaults to Never, in which case it would reject boolean arrays.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the inner R_co?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant in general

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the specific case of Array[bool, Array[Any, _, _], _]
This should work for everything, but erases a lot of information on the return type.

x: Array[bool, Array[Any, _, _], _] = np.array([True])
y = x ** x  # Array[Any, _, _] 

Something more specific would be

x: Array[bool, Array[float | int, _, _], _] = np.array([True])
y = x ** x  # Array[Any, _, _] 

op.CanFloordivSame[T_contra, R_co],
op.CanModSame[T_contra, R_co],
op.CanPowSame[T_contra, R_co],
Protocol[T_contra, R_co, NS_co],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put NS_co first?

And what is the purpose of R_co here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put NS_co first?

It's the least likely to be used.

And what is the purpose of R_co here?

Return types. e.g BoolArray = Array[DType_co, bool, Array[Any, float, Never, NS_co], NS_co]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the least likely to be used.

The array namespace is where most of the static information lives, so I expect it will be used quite a lot, actually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm also fine with separating the two, i.e. the CanArrayNamespace and Array

Copy link
Collaborator Author

@nstarman nstarman Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect it will be used quite a lot, actually.

In what way? In #34 we get the DType_co parameter

Array[+DType, -Other = Never, +R = Never, +NS = ModuleType] = Array[DType, Self | Other, Self | R, NS]

So the parameters are dtype, other allowed types for binary ops, the return types, and the array namespace. This will predominantly just be a Module, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way?

getting rid of the ND_co typar

This will predominantly just be a Module, right?

It'll be something that we can match against with protocols, so that we can obtain a bunch of juicy static typing details, like for example the return type of their abs() function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be something that we can match against with protocols, so that we can obtain a bunch of juicy static typing details, like for example the return type of their abs() function.

That will be very nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about whether tools like Pylance will be able to figure out the docstrings like this, given the dynamic nature. Did you check that already?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. It worked in Jupyter, but that's not static.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you're a data scientist :P ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computational astrophysicist. Close enough. 🤷.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity; are you on team "dark matter", or on one of the other ones?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dark matter. Though that doesn't stop me writing papers on the other ones 😆.

@nstarman nstarman mentioned this pull request Jul 22, 2025
@nstarman nstarman marked this pull request as draft July 22, 2025 14:24
@nstarman nstarman mentioned this pull request Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
➕ deps Add a dependency. ✨ feature Introduce new features. 🚚 mv Move or rename resources (e.g.: files, paths, routes).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants