-
Notifications
You must be signed in to change notification settings - Fork 5
✨: add CanArrayX protocols #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Ok This PR is doing too much. Let me pair it down to just a few Protocols and do the rest as a series of followups. |
96067a4
to
a1be18e
Compare
Ping @NeilGirdhar, given related discussions. |
Should all the Protocols inherit from |
I don't know what Joren will say, but I would guess no and no? (I think you got it right in this PR?) Also, I'm guessing you're aware that |
My thought was for building stuff like class Positive(Protocol):
def __call__(self, array: CanArrayPos, /) -> CanArrayPos: ... is wrong. It should be something like class Positive(Protocol):
def __call__(self, array: HasArrayNamespace, /) -> HasArrayNamespace: ... But I think we want class Positive(Protocol):
def __call__(self, array: CanArrayPos, /) -> HasArrayNamespace: ... Which I think works best if it's class CanArrayPos(HasArrayNamespace, Protocol): ...
Yes. :). |
I see, you're kind of using it as a poor man's intersection?
Okay, is that because you're going to generate some documentation from these annotations? Or you find it less confusing? Also, are you going to add |
It's for 2 reasons: the array api does it in their docs and because I think the Python numerical tower is a mess and since ints and floats aren't subclasses of each other, it makes little sense for them to be interchangeable at the static type level. 😤😆
Worth discussing. The array api does not. |
The docs are that way to help beginners who might be confused. (At least that was the argument that was presented.) But you aren't expecting beginners to read your code, are you? And, you aren't using this repo to build docs? The downside of populating the unions unnecessarily is overcomplicated type errors. So from a user standpoint, I think this is worse. From a developer standpoint, it's a matter of taste. Personally, I think more succinct is easier to understand.
As much as you might like to turn back time and change the typing decisions that were made, the fact is that the static type I think I understand what you're doing and why. I spent years writing
array.__add__(other: int | float | complex | array, /) → array Have I misunderstood the documentation? |
Ah. We're building towards v2021 first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be easier to use optype for this, as it already provides single-method generic protocols for each of the special dunders:
https://github.com/jorenham/optype/blob/master/optype/_core/_can.py
There's even documentation: https://github.com/jorenham/optype#binary-operations
And of course it's tested and thoroughly type-checked and stuff
@jorenham is this prep for using optype? |
Yea, pretty much. |
I think the plan is to build docs, which would show the types. |
@jorenham do you want to switch some of these to be |
I've thought about this, but I'm not sure what the best approach is. I considered four approaches:
Now that I've written these down, I think I feel most for option 4. As far as I'm concerned, docstrings are a "should-have", not a "must-have" (MoSCow jargon). By postponing worrying about docstrings, we can focus on building the actual functionality first. This feels like the most agile approach to me. Thoughts? |
Signed-off-by: Nathaniel Starkman <[email protected]>
📌 pin the correct python version ⬆️ update optype version Co-authored-by: Nathaniel Starkman <[email protected]>
083f5dc
to
fe8dc7c
Compare
|
78edc18
to
d4485f4
Compare
Signed-off-by: Nathaniel Starkman <[email protected]>
@jorenham. This PR might finally be ready. |
"optype>=0.9.3; python_version < '3.11'", | ||
"optype>=0.12.2; python_version >= '3.11'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes... Do you think we really need 3.10 support, or could we get away with dropping it? I mean, I could add py310 support back to optype, but I'd really like to avoid that if I can 😅.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering. This wasn't my commit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol I wasn't sure what I was thinking when I did that.
But either way, my question still stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we can drop 3.10!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CI doesn't seem to mind this, we can do that in a follow-up then.
"FBT", # flake8-boolean-trap | ||
"FIX", # flake8-fixme | ||
"ISC001", # Conflicts with formatter | ||
"PLW1641", # Object does not implement `__hash__` method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"PLW1641", # Object does not implement `__hash__` method |
with _docstrings_path.open("rb") as f: | ||
_array_docstrings = tomllib.load(f)["docstrings"] | ||
|
||
NS_co = TypeVar("NS_co", covariant=True, default=ModuleType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling this NamespaceT_co
instead?
Personally, I've kinda grown attached to having a trailing T
in type parameters. That way you can always tell at a glance if something is a type parameter or something else. Especially in case of numpy, where the _co
suffix is also used for array-likes with "coercible" dtypes, such as numpy._typing._ArrayLikeInt_co
.
But that being said, in this context here, it's probably won't be much of a problem. So it'd mostly be for the sake of consistency I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
with _docstrings_path.open("rb") as f: | ||
_array_docstrings = tomllib.load(f)["docstrings"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a mouthful, but it helps with namespace pollution (I'm talking about the f
).
with _docstrings_path.open("rb") as f: | |
_array_docstrings = tomllib.load(f)["docstrings"] | |
_array_docstrings = tomllib.loads(_docstrings_path.read_text())["docstrings"] |
And _array_docstrings
will now be inferred as Any
, so it could use an annotation (e.g. one wrapped as Final
).
Also, since these are constants (the path too), uppercase names seem appropriate.
|
||
|
||
class HasArrayNamespace(Protocol[NS_co]): | ||
"""Protocol for classes that have an `__array_namespace__` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should mention that this is only intended for static typing, not for runtime things. And maybe also that this is intended as purely structural type, not a nominal one (i.e. not intended as base class).
op.CanMulSame[T_contra, R_co], | ||
op.CanTruedivSame[T_contra, R_co], | ||
op.CanFloordivSame[T_contra, R_co], | ||
op.CanModSame[T_contra, R_co], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> np.array([True]) % np.array([True])
array([0], dtype=int8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered by R_co
, right?
Array[bool, Array[float | int, _, _], _]
op.CanTruedivSame[T_contra, R_co], | ||
op.CanFloordivSame[T_contra, R_co], | ||
op.CanModSame[T_contra, R_co], | ||
op.CanPowSame[T_contra, R_co], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> np.array([True]) ** np.array([True])
array([1], dtype=int8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered by R_co
, right?
Array[bool, Array[float | int, _, _], _]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But R_co
defaults to Never
, in which case it would reject boolean arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the inner R_co?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the specific case of Array[bool, Array[Any, _, _], _]
This should work for everything, but erases a lot of information on the return type.
x: Array[bool, Array[Any, _, _], _] = np.array([True])
y = x ** x # Array[Any, _, _]
Something more specific would be
x: Array[bool, Array[float | int, _, _], _] = np.array([True])
y = x ** x # Array[Any, _, _]
op.CanFloordivSame[T_contra, R_co], | ||
op.CanModSame[T_contra, R_co], | ||
op.CanPowSame[T_contra, R_co], | ||
Protocol[T_contra, R_co, NS_co], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put NS_co
first?
And what is the purpose of R_co
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put NS_co first?
It's the least likely to be used.
And what is the purpose of R_co here?
Return types. e.g BoolArray = Array[DType_co, bool, Array[Any, float, Never, NS_co], NS_co]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the least likely to be used.
The array namespace is where most of the static information lives, so I expect it will be used quite a lot, actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm also fine with separating the two, i.e. the CanArrayNamespace and Array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expect it will be used quite a lot, actually.
In what way? In #34 we get the DType_co
parameter
Array[+DType, -Other = Never, +R = Never, +NS = ModuleType] = Array[DType, Self | Other, Self | R, NS]
So the parameters are dtype, other allowed types for binary ops, the return types, and the array namespace. This will predominantly just be a Module, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way?
getting rid of the ND_co
typar
This will predominantly just be a Module, right?
It'll be something that we can match against with protocols, so that we can obtain a bunch of juicy static typing details, like for example the return type of their abs()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be something that we can match against with protocols, so that we can obtain a bunch of juicy static typing details, like for example the return type of their abs() function.
That will be very nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about whether tools like Pylance will be able to figure out the docstrings like this, given the dynamic nature. Did you check that already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. It worked in Jupyter, but that's not static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it you're a data scientist :P ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computational astrophysicist. Close enough. 🤷.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity; are you on team "dark matter", or on one of the other ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dark matter. Though that doesn't stop me writing papers on the other ones 😆.
No description provided.