Skip to content

Conversation

@rmnskb
Copy link
Contributor

@rmnskb rmnskb commented Nov 8, 2025

Rationale for this change

Please see #32007, currently, neither arrays nor scalars support Python-native arithmetic operations, such as array + array, it has to be done via pyarrow.compute API. This PR strives to fix this with custom dunder methods.

What changes are included in this PR?

Implemented dunder methods

Are these changes tested?

Yes

Are there any user-facing changes?

Possibility to use Python operators directly instead of calling the pyarrow.compute API.

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 8, 2025

@pitrou @AlenkaF @rok
Sorry for pinging like this, I wanted to understand the scope correctly. So far, I went with the most straightforward implementation and added dunders for all operations that were listed in the compute docs and had respective dunders available. Do you think this implementation is enough or should I go for other dunders as well, e.g bitwise operations? After reviewing the original discussion, I've seen @jorisvandenbossche suggesting checked versions of the functions, I think I will go with them as well.
Regarding the documentation:

  1. Should I add docstrings to dunder methods? Given that user won't see them in their IDE, for example.
  2. Should the documentation for arrays/scalars be updated to reflect that the native operators are now supported?
    Thanks for your input!

@rok
Copy link
Member

rok commented Nov 11, 2025

Sorry for late reply @rmnskb. I think all dunders that semantically map to our existing kernels are fair game! I'm not sure what these would be, but the set you have now looks decently sized.

Should I add docstrings to dunder methods? Given that user won't see them in their IDE, for example.

We might be able to add docstrings via type annotations. Or perhaps it's possible to copy them at runtime from the respective kernels? (unsure that is possible). In any case we probably don't want to duplicate docstrings in general.

Should the documentation for arrays/scalars be updated to reflect that the native operators are now supported?

Yes, that would be good to have!

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 11, 2025

Sorry for late reply @rmnskb. I think all dunders that semantically map to our existing kernels are fair game! I'm not sure what these would be, but the set you have now looks decently sized.

Ok, I was also thinking about covering as many dunder methods as possible, but did not want to go out of scope for this issue.

We might be able to add docstrings via type annotations. Or perhaps it's possible to copy them at runtime from the respective kernels? (unsure that is possible). In any case we probably don't want to duplicate docstrings in general.

Copying the docstrings sounds like a good idea, I will look further into that.

@AlenkaF
Copy link
Member

AlenkaF commented Nov 12, 2025

Thank you for working on this @rmnskb and sorry for replying late!
Pinging us when you have a question is totally fine and in fact necessary ;)

I think the current scope is solid. I am thinking of basic comparisons and the discussion in the issue around __eq__. That would also be good to tackle but maybe as a separate PR?

Yes, it would be important to update the documentation (User Guide which is the part that is not API reference related).

One comment after looking at the code would be to add tests that cover various type combinations (array, scalar, Python types, unsupported types ...)

@rok
Copy link
Member

rok commented Nov 12, 2025

I think the current scope is solid. I am thinking of basic comparisons and the discussion in the issue around __eq__. That would also be good to tackle but maybe as a separate PR?

I think we already have __eq__: https://github.com/apache/arrow/pull/7737/files.

One comment after looking at the code would be to add tests that cover various type combinations (array, scalar, Python types, unsupported types ...)

That's a good point, __eq__ currently raises if the types of compared don't match. Good test coverage would also help find edge cases.

    def __eq__(self, other):
        try:
            return self.equals(other)
        except TypeError:
            # This also handles comparing with None
            # as Array.equals(None) raises a TypeError.
            return NotImplemented

@AlenkaF
Copy link
Member

AlenkaF commented Nov 12, 2025

I think we already have __eq__: https://github.com/apache/arrow/pull/7737/files.

Yes, true. I was under the impression that we might want to change the use of equals() method with the equal compute function? From the issue:

I think if we add those, we should also change the behaviour of __eq__, although that is something that will require a long deprecation cycle.

@rok
Copy link
Member

rok commented Nov 12, 2025

Oh, I managed to forget this and thought array.equals dispatches to compute.equals already 🤦.

It seems like a good opportunity to change this! :)

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 13, 2025

We might be able to add docstrings via type annotations. Or perhaps it's possible to copy them at runtime from the respective kernels? (unsure that is possible). In any case we probably don't want to duplicate docstrings in general.

I've looked more into this, and based on these two discussions, we'd have to copy them directly from the underlying functions at runtime via importing the pyarrow.compute module, which AFAIK we don't have access to, since we call them with _pc().call_function().
I've added the docstrings to respective classes which notes that the API now support calling the Python-native operators directly, as well as provides some examples of that. Please let me know if that's sufficient.

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 13, 2025

I think the current scope is solid. I am thinking of basic comparisons and the discussion in the issue around __eq__. That would also be good to tackle but maybe as a separate PR?

I'll be happy to work on it, once I'm done with this implementation :)

Yes, it would be important to update the documentation (User Guide which is the part that is not API reference related).

Are you talking about this file? Or is there somewhere else, where I can put this information?

One comment after looking at the code would be to add tests that cover various type combinations (array, scalar, Python types, unsupported types ...)

Yes, I agree. Are there any particular unsupported types I should include that you have in mind?

@AlenkaF
Copy link
Member

AlenkaF commented Nov 14, 2025

I'll be happy to work on it, once I'm done with this implementation :)

Thank you!

Are you talking about this file? Or is there somewhere else, where I can put this information?

I was thinking more about the compute page in our User Guide: https://github.com/apache/arrow/blob/main/docs/source/python/compute.rst

Are there any particular unsupported types I should include that you have in mind?

No, not really. What comes to mind are strings and/or nested for some arithmetic functions.

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 14, 2025

Are you talking about this file? Or is there somewhere else, where I can put this information?

I was thinking more about the compute page in our User Guide: https://github.com/apache/arrow/blob/main/docs/source/python/compute.rst

Operators Documentation

I mentioned the newly implemented operators in the documentation. Please let me know if it makes sense. I'm not sure whether we should list all the implemented methods, on the other hand, I don't want to leave users guessing what exactly can they use.

@rok
Copy link
Member

rok commented Nov 17, 2025

One comment after looking at the code would be to add tests that cover various type combinations (array, scalar, Python types, unsupported types ...)

Yes, I agree. Are there any particular unsupported types I should include that you have in mind?

An integer based extension type maybe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants