Skip to content

Conversation

@mpiannucci
Copy link
Contributor

First pass at design for bringing async API back to python #620

This currently makes no decisions but starts to lay out options to iterate on.

@mpiannucci mpiannucci requested review from dcherian and paraseba April 15, 2025 20:22
@mpiannucci mpiannucci changed the title First pass at async api options [Design Document] First pass at async api options design Apr 15, 2025
@mpiannucci
Copy link
Contributor Author

Tagging @TomNicholas @kylebarron in case they have some insight or opinion!

Copy link
Collaborator

@paraseba paraseba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great @mpiannucci . It very clearly explains the options.

I don't think anybody will like "Classes with Async methods", or I don't at least (but it's great to keep it in the doc as an option).

I feel the option with _async_repo and python doing the blocking is very compelling because of its simplicity and small code footprint. But, I'm concerned about having more async stuff in python, I have fought those async loops way more than I want to remember. For example, we would need to explore how well this interacts with multithreaded python code.

The everything-in-rust option is by far what takes more code, but also what feels safer. I'm afraid to ask, but ... how about writing a macro that can generate both versions of every method?

Co-authored-by: Sebastián Galkin <[email protected]>
@mpiannucci
Copy link
Contributor Author

mpiannucci commented Apr 16, 2025

But, I'm concerned about having more async stuff in python, I have fought those async loops way more than I want to remember. For example, we would need to explore how well this interacts with multithreaded python code.

This is my exact worry as well.

A macro actually might be the best way to do this, because the inner async closure will be the same for both sync and async and the only thing that changes are the call signatures and the runtime wrapper. I'll look into that

@paraseba
Copy link
Collaborator

the inner async closure will be the same for both sync and async and the only thing that changes are the call signatures and the runtime wrapper.

Exactly.

Maybe we can do a small proof of concept and see how it goes. The other concern with this is that we still need to create both python classes calling to the proper Rust object. I wonder if there are any tricks we play there without killing performance.

@kylebarron
Copy link
Contributor

kylebarron commented Apr 16, 2025

I don't think anybody will like "Classes with Async methods", or I don't at least (but it's great to keep it in the doc as an option).

FWIW this is my preferred option 😄. E.g. in obstore the same operations are synchronous with def get_range or async with async def get_range_async.

I don't think there should be two separate classes that represent the same underlying concept and data on the rust side. E.g. for obstore the act of creating an S3Store is totally indifferent to whether you plan to use it synchronously or asynchronously. And the underlying Rust data (Arc<dyn ObjectStore>) is the same whether or not you use async methods on it. So I think it would be more confusing to have two separate classes that do the same thing.

Even when you might want an async constructor, I prefer having a single class. E.g. in kylebarron/arro3#313 I'm updating my ParquetFile API. But since the underlying data held by ParquetFile is the same in a sync or async case, I prefer having alternate constructors: open and open_async (or potentially both __enter__ and __aenter__ to use as a sync or async context manager).

I also prefer fetch_config_async instead of async_fetch_config in your example so you get quicker narrowing of autocompletion in IDE type hinting.

@paraseba
Copy link
Collaborator

Thanks @kylebarron . This gives me a new appreciation for that option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants