From 621cbc2869f9eb24acbe7ed791a17620b8ac6aa5 Mon Sep 17 00:00:00 2001 From: Matthew Iannucci Date: Tue, 15 Apr 2025 16:21:05 -0400 Subject: [PATCH 1/3] First pass at async api options --- design-docs/010-async-python-api.md | 112 ++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100644 design-docs/010-async-python-api.md diff --git a/design-docs/010-async-python-api.md b/design-docs/010-async-python-api.md new file mode 100644 index 000000000..8660d4efc --- /dev/null +++ b/design-docs/010-async-python-api.md @@ -0,0 +1,112 @@ +# Async Python API + +The Icechunk rust API for `Repository` and `Session` are both async using `tokio`. Originally, the python API was also async before the transition to separate `Repository`,` Session`, and `Store` classes. + +These changes were originally made to ease the typical python developer experience which may not be running from within an async context. However, Icechunk has many applications that may require an async runtime such as use within web servers. In these cases, blocking the main thread for 200 ms to perform IO is not acceptabl. + +This design document seeks to plan out the ability to perform async lifecycle functions from python, specifically in the `Repository` and `Session` interfaces. + +## API Options + +There are a few different ways this interface can be achieved, we will iterate them here and add links to external references where appropriate. + +### Separate Classes + +We will have a separate `asyn` module within the `icechunk` python module, leaving an API listing like this: + +``` +icechunk +├── asyn +│ ├── async_repository.py +│ ├── async_session.py +│ └── __init__.py +├── repository.py +├── session.py +└── __init__.py +``` + +Looking at the `async_repository.py` file, it would have the following structure: + +```python +class AsyncRepository: + async def fetch_config(self) -> RepositoryConfig | None: + ... + +... +class Repository: + def fetch_config(self) -> RepositoryConfig | None: + ... +``` + +#### Alternative Implementation + +This approach leaves a few alternatives to maaximize reuse of code between the `Repository` and `AsyncRepository` classes. + +```python +class AsyncRepository: + async def fetch_config(self) -> RepositoryConfig | None: + ... + +class Repository: + + _async_repo: AsyncRepository + def fetch_config(self) -> RepositoryConfig | None: + loop = asyncio.get_event_loop() + return loop.run_until_complete(self._async_repo.fetch_config()) +``` + +We could do similar on the rust layer instead. looking like this. The performance and GIL impact of this apprach is unknown at this time: + +```rust + +#[pyclass] +pub struct PyAsyncRepository(Arc>); + +#[pymethods] +impl PyAsyncRepository { + #[staticmethod] + fn fetch_config(py: Python<'_>, storage: PyStorage) -> PyResult> { + pyo3_async_runtimes::tokio::future_into_py(py, async move { + let res = Repository::fetch_config(storage.0.as_ref()) + .await + .map_err(PyIcechunkStoreError::RepositoryError)?; + let res: Option = res.map(|res| res.0.into()); + Ok(res) + }) + } +} + +#[pyclass] +pub struct PyRepository(PyAsyncRepository); + +#[pymethods] +impl PyRepository { + #[staticmethod] + fn fetch_config(py: Python<'_>, storage: PyStorage) -> PyResult> { + // This function calls block_on, so we need to allow other thread python to make progress + py.allow_threads(move || { + pyo3_async_runtimes::tokio::get_runtime().block_on(async move { + let coro = PyAsyncRepository::fetch_config(py, storage)?; + let res = pyo3_async_runtimes::tokio::into_future(coro)?.await? + Ok(res) + }) + }) + } +} + +``` + +### Classes with Async Methods + +We will add async methods to the existing classes, keeping the existing API structure. Taking the existing `Repository` class as an example, it would be extended to include the following: + +```python +class Repository: + def fetch_config(self) -> RepositoryConfig | None: + ... + + async def async_fetch_config(self) -> RepositoryConfig | None: + ... +``` + +This is certainly similar but it will make the API more cluttered. \ No newline at end of file From 0592461ee0c3174a1c1f264fc84a2566822cd9d6 Mon Sep 17 00:00:00 2001 From: Matthew Iannucci Date: Tue, 15 Apr 2025 16:23:27 -0400 Subject: [PATCH 2/3] lint --- design-docs/010-async-python-api.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/design-docs/010-async-python-api.md b/design-docs/010-async-python-api.md index 8660d4efc..3410d0691 100644 --- a/design-docs/010-async-python-api.md +++ b/design-docs/010-async-python-api.md @@ -1,18 +1,18 @@ # Async Python API -The Icechunk rust API for `Repository` and `Session` are both async using `tokio`. Originally, the python API was also async before the transition to separate `Repository`,` Session`, and `Store` classes. +The Icechunk rust API for `Repository` and `Session` are both async using `tokio`. Originally, the python API was also async before the transition to separate `Repository`,` Session`, and `Store` classes. -These changes were originally made to ease the typical python developer experience which may not be running from within an async context. However, Icechunk has many applications that may require an async runtime such as use within web servers. In these cases, blocking the main thread for 200 ms to perform IO is not acceptabl. +These changes were originally made to ease the typical python developer experience which may not be running from within an async context. However, Icechunk has many applications that may require an async runtime such as use within web servers. In these cases, blocking the main thread for 200 ms to perform IO is not acceptabl. -This design document seeks to plan out the ability to perform async lifecycle functions from python, specifically in the `Repository` and `Session` interfaces. +This design document seeks to plan out the ability to perform async lifecycle functions from python, specifically in the `Repository` and `Session` interfaces. ## API Options -There are a few different ways this interface can be achieved, we will iterate them here and add links to external references where appropriate. +There are a few different ways this interface can be achieved, we will iterate them here and add links to external references where appropriate. ### Separate Classes -We will have a separate `asyn` module within the `icechunk` python module, leaving an API listing like this: +We will have a separate `asyn` module within the `icechunk` python module, leaving an API listing like this: ``` icechunk @@ -25,7 +25,7 @@ icechunk └── __init__.py ``` -Looking at the `async_repository.py` file, it would have the following structure: +Looking at the `async_repository.py` file, it would have the following structure: ```python class AsyncRepository: @@ -40,7 +40,7 @@ class Repository: #### Alternative Implementation -This approach leaves a few alternatives to maaximize reuse of code between the `Repository` and `AsyncRepository` classes. +This approach leaves a few alternatives to maaximize reuse of code between the `Repository` and `AsyncRepository` classes. ```python class AsyncRepository: @@ -55,7 +55,7 @@ class Repository: return loop.run_until_complete(self._async_repo.fetch_config()) ``` -We could do similar on the rust layer instead. looking like this. The performance and GIL impact of this apprach is unknown at this time: +We could do similar on the rust layer instead. looking like this. The performance and GIL impact of this approach is unknown at this time: ```rust @@ -90,7 +90,7 @@ impl PyRepository { let res = pyo3_async_runtimes::tokio::into_future(coro)?.await? Ok(res) }) - }) + }) } } @@ -98,7 +98,7 @@ impl PyRepository { ### Classes with Async Methods -We will add async methods to the existing classes, keeping the existing API structure. Taking the existing `Repository` class as an example, it would be extended to include the following: +We will add async methods to the existing classes, keeping the existing API structure. Taking the existing `Repository` class as an example, it would be extended to include the following: ```python class Repository: @@ -109,4 +109,4 @@ class Repository: ... ``` -This is certainly similar but it will make the API more cluttered. \ No newline at end of file +This is certainly similar but it will make the API more cluttered. From 8b05850134e5d529fad3123fcc9e7323b57941fc Mon Sep 17 00:00:00 2001 From: Matthew Iannucci Date: Tue, 15 Apr 2025 21:34:59 -0400 Subject: [PATCH 3/3] Update 010-async-python-api.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Sebastián Galkin --- design-docs/010-async-python-api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design-docs/010-async-python-api.md b/design-docs/010-async-python-api.md index 3410d0691..3aea1be0f 100644 --- a/design-docs/010-async-python-api.md +++ b/design-docs/010-async-python-api.md @@ -2,7 +2,7 @@ The Icechunk rust API for `Repository` and `Session` are both async using `tokio`. Originally, the python API was also async before the transition to separate `Repository`,` Session`, and `Store` classes. -These changes were originally made to ease the typical python developer experience which may not be running from within an async context. However, Icechunk has many applications that may require an async runtime such as use within web servers. In these cases, blocking the main thread for 200 ms to perform IO is not acceptabl. +These changes were originally made to ease the typical python developer experience which may not be running from within an async context. However, Icechunk has many applications that may require an async runtime such as use within web servers. In these cases, blocking the main thread for 200 ms to perform IO is not acceptable. This design document seeks to plan out the ability to perform async lifecycle functions from python, specifically in the `Repository` and `Session` interfaces.