-
Notifications
You must be signed in to change notification settings - Fork 3.1k
ReadManyItems API #42167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReadManyItems API #42167
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the ReadManyItems API for Azure Cosmos DB, providing an efficient method to retrieve multiple items in a single request. The implementation intelligently groups items by partition and constructs optimized backend queries to reduce network round trips and RU costs compared to individual point reads.
Key changes:
- Added async
read_many_items
method to container client with intelligent query optimization - Implemented helper classes for query building and partition-aware request batching
- Added comprehensive test coverage including partition splits, retry scenarios, and various partition key configurations
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
azure/cosmos/aio/_container.py |
Added read_many_items method with single-item optimization and partition key handling |
azure/cosmos/aio/_cosmos_client_connection_async.py |
Added ReadManyItems connection method that delegates to helper class |
azure/cosmos/_read_many_items_helper.py |
Core helper class managing partition grouping, query chunking, and concurrent execution |
azure/cosmos/_query_builder.py |
Query optimization logic for ID-based, single-partition, and parameterized queries |
azure/cosmos/_routing/routing_range.py |
Added get_full_range utility method for partition range operations |
azure/cosmos/exceptions.py |
Fixed bug in partition split detection logic |
tests/test_read_many_items_async.py |
Comprehensive async test suite covering various scenarios and edge cases |
tests/test_read_many_items_partition_split.py |
Specialized tests for partition split scenarios |
Comments suppressed due to low confidence (2)
sdk/cosmos/azure-cosmos/tests/test_read_many_items_partition_split.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
…plit.py Co-authored-by: Copilot <[email protected]>
…zure/azure-sdk-for-python into users/dibahl/readmanyitems_api
sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_read_many_items_helper_async.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/tests/test_read_many_items_partition_split.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/_read_many_items_helper.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some different comments and agree with Tomas on several of his. The core logic for aggregating and creating the queries seems perfect though. Thanks!
sdk/cosmos/azure-cosmos/azure/cosmos/_read_many_items_helper.py
Outdated
Show resolved
Hide resolved
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
/azp run python - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
sdk/cosmos/azure-cosmos/azure/cosmos/_read_many_items_helper.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/_read_many_items_helper.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/_read_many_items_helper.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/_read_many_items_helper.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_read_many_items_helper_async.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/base_execution_context.py
Show resolved
Hide resolved
/azp run python - cosmos - tests |
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
/azp run python - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The ReadManyItems API provides an efficient method for retrieving multiple items from a container in a single request.
Instead of fetching items one by one, you provide a list of item ID and partition key pairs. The API intelligently groups these items and constructs optimized backend queries, often using an IN clause, to fetch them in batches. This significantly reduces the number of network round trips and latency compared to multiple individual point reads in most cases.
This PR also includes fixing a typo in sdk/cosmos/azure-cosmos/azure/cosmos/exceptions.py where it was not comparing sub_status correctly.
It also includes a critical fix which was causing exceptions to be swallowed in the synchronous flow.
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/base_execution_context.py
By moving the lines if not self._has_started: self._has_started = True to after the fetch_function call, we ensure that the _has_started flag is only set to True after a successful fetch operation.
Here is the execution flow that caused the issue:
On the first attempt, _has_started is False. The while loop condition not self._has_started is met.
Before the fix: _has_started was immediately set to True.
The fetch_function was called and it raised an exception.
The retry mechanism would catch the exception and attempt to call the function again.
On the retry attempt, _has_started is now True. The while loop condition not self._has_started is now False, so the loop is skipped entirely.
The function would then return an empty list, effectively "swallowing" the exception and preventing further retries.
By moving the code block, if fetch_function fails, _has_started remains False. On the next retry, the while loop condition is still met, allowing the fetch_function to be called again as intended.
this new revised flow matches the aio/base_execution_context.py now.
attached the design document:
ReadManyItemsAsync Flow.docx