Skip to content

Conversation

jayantsing-db
Copy link
Collaborator

Description

This PR introduces lazy loading support for inline Arrow results to improve memory efficiency when handling large result sets.

Previously, InlineChunkProvider would eagerly fetch all arrow batches upfront when results had hasMoreRows = true, which could lead to memory issues with large datasets. This change splits the handling into two separate paths:

  1. Lazy path (new): For Thrift-based inline Arrow results (when ARROW_BASED_SET is returned), we now use LazyThriftInlineArrowResult which fetches arrow batches on-demand as the client iterates through rows. This is similar to how LazyThriftResult works for columnar data.
  2. Remote path (existing): For URL-based Arrow results (URL_BASED_SET), we continue using ArrowStreamResult with RemoteChunkProvider which downloads chunks from cloud storage.

The InlineChunkProvider is now only used for SEA results with JSON_ARRAY format and INLINE disposition (contain all data inline {no hasMoreRows flag set}).

This will reduce memory consumption and improve performance when dealing with large inline Arrow result sets similar to #975.

Testing

  • Unit tests
  • Integration tests
  • Manual testing

Additional Notes to the Reviewer

jayantsing-db and others added 2 commits September 30, 2025 14:15
This PR introduces lazy loading support for inline Arrow results to improve memory efficiency when handling large result sets.

Previously, InlineChunkProvider would eagerly fetch all arrow batches upfront when results had hasMoreRows = true, which could lead to memory issues with large datasets. This change splits the handling into two separate paths:
1. Lazy path (new): For Thrift-based inline Arrow results (when ARROW_BASED_SET is returned), we now use LazyThriftInlineArrowResult which fetches arrow batches on-demand as the client iterates through rows. This is similar to how LazyThriftResult works for columnar data.
2. Remote path (existing): For URL-based Arrow results (URL_BASED_SET), we continue using ArrowStreamResult with RemoteChunkProvider which downloads chunks from cloud storage.

The InlineChunkProvider is now only used for SEA results with JSON_ARRAY format and INLINE disposition (contain all data inline {no hasMoreRows flag set}).

This should reduce memory consumption and improve performance when dealing with large inline Arrow result sets.
@jayantsing-db
Copy link
Collaborator Author

I need to make some changes related to JDBC spec around row count because we don't have that data point when lazily fetching the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant