Skip to content

Conversation

@sgrebnov
Copy link
Contributor

@sgrebnov sgrebnov commented May 8, 2024

PRs adds exec_streamed method that returns Arrow Batches via stream while downloading/converting response. This reduces memory usage for large datasets as the records can be processed by chunks and improves performance by giving access for already loaded records.

  • Similar to Snowflake go driver MAX_CHUNK_DOWNLOAD_WORKERS(10) download workers are used: https://github.com/snowflakedb/gosnowflake/blob/master/rows.go#L22
  • I originally made RawQueryResult to always return result via stream but then realized that there is a polars dependency that requires RawQueryResult in bytes as it can't use async functionality to convert stream to bytes (defines TryFrom that is always sync)
  • With this change I was finally able to perform queries agains the very large snowflake_sample_data.tpch_sf100 dataset.

@sgrebnov sgrebnov force-pushed the sgrebnov/arrow-streaming branch from 742cebd to 241e4bf Compare September 24, 2024 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant