Improve StreamingResult iteration performance with parallel prefetching by douglas · Pull Request #157 · rinsed-org/rb-snowflake-client

douglas · 2025-12-05T05:42:09Z

Overview

This PR dramatically improves StreamingResult iteration performance (4-5x faster) through parallel partition prefetching while maintaining memory efficiency.

The idea to use a thread pool came from a performance profiling session at work, while replacing ODBC with the REST API for large Snowflake queries.

While comparing memory usage between the two approaches (ODBC loads everything into memory vs. REST API streaming), I noticed ODBC's iteration was dramatically faster (~0.4s vs ~25s for 300k rows) despite the REST API's superior memory efficiency.

Investigating the StreamingResult implementation revealed it hardcoded a single-threaded pool for partition fetching, meaning partitions were fetched sequentially.

Since the gem already had ThreadedInMemoryStrategy with configurable thread pools for non-streaming queries and calculated optimal thread counts via number_of_threads_to_use, it seemed natural to apply the same parallel-fetching approach to streaming.

A quick prototype showed 4-5x faster iteration while maintaining the memory benefits, achieving the best of both worlds: ODBC-like speed with streaming's efficiency.

Important: I used Claude Code to generate most of this PR, but I reviewed the code to ensure I'm not introducing any stupid errors or AI slop.

Problem

StreamingResult currently hardcodes a single-threaded pool for prefetching partitions:

thread_pool = Concurrent::FixedThreadPool.new 1

This causes sequential partition fetching, leading to slow iteration for large result sets:

300k rows: ~25-27s iteration time
Performance limited by network latency per partition fetch
Underutilizes available connection pool

Solution

Added configurable prefetch_threads parameter that:

Enables parallel partition fetching (N partitions at once)
Automatically calculates optimal thread count using existing number_of_threads_to_use logic
Maintains memory efficiency by still clearing processed partitions
Ensures proper thread pool shutdown

Changes

StreamingResult#initialize: Added prefetch_threads: parameter (default: 1)
StreamingResult#each: Updated to prefetch N partitions in parallel
StreamingResultStrategy.result: Passes through prefetch_threads
Client#retrieve_result_set: Uses same thread calculation as non-streaming
New test file: Comprehensive spec with 10 test cases

Performance Results

Real-world measurements with 300k rows (64 partitions):

Configuration	Iteration Time	Memory Growth	Total Time	Improvement
Before (1 thread)	25-27s	37 MB	~28s	baseline
After (4 threads)	6-7s	35 MB	8s	4x faster
After (8 threads)	5-6s	24-36 MB	7s	4.5x faster
ODBC (comparison)	0.4s	75 MB	6.4s	all in memory

Key Metrics

Speed: Now comparable to non-streaming approaches (7s vs 6.4s)
Memory: Still maintains 50-70% memory savings vs ODBC
Best of both worlds: ODBC-like speed with streaming efficiency

Backward Compatibility

✅ Fully backward compatible

Default prefetch_threads: 1 maintains existing behavior
All existing tests pass
No breaking changes to API

Implementation Details

Thread Pool Calculation

Uses the same logic as ThreadedInMemoryStrategy:

num_threads = number_of_threads_to_use(partition_count)
# Respects max_threads_per_query and thread_scale_factor

Memory Safety

Old partitions still cleared after iteration (marked as :finished)
Multiple partitions temporarily in memory during prefetch
Total memory impact: minimal increase, often better than single-threaded

Resource Management

Proper thread pool shutdown with thread_pool.shutdown and wait_for_termination
Prevents resource leaks

Testing

Added spec/ruby_snowflake/streaming_result_spec.rb with comprehensive coverage:

✅ Single thread backward compatibility
✅ Multi-threaded prefetching behavior
✅ Partition clearing/memory management
✅ Thread pool shutdown
✅ Edge cases (more threads than partitions)
✅ Enumerator support
✅ Concurrent fetch verification

All unit tests pass (22 examples, 0 failures).

Use Cases

This improvement particularly benefits:

Large data exports (100k+ rows)
CSV/report generation from Snowflake
Data pipelines processing streaming results
Any application where iteration time is significant

Configuration

Automatically configured based on partition count and existing settings:

max_threads_per_query (env: SNOWFLAKE_MAX_THREADS_PER_QUERY, default: 8)
thread_scale_factor (env: SNOWFLAKE_THREAD_SCALE_FACTOR, default: 4)

Formula: min(partition_count / thread_scale_factor, max_threads_per_query)

Real-World Impact

For our prescriber coverage data exports:

Query returns 300k rows in 64 partitions
Before: 28s total (25s iteration + 3s overhead)
After: 7s total (5s iteration + 2s overhead)
Memory: 35 MB vs 75 MB ODBC (53% savings)

Result: Achieves comparable speed to ODBC while using half the memory.

Questions?

Happy to answer any questions or make adjustments!

…reamingResult Improves streaming result iteration performance by 4-5x through parallel partition fetching while maintaining memory efficiency. StreamingResult hardcoded a single-threaded pool, causing partitions to be fetched sequentially. For large result sets (100k+ rows), this led to slow iteration times (~25s for 300k rows) despite efficient memory usage. - Add prefetch_threads parameter to StreamingResult#initialize (default: 1) - Update #each to prefetch N partitions in parallel (N = prefetch_threads) - Automatically calculate optimal thread count using existing number_of_threads_to_use logic - Ensure proper thread pool shutdown after iteration Real-world measurements with 300k rows (64 partitions): | Threads | Iteration Time | Memory Growth | Speedup | |---------|---------------|---------------|---------| | 1 (before) | 25-27s | 37 MB | baseline | | 4 | 6-7s | 35 MB | 4x faster | | 8 | 5-6s | 24-36 MB | 4.5x faster | Total time now comparable to non-streaming approaches (7s vs 6.4s) while maintaining 50-70% memory savings. - Default behavior unchanged (prefetch_threads: 1) - All existing tests pass - No breaking changes to API - Uses same thread calculation as ThreadedInMemoryStrategy (number_of_threads_to_use) - Respects max_threads_per_query and thread_scale_factor settings - Maintains memory efficiency by still clearing processed partitions - Properly shuts down thread pool to prevent resource leaks Added comprehensive spec with 10 test cases covering: - Single thread backward compatibility - Multi-threaded prefetching - Thread pool cleanup - Edge cases (more threads than partitions) - Enumerator support

Wraps iteration logic in begin/ensure to guarantee thread pool shutdown even when exceptions occur during iteration. Prevents resource leaks. Also adds test to verify thread pool cleanup on exception.

reidnimz · 2026-01-07T23:26:32Z

Thanks Douglas, I'll try to take a detailed look into this in the next week or so - seems like a good idea to me!

reidnimz

This is a great change! There's a couple of small items, but nothing major.

Can you also update the Unreleased section of the changelog with these changes. We should mention that now using streaming mode is going to use more memory but will be much much faster in there.

reidnimz · 2026-01-20T15:15:01Z

lib/ruby_snowflake/streaming_result.rb

+        end
+      ensure
+        # Ensure thread pool is properly shut down even if an exception occurs
+        thread_pool.shutdown


ooh, this is a good catch - we should have been doing that already

reidnimz · 2026-01-20T15:19:30Z

lib/ruby_snowflake/streaming_result.rb

 module RubySnowflake
  class StreamingResult < Result
-    def initialize(partition_count, row_type_data, retreive_proc)
+    def initialize(partition_count, row_type_data, retreive_proc, prefetch_threads: 1)


I think we need to raise an error if you tried to pass in a non positive number for prefetch_threads (I could reasonably see someone trying 0) which I think would cause us to never fetch data

Good call. Added an ArgumentError in initialize if prefetch_threads isn't a positive integer — covers 0, negatives, and non-integer values. Added specs for it too.

reidnimz · 2026-01-20T15:18:22Z

lib/ruby_snowflake/streaming_result.rb

+      ensure
+        # Ensure thread pool is properly shut down even if an exception occurs
+        thread_pool.shutdown
+        thread_pool.wait_for_termination


Let's put a timeout parameter on this, or it will wait indefinitely. Since we're already handling an exception, maybe something short like 5 seconds is appropriate?

Done! Added wait_for_termination(5) — 5 seconds should be plenty for in-flight fetches to wrap up during exception handling.

reidnimz · 2026-01-20T15:29:28Z

lib/ruby_snowflake/client.rb

        if streaming
-          StreamingResultStrategy.result(json_body, retrieve_proc)
+          # Use same thread calculation logic for streaming to enable parallel prefetching
+          # This dramatically improves iteration performance while maintaining memory efficiency


Claude code has a tendency to insert comments like this that seem to it very important and relevant in the current context, but then, are sort of redundant later. If you don't mind cleaning these up unless there's something surprising or tricky the reader should keep in mind - that'll keep it matching the rest of the codebase. I think I have like 6 lines in my ~/.claude/CLAUDE.md file to try to keep it from doing this (and sometimes it even works!)

Ha, fair enough — cleaned up the redundant comments in both client.rb and streaming_result.rb. Left the existing ones that were already there (partition clearing explanation) since those seem genuinely useful.

- Add 5-second timeout to wait_for_termination to avoid indefinite hangs - Validate prefetch_threads is a positive integer in initialize - Remove redundant AI-generated comments - Update CHANGELOG with streaming performance improvement

douglas · 2026-02-20T20:50:13Z

All feedback addressed in 80498ec — added the CHANGELOG entry under Unreleased as well. Thanks for the review!

douglas force-pushed the dsa/improve-streaming-result-iteration-performance branch from 2636b15 to fc23b78 Compare December 5, 2025 06:20

douglas added 2 commits December 5, 2025 03:20

Add ensure block for thread pool cleanup on exceptions

08e403b

Wraps iteration logic in begin/ensure to guarantee thread pool shutdown even when exceptions occur during iteration. Prevents resource leaks. Also adds test to verify thread pool cleanup on exception.

[Passing-by] Updating the Gemfile for the 1.5.0 release

3f18852

douglas force-pushed the dsa/improve-streaming-result-iteration-performance branch from fc23b78 to 3f18852 Compare December 5, 2025 06:20

reidnimz requested changes Jan 20, 2026

View reviewed changes

Address PR review feedback

80498ec

- Add 5-second timeout to wait_for_termination to avoid indefinite hangs - Validate prefetch_threads is a positive integer in initialize - Remove redundant AI-generated comments - Update CHANGELOG with streaming performance improvement

douglas requested a review from reidnimz February 20, 2026 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve StreamingResult iteration performance with parallel prefetching#157

Improve StreamingResult iteration performance with parallel prefetching#157
douglas wants to merge 4 commits intorinsed-org:masterfrom
douglas:dsa/improve-streaming-result-iteration-performance

douglas commented Dec 5, 2025 •

edited

Loading

Uh oh!

reidnimz commented Jan 7, 2026

Uh oh!

reidnimz left a comment

Uh oh!

reidnimz Jan 20, 2026

Uh oh!

douglas Feb 20, 2026

Uh oh!

reidnimz Jan 20, 2026

Uh oh!

douglas Feb 20, 2026

Uh oh!

reidnimz Jan 20, 2026

Uh oh!

douglas Feb 20, 2026

Uh oh!

reidnimz Jan 20, 2026

Uh oh!

douglas Feb 20, 2026

Uh oh!

douglas commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

douglas commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Problem

Solution

Changes

Performance Results

Key Metrics

Backward Compatibility

Implementation Details

Thread Pool Calculation

Memory Safety

Resource Management

Testing

Use Cases

Configuration

Real-World Impact

Questions?

Uh oh!

reidnimz commented Jan 7, 2026

Uh oh!

reidnimz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

douglas commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

douglas commented Dec 5, 2025 •

edited

Loading