-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add a SpillingPool to manage collections of spill files #18207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Marking as draft for now. Open to input but needs a bit more work. I'm still familiarizing myself with the spilling infrastructure. |
This PR is setting size limit to spill files, when the size exceeds threshold, the spiller rotates to new file. I'm wondering why this design? Now the spill writer and reader is able to do streaming read/write, so a large spill file usually won't be the issue, unless it needs more parallelism somewhere. |
The issue with using a single FIFO file is that you accumulate dead data, bloating disk usage considerably. The idea is to cap that at say 100MB and then start a new file so that once all of the original file has been consumed we can garbage collect it. |
6b801ae
to
f7c84fe
Compare
f7c84fe
to
c5b40ee
Compare
@2010YOUY01 let me know if that makes sense, there's an example of this issue in #18011 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a SpillPool
abstraction to centralize the management of spill files with FIFO semantics. The pool handles file rotation, batching multiple record batches into single files up to a configurable size limit, and provides streaming read access to spilled data.
Key changes:
- Adds a new
SpillPool
module with FIFO queue semantics for managing spill files - Integrates
SpillPool
intoRepartitionExec
to replace the previous one-file-per-batch approach - Adds a new configuration option
max_spill_file_size_bytes
(default 100MB) to control when spill files rotate
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
datafusion/physical-plan/src/spill/spill_pool.rs | New module implementing SpillPool and SpillPoolStream with comprehensive tests |
datafusion/physical-plan/src/spill/mod.rs | Exports the new spill_pool module |
datafusion/physical-plan/src/repartition/mod.rs | Refactored to use SpillPool instead of one-file-per-batch spilling |
datafusion/common/src/config.rs | Adds max_spill_file_size_bytes configuration option |
docs/source/user-guide/configs.md | Documents the new max_spill_file_size_bytes configuration |
datafusion/sqllogictest/test_files/information_schema.slt | Updates test expectations to include new configuration option |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
} | ||
// If input is finished, don't poll channel anymore | ||
if self.input_finished { | ||
continue; |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This continue statement creates an infinite loop when input_finished
is true and spill_stream returns Pending. Since the channel won't be polled and spill_stream already returned Pending, the loop will continue indefinitely without waiting. The logic at line 1404-1406 should have returned Poll::Ready(None) when input is finished and no spilled data is available, but this code path bypasses that check.
continue; | |
return Poll::Ready(None); |
Copilot uses AI. Check for mistakes.
} | ||
|
||
impl SpillFile { | ||
fn new(file: RefCountedTempFile, _total_batches: usize, _total_size: usize) -> Self { |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameters _total_batches
and _total_size
are unused. If these are intended for future use or debugging, consider adding a comment explaining their purpose. Otherwise, remove them to simplify the API.
fn new(file: RefCountedTempFile, _total_batches: usize, _total_size: usize) -> Self { | |
fn new(file: RefCountedTempFile) -> Self { |
Copilot uses AI. Check for mistakes.
// All inputs finished, wait for spill stream to have more data or finish | ||
return Poll::Pending; |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When all_inputs_finished
is true and spill_stream returns Pending, this returns Pending without checking if the spill stream will ever produce more data. According to the SpillPool implementation, once finalized, the stream should return None when there's no more data. However, if finalize() fails or wasn't called, this could hang indefinitely. Consider adding a check to ensure the spill pool was properly finalized before returning Pending.
// All inputs finished, wait for spill stream to have more data or finish | |
return Poll::Pending; | |
// All inputs finished, check if the spill pool is finalized. | |
let is_finalized = { | |
let pool = self.spill_pool.lock(); | |
pool.is_finalized() | |
}; | |
if is_finalized { | |
// If finalized and still pending, treat as end of stream to avoid hanging. | |
return Poll::Ready(None); | |
} else { | |
// Not finalized yet, wait for more data or finalization. | |
return Poll::Pending; | |
} |
Copilot uses AI. Check for mistakes.
// Flush and finalize the SpillPool | ||
{ | ||
let mut pool = self.spill_pool.lock(); | ||
pool.flush().ok(); |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silently ignoring flush errors with .ok()
could lead to data loss if the flush fails. Consider logging the error or returning it to the caller for proper error handling.
pool.flush().ok(); | |
if let Err(e) = pool.flush() { | |
return Poll::Ready(Some(Err(DataFusionError::External(Box::new(e))))); | |
} |
Copilot uses AI. Check for mistakes.
// Flush and finalize the SpillPool | ||
{ | ||
let mut pool = self.spill_pool.lock(); | ||
pool.flush().ok(); |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silently ignoring flush errors with .ok()
could lead to data loss if the flush fails. Consider logging the error or returning it to the caller for proper error handling.
pool.flush().ok(); | |
if let Err(e) = pool.flush() { | |
return Poll::Ready(Some(Err(e))); | |
} |
Copilot uses AI. Check for mistakes.
Addresses #18014 (comment), potentially paves the path to solve #18011 for other operators as well