Skip to content

perf: Add native streaming interpolate#27185

Open
coastalwhite wants to merge 4 commits intopola-rs:mainfrom
coastalwhite:perf/streaming-interpolate
Open

perf: Add native streaming interpolate#27185
coastalwhite wants to merge 4 commits intopola-rs:mainfrom
coastalwhite:perf/streaming-interpolate

Conversation

@coastalwhite
Copy link
Copy Markdown
Collaborator

xref: #20947.

This adds a native interpolate node for the streaming engine. This node is heavily based on the BackwardFillNode, but needs a few small adjustments.

I added a parametric test to verify its behavior against the in-memory engine which uncovered #27184. So this also fixes #27184.

Used and verified the AI.

@github-actions github-actions bot added A-streaming Related to the streaming engine performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Apr 4, 2026
let source_token = SourceToken::new();

let Some(recv) = recv else {
// Input exhausted. Flush pending trailing nulls as actual nulls — no right endpoint
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI emdash detected ✈️

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 4, 2026

Codecov Report

❌ Patch coverage is 95.23810% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.22%. Comparing base (efe654e) to head (c6e5c82).
⚠️ Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-stream/src/nodes/interpolate.rs 95.13% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #27185      +/-   ##
==========================================
- Coverage   81.80%   81.22%   -0.58%     
==========================================
  Files        1816     1817       +1     
  Lines      250592   250730     +138     
  Branches     3144     3144              
==========================================
- Hits       204985   203646    -1339     
- Misses      44803    46280    +1477     
  Partials      804      804              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dsprenkels dsprenkels self-requested a review April 5, 2026 08:30
xref: pola-rs#20947.

This adds a native interpolate node for the streaming engine. This node is
heavily based on the `BackwardFillNode`, but needs a few small adjustments.
@orlp orlp force-pushed the perf/streaming-interpolate branch from 5c903c5 to c6e5c82 Compare April 7, 2026 09:56
@dsprenkels dsprenkels self-assigned this Apr 7, 2026
@given(
data=series(
name="a",
allowed_dtypes=[
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: @given(data=series(name="a", allowed_dtypes=NUMERIC_DTYPES))

Or is there a reason to not exclude pl.{U,}Int128? When I test it it works locally.

let pending = *pending_nulls;
let mut send = send.serial();
join_handles.push(scope.spawn_task(TaskPriority::High, async move {
let morsel_size = get_ideal_morsel_size();
Copy link
Copy Markdown
Collaborator

@dsprenkels dsprenkels Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix: Could you check stop_requested on the source token (and update pending)? I am concerned that some user might input a very weird dataframe at some point (with many nulls at the end).


// Parallel worker threads.
for (mut send, mut recv) in senders.into_iter().zip(distr_receivers) {
let source_token = source_token.clone();
Copy link
Copy Markdown
Collaborator

@dsprenkels dsprenkels Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix: This source_token should not be local, but it should be passed by the serial distributing task from the input morsels.

if send.send(morsel).await.is_err() {
break;
}
wait_group.wait().await;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint: This is not strictly necessary, because this task does not generate more morsels than it consumes, but it also does not matter to wait either.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my understanding is a bit off. We merged the pipelines into the serial input and then distributed them again over the pipelines. If we didn't have these wait groups, what is stopping two morsels from entering the same pipeline at the same time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-streaming Related to the streaming engine performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Underflow in interpolate(method="nearest")

2 participants