Skip to content

Comments

Add pandas queries q9-q22 and fix caching pattern#179

Open
Matt711 wants to merge 4 commits intopola-rs:mainfrom
Matt711:pdsh-queries-migration
Open

Add pandas queries q9-q22 and fix caching pattern#179
Matt711 wants to merge 4 commits intopola-rs:mainfrom
Matt711:pdsh-queries-migration

Conversation

@Matt711
Copy link
Contributor

@Matt711 Matt711 commented Jan 21, 2026

Supersedes #156. The nonlocal pattern caused the query() to fail when run twice (cold run then hot run). Eg. In Q1, on the second call, line_item_ds is a DataFrame which we attempt to call -> failure.

The new queries are largely copied from rapidsai/cudf#21108. And I used to claude to convert the queries from our structure in polars[gpu] to the structure here.

Matt711 and others added 4 commits November 14, 2025 18:01
- Migrate 14 new pandas PDSH queries (q9-q22) from cudf repository
- Fix caching pattern bug in all queries (q1-q22) by using _fn suffix
  to avoid nonlocal variable reassignment issue that caused TypeError
  on repeated query execution
- Add proper pandas imports for queries using pd.NamedAgg
- Use pd.Timestamp for datetime comparisons with PyArrow backend
- All 22 queries validated at scale factor 0.01

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@Matt711
Copy link
Contributor Author

Matt711 commented Jan 22, 2026

Ready for review!
CC @ritchie46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant