init: m5 forecasting FE benchmark by MarcoGorelli · Pull Request #136 · pola-rs/polars-benchmark

MarcoGorelli · 2024-09-04T13:25:55Z

Some results: https://www.kaggle.com/code/marcogorelli/m5-forecasting-feature-engineering-benchmark

ritchie46 · 2024-09-14T11:10:30Z

m5-forecasting-feature-engineering/polars_queries.py

+
+
+def q2_polars(df):
+    return df.with_columns(


Can we use the select + explode mapping here?

ritchie46 · 2024-09-14T11:14:47Z

m5-forecasting-feature-engineering/README.md

+Participants typically used pandas (Polars was only just getting started at the time), so here we benchmark how long it have
+taken to do the same feature engineering with Polars (and, coming soon, DuckDB).
+
+We believe this to be a useful task to benchmark, because:


I think we can remove L9-L12.

I think this can serve as a basis for more time-series related benchmarks on this datasets. I don't think we have to strictly limit to what was used in the kaggle competition.

MarcoGorelli · 2024-11-10T14:37:53Z

Just got back to this - running locally, I'm seeing very good results for Polars:

*** polars lazy ***
q1 took: 20.075744923997263
q2 took: 24.77264992900018
q3 took: 28.980969234995428
(polars-benchmark) marcogorelli@DESKTOP-U8OKFP3:~/polars-benchmark/m5-forecasting-feature-engineering$ python duckdb_queries.py 
*** duckdb ***
q1 took: 176.87156045399752
q2 took: 101.69693301500229
q3 took: 115.6844151769983

init: m5 forecasting FE benchmark

bab1673

MarcoGorelli force-pushed the m5-fe branch from 62062f8 to bab1673 Compare September 4, 2024 13:26

lint

df2ccb5

MarcoGorelli mentioned this pull request Sep 4, 2024

group_by+explode more than 3x faster than over pola-rs/polars#18556

Closed

2 tasks

ritchie46 force-pushed the main branch from 6f7780e to cf31c4d Compare September 13, 2024 14:04

ritchie46 reviewed Sep 14, 2024

View reviewed changes

MarcoGorelli marked this pull request as draft September 14, 2024 11:15

add some tests, rewrite using select + mapping strategy explode

0bf9fcd

MarcoGorelli force-pushed the m5-fe branch from 3ac2b04 to 0bf9fcd Compare November 10, 2024 14:33

MarcoGorelli marked this pull request as ready for review November 10, 2024 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

init: m5 forecasting FE benchmark#136

init: m5 forecasting FE benchmark#136
MarcoGorelli wants to merge 3 commits intopola-rs:mainfrom
MarcoGorelli:m5-fe

MarcoGorelli commented Sep 4, 2024 •

edited

Loading

Uh oh!

ritchie46 Sep 14, 2024

Uh oh!

ritchie46 Sep 14, 2024

Uh oh!

MarcoGorelli commented Nov 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

MarcoGorelli commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ritchie46 Sep 14, 2024

Choose a reason for hiding this comment

Uh oh!

ritchie46 Sep 14, 2024

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Nov 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarcoGorelli commented Sep 4, 2024 •

edited

Loading