-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
A-io-csvArea: reading/writing CSV filesArea: reading/writing CSV filesA-streamingRelated to the streaming engineRelated to the streaming engineP-mediumPriority: mediumPriority: mediumacceptedReady for implementationReady for implementationbugSomething isn't workingSomething isn't workingperformancePerformance issues or improvementsPerformance issues or improvementspythonRelated to Python PolarsRelated to Python PolarsregressionIssue introduced by a new releaseIssue introduced by a new release
Description
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Any query such as
import polars as pl
import time
target = "./tt.csv"
n_rows = 100_000_000
df = (
pl.DataFrame()
.with_columns(pl.int_range(n_rows).alias("a"))
.with_columns(pl.col.a.cast(pl.String).alias("b"))
)
ref_schema = pl.Schema({"a": pl.Int64, "b": pl.String})
df.lazy().sink_csv(target)
print("__start__ scan_csv", flush=True)
start = time.perf_counter()
out = (
pl.scan_csv(target, schema=ref_schema)
.select(pl.len())
.collect(engine="streaming")
)
end = time.perf_counter()
print(out)
print(f"duration: {((end - start) * 1000):.3f} milliseconds", flush=True)Log output
$ for v in 1.35.1 1.35.2; do echo $v; pip install polars==$v --upgrade --target /tmp/polars_specific && PYTHONPATH=/tmp/polars_specific python perf_regr
ession_mre.py; done
1.35.1
Collecting polars==1.35.1
Using cached polars-1.35.1-py3-none-any.whl.metadata (10 kB)
Collecting polars-runtime-32==1.35.1 (from polars==1.35.1)
Using cached polars_runtime_32-1.35.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.5 kB)
Using cached polars-1.35.1-py3-none-any.whl (783 kB)
Using cached polars_runtime_32-1.35.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.3 MB)
Installing collected packages: polars-runtime-32, polars
Successfully installed polars-1.35.1 polars-runtime-32-1.35.1
[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: pip install --upgrade pip
__start__ scan_csv
shape: (1, 1)
┌───────────┐
│ len │
│ --- │
│ u32 │
╞═══════════╡
│ 100000000 │
└───────────┘
duration: 52.978 milliseconds
1.35.2
Collecting polars==1.35.2
Using cached polars-1.35.2-py3-none-any.whl.metadata (10 kB)
Collecting polars-runtime-32==1.35.2 (from polars==1.35.2)
Using cached polars_runtime_32-1.35.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.5 kB)
Using cached polars-1.35.2-py3-none-any.whl (783 kB)
Using cached polars_runtime_32-1.35.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.3 MB)
Installing collected packages: polars-runtime-32, polars
Successfully installed polars-1.35.2 polars-runtime-32-1.35.2
[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: pip install --upgrade pip
__start__ scan_csv
shape: (1, 1)
┌───────────┐
│ len │
│ --- │
│ u32 │
╞═══════════╡
│ 100000000 │
└───────────┘
duration: 242.239 millisecondsIssue description
There is a significant performance regression, from 53 ms to 242 ms in the given example, bisected to:
#25179
Following this PR, the line count is running single-threaded.
Expected behavior
No regression.
Installed versions
See MRE.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-io-csvArea: reading/writing CSV filesArea: reading/writing CSV filesA-streamingRelated to the streaming engineRelated to the streaming engineP-mediumPriority: mediumPriority: mediumacceptedReady for implementationReady for implementationbugSomething isn't workingSomething isn't workingperformancePerformance issues or improvementsPerformance issues or improvementspythonRelated to Python PolarsRelated to Python PolarsregressionIssue introduced by a new releaseIssue introduced by a new release
Type
Projects
Status
Ready