-
-
Notifications
You must be signed in to change notification settings - Fork 378
chore!: replace direct download of nyc_taxi with hvsampledata #1462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
hoxbro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should remove pyct + setuptools from pyproject.toml + pixi.toml
| "import hvsampledata as hvs\n", | ||
| "\n", | ||
| "df = pd.read_csv('../data/nyc_taxi.csv', usecols=['dropoff_x', 'dropoff_y'])\n", | ||
| "df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", | |
| "df = hvs.nyc_taxi_remote(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", |
Also, mention that this is the first time this cell is run.
pixi.toml
Outdated
| setuptools = "*" # distutils for pyct | ||
| toolz = "*" | ||
| xarray = "*" | ||
| fastparquet = "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that this has not indirectly pinned anything like pandas.
CodSpeed Performance ReportMerging #1462 will improve performances by 41.46%Comparing Summary
Benchmarks breakdown
|
maximlt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be first deprecated before being removed https://holoviz.org/about/heps/hep2.html.
Are you okay with not having pyct as a required dependency? but still having the entrypoint and explaining that:
This is also a little bit weird, because we don't break running code, but "only" the CLI. |
|
The CLI is part of public interface so I think it deserves the same treatment as the API. I don't see an urgent reason to immediately remove pyct from the dependencies. I'm in favor of simply adding deprecation warnings, and removing the whole thing in a few releases. |
datashader/__init__.py
Outdated
| from . import transfer_functions as tf # noqa (API import) | ||
| from . import data_libraries # noqa (API import) | ||
|
|
||
| with suppress(ImportError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right place it should be in main.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
| 'numpy', | ||
| 'pandas', | ||
| 'param', | ||
| 'pyct', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think pyct should be removed yet from the dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not removed totally. Just moved to optional dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these changes if I install datashader and run datashader example it breaks. So that's a breaking change. The goal is to deprecate it so people get a chance to stop using datashader example. Then pyct can be removed entirely from the dependencies.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1462 +/- ##
==========================================
- Coverage 88.34% 88.31% -0.03%
==========================================
Files 96 96
Lines 18932 18936 +4
==========================================
- Hits 16725 16724 -1
- Misses 2207 2212 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I added a temporary patch to allow the CI to pass pending a new hvsampledata release but otherwise I think it's ready for review now. |
| gds.get_path("geoda health") | ||
|
|
||
|
|
||
| with suppress(ImportError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hvs.nyc_taxi_remote("download-only") # return None or pathI think we should add something like this in hvsampledata to avoid all of this boilerplate. This will likely also make it so we can move pyarrow back in pixi.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the goal of this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of the script is to pre-download the data, so it doesn't fail in CI tests later on, and to avoid download bars in examples.
The goal of the code was likely to download it, without the latest hvsampledata.
pyproject.toml
Outdated
| # 2025-09 | ||
| "ignore:Signature .* for <class 'numpy.longdouble'>.*:UserWarning", | ||
| # 2025-10 | ||
| "ignore:The 'pyct' package bundled as a datashader dependency is deprecated since version 0.19 and will be removed in version 0.20.*" # https://github.com/holoviz/datashader/pull/1462 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this filtering. Running datashader in Python (and not CLI) should never show this warning.
This PR replaces the direct download of the large
nyc_taxidata with the version from hvsampledata.Testing with an editable install of hvsampledata