Skip to content

Conversation

@Azaya89
Copy link
Contributor

@Azaya89 Azaya89 commented Sep 26, 2025

This PR replaces the direct download of the large nyc_taxi data with the version from hvsampledata.

Testing with an editable install of hvsampledata

import hvsampledata as hvs
import datashader as ds

df = hvs.nyc_taxi_remote("pandas", engine_kwargs={"columns": ['dropoff_x', 'dropoff_y']})
print(df.head())

agg = ds.Canvas().points(df, 'dropoff_x', 'dropoff_y')
print(agg.shape)
   dropoff_x  dropoff_y
0 -8234835.5  4975627.0
1 -8237020.5  4976875.0
2 -8232279.0  4986477.0
3 -8238124.0  4971127.0
4 -8238107.5  4974457.0
(600, 600)

@Azaya89 Azaya89 requested a review from Copilot September 26, 2025 13:26

This comment was marked as outdated.

@Azaya89 Azaya89 self-assigned this Sep 26, 2025
Copy link
Member

@hoxbro hoxbro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove pyct + setuptools from pyproject.toml + pixi.toml

"import hvsampledata as hvs\n",
"\n",
"df = pd.read_csv('../data/nyc_taxi.csv', usecols=['dropoff_x', 'dropoff_y'])\n",
"df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n",
"df = hvs.nyc_taxi_remote(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n",

Also, mention that this is the first time this cell is run.

pixi.toml Outdated
setuptools = "*" # distutils for pyct
toolz = "*"
xarray = "*"
fastparquet = "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that this has not indirectly pinned anything like pandas.

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 26, 2025

CodSpeed Performance Report

Merging #1462 will improve performances by 41.46%

Comparing hvsampledata (a916499) with main (a5ae17c)

Summary

⚡ 1 improvement
✅ 42 untouched

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
Instrumentation test_layout[forceatlas2_layout] 72.2 ms 51 ms +41.46%

@hoxbro hoxbro changed the title chore: replace direct download of nyc_taxi with hvsampledata chore!: replace direct download of nyc_taxi with hvsampledata Sep 27, 2025
Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be first deprecated before being removed https://holoviz.org/about/heps/hep2.html.

@hoxbro
Copy link
Member

hoxbro commented Oct 9, 2025

I think this should be first deprecated before being removed holoviz.org/about/heps/hep2.html.

Are you okay with not having pyct as a required dependency? but still having the entrypoint and explaining that:

  1. the entrypoint is deprecated, but users need to install pyct for this to work and,
  2. maybe guide people to use hvsampledata if they want to download data?

This is also a little bit weird, because we don't break running code, but "only" the CLI.

@maximlt
Copy link
Member

maximlt commented Oct 10, 2025

The CLI is part of public interface so I think it deserves the same treatment as the API. I don't see an urgent reason to immediately remove pyct from the dependencies. I'm in favor of simply adding deprecation warnings, and removing the whole thing in a few releases.

from . import transfer_functions as tf # noqa (API import)
from . import data_libraries # noqa (API import)

with suppress(ImportError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place it should be in main.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

'numpy',
'pandas',
'param',
'pyct',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pyct should be removed yet from the dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not removed totally. Just moved to optional dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes if I install datashader and run datashader example it breaks. So that's a breaking change. The goal is to deprecate it so people get a chance to stop using datashader example. Then pyct can be removed entirely from the dependencies.

@codecov
Copy link

codecov bot commented Oct 15, 2025

Codecov Report

❌ Patch coverage is 22.22222% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.31%. Comparing base (a5ae17c) to head (a916499).

Files with missing lines Patch % Lines
datashader/__main__.py 0.00% 5 Missing ⚠️
datashader/__init__.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1462      +/-   ##
==========================================
- Coverage   88.34%   88.31%   -0.03%     
==========================================
  Files          96       96              
  Lines       18932    18936       +4     
==========================================
- Hits        16725    16724       -1     
- Misses       2207     2212       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Azaya89 Azaya89 marked this pull request as ready for review October 15, 2025 17:32
@Azaya89
Copy link
Contributor Author

Azaya89 commented Oct 15, 2025

I added a temporary patch to allow the CI to pass pending a new hvsampledata release but otherwise I think it's ready for review now.

@Azaya89 Azaya89 requested review from Copilot, hoxbro and maximlt October 15, 2025 17:37

This comment was marked as spam.

@holoviz holoviz deleted a comment from Copilot AI Oct 15, 2025
gds.get_path("geoda health")


with suppress(ImportError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hvs.nyc_taxi_remote("download-only")  # return None or path

I think we should add something like this in hvsampledata to avoid all of this boilerplate. This will likely also make it so we can move pyarrow back in pixi.toml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal of this code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the script is to pre-download the data, so it doesn't fail in CI tests later on, and to avoid download bars in examples.

The goal of the code was likely to download it, without the latest hvsampledata.

pyproject.toml Outdated
# 2025-09
"ignore:Signature .* for <class 'numpy.longdouble'>.*:UserWarning",
# 2025-10
"ignore:The 'pyct' package bundled as a datashader dependency is deprecated since version 0.19 and will be removed in version 0.20.*" # https://github.com/holoviz/datashader/pull/1462
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this filtering. Running datashader in Python (and not CLI) should never show this warning.

@hoxbro hoxbro added this to the v0.19.0 milestone Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants