-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() #10656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR introduces a bug fix and a breaking changes: 1. The default backend ``engine`` used by `Dataset.to_netcdf` and `DataTree.to_netcdf` is now chosen consistently with `open_dataset` and `open_datatree`, using whichever netCDF libraries are available and preferring netCDF4 to h5netcdf to scipy. Previously, `DataTree.to_netcdf` was hard-coded to use h5netcdf. 2. The return value of `Dataset.to_netcdf` without ``path`` is now a ``memoryview`` object instead of ``bytes``. This removes an unnecessary memory copy and ensures consistency when using either ``engine="scipy"`` or ``engine="h5netcdf"``. Fixes pydata#10654
@dataclass | ||
class BytesIOProxy(Generic[BytesOrMemory]): | ||
"""Proxy object for a write that returns either bytes or a memoryview.""" | ||
class BytesIOProxy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'm keeping around BytesIOProxy because we'll need it for #10624
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize the PR fixing the issue was open already. Thanks for opening it. I have added a comment regarding adding this as a breaking change. I realized about the issue because code that has run successfully in CI since the introduction of DataTree in xarray stopped working with 2025.8.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
@flamingbear and I had encountered #10654 causing issues in the wild (trying to use dt.to_netcdf()
with the default engine when only netcdf4
was available as an engine). Looks like you beat me to a fix!
I tested this branch locally and it works on my example (a TEMPO netCDF4 granule).
Anyone else want to take a look here? I'd love to merge this in the next few days to fix this regression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me. One minor question about phrasing in whats-new.rst.
doc/whats-new.rst
Outdated
libraries are available and valid, and preferring netCDF4 to h5netcdf to scipy | ||
(:issue:`10654`). This will change the default backend in some edge cases | ||
(e.g., from scipy to netCDF4 when writing to a file-like object or bytes). To | ||
avoid around these new defaults, set ``engine`` explicitly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stumbled over this phrase, should it read, "to avoid these new defaults" or "to work around these new defaults"? Maybe "to bypass" or "to override"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this, only found a typo. and had a probably incorrect assumption about a variable name.
xarray/backends/api.py
Outdated
if to_file_or_memoryview: | ||
candidates.remove("netcdf4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this exclude netcdf4
for writing datatrees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, nope. And I see memoryview is only available in h5netcdf. and only when writing to memoryview or filelike (non string) objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. We don't support writing memoryviews with netCDF4-python yet, but it it should work fine for writing netCDF files (including DataTree) to disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to_file_or_memoryview
to to_fileobject_or_memoryview
. Hopefully that makes this more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while, but I got there, thanks. And I was able to use this version to write datatrees with just netcdf4.
Thanks everyone for the reviews! |
This PR introduces two breaking changes:
engine
used byDataset.to_netcdf
andDataTree.to_netcdf
is now chosen consistently withopen_dataset
andopen_datatree
, using whichever netCDF libraries are available and valid, and preferring netCDF4 to h5netcdf to scipy. Previously,DataTree.to_netcdf
was hard-coded to use scipy for writing to file-like objects or bytes, andDataTree.to_netcdf
was hard-coded to use h5netcdf.Dataset.to_netcdf
withoutpath
is now amemoryview
object instead ofbytes
. This removes an unnecessary memory copy and ensures consistency when using eitherengine="scipy"
orengine="h5netcdf"
.It also includes a minor bug-fix, raising an error when returning a memoryview with
compute=False
Fixes #10654
whats-new.rst