Skip to content

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Aug 19, 2025

This PR introduces two breaking changes:

  1. The default backend engine used by Dataset.to_netcdf and DataTree.to_netcdf is now chosen consistently with open_dataset and open_datatree, using whichever netCDF libraries are available and valid, and preferring netCDF4 to h5netcdf to scipy. Previously, DataTree.to_netcdf was hard-coded to use scipy for writing to file-like objects or bytes, and DataTree.to_netcdf was hard-coded to use h5netcdf.
  2. The return value of Dataset.to_netcdf without path is now a memoryview object instead of bytes. This removes an unnecessary memory copy and ensures consistency when using either engine="scipy" or engine="h5netcdf".

It also includes a minor bug-fix, raising an error when returning a memoryview with compute=False

Fixes #10654

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR introduces a bug fix and a breaking changes:

1. The default backend ``engine`` used by `Dataset.to_netcdf`
   and `DataTree.to_netcdf` is now chosen consistently with
   `open_dataset` and `open_datatree`, using whichever netCDF
   libraries are available and preferring netCDF4 to h5netcdf to scipy.
   Previously, `DataTree.to_netcdf` was hard-coded to use h5netcdf.
2. The return value of `Dataset.to_netcdf` without ``path`` is
   now a ``memoryview`` object instead of ``bytes``. This removes an unnecessary
   memory copy and ensures consistency when using either ``engine="scipy"`` or
   ``engine="h5netcdf"``.

Fixes pydata#10654
@github-actions github-actions bot added topic-backends topic-DataTree Related to the implementation of a DataTree class io labels Aug 19, 2025
@dataclass
class BytesIOProxy(Generic[BytesOrMemory]):
"""Proxy object for a write that returns either bytes or a memoryview."""
class BytesIOProxy:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm keeping around BytesIOProxy because we'll need it for #10624

@shoyer shoyer changed the title Improve consistency and engine keyword argument for to_netcdf() Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() Aug 19, 2025
Copy link
Contributor

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize the PR fixing the issue was open already. Thanks for opening it. I have added a comment regarding adding this as a breaking change. I realized about the issue because code that has run successfully in CI since the introduction of DataTree in xarray stopped working with 2025.8.0.

Copy link
Contributor

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

@flamingbear and I had encountered #10654 causing issues in the wild (trying to use dt.to_netcdf() with the default engine when only netcdf4 was available as an engine). Looks like you beat me to a fix!

I tested this branch locally and it works on my example (a TEMPO netCDF4 granule).

@shoyer
Copy link
Member Author

shoyer commented Sep 1, 2025

Anyone else want to take a look here? I'd love to merge this in the next few days to fix this regression.

Copy link
Contributor

@kmuehlbauer kmuehlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me. One minor question about phrasing in whats-new.rst.

libraries are available and valid, and preferring netCDF4 to h5netcdf to scipy
(:issue:`10654`). This will change the default backend in some edge cases
(e.g., from scipy to netCDF4 when writing to a file-like object or bytes). To
avoid around these new defaults, set ``engine`` explicitly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stumbled over this phrase, should it read, "to avoid these new defaults" or "to work around these new defaults"? Maybe "to bypass" or "to override"?

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this, only found a typo. and had a probably incorrect assumption about a variable name.

Comment on lines 122 to 123
if to_file_or_memoryview:
candidates.remove("netcdf4")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this exclude netcdf4 for writing datatrees?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, nope. And I see memoryview is only available in h5netcdf. and only when writing to memoryview or filelike (non string) objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We don't support writing memoryviews with netCDF4-python yet, but it it should work fine for writing netCDF files (including DataTree) to disk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to_file_or_memoryview to to_fileobject_or_memoryview. Hopefully that makes this more obvious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while, but I got there, thanks. And I was able to use this version to write datatrees with just netcdf4.

@shoyer shoyer merged commit 722f0ad into pydata:main Sep 2, 2025
37 checks passed
@shoyer
Copy link
Member Author

shoyer commented Sep 2, 2025

Thanks everyone for the reviews!

@shoyer shoyer deleted the to-netcdf-engine-fix branch September 2, 2025 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io topic-backends topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataTree.to_netcdf has h5netcdf hardcoded as default
5 participants