Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() #10656

shoyer · 2025-08-19T19:15:37Z

This PR introduces two breaking changes:

The default backend engine used by Dataset.to_netcdf and DataTree.to_netcdf is now chosen consistently with open_dataset and open_datatree, using whichever netCDF libraries are available and valid, and preferring netCDF4 to h5netcdf to scipy. Previously, DataTree.to_netcdf was hard-coded to use scipy for writing to file-like objects or bytes, and DataTree.to_netcdf was hard-coded to use h5netcdf.
The return value of Dataset.to_netcdf without path is now a memoryview object instead of bytes. This removes an unnecessary memory copy and ensures consistency when using either engine="scipy" or engine="h5netcdf".

It also includes a minor bug-fix, raising an error when returning a memoryview with compute=False

Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR introduces a bug fix and a breaking changes: 1. The default backend ``engine`` used by `Dataset.to_netcdf` and `DataTree.to_netcdf` is now chosen consistently with `open_dataset` and `open_datatree`, using whichever netCDF libraries are available and preferring netCDF4 to h5netcdf to scipy. Previously, `DataTree.to_netcdf` was hard-coded to use h5netcdf. 2. The return value of `Dataset.to_netcdf` without ``path`` is now a ``memoryview`` object instead of ``bytes``. This removes an unnecessary memory copy and ensures consistency when using either ``engine="scipy"`` or ``engine="h5netcdf"``. Fixes pydata#10654

shoyer · 2025-08-19T19:22:07Z

xarray/backends/common.py

 @dataclass
-class BytesIOProxy(Generic[BytesOrMemory]):
-    """Proxy object for a write that returns either bytes or a memoryview."""
+class BytesIOProxy:


Note: I'm keeping around BytesIOProxy because we'll need it for #10624

OriolAbril

I didn't realize the PR fixing the issue was open already. Thanks for opening it. I have added a comment regarding adding this as a breaking change. I realized about the issue because code that has run successfully in CI since the introduction of DataTree in xarray stopped working with 2025.8.0.

doc/whats-new.rst

owenlittlejohns

This looks great!

@flamingbear and I had encountered #10654 causing issues in the wild (trying to use dt.to_netcdf() with the default engine when only netcdf4 was available as an engine). Looks like you beat me to a fix!

I tested this branch locally and it works on my example (a TEMPO netCDF4 granule).

shoyer · 2025-09-01T22:44:07Z

Anyone else want to take a look here? I'd love to merge this in the next few days to fix this regression.

kmuehlbauer

This is looking good to me. One minor question about phrasing in whats-new.rst.

kmuehlbauer · 2025-09-02T05:03:35Z

doc/whats-new.rst

+  libraries are available and valid, and preferring netCDF4 to h5netcdf to scipy
+  (:issue:`10654`). This will change the default backend in some edge cases
+  (e.g., from scipy to netCDF4 when writing to a file-like object or bytes). To
+  avoid around these new defaults, set ``engine`` explicitly.


Stumbled over this phrase, should it read, "to avoid these new defaults" or "to work around these new defaults"? Maybe "to bypass" or "to override"?

flamingbear

I went through this, only found a typo. and had a probably incorrect assumption about a variable name.

doc/whats-new.rst

flamingbear · 2025-09-02T17:50:48Z

xarray/backends/api.py

+    if to_file_or_memoryview:
+        candidates.remove("netcdf4")


Doesn't this exclude netcdf4 for writing datatrees?

Ok, nope. And I see memoryview is only available in h5netcdf. and only when writing to memoryview or filelike (non string) objects.

Correct. We don't support writing memoryviews with netCDF4-python yet, but it it should work fine for writing netCDF files (including DataTree) to disk.

I renamed to_file_or_memoryview to to_fileobject_or_memoryview. Hopefully that makes this more obvious.

It took me a while, but I got there, thanks. And I was able to use this version to write datatrees with just netcdf4.

shoyer · 2025-09-02T20:42:48Z

Thanks everyone for the reviews!

github-actions bot added topic-backends topic-DataTree Related to the implementation of a DataTree class io labels Aug 19, 2025

shoyer added 2 commits August 19, 2025 12:16

Add PR number to whatsnew

3fd0de4

Consistently use BytesIOProxy

bfea52d

shoyer commented Aug 19, 2025

View reviewed changes

shoyer added 2 commits August 19, 2025 12:24

Fix test_engine

594b122

Clarify whats new

af7167b

shoyer mentioned this pull request Aug 19, 2025

Should Xarray prefer h5netcdf and scipy to netCDF4? #10657

Open

shoyer changed the title ~~Improve consistency and engine keyword argument for to_netcdf()~~ Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() Aug 19, 2025

OriolAbril reviewed Aug 25, 2025

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

Clarify whats new

aab1155

owenlittlejohns approved these changes Aug 29, 2025

View reviewed changes

kmuehlbauer approved these changes Sep 2, 2025

View reviewed changes

shoyer added 2 commits September 2, 2025 09:41

tweak whats-new

6e4b6e0

Merge branch 'main' into to-netcdf-engine-fix

ebc4de7

flamingbear mentioned this pull request Sep 2, 2025

DAS-2411: Write Inherited Coordinates so that panoply can plot all variables in output file nasa/harmony-metadata-annotator#23

Closed

4 tasks

flamingbear approved these changes Sep 2, 2025

View reviewed changes

shoyer added 3 commits September 2, 2025 12:46

Fix release note typo

5628d32

Rename to_file_or_memoryview

34b0317

test fixes

b0f514f

shoyer merged commit 722f0ad into pydata:main Sep 2, 2025
37 checks passed

shoyer deleted the to-netcdf-engine-fix branch September 2, 2025 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() #10656

Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() #10656

Uh oh!

shoyer commented Aug 19, 2025 •

edited

Loading

Uh oh!

shoyer Aug 19, 2025

Uh oh!

OriolAbril left a comment

Uh oh!

Uh oh!

owenlittlejohns left a comment

Uh oh!

shoyer commented Sep 1, 2025

Uh oh!

kmuehlbauer left a comment

Uh oh!

kmuehlbauer Sep 2, 2025

Uh oh!

flamingbear left a comment

Uh oh!

Uh oh!

flamingbear Sep 2, 2025

Uh oh!

flamingbear Sep 2, 2025

Uh oh!

shoyer Sep 2, 2025

Uh oh!

shoyer Sep 2, 2025

Uh oh!

flamingbear Sep 2, 2025

Uh oh!

Uh oh!

shoyer commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() #10656

Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() #10656

Uh oh!

Conversation

shoyer commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

owenlittlejohns left a comment

Choose a reason for hiding this comment

Uh oh!

shoyer commented Sep 1, 2025

Uh oh!

kmuehlbauer left a comment

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

flamingbear left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flamingbear Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

flamingbear Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

shoyer Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

shoyer Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

flamingbear Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shoyer commented Sep 2, 2025

Uh oh!

Uh oh!

shoyer commented Aug 19, 2025 •

edited

Loading