Skip to content

Add CLI for converting v2 metadata to v3 #3257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 66 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
45bb4e5
add rough cli converter structure
K-Meech Jul 1, 2025
456c9e7
allow zstd, gzip and numcodecs zarr 3 compression
K-Meech Jul 1, 2025
242a338
convert filters to v3
K-Meech Jul 1, 2025
1045c33
create BytesCodec with correct endian
K-Meech Jul 1, 2025
4e2442f
handle C vs F order in v2 metadata
K-Meech Jul 1, 2025
c63f0b8
save group and array metadata to file
K-Meech Jul 2, 2025
2947ce4
create overall conversion functions for store, array or group
K-Meech Jul 2, 2025
ba81755
add minimal typer cli
K-Meech Jul 3, 2025
67f9580
add initial tests for converter
K-Meech Jul 3, 2025
0d7c2c8
add tests for conversion of groups and nested groups and arrays
K-Meech Jul 3, 2025
cf39580
add tests for conversion of compressors and filters
K-Meech Jul 3, 2025
11499e7
test conversion of order and endianness
K-Meech Jul 3, 2025
90b0996
add tests for edge cases of incorrect codecs
K-Meech Jul 3, 2025
85159bb
add tests for / separator
K-Meech Jul 4, 2025
53ba166
draft of metadata remover and add test for internal paths
K-Meech Jul 7, 2025
d4cdc04
add clear command to cli with tests
K-Meech Jul 7, 2025
dfdc729
add test for metadata removal with path#
K-Meech Jul 7, 2025
ad60991
add verbose logging option
K-Meech Jul 7, 2025
66bae0d
add dry run option to cli
K-Meech Jul 8, 2025
97df9bf
add test for dry-run
K-Meech Jul 8, 2025
42e0435
add zarr-converter script and enable cli dep in tests
K-Meech Jul 9, 2025
9e20b39
use v2 chunk key encoding type
K-Meech Jul 9, 2025
6586e66
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech Jul 14, 2025
ce409a3
update endianness of test data type
K-Meech Jul 14, 2025
fb7136b
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech Jul 16, 2025
6585f24
check converted arrays can be accessed
K-Meech Jul 16, 2025
46e958d
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech Jul 16, 2025
08fc138
remove uses of pathlib walk, as it didn't exist in python 3.11
K-Meech Jul 16, 2025
3540434
include tags in checkout for gpu test, to avoid numcodecs.zarr3 reque…
K-Meech Jul 16, 2025
0889979
rename cli commands from review comments
K-Meech Jul 23, 2025
d906dba
remove path option
K-Meech Jul 23, 2025
5e03e3c
allow metadata to be written to a separate store location
K-Meech Jul 24, 2025
89aa095
add overwrite and remove-v2-metadata options
K-Meech Jul 24, 2025
ade9c3b
add force option
K-Meech Jul 24, 2025
218e8a8
use v2, v3 format for CLI
K-Meech Jul 24, 2025
49787f6
split into convert_group and convert_array functions
K-Meech Jul 24, 2025
488485c
update command names in converter tests
K-Meech Jul 24, 2025
18487c9
update test filename to reflect command name change
K-Meech Jul 24, 2025
a5cd760
fix tests for sub-groups
K-Meech Jul 24, 2025
bde452f
add tests for --force
K-Meech Jul 24, 2025
671c5e3
add test for migrating to separate output location
K-Meech Jul 24, 2025
0281cc1
add test for remove-v2-metadata option
K-Meech Jul 25, 2025
2ffe854
update test names to match command name
K-Meech Jul 25, 2025
432eae6
add test for --remove-v2-metadata with separate output location
K-Meech Jul 25, 2025
7cb42c5
merge upstream changes
K-Meech Jul 25, 2025
6e6788d
separate cli fixtures from the tests
K-Meech Jul 25, 2025
4abc84a
add test for overwrite option in separate location
K-Meech Jul 25, 2025
0bdd6f8
fix failing test
K-Meech Jul 25, 2025
f2fa389
small fixes to tests
K-Meech Jul 25, 2025
4d98121
Merge pull request #1 from K-Meech/km/v2-v2-conversion-review
K-Meech Jul 28, 2025
649bb20
fix pre-commit errors
K-Meech Jul 28, 2025
dba4073
update docstrings with review comments
K-Meech Aug 1, 2025
b702060
pass filters and compressors to processing functions, rather than ful…
K-Meech Aug 1, 2025
b900a0e
use Store as input rather than StoreLike
K-Meech Aug 1, 2025
42aa7db
move conversion functions into public api
K-Meech Aug 1, 2025
d3fc21e
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech Aug 1, 2025
5c05c0c
merge upstream changes
K-Meech Aug 4, 2025
f62fe31
fail on discovery of consolidated metadata
K-Meech Aug 4, 2025
71067ba
minor changes from review
K-Meech Aug 6, 2025
34e97f0
use same logger throughout zarr-python
K-Meech Aug 6, 2025
9f6b875
add release notes and docs for the cli
K-Meech Aug 6, 2025
1362cc6
tidy up formatting of zarr.metadata api docs
K-Meech Aug 6, 2025
4ae3491
Merge branch 'main' of github.com:K-Meech/zarr-python into km/v2-v3-c…
K-Meech Aug 6, 2025
f301172
fix failing tests
K-Meech Aug 6, 2025
0449ef7
add a section about --verbose to the docs
K-Meech Aug 7, 2025
14b9cfd
Merge branch 'main' into km/v2-v3-conversion
d-v-b Aug 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/gpu_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ jobs:

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # grab all branches and tags
# - name: cuda-toolkit
# uses: Jimver/[email protected]
# id: cuda-toolkit
Expand Down
2 changes: 2 additions & 0 deletions changes/1798.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add a command-line interface to migrate v2 Zarr metadata to v3. Corresponding functions are also
provided under zarr.metadata.
127 changes: 127 additions & 0 deletions docs/user-guide/cli.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
.. _user-guide-cli:

Command-line interface
========================

Zarr-Python provides a command-line interface that enables:

- migration of Zarr v2 metadata to v3
- removal of v2 or v3 metadata

To see available commands run the following in a terminal:

.. code-block:: bash

$ zarr --help

or to get help on individual commands:

.. code-block:: bash

$ zarr migrate --help

$ zarr remove-metadata --help


Migrate metadata from v2 to v3
------------------------------

Migrate to a separate location
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To migrate a Zarr array/group's metadata from v2 to v3 run:

.. code-block:: bash

$ zarr migrate v3 path/to/input.zarr path/to/output.zarr

This will write new ``zarr.json`` files to ``output.zarr``, leaving ``input.zarr`` un-touched.
Note - this will migrate the entire Zarr hierarchy, so if ``input.zarr`` contains multiple groups/arrays,
new ``zarr.json`` will be made for all of them.

Migrate in-place
~~~~~~~~~~~~~~~~

If you'd prefer to migrate the metadata in-place run:

.. code-block:: bash

$ zarr migrate v3 path/to/input.zarr

This will write new ``zarr.json`` files to ``input.zarr``, leaving the existing v2 metadata un-touched.

To open the array/group using the new metadata use:

.. code-block:: python

>>> import zarr
>>> zarr_with_v3_metadata = zarr.open('path/to/input.zarr', zarr_format=3)

Once you are happy with the conversion, you can run the following to remove the old v2 metadata:

.. code-block:: bash

$ zarr remove-metadata v2 path/to/input.zarr

Note there is also a shortcut to migrate and remove v2 metadata in one step:

.. code-block:: bash

$ zarr migrate v3 path/to/input.zarr --remove-v2-metadata


Remove metadata
----------------

Remove v2 metadata using:

.. code-block:: bash

$ zarr remove-metadata v2 path/to/input.zarr

or v3 with:

.. code-block:: bash

$ zarr remove-metadata v3 path/to/input.zarr

By default, this will only allow removal of metadata if a valid alternative exists. For example, you can't
remove v2 metadata unless v3 metadata exists at that location.

To override this behaviour use ``--force``:

.. code-block:: bash

$ zarr remove-metadata v3 path/to/input.zarr --force


Dry run
--------
All commands provide a ``--dry-run`` option that will log changes that would be made on a real run, without creating
or modifying any files.

.. code-block:: bash

$ zarr migrate v3 path/to/input.zarr --dry-run

Dry run enabled - no new files will be created or changed. Log of files that would be created on a real run:
Saving metadata to path/to/input.zarr/zarr.json


Verbose
--------
You can also add ``--verbose`` **before** any command, to see a full log of its actions:

.. code-block:: bash

$ zarr --verbose migrate v3 path/to/input.zarr

$ zarr --verbose remove-metadata v2 path/to/input.zarr


Equivalent functions
--------------------
All features of the command-line interface are also available via functions under
:mod:`zarr.metadata`.


1 change: 1 addition & 0 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ User guide
storage
config
v3_migration
cli

Advanced Topics
---------------
Expand Down
6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ remote = [
gpu = [
"cupy-cuda12x",
]
cli = ["typer"]
# Development extras
test = [
"coverage>=7.10",
Expand Down Expand Up @@ -113,6 +114,9 @@ docs = [
'pytest'
]

[project.scripts]
zarr = "zarr._cli.cli:app"


[project.urls]
"Bug Tracker" = "https://github.com/zarr-developers/zarr-python/issues"
Expand Down Expand Up @@ -163,7 +167,7 @@ deps = ["minimal", "optional"]

[tool.hatch.envs.test.overrides]
matrix.deps.dependencies = [
{value = "zarr[remote, remote_tests, test, optional]", if = ["optional"]}
{value = "zarr[remote, remote_tests, test, optional, cli]", if = ["optional"]}
]

[tool.hatch.envs.test.scripts]
Expand Down
58 changes: 58 additions & 0 deletions src/zarr/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
import functools
import logging
from typing import Literal

from zarr._version import version as __version__
from zarr.api.synchronous import (
array,
Expand Down Expand Up @@ -37,6 +41,8 @@
# in case setuptools scm screw up and find version to be 0.0.0
assert not __version__.startswith("0.0.0")

_logger = logging.getLogger(__name__)


def print_debug_info() -> None:
"""
Expand Down Expand Up @@ -85,6 +91,58 @@ def print_packages(packages: list[str]) -> None:
print_packages(optional)


# The decorator ensures this always returns the same handler (and it is only
# attached once).
@functools.cache
def _ensure_handler() -> logging.Handler:
"""
The first time this function is called, attach a `StreamHandler` using the
same format as `logging.basicConfig` to the Zarr-Python root logger.

Return this handler every time this function is called.
"""
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter(logging.BASIC_FORMAT))
_logger.addHandler(handler)
return handler


def set_log_level(
level: Literal["NOTSET", "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
) -> None:
"""Set the logging level for Zarr-Python.

Zarr-Python uses the standard library `logging` framework under the root
logger 'zarr'. This is a helper function to:

- set Zarr-Python's root logger level
- set the root logger handler's level, creating the handler
if it does not exist yet

Parameters
----------
level : str
The logging level to set.
"""
_logger.setLevel(level)
_ensure_handler().setLevel(level)


def set_format(log_format: str) -> None:
"""Set the format of logging messages from Zarr-Python.

Zarr-Python uses the standard library `logging` framework under the root
logger 'zarr'. This sets the format of log messages from the root logger's StreamHandler.

Parameters
----------
log_format : str
A string determining the log format (as defined in the standard library's `logging` module
for logging.Formatter)
"""
_ensure_handler().setFormatter(logging.Formatter(fmt=log_format))


__all__ = [
"Array",
"AsyncArray",
Expand Down
Empty file added src/zarr/_cli/__init__.py
Empty file.
Loading
Loading