URL syntax for icechunk

I'm in the process of implementing support for ZEP 8-style URL syntax in Neuroglancer, and in conjunction with that am planning to also add support for icechunk format.

Here are some examples of existing URLs that are supported or will be supported:

```
gs://bucket/path/to/array/|zarr3:
gs://bucket/path/to/group/|zarr3:path/to/array/
gs://bucket/path/to/file.zip|zip:path/to/array/|zarr3:
gs://bucket/path/to/file.zip|zip:path/to/nested.zip|zip:path/to/array/|zarr3:
gs://bucket/path/to/ocdbt/|ocdbt:path/to/array/|zarr3:
```

Note that the URL consists of a "pipeline" of |-separated components, where the first component must be a base kvstore protocol (e.g. gs, s3, http), followed by zero or more kvstore adapter schemes, like zip, followed by a data format scheme, e.g. zarr2, zarr3, precomputed, n5, etc.  There is also format auto-detection, which can add necessary kvstore adapter schemes and the final data format scheme automatically.  For example, if you type just `gs://bucket/path/to/file.zip` then it will first get completed to `gs://bucket/path/to/file.zip|zip:` (based on the content, not the filename) and then if there is a zarr array or group at the root within the zip file it will get further completed to `gs://bucket/path/to/file.zip|zip:|zarr3:`.

Note that whether some part of the path goes before or after the `|zarr:` currently doesn't matter, but if a group storage transformer were used, it would matter.  I'm planning to normalize urls so that the outer kvstore url points to the topmost valid zarr v3 group.  E.g. if we normalize to `gs://bucket/path/to/group/|zarr3:path/to/array/` then that means:
- gs://bucket/path/to/zarr.json does NOT exist, but
- gs://bucket/path/to/group/zarr.json does exist, and
- gs://bucket/path/to/group/path/zarr.json does exist and
- gs://bucket/path/to/group/path/to/zarr.json does exist and
- gs://bucket/path/to/group/path/to/array/zarr.json does exist

With this background out of the way, for icechunk, there are two questions:
1. How to encode the branch / tag / snapshot in the URL
2. Whether to treat icechunk as a key-value store adapter, or as a final data format in place of zarr3.

Some possible options:

```
gs://bucket/path/to/icechunk_repo/|icechunk:branch.main/path/to/array/|
gs://bucket/path/to/icechunk_repo/|icechunk:branch.main@path/to/array/|
gs://bucket/path/to/icechunk_repo/|icechunk:branch.main|zarr3:path/to/array/|

gs://bucket/path/to/icechunk_repo/refs/branch.main/|icechunk:path/to/array/|
gs://bucket/path/to/icechunk_repo/refs/branch.main/|icechunk:|zarr3:path/to/array/|
```

Neuroglancer needs a URL syntax to support it at all, unlike e.g. zarr-python, but it would be nice to standardize on a syntax that will also be supported by other tools in the future.

Choosing a URL syntax that includes a final |zarr3: component most closely corresponds to the current zarr-python integration where icechunk just behaves as a key-value store and translates the metadata and chunks back to the standard zarr v3 metadata encoding and key encoding.   I think the right choice, though, depends on how you expect to evolve icechunk in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

URL syntax for icechunk #576

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

URL syntax for icechunk #576

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions