Skip to content

Conversation

@paraseba
Copy link
Collaborator

No description provided.

@paraseba paraseba requested a review from aladinor July 31, 2025 18:20
@paraseba paraseba marked this pull request as draft July 31, 2025 20:03
@paraseba paraseba marked this pull request as ready for review August 21, 2025 21:33
@dcherian
Copy link
Contributor

To experiment, I did

import icechunk as ic
import xarray as xr

storage = ic.s3_storage(
   bucket="icechunk-public-data",
   prefix="v1/era5_weatherbench2",
   region="us-east-1",
   anonymous=True,
)

repo = ic.Repository.open(storage=storage)
manifests = repo.lookup_snapshot(repo.lookup_branch('main')).manifests
json.loads(repo.inspect_manifest(manifests[-1].id))
{'id': 'ZSBZ88JV8HKHNRQBYM40',
 'size_bytes': 455624,
 'total_chunk_refs': 8760,
 'arrays': [{'node_id': 'SQDGS52B4TAWG', 'num_chunk_refs': 8760}]}

Some comments:

  1. repo.inspect_manifest gives me a string. Since we are returning json, can we use json.loads on the python side?
  2. Even with that, as a user, this is less useful because I don't know what array the NodeId refers to, can we add the path too?
  3. As a user, I think i'm more interested in looking up the manifests for a particular array. So perhaps we need repo.lookup_manifests("path/to/array", snapshot_id=repo.lookup_snapshot('main')) -> list[ManifestInfo] or perhaps a full summary (total nbytes, total refs, num of manifests, list[manifestInfo])`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants