In rust, ome_zarr_metadata takes a similar strategy of representing and validating metadata without performing IO. However, a lot of OME-Zarr validation needs metadata from Node B in order to properly validate Node A (e.g. transformations which are paths to arrays).
The strategy I'm working on in a local branch is to allow all OME-Zarr objects to report any references they have, so that the caller can fetch those nodes' metadata however they deem appropriate, and then pass them back in for a second round of validation which only addresses the remote references. There's scope for a shallow version (where only direct references from the node of interest are gathered), or deep (where you keep expanding the graph whenever a referred-to node has its own references).
The API looks something like
@dataclass
class ContextValue:
zarr_metadata: ZarrV3Metadata
ome: OMEZarrMetadata
@dataclass
class ValidationContext:
node_path: Path
"""Absolute path to this Zarr node within the store."""
references: dict[Path, ContextValue]
"""Paths must be absolute within the store and must contain `node_path`."""
def index_relative(self, path: Path) -> ContextValue:
abs_path = self.node_path / path
return self.references[abs_path]
class ReferrerMixin:
"""API for an OME-Zarr metadata object which may refer to other Zarr nodes."""
@abstractmethod
def gather_references(self, paths: set[Path]):
pass
@abstractmethod
def validate_references(self, context: ValidationContext):
pass
class MultiscaleImageDataset(ReferrerMixin):
def gather_references(self, paths: set[Path]):
paths.add(self.path)
def validate_references(self, context: ValidationContext):
ctx = context.index_relative(self.path)
assert ctx.zarr_metadata.is_array()
class MultiscaleImage(ReferrerMixin):
def gather_references(self, paths: set[Path]):
for ds in self.datasets:
ds.gather_references(paths)
...
def validate_references(self, context: ValidationContext):
dtypes = set()
for ds in self.datasets:
ds.validate_references(context)
ds_ctx = context.index_relative(ds.path)
dtypes.add(ds_ctx.zarr_metadata.data_type)
assert len(dtypes) <= 1
...
...
# and so on for other OME-Zarr metadata classes
store_url = "https://my.zarr.store"
node_root = Path("/node/of/interest")
node_of_interest = get_zarr_node(store_url, node_root)
root_ome = parse_ome(node_of_interest.attributes)
refs = set()
root_ome.gather_references(refs)
context = ValidationContext(node_root, {node_root: ContextValue(node_of_interest, root_ome)})
for ref in refs:
path = node_root / ref
node = get_zarr_node(store_url, path)
context[path] = ContextValue(node, parse_ome(node.attributes))
# extend the graph if you want
root_ome.validate_references(context)
In rust, ome_zarr_metadata takes a similar strategy of representing and validating metadata without performing IO. However, a lot of OME-Zarr validation needs metadata from Node B in order to properly validate Node A (e.g. transformations which are paths to arrays).
The strategy I'm working on in a local branch is to allow all OME-Zarr objects to report any references they have, so that the caller can fetch those nodes' metadata however they deem appropriate, and then pass them back in for a second round of validation which only addresses the remote references. There's scope for a shallow version (where only direct references from the node of interest are gathered), or deep (where you keep expanding the graph whenever a referred-to node has its own references).
The API looks something like