Skip to content

Conversation

emmanuelmathot
Copy link
Contributor

Introduce the Geo Multiscales Attribute Extension specification and schema as a Zarr extensions to be registered.

…layout representation and enhance field descriptions
"version": "0.1.0",
"layout": [
{"group": "0"},
{"group": "1", "from_group": "0", "factors": [2, 2], "resampling_method": "average"},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I proposed to add the decimation factor into this but I'm now afraid that it adds complexity.
I fear some people would maybe prefer something like `"factors": {"x": 2, "y": 2}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is more useful, knowing the relative scaling or the absolute scaling of an image? The relative scaling is kind of tricky because it requires a reference to another image (the source), and this can lead to redundant representations:

[
 {"group": "a"},
 {"group": "b", "from_group": "a", "factors": [2,2]},
 {"group": "c", "from_group": "b", "factors": [2,2]}
]

conveys the same information as

[
 {"group": "a"},
 {"group": "b", "from_group": "a", "factors": [2,2]},
 {"group": "c", "from_group": "a", "factors": [4,4]}
]

I feel like a client doing visualization would mainly care about getting the right data given its current FOV, which is defined in physical units, which means the multiscale data should probably be declared with physical units. This suggests a really simple multiscale declaration:

{
"multiscale_key": {
    "downsampling_method": {....},
    "assets": [
          {"name": "image_name_0", "transform": {"scale": [1,1], "translation": [0,0]}}, 
          {"name": "image_name_1", "transform": {"scale": [2, 2], "translation": [0.5, 0.5] }},
    ]
}

This is very schematic but the basic data model is that a "multiscale collection" is a JSON object with some keys that represent metadata about the downsampling process, and then a field that's a collection of references to data (zarr groups), where each one declares its transformation relative to the metadata in the "proj" field.

The only transformations we need to consider are scaling and translation, because that's what downsampling does. I don't know the proj metadata well enough yet to fully resolve how this should work, but I think a scheme where each image independently declares its location in space is better for clients than a system where images have spatial localization that depends on the metadata of other images

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentsarago I believe the factors (or scale) array would follow the zarr shape dimensions and order
@d-v-b, actually the first and second example could lead to different result because resampling from native or an already resampled intermediary dataset may be different. How would you trace the decimation?
I do like having the geometric transformations described and this may indeed be something useful to know for the client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the first and second example could lead to different result because resampling from native or an already resampled intermediary dataset may be different.

That's true, but when is this important? I am genuinely ignorant of how this information gets used, so some context would be helpful here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of radar, the intensity might contain noises and if you do not want to "replicate" too much the noises between levels, this is better to downsample from native for all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a resource you can point me to that explains the priorities for consumers of the different scale levels? I'm coming from the bioimaging background and so maybe some of my assumptions are invalid here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sorry, there isn't. The purpose of the from is to trace the downsampling source and method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this metadata need to support a case where the source is not available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say so. This is the default case for the "native" level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and consumers, are they more interested in the relative or absolute coordinates of the different scale levels?

@emmanuelmathot
Copy link
Contributor Author

emmanuelmathot commented Oct 3, 2025

Following our discussion on Wednesday, October 1st, at the GeoZarr meeting, I added a domain-agnostic version of multiscales that is composable with domain-specific attributes (such as geo/proj). This version includes a set of attributes capable of representing a geospatial pyramid of overviews. If received positively, it could replace the original proposal of a geo/multiscale.

@rabernat @d-v-b @felixcremer @christophenoel @maxrjones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants