-
Notifications
You must be signed in to change notification settings - Fork 15
Clarify terminology across specification #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
5f8c6fb
bb05f3f
561edd9
f99d742
b8c988b
8cac80c
08caa63
4db26fb
1500a6d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,30 @@ | |
|
||
The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic unified data model, the specification of its encoding into Zarr Version 2 and Version 3, and the establishment of extension points to support interoperability with external metadata and tiling standards. | ||
|
||
This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models, such as the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, with operational encoding formats suitable for cloud-native storage and analysis. | ||
These capabilities are necessary because Zarr does not provide semantic constructs for geospatial data interpretation. Applications need to understand not just array shapes and values, but coordinate meanings, projection parameters, and scientific metadata. GeoZarr fills this gap without compromising Zarr's performance characteristics. | ||
|
||
Typical use cases include the storage, transformation, discovery, and processing of raster and gridded data, data cubes with temporal or vertical dimensions, and catalogue-enabled datasets integrated with metadata standards such as STAC and OGC Tile Matrix Sets. | ||
=== Why GeoZarr Exists | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we may be missing an important clarification to justify the purpose of Geozarr: There are already existing conventions for geospatial data in Zarr, as implemented in Xarray, NCZarr, GDAL, those conventions primarily translate aspects of the CF/NetCDF data model into Zarr encoding. However:
|
||
|
||
Zarr, by design, is a low-level container for storing n-dimensional arrays and metadata. While this simplicity is a strength for performance and interoperability, it means Zarr lacks higher-level concepts that geospatial applications require: | ||
|
||
* *Coordinate Systems:* No native way to associate spatial or temporal meaning with array dimensions | ||
* *Grid Mappings:* No standard mechanism for projection and coordinate reference system metadata | ||
* *Semantic Metadata:* No conventions for units, standard names, or scientific attributes | ||
* *Variable Relationships:* No formal distinction between coordinate variables and data variables | ||
|
||
These concepts are essential for geospatial workflows but must be layered on top of Zarr's array storage. GeoZarr provides this semantic layer through proven standards (Common Data Model and CF conventions) while preserving Zarr's cloud-native advantages. | ||
|
||
=== Relationship to Zarr Core Concepts | ||
|
||
GeoZarr builds upon Zarr's foundational concepts of <<term-store,stores>> and <<term-hierarchy, hierarchies>>. A Zarr store provides the storage and retrieval interface (e.g., filesystem, cloud object storage), while a hierarchy defines the logical tree structure of groups and arrays within that store. GeoZarr specifies how to organize and structure hierarchies to support geospatial semantics, without modifying the underlying store interface. | ||
|
||
=== Use Cases and Applications | ||
|
||
This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models with operational encoding formats suitable for cloud-native storage and analysis. | ||
|
||
Typical use cases include: | ||
* Storage and processing of raster and gridded data | ||
* Management of data cubes with temporal or vertical dimensions | ||
* Integration with catalogue systems through standardized metadata | ||
* Multi-resolution tiling for efficient visualization and analysis | ||
* Cloud-optimized access to large geospatial datasets |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -2,6 +2,9 @@ | |||||
|
||||||
=== Terms and definitions | ||||||
|
||||||
GeoZarr specification inherits https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#concepts-and-terminology[concepts and terminology from the Zarr core specification]. | ||||||
The following terms adds Geozarr specificity to the existing Zarr terminology | ||||||
|
||||||
==== array | ||||||
|
||||||
A multidimensional, regularly spaced collection of values (e.g., raster data or gridded measurements), typically indexed by dimensions such as time, latitude, longitude, or spectral band. | ||||||
|
@@ -22,17 +25,21 @@ An array containing the primary geospatial or scientific measurements of interes | |||||
|
||||||
An index axis along which arrays are organised. Dimensions provide a naming and ordering scheme for accessing data in multidimensional arrays (e.g., `time`, `x`, `y`, `band`). | ||||||
|
||||||
==== group | ||||||
==== dataset | ||||||
|
||||||
A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections). | ||||||
A group that contains one or more data variables along with their associated coordinate variables, having a consistent relationship between these components. A dataset represents a coherent set of related data arrays and follows the Unified Data Model. | ||||||
|
||||||
==== metadata | ||||||
|
||||||
Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable. | ||||||
|
||||||
==== multiscale dataset | ||||||
==== multiscale group | ||||||
|
||||||
A group that contains child groups representing the same data at different resolutions, where each child group is a <<term-dataset,dataset>>. The multiscale group includes metadata describing the relationship between resolution levels. A multiscale group can be initialized with a single dataset and expanded with additional resolution levels over time. | ||||||
|
||||||
==== store | ||||||
|
||||||
A dataset that includes multiple representations of the same data variable at varying spatial resolutions. Each resolution level is associated with a tile matrix from an OGC Tile Matrix Set. | ||||||
A system that provides storage and retrieval operations for Zarr hierarchies, as defined in the https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#stores[Zarr core specification]. A store implements the abstract store interface and can be backed by various storage technologies such as filesystems, cloud object storage, or databases. GeoZarr hierarchies are stored within and accessed through Zarr stores. | ||||||
|
||||||
==== tile matrix set | ||||||
|
||||||
|
@@ -42,9 +49,9 @@ A spatial tiling scheme defined by a hierarchy of zoom levels and consistent gri | |||||
|
||||||
An affine transformation used to convert between grid coordinates and geospatial coordinates, typically defined using the GDAL GeoTransform convention. | ||||||
|
||||||
==== unified data model (UDM) | ||||||
==== Unified Data Model (UDM) | ||||||
|
||||||
A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations. | ||||||
A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations. The Unified Data Model provides a standardized framework for expressing spatial relationships, coordinate systems, and scientific metadata. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the current definition is not ideal, since an abstract model should not be defined for a specific format. Instead, it should stand independently and be applicable across formats, with Zarr being one possible encoding of that model (as for CDM, CF abstract model, UDM, etc.)
Suggested change
|
||||||
|
||||||
=== Abbreviated Terms | ||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this introduction.