Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions standard/template/sections/clause_0_front_material.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ This Standard has been developed in collaboration with contributors from Earth o
[abstract]
== Abstract

The GeoZarr Unified Data Model and Encoding Standard specifies a conceptual and implementation framework for representing multidimensional, geospatial datasets using the Zarr format. This Standard builds upon the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, and introduces interoperable constructs for tiling, georeferencing, and metadata integration.
Zarr provides efficient chunked storage for n-dimensional arrays but do not provide with the semantic constructs required for geospatial and scientific data workflows. The GeoZarr Unified Data Model and Encoding Standard addresses this gap by adding essential concepts—coordinate systems, grid mappings, temporal semantics, and CF-compliant metadata—on top of Zarr's storage foundation.

The model defines core elements—dimensions, coordinate variables, data variables, attributes—and optional extensions for multi-resolution overviews, affine geotransforms, and STAC metadata. Encoding guidance is provided for Zarr Version 2 and Zarr Version 3, including chunking, group hierarchy, and metadata conventions.
The Standard builds upon proven concepts from the Common Data Model (CDM) and Climate and Forecast (CF) Conventions to define core elements—dimensions, coordinate variables, data variables, and attributes—along with extensions for multi-resolution overviews, affine geotransforms, and STAC metadata. This layered approach ensures applications can work with semantically rich geospatial data while leveraging Zarr's cloud-optimized storage capabilities.

GeoZarr aims to bridge scientific and geospatial communities by enabling round-trip transformations with formats such as NetCDF and GeoTIFF, and supporting compatibility with tools in the scientific Python and geospatial ecosystems. This Standard enables scalable, standards-compliant, and semantically rich data structures for cloud-native Earth observation applications.
By providing a standardized framework for geospatial semantics, GeoZarr enables scientific and geospatial applications to fully utilize cloud-native storage architectures while maintaining the rich metadata and coordinate referencing required for Earth observation workflows. The result is a modern, scalable approach to storing and accessing geospatial data that meets the needs of both data providers and consumers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this introduction.

== Submitters

Expand All @@ -29,4 +29,4 @@ All questions regarding this submission should be directed to the editor or the
|Brianna Pagán _(editor)_ | DevSeed
|Ryan Abernathey| EarthMover
| TBD | TBD
|===
|===
28 changes: 26 additions & 2 deletions standard/template/sections/clause_1_scope.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,30 @@

The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic unified data model, the specification of its encoding into Zarr Version 2 and Version 3, and the establishment of extension points to support interoperability with external metadata and tiling standards.

This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models, such as the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, with operational encoding formats suitable for cloud-native storage and analysis.
These capabilities are necessary because Zarr does not provide semantic constructs for geospatial data interpretation. Applications need to understand not just array shapes and values, but coordinate meanings, projection parameters, and scientific metadata. GeoZarr fills this gap without compromising Zarr's performance characteristics.

Typical use cases include the storage, transformation, discovery, and processing of raster and gridded data, data cubes with temporal or vertical dimensions, and catalogue-enabled datasets integrated with metadata standards such as STAC and OGC Tile Matrix Sets.
=== Why GeoZarr Exists

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be missing an important clarification to justify the purpose of Geozarr: There are already existing conventions for geospatial data in Zarr, as implemented in Xarray, NCZarr, GDAL, those conventions primarily translate aspects of the CF/NetCDF data model into Zarr encoding.

However:

  1. The CF/NetCDF data model itself may lack certain capabilities, such as support for multiscale overviews, affine transforms, etc. .
  2. The current encoding conventions to Zarr – for example, mapping all NetCDF attributes into Zarr string attributes – may not be optimal and could be revisited.


Zarr, by design, is a low-level container for storing n-dimensional arrays and metadata. While this simplicity is a strength for performance and interoperability, it means Zarr lacks higher-level concepts that geospatial applications require:

* *Coordinate Systems:* No native way to associate spatial or temporal meaning with array dimensions
* *Grid Mappings:* No standard mechanism for projection and coordinate reference system metadata
* *Semantic Metadata:* No conventions for units, standard names, or scientific attributes
* *Variable Relationships:* No formal distinction between coordinate variables and data variables

These concepts are essential for geospatial workflows but must be layered on top of Zarr's array storage. GeoZarr provides this semantic layer through proven standards (Common Data Model and CF conventions) while preserving Zarr's cloud-native advantages.

=== Relationship to Zarr Core Concepts

GeoZarr builds upon Zarr's foundational concepts of <<term-store,stores>> and <<term-hierarchy, hierarchies>>. A Zarr store provides the storage and retrieval interface (e.g., filesystem, cloud object storage), while a hierarchy defines the logical tree structure of groups and arrays within that store. GeoZarr specifies how to organize and structure hierarchies to support geospatial semantics, without modifying the underlying store interface.

=== Use Cases and Applications

This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models with operational encoding formats suitable for cloud-native storage and analysis.

Typical use cases include:
* Storage and processing of raster and gridded data
* Management of data cubes with temporal or vertical dimensions
* Integration with catalogue systems through standardized metadata
* Multi-resolution tiling for efficient visualization and analysis
* Cloud-optimized access to large geospatial datasets
19 changes: 13 additions & 6 deletions standard/template/sections/clause_4_terms_and_definitions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

=== Terms and definitions

GeoZarr specification inherits https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#concepts-and-terminology[concepts and terminology from the Zarr core specification].
The following terms adds Geozarr specificity to the existing Zarr terminology

==== array

A multidimensional, regularly spaced collection of values (e.g., raster data or gridded measurements), typically indexed by dimensions such as time, latitude, longitude, or spectral band.
Expand All @@ -22,17 +25,21 @@ An array containing the primary geospatial or scientific measurements of interes

An index axis along which arrays are organised. Dimensions provide a naming and ordering scheme for accessing data in multidimensional arrays (e.g., `time`, `x`, `y`, `band`).

==== group
==== dataset

A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections).
A group that contains one or more data variables along with their associated coordinate variables, having a consistent relationship between these components. A dataset represents a coherent set of related data arrays and follows the Unified Data Model.

==== metadata

Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.

==== multiscale dataset
==== multiscale group

A group that contains child groups representing the same data at different resolutions, where each child group is a <<term-dataset,dataset>>. The multiscale group includes metadata describing the relationship between resolution levels. A multiscale group can be initialized with a single dataset and expanded with additional resolution levels over time.

==== store

A dataset that includes multiple representations of the same data variable at varying spatial resolutions. Each resolution level is associated with a tile matrix from an OGC Tile Matrix Set.
A system that provides storage and retrieval operations for Zarr hierarchies, as defined in the https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#stores[Zarr core specification]. A store implements the abstract store interface and can be backed by various storage technologies such as filesystems, cloud object storage, or databases. GeoZarr hierarchies are stored within and accessed through Zarr stores.

==== tile matrix set

Expand All @@ -42,9 +49,9 @@ A spatial tiling scheme defined by a hierarchy of zoom levels and consistent gri

An affine transformation used to convert between grid coordinates and geospatial coordinates, typically defined using the GDAL GeoTransform convention.

==== unified data model (UDM)
==== Unified Data Model (UDM)

A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations.
A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations. The Unified Data Model provides a standardized framework for expressing spatial relationships, coordinate systems, and scientific metadata.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the current definition is not ideal, since an abstract model should not be defined for a specific format. Instead, it should stand independently and be applicable across formats, with Zarr being one possible encoding of that model (as for CDM, CF abstract model, UDM, etc.)

Suggested change
A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations. The Unified Data Model provides a standardized framework for expressing spatial relationships, coordinate systems, and scientific metadata.
A conceptual model for structuring geospatial data using CDM-based constructs. It enables consistent representation of coordinate referencing, metadata integration, and multiscale data. The Unified Data Model provides a standard framework for describing spatial relationships, coordinate systems, and scientific metadata, which can then be encoded in formats such as Zarr.


=== Abbreviated Terms

Expand Down
Loading